我正在尝试使用Metal创建一个程序游戏,我正在使用基于八叉树的块方法来实现Level of Detail.
我正在使用的方法涉及CPU为地形创建八叉树节点,然后使用计算着色器在GPU上创建其网格.此网格存储在块对象中的顶点缓冲区和索引缓冲区中以进行渲染.
所有这些看起来都运行得相当不错,但是在渲染块时我很早就遇到了性能问题.目前我收集了一系列要绘制的块,然后将其提交给我的渲染器,这将创建一个MTLParallelRenderCommandEncoder
然后MTLRenderCommandEncoder
为每个块创建一个块,然后将其提交给GPU.
从它的外观来看,大约50%的CPU时间花费在MTLRenderCommandEncoder
为每个块创建.目前我只是为每个块创建一个简单的8顶点立方体网格,我有一个4x4x4阵列的块,我在这些早期阶段下降到50fps左右.(实际上似乎MTLRenderCommandEncoder
每个最多只能有63 个MTLParallelRenderCommandEncoder
因此它不完全是4x4x4)
我已经读过,重点MTLParallelRenderCommandEncoder
是MTLRenderCommandEncoder
在一个单独的线程中创建每个,但是我没有太多运气让它工作.同样多线程它不会绕过63个块的上限被渲染为最大值.
我觉得以某种方式将每个块的顶点和索引缓冲区合并到一个或两个更大的缓冲区中以提交将有所帮助,但我不知道如何在没有大量memcpy()
调用的情况下执行此操作以及这是否甚至可以提高效率.
这是我的代码,它接收节点数组并绘制它们:
func drawNodes(nodes: [OctreeNode], inView view: AHMetalView){ // For control of several rotating buffers dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER) makeDepthTexture() updateUniformsForView(view, duration: view.frameDuration) let commandBuffer = commandQueue.commandBuffer() let optDrawable = layer.nextDrawable() guard let drawable = optDrawable else{ return } let passDescriptor = MTLRenderPassDescriptor() passDescriptor.colorAttachments[0].texture = drawable.texture passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1) passDescriptor.colorAttachments[0].storeAction = .Store passDescriptor.colorAttachments[0].loadAction = .Clear passDescriptor.depthAttachment.texture = depthTexture passDescriptor.depthAttachment.clearDepth = 1 passDescriptor.depthAttachment.loadAction = .Clear passDescriptor.depthAttachment.storeAction = .Store let parallelRenderPass = commandBuffer.parallelRenderCommandEncoderWithDescriptor(passDescriptor) // Currently 63 nodes as a maximum for node in nodes{ // This line is taking up around 50% of the CPU time let renderPass = parallelRenderPass.renderCommandEncoder() renderPass.setRenderPipelineState(renderPipelineState) renderPass.setDepthStencilState(depthStencilState) renderPass.setFrontFacingWinding(.CounterClockwise) renderPass.setCullMode(.Back) let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0) renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1) renderPass.setTriangleFillMode(.Lines) renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0) renderPass.endEncoding() } parallelRenderPass.endEncoding() commandBuffer.presentDrawable(drawable) commandBuffer.addCompletedHandler { (commandBuffer) -> Void in self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount dispatch_semaphore_signal(self.displaySemaphore) } commandBuffer.commit() }
rickster.. 9
你注意到:
我已经读过,关键在于在一个单独的线程中
MTLParallelRenderCommandEncoder
创建每个MTLRenderCommandEncoder
...
而且你是对的.你正在做的是顺序创建,编码和结束命令编码器 - 这里没有任何并行,所以MTLParallelRenderCommandEncoder
没有为你做任何事情.如果你消除了并行编码器并且只是renderCommandEncoderWithDescriptor(_:)
在每次通过for循环时创建了编码器,那么你的性能大致相同......也就是说,由于创建全部的开销,你仍然会遇到相同的性能问题那些编码器.
因此,如果您要按顺序编码,只需重复使用相同的编码器即可.此外,您应该尽可能多地重用其他共享状态.以下是可能的重构(未经测试)的快速通过:
let passDescriptor = MTLRenderPassDescriptor() // call this once before your render loop func setup() { makeDepthTexture() passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1) passDescriptor.colorAttachments[0].storeAction = .Store passDescriptor.colorAttachments[0].loadAction = .Clear passDescriptor.depthAttachment.texture = depthTexture passDescriptor.depthAttachment.clearDepth = 1 passDescriptor.depthAttachment.loadAction = .Clear passDescriptor.depthAttachment.storeAction = .Store // set up render pipeline state and depthStencil state } func drawNodes(nodes: [OctreeNode], inView view: AHMetalView) { updateUniformsForView(view, duration: view.frameDuration) // Set up completed handler ahead of time let commandBuffer = commandQueue.commandBuffer() commandBuffer.addCompletedHandler { _ in // unused parameter self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount dispatch_semaphore_signal(self.displaySemaphore) } // Semaphore should be tied to drawable acquisition dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER) guard let drawable = layer.nextDrawable() else { return } // Set up the one part of the pass descriptor that changes per-frame passDescriptor.colorAttachments[0].texture = drawable.texture // Get one render pass descriptor and reuse it let renderPass = commandBuffer.renderCommandEncoderWithDescriptor(passDescriptor) renderPass.setTriangleFillMode(.Lines) renderPass.setRenderPipelineState(renderPipelineState) renderPass.setDepthStencilState(depthStencilState) for node in nodes { // Update offsets and draw let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0) renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1) renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0) } renderPass.endEncoding() commandBuffer.presentDrawable(drawable) commandBuffer.commit() }
然后,使用Instruments查看您可能遇到的进一步性能问题(如果有).有一个很棒的WWDC 2015会议,展示了几个常见的"问题",如何在分析中诊断它们,以及如何解决它们.
你注意到:
我已经读过,关键在于在一个单独的线程中
MTLParallelRenderCommandEncoder
创建每个MTLRenderCommandEncoder
...
而且你是对的.你正在做的是顺序创建,编码和结束命令编码器 - 这里没有任何并行,所以MTLParallelRenderCommandEncoder
没有为你做任何事情.如果你消除了并行编码器并且只是renderCommandEncoderWithDescriptor(_:)
在每次通过for循环时创建了编码器,那么你的性能大致相同......也就是说,由于创建全部的开销,你仍然会遇到相同的性能问题那些编码器.
因此,如果您要按顺序编码,只需重复使用相同的编码器即可.此外,您应该尽可能多地重用其他共享状态.以下是可能的重构(未经测试)的快速通过:
let passDescriptor = MTLRenderPassDescriptor() // call this once before your render loop func setup() { makeDepthTexture() passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1) passDescriptor.colorAttachments[0].storeAction = .Store passDescriptor.colorAttachments[0].loadAction = .Clear passDescriptor.depthAttachment.texture = depthTexture passDescriptor.depthAttachment.clearDepth = 1 passDescriptor.depthAttachment.loadAction = .Clear passDescriptor.depthAttachment.storeAction = .Store // set up render pipeline state and depthStencil state } func drawNodes(nodes: [OctreeNode], inView view: AHMetalView) { updateUniformsForView(view, duration: view.frameDuration) // Set up completed handler ahead of time let commandBuffer = commandQueue.commandBuffer() commandBuffer.addCompletedHandler { _ in // unused parameter self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount dispatch_semaphore_signal(self.displaySemaphore) } // Semaphore should be tied to drawable acquisition dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER) guard let drawable = layer.nextDrawable() else { return } // Set up the one part of the pass descriptor that changes per-frame passDescriptor.colorAttachments[0].texture = drawable.texture // Get one render pass descriptor and reuse it let renderPass = commandBuffer.renderCommandEncoderWithDescriptor(passDescriptor) renderPass.setTriangleFillMode(.Lines) renderPass.setRenderPipelineState(renderPipelineState) renderPass.setDepthStencilState(depthStencilState) for node in nodes { // Update offsets and draw let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0) renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1) renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0) } renderPass.endEncoding() commandBuffer.presentDrawable(drawable) commandBuffer.commit() }
然后,使用Instruments查看您可能遇到的进一步性能问题(如果有).有一个很棒的WWDC 2015会议,展示了几个常见的"问题",如何在分析中诊断它们,以及如何解决它们.