Hide chapters

Metal by Tutorials

Before You Begin

Section 0: 3 chapters
Show chapters Hide chapters

Section I: The Player

Section 1: 8 chapters
Show chapters Hide chapters

Section III: The Effects

Section 3: 10 chapters
Show chapters Hide chapters

15. GPU-Driven Rendering
Written by Caroline Begbie

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

So far, you’ve created an engine where you can load complex models with textures and materials, animate or update them per frame and render them. Your scenes will start to get more and more complicated as you develop your game, and you’ll want to find more performant ways of doing things.

In this chapter, you’ll take a simple scene and instead of organizing the render command codes on the CPU on each frame, you’ll set up a list of all the commands before you even start the rendering loop. Along the way, you’ll learn about argument buffers, resource heaps and indirect command buffers. Finally you’ll move the command list creation to the GPU, and have a fully GPU-driven pipeline.

As you progress through the chapter, you may not see the immediate gains. However, once you’ve centralized your rendering at the end of the chapter, your app will have more complex initial loading, but much simpler rendering.

If you feel that you haven’t spent enough time so far digging around inside buffers and examining memory, then this is the chapter for you.

Note: Indirect command buffers are supported by: iOS - Apple A9 devices and up; iMacs - models from 2015; and MacBook and MacBook Pro - models from 2016. As you’re going to be delving deep into using the hardware directly, much of this chapter won’t work on the iOS simulator.

Argument buffers

In previous chapters, you sent up to five textures to the fragment shader - the base color, normals, roughness, metalness and ambient occlusion textures. During the frame render loop, each of these incurs a renderEncoder.setFragmentTexture(texture:at:) command. Using argument buffers, you can group five pointers to these five textures into one buffer, and set this buffer on the render command encoder with just one command. This argument buffer doesn’t only have to contain textures, it can contain any other data necessary to render the frame.

When you come to draw time, instead of setting the five fragment textures on the render command encoder, you set the single argument buffer. You then perform renderEncoder.useResource(_:usage:) for each texture, which places all five textures onto the GPU as indirect resources.

Once you’ve set up an argument buffer, you can refer to it in a shader, using a struct as a parameter to the shader function.

The starter project

With the concepts under your belt, open up the starter project for this chapter and examine the project. The project is minimal and it’s not the most exciting scene, but the two models have very few vertices, and you can examine the loaded model buffers more easily. The barn has a color texture, and the grass has color and normal textures.

Create the struct

Combine both textures into a new struct. Add this before fragment_main:

struct Textures {
  texture2d<float> baseColorTexture;
  texture2d<float> normalTexture;
texture2d<float> baseColorTexture [[texture(BaseColorTexture)]],
texture2d<float> normalTexture [[texture(NormalTexture)]],
constant Textures &textures [[buffer(BufferIndexTextures)]],
float3 baseColor = 
                            in.uv * modelParams.tiling).rgb;

Create the argument buffer

To pass these textures, you create an argument buffer that matches the struct.

var texturesBuffer: MTLBuffer!
func initializeTextures() {
  // 1
  let textureEncoder = fragmentFunction.makeArgumentEncoder(
    bufferIndex: Int(BufferIndexTextures.rawValue))
  // 2
  texturesBuffer =
            length: textureEncoder.encodedLength,
            options: [])!
  texturesBuffer.label = "Textures"
  textureEncoder.setArgumentBuffer(texturesBuffer, offset: 0)
  textureEncoder.setTexture(colorTexture, index: 0)
  if let normalTexture = normalTexture {
    textureEncoder.setTexture(normalTexture, index: 1)

The draw call

In Renderer.swift, in draw(in:), change:

                        index: Int(BaseColorTexture.rawValue))
                        index: Int(NormalTexture.rawValue))
                       offset: 0, 
                       index: Int(BufferIndexTextures.rawValue))

if let colorTexture = model.colorTexture {
  renderEncoder.useResource(colorTexture, usage: .read)
if let normalTexture = model.normalTexture {
  renderEncoder.useResource(normalTexture, usage: .read)

Resource heaps

You’ve grouped textures into an argument buffer, but you can also combine all your app’s textures into a resource heap.

import MetalKit

class TextureController {
  static var textures: [MTLTexture] = []
static func addTexture(texture: MTLTexture?) -> Int? {
  guard let texture = texture else { return nil }
  return TextureController.textures.count - 1
let colorTexture: MTLTexture?
let normalTexture: MTLTexture?
let colorTexture: Int?
let normalTexture: Int?
colorTexture = textures.baseColor
normalTexture = textures.normal
colorTexture = 
    TextureController.addTexture(texture: textures.baseColor)
normalTexture = 
    TextureController.addTexture(texture: textures.normal)
if let index = colorTexture {
                            index: 0)
if let index = normalTexture {
                            index: 1)
if let colorTexture = model.colorTexture {
  renderEncoder.useResource(colorTexture, usage: .read)
if let normalTexture = model.normalTexture {
  renderEncoder.useResource(normalTexture, usage: .read)
if let index = model.colorTexture {
                            usage: .read)
if let index = model.normalTexture {
                            usage: .read)
static var heap: MTLHeap?
static func buildHeap() -> MTLHeap?  {
  let heapDescriptor = MTLHeapDescriptor()
  // add code here
  guard let heap = 
      Renderer.device.makeHeap(descriptor: heapDescriptor)
    else { fatalError() }
  return heap
let descriptors = { texture in
  MTLTextureDescriptor.descriptor(from: texture)
let sizeAndAligns = { 
  Renderer.device.heapTextureSizeAndAlign(descriptor: $0)
heapDescriptor.size = sizeAndAligns.reduce(0) { 
  $0 + $1.size - ($1.size & ($1.align - 1)) + $1.align
if heapDescriptor.size == 0 {
  return nil
129 - (129 & (128 - 1)) + 128 = 256
let heapTextures = { descriptor -> MTLTexture in
  descriptor.storageMode = heapDescriptor.storageMode
  return heap.makeTexture(descriptor: descriptor)!
  let commandBuffer = Renderer.commandQueue.makeCommandBuffer(),
  let blitEncoder = commandBuffer.makeBlitCommandEncoder() 
  else {
zip(textures, heapTextures).forEach { (texture, heapTexture) in
  var region = MTLRegionMake2D(0, 0, texture.width, 
  for level in 0..<texture.mipmapLevelCount {
    for slice in 0..<texture.arrayLength {
      blitEncoder.copy(from: texture,
                       sourceSlice: slice,
                       sourceLevel: level,
                       sourceOrigin: region.origin,
                       sourceSize: region.size,
                       to: heapTexture,
                       destinationSlice: slice,
                       destinationLevel: level,
                       destinationOrigin: region.origin)
    region.size.width /= 2
    region.size.height /= 2
TextureController.textures = heapTextures
func initialize() {
  TextureController.heap = TextureController.buildHeap()
if let index = model.colorTexture {
                            usage: .read)
if let index = model.normalTexture {
                            usage: .read)
if let heap = TextureController.heap {
models.forEach { model in

Indirect Command Buffers

You’ve created several levels of indirection with your textures by using an argument buffer and a heap, but you can also create indirection with commands on command encoders.

1. Uniform buffers

In Renderer.swift, create three new properties to hold the uniforms and model constants in buffers:

var uniformsBuffer: MTLBuffer!
var fragmentUniformsBuffer: MTLBuffer!
var modelParamsBuffer: MTLBuffer!
var bufferLength = MemoryLayout<Uniforms>.stride
uniformsBuffer = 
  Renderer.device.makeBuffer(length: bufferLength, options: [])
uniformsBuffer.label = "Uniforms"
bufferLength = MemoryLayout<FragmentUniforms>.stride
fragmentUniformsBuffer = 
  Renderer.device.makeBuffer(length: bufferLength, options: [])
fragmentUniformsBuffer.label = "Fragment Uniforms"
bufferLength = models.count * MemoryLayout<ModelParams>.stride
modelParamsBuffer = 
  Renderer.device.makeBuffer(length: bufferLength, options: [])
modelParamsBuffer.label = "Model Parameters"
// 1
var bufferLength = MemoryLayout<Uniforms>.stride
uniformsBuffer.contents().copyMemory(from: &uniforms,
                                     byteCount: bufferLength)
bufferLength = MemoryLayout<FragmentUniforms>.stride
             from: &fragmentUniforms,
             byteCount: bufferLength)

// 2
var pointer = 
  modelParamsBuffer.contents().bindMemory(to: ModelParams.self,
                                  capacity: models.count)
// 3
for model in models {
  pointer.pointee.modelMatrix = model.modelMatrix
  pointer.pointee.tiling = model.tiling
  pointer = pointer.advanced(by: 1)

2. Indirect command buffer

You’re now ready to create some indirect commands. Take a look at draw(in:) to refresh your memory on all the render commands that you set in the rendering for loop. You’re going to move all these commands to an indirect command list. You’ll set up this command list at the start of the app, and simply call executeCommandsInBuffer on the render command encoder each frame. This will execute the entire command list with just that one command.

var icb: MTLIndirectCommandBuffer!
func initializeCommands() {
  let icbDescriptor = MTLIndirectCommandBufferDescriptor()
  icbDescriptor.commandTypes = [.drawIndexed]
  icbDescriptor.inheritBuffers = false
  icbDescriptor.maxVertexBufferBindCount = 25
  icbDescriptor.maxFragmentBufferBindCount = 25
  icbDescriptor.inheritPipelineState = false
guard let icb =
    descriptor: icbDescriptor,
    maxCommandCount: models.count,
    options: []) 
  else { fatalError() }
self.icb = icb

3. Indirect commands

Now that you’ve set up an indirect command buffer, you’ll add the list of commands to it. Add this to initializeCommands()

for (modelIndex, model) in models.enumerated() {
  let icbCommand = icb.indirectRenderCommandAt(modelIndex)
  icbCommand.setVertexBuffer(uniformsBuffer, offset: 0,
    at: Int(BufferIndexUniforms.rawValue))
    offset: 0,
    at: Int(BufferIndexFragmentUniforms.rawValue))
  icbCommand.setVertexBuffer(modelParamsBuffer, offset: 0,
    at: Int(BufferIndexModelParams.rawValue))
  icbCommand.setFragmentBuffer(modelParamsBuffer, offset: 0,
    at: Int(BufferIndexModelParams.rawValue))
  icbCommand.setVertexBuffer(model.vertexBuffer, offset: 0,
    at: Int(BufferIndexVertices.rawValue))
  icbCommand.setFragmentBuffer(model.texturesBuffer, offset: 0,
    at: Int(BufferIndexTextures.rawValue))
  indexCount: model.submesh.indexCount,
  indexType: model.submesh.indexType,
  indexBuffer: model.submesh.indexBuffer.buffer,
  indexBufferOffset: model.submesh.indexBuffer.offset,
  instanceCount: 1,
  baseVertex: 0,
  baseInstance: modelIndex)

4. Update the render loop

You can now remove most of the render encoder commands from draw(in:). Remove all the code after setting the heap down to, but not including, renderEncoder.endEncoding().

func draw(in view: MTKView) {
    let descriptor = view.currentRenderPassDescriptor,
    let commandBuffer = 
       Renderer.commandQueue.makeCommandBuffer() else {

  guard let renderEncoder =
  commandBuffer.makeRenderCommandEncoder(descriptor: descriptor) 
      else { return }
  if let heap = TextureController.heap {

  guard let drawable = view.currentDrawable else {
failed assertion `Setting a pipeline that does not have supportIndirectCommandBuffers = YES is invalid’
pipelineDescriptor.supportIndirectCommandBuffers = true

renderEncoder.useResource(uniformsBuffer, usage: .read)
renderEncoder.useResource(fragmentUniformsBuffer, usage: .read)
renderEncoder.useResource(modelParamsBuffer, usage: .read)
for model in models {
  renderEncoder.useResource(model.vertexBuffer, usage: .read)
                            usage: .read)
  renderEncoder.useResource(model.texturesBuffer, usage: .read)

5. Update the shader functions

Both vertex_main and fragment_main use modelParams, which holds each model’s matrix and the tiling of textures for the model. You’ve changed the single instance of modelParams to be an array, so now you’ll change the shader functions to match the incoming buffers and access the correct element in the model parameters array.

constant ModelParams &modelParams [[buffer(BufferIndexModelParams)]]
constant ModelParams *modelParamsArray 
uint baseInstance [[base_instance]]
ModelParams modelParams = modelParamsArray[baseInstance];
uint modelIndex [[flat]];
.modelIndex = baseInstance
constant ModelParams &modelParams [[buffer(BufferIndexModelParams)]]
constant ModelParams *modelParamsArray [[buffer(BufferIndexModelParams)]]
ModelParams modelParams = modelParamsArray[in.modelIndex];

6. Execute the command list

All the code you have written in this chapter so far has been building up to one command. Drum roll….

                                      range: 0..<models.count)

GPU driven rendering

You’ve achieved indirect CPU rendering, by setting up a command list and rendering it. However, you can go one better, and get the GPU to create this command list. Open Renderer.swift and take a look at the for loop in initializeCommands().

Compute shader function

You’ll start by creating the compute shader, so that you can see what data you have to pass. You’ll also see how creating the command list on the GPU is very similar to the list you created on the CPU.

#import "Common.h"

struct ICBContainer {
  command_buffer icb [[id(0)]];

struct Model {
  constant float *vertexBuffer;
  constant uint *indexBuffer;
  constant float *texturesBuffer;
  render_pipeline_state pipelineState;
kernel void encodeCommands(
  uint modelIndex [[thread_position_in_grid]],
  constant Uniforms &uniforms [[buffer(BufferIndexUniforms)]],
  constant FragmentUniforms &fragmentUniforms 
  constant MTLDrawIndexedPrimitivesIndirectArguments 
    *drawArgumentsBuffer [[buffer(BufferIndexDrawArguments)]],
  constant ModelParams *modelParamsArray 
  constant Model *modelsArray [[buffer(BufferIndexModels)]],
  device ICBContainer *icbContainer [[buffer(BufferIndexICB)]]) {
Model model = modelsArray[modelIndex];
MTLDrawIndexedPrimitivesIndirectArguments drawArguments
  = drawArgumentsBuffer[modelIndex];
render_command cmd(icbContainer->icb, modelIndex);
cmd.set_vertex_buffer(&uniforms, BufferIndexUniforms);
cmd.set_vertex_buffer(modelParamsArray, BufferIndexModelParams);
cmd.set_vertex_buffer(model.vertexBuffer, 0);
  model.indexBuffer + drawArguments.indexStart,

The compute pipeline state

In Renderer.swift, create these new properties:

let icbPipelineState: MTLComputePipelineState
let icbComputeFunction: MTLFunction
static func buildComputePipelineState(function: MTLFunction) -> 
  MTLComputePipelineState {
  let computePipelineState: MTLComputePipelineState
  do {
    computePipelineState = try 
                 function: function)
  } catch {
  return computePipelineState
icbComputeFunction = 
  Renderer.library.makeFunction(name: "encodeCommands")!
icbPipelineState = 
  Renderer.buildComputePipelineState(function: icbComputeFunction)

The argument buffers

In the compute shader, you created two structs — one for the ICB, and one for the model. In Renderer, create two buffer properties for the argument buffers to match these structs:

var icbBuffer: MTLBuffer!
var modelsBuffer: MTLBuffer!
let icbEncoder = icbComputeFunction.makeArgumentEncoder(
                   bufferIndex: Int(BufferIndexICB.rawValue))
icbBuffer = Renderer.device.makeBuffer(
              length: icbEncoder.encodedLength,
              options: [])
icbEncoder.setArgumentBuffer(icbBuffer, offset: 0)
icbEncoder.setIndirectCommandBuffer(icb, index: 0)
var mBuffers: [MTLBuffer] = []
var mBuffersLength = 0
for model in models {
  let encoder = icbComputeFunction.makeArgumentEncoder(
                  bufferIndex: Int(BufferIndexModels.rawValue))
  let mBuffer = Renderer.device.makeBuffer(
                  length: encoder.encodedLength, options: [])!
  encoder.setArgumentBuffer(mBuffer, offset: 0)
  encoder.setBuffer(model.vertexBuffer, offset: 0, index: 0)
                    offset: 0, index: 1)
  encoder.setBuffer(model.texturesBuffer!, offset: 0, index: 2)
  encoder.setRenderPipelineState(model.pipelineState, index: 3)
  mBuffersLength += mBuffer.length
modelsBuffer = Renderer.device.makeBuffer(length: mBuffersLength, 
                                          options: [])
modelsBuffer.label = "Models Array Buffer"
var offset = 0
for mBuffer in mBuffers {
  var pointer = modelsBuffer.contents()
  pointer = pointer.advanced(by: offset)
  pointer.copyMemory(from: mBuffer.contents(), byteCount: mBuffer.length)
  offset += mBuffer.length

Draw arguments

At the top of Renderer, create a new buffer property for the draw arguments:

var drawArgumentsBuffer: MTLBuffer!
let drawLength = models.count * 
drawArgumentsBuffer = 
     Renderer.device.makeBuffer(length: drawLength,
                                options: [])!
drawArgumentsBuffer.label = "Draw Arguments"
// 1
var drawPointer = 
    to: MTLDrawIndexedPrimitivesIndirectArguments.self,
    capacity: models.count)
// 2
for (modelIndex, model) in models.enumerated() {
  var drawArgument = MTLDrawIndexedPrimitivesIndirectArguments()
  drawArgument.indexCount = UInt32(model.submesh.indexCount)
  drawArgument.instanceCount = 1
  drawArgument.indexStart = 
  drawArgument.baseVertex = 0
  drawArgument.baseInstance = UInt32(modelIndex)
  // 3
  drawPointer.pointee = drawArgument
  drawPointer = drawPointer.advanced(by: 1)

The compute command encoder

You’ve done all the preamble and setup code. All that’s left to do now is create a compute command encoder to run the compute shader function. This will create a render command to render every model.

  let computeEncoder = commandBuffer.makeComputeCommandEncoder()
  else { return }
computeEncoder.setBuffer(uniformsBuffer, offset: 0, 
  index: Int(BufferIndexUniforms.rawValue))
computeEncoder.setBuffer(fragmentUniformsBuffer, offset: 0, 
  index: Int(BufferIndexFragmentUniforms.rawValue))
computeEncoder.setBuffer(drawArgumentsBuffer, offset: 0, 
  index: Int(BufferIndexDrawArguments.rawValue))
computeEncoder.setBuffer(modelParamsBuffer, offset: 0, 
  index: Int(BufferIndexModelParams.rawValue))
computeEncoder.setBuffer(modelsBuffer, offset: 0, 
  index: Int(BufferIndexModels.rawValue))
computeEncoder.setBuffer(icbBuffer, offset: 0, 
  index: Int(BufferIndexICB.rawValue))
computeEncoder.useResource(icb, usage: .write)
computeEncoder.useResource(modelsBuffer, usage: .read)

if let heap = TextureController.heap {

for model in models {
  computeEncoder.useResource(model.vertexBuffer, usage: .read)
                             usage: .read)
                             usage: .read)
let threadExecutionWidth = icbPipelineState.threadExecutionWidth
let threads = MTLSize(width: models.count, height: 1, depth: 1)
let threadsPerThreadgroup = MTLSize(width: threadExecutionWidth, 
                                    height: 1, depth: 1)
  threadsPerThreadgroup: threadsPerThreadgroup)
let blitEncoder = commandBuffer.makeBlitCommandEncoder()!
                                 range: 0..<models.count)
                                      range: 0..<models.count)

Where to go from here?

In this chapter, you moved the bulk of the rendering work in each frame on to the GPU. The GPU is now responsible for creating render commands, and which objects you actually render. Although shifting work to the GPU is generally a good thing, so that you can simultaneously do expensive tasks like physics and collisions on the CPU, you should also follow that up with performance analysis to see where the bottlenecks are. You can read more about this at the end of the next section.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2023 Kodeco Inc.

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.

Unlock now