Chapters

Hide chapters

Metal by Tutorials

Fifth Edition · macOS 26, iOS 26 · Swift 6, Metal 3 · Xcode 26

Section I: Beginning Metal

Section 1: 10 chapters
Show chapters Hide chapters

Section II: Intermediate Metal

Section 2: 8 chapters
Show chapters Hide chapters

Section III: Advanced Metal

Section 3: 8 chapters
Show chapters Hide chapters

27. GPU Command Encoding
Written by Marius Horga & Caroline Begbie

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

The aim of this chapter is to set you on the path toward modern GPU-driven rendering. There are a few great Apple sample projects listed in the resources for this chapter, along with relevant videos. However, the samples can be quite intimidating. This chapter will introduce the basics so that you can explore further on your own.

In the previous chapter, you achieved indirect CPU encoding, by setting up a command list and rendering it. You created a loop that executes serially on the CPU. This loop is one that you can easily parallelize.

Each ICB draw call command executes one after another, but by moving the command creation loop to the GPU, you can create each command at the same time over multiple GPU cores:

GPU command creation
GPU command creation

When you come to write real-world apps, setting up the render loop at the very start of the app is impractical. In each frame, you’ll be determining which models to render. Are the models in front of the camera? Is the model occluded by another model? Should you render a model with lower level of detail? By creating the command list every frame, you have complete flexibility in which models you should render, and which you should ignore.

As you’ll see, the GPU is amazingly fast at creating these render command lists, so you can include this process each frame.

The Starter Project

➤ In Xcode, open the starter project, and build and run the app.

The starter app
The starter app

The starter project is almost the same as the final project from the previous chapter with these exceptions:

  • The radio button options are both for indirect encoding, one on the GPU and one on the CPU.
  • The two render passes are held in IndirectRenderPass.swift and GPURenderPass.swift. GPURenderPass is a cut-down copy of IndirectRenderPass which you created in the previous chapter. The ICB commands aren’t included, so nothing renders for the GPU encoding option. You’ll add the commands in a shader function that runs on the GPU.
  • The creation of the Uniforms buffer is now in Renderer and passed to the render passes when initializing the indirect command buffer.

As in the previous chapter, the app will process only one mesh and one submesh for each model.

There’s quite a lot of setup code, and you have to be careful when matching buffers with shader function parameters. If you make an error, it’s difficult to debug it, and your computer may lock up. Running the app on an external device, such as iPhone or iPad is preferable, if slightly slower.

These are the steps you’ll take through this chapter:

  1. Organize your scene data.
  2. Add the scene data to one big buffer.
  3. Create the compute shader function.
  4. Create the compute pipeline state object.
  5. Encode the ICB.
  6. Set up the compute shader threads and arguments.

1. Organizing Your Scene

Instead of handing the GPU one model at a time to encode, you’ll give a GPU compute shader function your whole scene organized into buffers. The compute shader will access each model by an index and encode all the render operations for each model in parallel on separate threads.

Creating commands per thread
Pzoohimn hotjargn yaf jsgauc

#if __METAL_VERSION__
// MARK: - Metal Shading Language

#include <metal_stdlib>
using namespace metal;

struct SceneData {
  constant float3* positionsAndNormals;
  constant float2* uvs;
  constant uint32_t* indices;
  uint32_t indexType;
  uint32_t indexCount;
  constant ShaderMaterial* materials;
  constant ModelParams* modelParams;
};

#else 
// MARK: - Swift side

# endif
#include <Metal/Metal.h>

struct SceneData {
  uint64_t positions;
  uint64_t uvs;
  uint64_t indices;
  uint32_t indexType;
  uint32_t indexCount;
  uint64_t materials;
  uint64_t modelParams;
};
#import "SceneData.h"

2. Creating the Scene Data Buffer

Your current model data structure looks like this:

Model data hierarchy
Tuhup xeno jiobaxmkp

Scene data
Dfena juce

Simplified scene data
Dizzfiliad lhono bexo

var sceneBuffer: MTLBuffer!
var modelParamsBufferArray: [MTLBuffer] = []
let sceneBufferSize = MemoryLayout<SceneData>.stride * models.count
sceneBuffer = Renderer.device.makeBuffer(length: sceneBufferSize)!
sceneBuffer.label = "Scene Buffer"
var scenePtr = sceneBuffer.contents()
  .assumingMemoryBound(to: SceneData.self)
for model in models {
  let mesh = model.meshes[0]
  let submesh = mesh.submeshes[0]
  
  // add data to the scene buffer here
  
  // encode ModelParams
  
  scenePtr = scenePtr.advanced(by: 1)
}
scenePtr.pointee.positions = mesh.vertexBuffers[0].gpuAddress
scenePtr.pointee.uvs = mesh.vertexBuffers[1].gpuAddress
scenePtr.pointee.indices = submesh.indexBuffer.gpuAddress
scenePtr.pointee.indexType = submesh.indexType == .uint16 ? 0 : 1
scenePtr.pointee.indexCount = UInt32(submesh.indexCount)
scenePtr.pointee.materials = model.meshes[0].submeshes[0]
  .materialBuffer.gpuAddress
// 1
var modelParams = ModelParams(
  modelMatrix: model.transform.modelMatrix,
  tiling: model.tiling)
// 2
let modelParamsBufferSize = MemoryLayout<ModelParams>.stride
let modelParamsBuffer = Renderer.device.makeBuffer(
  bytes: &modelParams, length: modelParamsBufferSize)!
modelParamsBuffer.label = "Model Params"
// 3
scenePtr.pointee.modelParams = modelParamsBuffer.gpuAddress
// 4
modelParamsBufferArray.append(modelParamsBuffer)

3. Creating the Compute Shader Function

Now you’ll create the indirect command buffer on the GPU. Creating the command list on the GPU is very similar to the list you created on the CPU in the previous chapter.

#import "Common.h"

// 1
struct ICBContainer {
  command_buffer icb [[id(0)]];
};

kernel void encodeICB(
  // 2
  constant SceneData* scene [[buffer(0)]],
  constant Uniforms &uniforms [[buffer(UniformsBuffer)]],
  // 3
  device ICBContainer *icbContainer [[buffer(ICBBuffer)]],
  // 4
  uint modelIndex [[thread_position_in_grid]])
{
}
// 1
SceneData model = scene[modelIndex];
command_buffer icb = icbContainer->icb;

// 2
bool isVisible = true;
// 3
render_command cmd(icb, modelIndex);
if (isVisible) {
  cmd.set_vertex_buffer(&uniforms, UniformsBuffer);
  cmd.set_vertex_buffer(model.positionsAndNormals, VertexBuffer);
  cmd.set_vertex_buffer(model.uvs, UVBuffer);
  cmd.set_vertex_buffer(model.modelParams, ModelParamsBuffer);
  cmd.set_fragment_buffer(model.materials, MaterialBuffer);
  cmd.set_fragment_buffer(model.modelParams, ModelParamsBuffer);
} else {
// 4
  cmd.reset();
}
if (model.indexType == 0) {
  // uint16 indices
  cmd.draw_indexed_primitives(
    primitive_type::triangle,
    model.indexCount,
    (constant ushort*) model.indices,
    1);
} else {
  // uint32 indices
  cmd.draw_indexed_primitives(
    primitive_type::triangle,
    model.indexCount,
    (constant uint32_t*) model.indices,
    1);
}
Incorrect indices
Ogvirfakc uvrajov

4. Creating the Compute Pipeline State Object

➤ Open GPURenderPass.swift, and create these new properties in GPURenderPass:

let icbPipelineState: MTLComputePipelineState
let icbComputeFunction: MTLFunction
icbComputeFunction =
  Renderer.library.makeFunction(name: "encodeICB")!
icbPipelineState = PipelineStates.createComputePSO(
  function: "encodeICB")

5. Encoding the ICB

The encodeICB compute function requires as input a buffer that contains the indirect command buffer.

var icbContainer: MTLBuffer!
let icbEncoder = icbComputeFunction.makeArgumentEncoder(
  bufferIndex: ICBBuffer.index)
icbContainer = Renderer.device.makeBuffer(
  length: icbEncoder.encodedLength,
  options: [])
icbEncoder.setArgumentBuffer(icbContainer, offset: 0)
icbEncoder.setIndirectCommandBuffer(icb, index: 0)

6. Setting up the Compute Command Encoder

You’ve done all the preamble and setup code. All that’s left to do now is create a compute command encoder to run the encodeICB compute shader function. The function will create a render command to render every model.

func encodeICB(
  commandBuffer: MTLCommandBuffer,
  models: [Model],
  uniforms: MTLBuffer
) {
  guard let computeEncoder = 
    commandBuffer.makeComputeCommandEncoder() else { return }
  computeEncoder.label = "GPU Encoding"
  
  computeEncoder.setComputePipelineState(icbPipelineState)
  computeEncoder.setBuffer(sceneBuffer, offset: 0, index: 0)
  computeEncoder.setBuffer(
    uniforms, offset: 0, index: UniformsBuffer.index)
  computeEncoder.setBuffer(
    icbContainer, offset: 0, index: ICBBuffer.index)
}
// Dispatch threads
let threadExecutionWidth = icbPipelineState.threadExecutionWidth
let drawCount = models.count // should be number of draw calls
let threads = MTLSize(width: drawCount, height: 1, depth: 1)
let threadsPerThreadgroup = MTLSize(
  width: threadExecutionWidth, height: 1, depth: 1)
computeEncoder.dispatchThreads(
  threads, threadsPerThreadgroup: threadsPerThreadgroup)
computeEncoder.endEncoding()
encodeICB(
  commandBuffer: commandBuffer,
  models: scene.models,
  uniforms: uniforms)
The rendered scene
Vyo suyfuvam wrale

encoder.useResource(sceneBuffer, usage: .read, stages: [.vertex, .fragment])
modelParamsBufferArray.forEach {
  encoder.useResource($0, usage: .read, stages: [.vertex, .fragment])
}
Compute pass bound resources
Duzfeno kidk weobs vopuupkil

Formatted scene data
Soxkonbaq rqaku kema

Key Points

Where to Go From Here?

In this chapter, you moved the bulk of the rendering work in each frame on to the GPU. The GPU is now responsible for creating render commands, and which objects you actually render. Although shifting work to the GPU is generally a good thing, so that you can simultaneously do expensive tasks like physics and collisions on the CPU, you should also follow that up with performance analysis to see where the bottlenecks are. You can read more about this in Chapter 30, “Profiling”.

Apple sample: Modern Rendering With Metal
Akndu zomdgi: Cinozp Rajzulehx Sadd Bulog

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2025 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now