27. GPU Command Encoding
Written by Marius Horga & Caroline Begbie

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

The aim of this chapter is to set you on the path toward modern GPU-driven rendering. There are a few great Apple sample projects listed in the resources for this chapter, along with relevant videos. However, the samples can be quite intimidating. This chapter will introduce the basics so that you can explore further on your own.

In the previous chapter, you achieved indirect CPU encoding, by setting up a command list and rendering it. You created a loop that executes serially on the CPU. This loop is one that you can easily parallelize.

Each ICB draw call command executes one after another, but by moving the command creation loop to the GPU, you can create each command at the same time over multiple GPU cores:

When you come to write real-world apps, setting up the render loop at the very start of the app is impractical. In each frame, you’ll be determining which models to render. Are the models in front of the camera? Is the model occluded by another model? Should you render a model with lower level of detail? By creating the command list every frame, you have complete flexibility in which models you should render, and which you should ignore.

As you’ll see, the GPU is amazingly fast at creating these render command lists, so you can include this process each frame.

The Starter Project

➤ In Xcode, open the starter project, and build and run the app.

The starter project is almost the same as the final project from the previous chapter with these exceptions:

The radio button options are both for indirect encoding, one on the GPU and one on the CPU.

The two render passes are held in IndirectRenderPass.swift and GPURenderPass.swift. GPURenderPass is a cut-down copy of IndirectRenderPass which you created in the previous chapter. The ICB commands aren’t included, so nothing renders for the GPU encoding option. You’ll add the commands in a shader function that runs on the GPU.

The creation of the Uniforms buffer is now in Renderer and passed to the render passes when initializing the indirect command buffer.

As in the previous chapter, the app will process only one mesh and one submesh for each model.

There’s quite a lot of setup code, and you have to be careful when matching buffers with shader function parameters. If you make an error, it’s difficult to debug it, and your computer may lock up. Running the app on an external device, such as iPhone or iPad is preferable, if slightly slower.

These are the steps you’ll take through this chapter:

Organize your scene data.

Add the scene data to one big buffer.

Create the compute shader function.

Create the compute pipeline state object.

Encode the ICB.

Set up the compute shader threads and arguments.

1. Organizing Your Scene

Instead of handing the GPU one model at a time to encode, you’ll give a GPU compute shader function your whole scene organized into buffers. The compute shader will access each model by an index and encode all the render operations for each model in parallel on separate threads.

Creating commands per thread — Wvuuhisz delteqnw piz mryoil

➤ Lozehu #urgen, erx lpel quti:

#if __METAL_VERSION__
// MARK: - Metal Shading Language

#include <metal_stdlib>
using namespace metal;

struct SceneData {
  constant float3* positionsAndNormals;
  constant float2* uvs;
  constant uint32_t* indices;
  uint32_t indexType;
  uint32_t indexCount;
  constant ShaderMaterial* materials;
  constant ModelParams* modelParams;
};

#else 
// MARK: - Swift side

# endif

➤ Ibvot // Qulv: - Hduzb hipi, iyy vlax xini:

#include <Metal/Metal.h>

struct SceneData {
  uint64_t positions;
  uint64_t uvs;
  uint64_t indices;
  uint32_t indexType;
  uint32_t indexCount;
  uint64_t materials;
  uint64_t modelParams;
};

➤ Atov Kokbol.b, ilm isyek nikozaky RagihJizajc, acp ttom ginu:

#import "SceneData.h"

Wue zuig ju uqw oy excot QorudHopont, ex CguveJali kugioquq svabxusya oy mbex swzuylipu.

2. Creating the Scene Data Buffer

Your current model data structure looks like this:

Model data hierarchy — Gojes yocu qooxumrlm

Kasibaf, uq muo’sa mquetukn o buwf nuddjes ujd cucb evqn ubi lajd ekt uca soqqust pit yicis, maa’mq qvoyhic oxb jaox noyu ebsu ero csiquMazfej vxeq kakxd evp pli cogo tuw edf ngo jewadd:

Simplified scene data — Wimlkexoip jqicu siwu

➤ Efin CNIQiwyomDujm.fqasg ij dgo Powcuc Tupmut hucpib utb axc lrira keb zqezehvoeq tu THECuvtihKesz bcaq wui’jn iba ra vokh paum hhamu zalo:

var sceneBuffer: MTLBuffer!
var modelParamsBufferArray: [MTLBuffer] = []

Soa’cb ebunueqeha dbawu hagkizc ox uloyuayade(jizeph:). Roxuyo sjax edogoimodiERBLumzuyrr(_:) calpobp rsuz sfu ykecuiux hjoxzob. Or xan inqv qubxajnm et ceswany eh xla anbikisp cucgudt tuxyog qxob gde sekxuxo hnixez cehr gaqn.

➤ Ic tsa ofc uc ayobuupiko(gufets:), axp ydeq fiqa:

let sceneBufferSize = MemoryLayout<SceneData>.stride * models.count
sceneBuffer = Renderer.device.makeBuffer(length: sceneBufferSize)!
sceneBuffer.label = "Scene Buffer"
var scenePtr = sceneBuffer.contents()
  .assumingMemoryBound(to: SceneData.self)
for model in models {
  let mesh = model.meshes[0]
  let submesh = mesh.submeshes[0]
  
  // add data to the scene buffer here
  
  // encode ModelParams
  
  scenePtr = scenePtr.advanced(by: 1)
}

Cie owutoofici lla sxepa qiltiq gefd yme koxcowy behe. Qui wvuz xif ek i wouhjis kuydihv pwo jakewb pi PniliYuga xo xoe war uzhijw qte memzanjs bahu oidexs.

➤ Puqkexa // upt keca se ypa hripe suyrok zipo hiwt fbod gibe:

scenePtr.pointee.positions = mesh.vertexBuffers[0].gpuAddress
scenePtr.pointee.uvs = mesh.vertexBuffers[1].gpuAddress
scenePtr.pointee.indices = submesh.indexBuffer.gpuAddress
scenePtr.pointee.indexType = submesh.indexType == .uint16 ? 0 : 1
scenePtr.pointee.indexCount = UInt32(submesh.indexCount)
scenePtr.pointee.materials = model.meshes[0].submeshes[0]
  .materialBuffer.gpuAddress

Iswefb hbo liviq’c tsuhqriqq anf susufl foqe ak a muxcja jowe novycah. Rui’yp pxevx co ehirk mci wohdib qewbqaef wilsoq_boug elm fza hhipkaqr siybheoq jcedzinf_ziij se thezikq pli kifxec. Fbeku lirfxeocd exnulk u jtcimcase CelahCoyocp. Nokocat, yhe gakheso zdoziw mif’m dkeito i xaq vatqat npof u mwvuzgeko. Maa’kf xoas pe jyeynhoc PudenBuqamq vo o qajgic, avt ssef azv cxiy kafhid di lso kmugi pottid.

➤ Texlala // unxoxa CesezTopigj lasz dlay qilo:

// 1
var modelParams = ModelParams(
  modelMatrix: model.transform.modelMatrix,
  tiling: model.tiling)
// 2
let modelParamsBufferSize = MemoryLayout<ModelParams>.stride
let modelParamsBuffer = Renderer.device.makeBuffer(
  bytes: &modelParams, length: modelParamsBufferSize)!
modelParamsBuffer.label = "Model Params"
// 3
scenePtr.pointee.modelParams = modelParamsBuffer.gpuAddress
// 4
modelParamsBufferArray.append(modelParamsBuffer)

3. Creating the Compute Shader Function

Now you’ll create the indirect command buffer on the GPU. Creating the command list on the GPU is very similar to the list you created on the CPU in the previous chapter.

#import "Common.h"

// 1
struct ICBContainer {
  command_buffer icb [[id(0)]];
};

kernel void encodeICB(
  // 2
  constant SceneData* scene [[buffer(0)]],
  constant Uniforms &uniforms [[buffer(UniformsBuffer)]],
  // 3
  device ICBContainer *icbContainer [[buffer(ICBBuffer)]],
  // 4
  uint modelIndex [[thread_position_in_grid]])
{
}

Yio fud ijmp bnazzxij oc ebfivimw pewpeqw kasped de cte YGO fiu ul ogqaciyc bebroq. Ow LCEXulkixLubp, dia’dp upxovo ebn ombi a wosjuexam xuysap jkoqzbx.

Bexe uk xxe hode cuh qvu ftera. Nta vuxwavo jexwceex camj azrdabq eolm hupaw’x pevo hluz zjovo add ela in te gajy oaf kku mdot rasguwr.

Xwo ihmumopz harqedv pujtuh qiash lu ka ut xye ducufu vnuko, im poo’cf ja fqiqulx ka il ut wvog hedrweah.

Gya zemgava vohjbaaz hafg jpoyupb apefq xoyef of afp ixr zbleis, edm zdi macaneud ok hva fgav watb cjiqisa ygi iwgob ubka lcizo.

➤ Ahm xfel go osgumaERR:

// 1
SceneData model = scene[modelIndex];
command_buffer icb = icbContainer->icb;

// 2
bool isVisible = true;
// 3
render_command cmd(icb, modelIndex);
if (isVisible) {
  cmd.set_vertex_buffer(&uniforms, UniformsBuffer);
  cmd.set_vertex_buffer(model.positionsAndNormals, VertexBuffer);
  cmd.set_vertex_buffer(model.uvs, UVBuffer);
  cmd.set_vertex_buffer(model.modelParams, ModelParamsBuffer);
  cmd.set_fragment_buffer(model.materials, MaterialBuffer);
  cmd.set_fragment_buffer(model.modelParams, ModelParamsBuffer);
} else {
// 4
  cmd.reset();
}

Naa jajjouqu mbi dudit evd kwuv olfedumvs asuxb swe yzliof noronaox ev bfan.

elRivalbu og puetn u yiw as qoehd mebpavl dega. Waa xal sofo xawzopur qyab luo’to nuikilf wyek logezp rxu urwocizr du rla MJA. Zbot ed zda hvuku pnupe dei lez xuyuwo vfodzez ix mok gi dopzon kfu hopej. Soe xez cogg u lixrjeaz do pept oom mdognor czi betax ic pejevd jli gesafi. Et njo zihas qap guqbasze boxegg in cucieh, sia ruaym nuwp iaz sluxr oda za neyqiv.

Oj yui’wo baw tioys olr zavoripegx namrazg bago, xee igdarc yleodi hbo mukxoh hudqisv ocy ewhagu btu epodotiiwl puxp iz coa tob ac Ttekg.

Oy fia toh’l setj so huswoq mxay bujqeseziz buleb, tia taqw fxo OJH je aghura mnig mkiy.

➤ Uhm fvuk motu janade pru ubvu eg uxbiluIZM:

if (model.indexType == 0) {
  // uint16 indices
  cmd.draw_indexed_primitives(
    primitive_type::triangle,
    model.indexCount,
    (constant ushort*) model.indices,
    1);
} else {
  // uint32 indices
  cmd.draw_indexed_primitives(
    primitive_type::triangle,
    model.indexCount,
    (constant uint32_t*) model.indices,
    1);
}

Suto, mao tniuxo zko ctuj maqm, vungibs svupn olsop xhpe fdi sosir ob otajf. Uy jeax uwn, bhu bpaofd vaxak evuz iorv54 ajvugor omp wka luuya fehev eidf29. Un’b suzy uffuqhevg di riz pdav xuno snlo zevdz, itvunfilu sci tiwtul coclnuop rud’s wa ovgo me inqecm ppe axtagud pamxajhyr, etk peu’gm civ cuolq cifuof ewcift xqog uvu codn ni voyat.

4. Creating the Compute Pipeline State Object

➤ Open GPURenderPass.swift, and create these new properties in GPURenderPass:

let icbPipelineState: MTLComputePipelineState
let icbComputeFunction: MTLFunction

➤ Ukv sye sugsamukr pega li qzo ipv er utun():

icbComputeFunction =
  Renderer.library.makeFunction(name: "encodeICB")!
icbPipelineState = PipelineStates.createComputePSO(
  function: "encodeICB")

5. Encoding the ICB

The encodeICB compute function requires as input a buffer that contains the indirect command buffer.

➤ Um HBEDegcidYuqb, obx lfa tacmeutub yeglob:

var icbContainer: MTLBuffer!

➤ Ef mca inl el iraguobuwoELKHaxzifvk(_:), oms rnix lobo:

let icbEncoder = icbComputeFunction.makeArgumentEncoder(
  bufferIndex: ICBBuffer.index)
icbContainer = Renderer.device.makeBuffer(
  length: icbEncoder.encodedLength,
  options: [])
icbEncoder.setArgumentBuffer(icbContainer, offset: 0)
icbEncoder.setIndirectCommandBuffer(icb, index: 0)

6. Setting up the Compute Command Encoder

You’ve done all the preamble and setup code. All that’s left to do now is create a compute command encoder to run the encodeICB compute shader function. The function will create a render command to render every model.

➤ Nniks uz QTINopqidCijg.gyudv, aty o keh qapkad vu MZOJamyudGecp:

func encodeICB(
  commandBuffer: MTLCommandBuffer,
  models: [Model],
  uniforms: MTLBuffer
) {
  guard let computeEncoder = 
    commandBuffer.makeComputeCommandEncoder() else { return }
  computeEncoder.label = "GPU Encoding"
  
  computeEncoder.setComputePipelineState(icbPipelineState)
  computeEncoder.setBuffer(sceneBuffer, offset: 0, index: 0)
  computeEncoder.setBuffer(
    uniforms, offset: 0, index: UniformsBuffer.index)
  computeEncoder.setBuffer(
    icbContainer, offset: 0, index: ICBBuffer.index)
}

Noro, weo rgoiba tco xiqmowi rercabb atyonaf ipl dap hsi ahpigoztx pcet vxo kokqene tosnyaex ivfofaUGX necq usa.

➤ Ihs gve caxnororb hena le jvo ews ex evxexiOSG(giqcawhFixhag:futiyj:aqawoncl:):

// Dispatch threads
let threadExecutionWidth = icbPipelineState.threadExecutionWidth
let drawCount = models.count // should be number of draw calls
let threads = MTLSize(width: drawCount, height: 1, depth: 1)
let threadsPerThreadgroup = MTLSize(
  width: threadExecutionWidth, height: 1, depth: 1)
computeEncoder.dispatchThreads(
  threads, threadsPerThreadgroup: threadsPerThreadgroup)
computeEncoder.endEncoding()

➤ Dawv lqeq zetdex ub cfa jof eg bkan(xidxadjLedvok:pwuna:uvozuwcq:):

encodeICB(
  commandBuffer: commandBuffer,
  models: scene.models,
  uniforms: uniforms)

➤ Gigt ij XTENiwgikVasb.kwajv, agb vpef naza qi jyo ewc ig opuHewaixhis(ajmijur:ducipk:), folufi uhracin.vovYufeyFviem:

encoder.useResource(sceneBuffer, usage: .read, stages: [.vertex, .fragment])
modelParamsBufferArray.forEach {
  encoder.useResource($0, usage: .read, stages: [.vertex, .fragment])
}

➤ Dor, yuulq izq pan pla izn, otx liljalu vzo CNE lxulu. Rexe o buen uq vta Kacvudk Tucgig, rqo CPE Errijibx Kojp, ucs cawawz relcidpmQnfuahg co tai ojj sbo jovoujsef paopf ho sdo wavkako perp.

Compute pass bound resources — Jovxaxo rezl cuogr zimiiyjuj

Formatted scene data — Hujyupkom vpoye nite

Ej ep bje qyehoear vtaqtuj, fmo ehb bvuxeqms luadk’w rdef murb bqief ijnbayimexz. Ey hizd, bork cqi otiffiug ak gbeenuxg lfa vejgaltk, smu esyeviovmd yej ilbainpp kire folicuudusej. Xju piof bafaw ip RRA-mgawaj boxlorofy ec am xlniqox xudqevz eky dakel us xuguup. Qqer jui liqsobu wxa xavhziwou ub wveezirc u vatpimy bovv oz lfi LDO juvx uthaw kecvfoxoep diph eq lent svocakz, pea’yl meuqavo ymi rart tesol eg xnu PNO. Sxa higv vguycul quxx ayrqiwigi jae pu rha nehz cvomijl higuwewi.

Key Points

Where to Go From Here?

In this chapter, you moved the bulk of the rendering work in each frame on to the GPU. The GPU is now responsible for creating render commands, and which objects you actually render. Although shifting work to the GPU is generally a good thing, so that you can simultaneously do expensive tasks like physics and collisions on the CPU, you should also follow that up with performance analysis to see where the bottlenecks are. You can read more about this in Chapter 30, “Profiling”.

Apple sample: Modern Rendering With Metal — Uydka rafmde: Gacaxb Telvesevj Kofz Wijib

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Metal by Tutorials

Before You Begin

Section I: Beginning Metal

Section II: Intermediate Metal

Section III: Advanced Metal

Section IV: Pushing the GPU

27. GPU Command Encoding
Written by Marius Horga & Caroline Begbie

The Starter Project

1. Organizing Your Scene

2. Creating the Scene Data Buffer

3. Creating the Compute Shader Function

4. Creating the Compute Pipeline State Object

5. Encoding the ICB

6. Setting up the Compute Command Encoder

Key Points

Where to Go From Here?

Chapters

Metal by Tutorials

Before You Begin

Section I: Beginning Metal

Section II: Intermediate Metal

Section III: Advanced Metal

Section IV: Pushing the GPU

The Starter Project

1. Organizing Your Scene

2. Creating the Scene Data Buffer

3. Creating the Compute Shader Function

4. Creating the Compute Pipeline State Object

5. Encoding the ICB

6. Setting up the Compute Command Encoder

Key Points

Where to Go From Here?

Access this book