Chapters

Hide chapters

Metal by Tutorials

Third Edition · macOS 12 · iOS 15 · Swift 5.5 · Xcode 13

Section I: Beginning Metal

Section 1: 10 chapters
Show chapters Hide chapters

Section II: Intermediate Metal

Section 2: 8 chapters
Show chapters Hide chapters

Section III: Advanced Metal

Section 3: 8 chapters
Show chapters Hide chapters

31. Performance Optimization
Written by Caroline Begbie & Marius Horga

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

The first step to optimizing the performance of your app is examining exactly how your current app performs and analyzing where the bottlenecks are. The starter app provided with this chapter, even with several render passes, runs quite well as it is, but you’ll study its performance so that you know where to look when you develop real-world apps.

The Starter App

➤ In Xcode, build and run the starter app for this chapter.

The starter app
The starter app

There are several render passes involved:

  • ShadowRenderPass: Renders models to depth texture.
  • ForwardRenderPass: Renders all models aside from rocks and grass.
  • NatureRenderPass: Renders rocks and grass.
  • SkyboxRenderPass: Renders the skybox.
  • Bloom: Post processes the image with bloom.

You may find that the app runs very slowly. On my 2018 11” iPad Pro, it runs at 33 FPS. This is mostly due to the number of skeletons and quantity of grass. If your app runs too slowly, you can reduce these in GameScene.

Profiling

There are a few ways to monitor and tweak your app’s performance. In this chapter, you’ll look at what Xcode has to offer in the way of profiling. You should also check out Instruments, which is a powerful app that profiles both CPU and GPU performance. For further information, read Apple’s article Using Metal System Trace in Instruments to Profile Your App.

GPU History

GPU history is a tool provided by the macOS operating system via its Activity Monitor app, so it is not inside Xcode. It shows basic GPU activity in real time for all of your GPUs. If you’re using eGPUs, it’ll show activity in there too.

GPU History
PNI Gosnutc

The GPU Report

➤ With your app running, in Xcode on the Debug navigator, click FPS.

The GPU report
Ypa YDA buyuwd

GPU Workload Capture

In previous chapters, you captured the GPU workload to inspect textures, buffers and render passes. The GPU capture is always the first point of call for debugging. Make sure that your buffers and render passes are structured in the way that you think they are, and that they contain sensible information.

Summary

➤ With your app running, capture the GPU workload, and in the Debug navigator, click Summary.

The summary of your frame
Xja datmifp ud raap jjuxe

.worldTangent = uniforms.normalMatrix * in.tangent,
.worldBitangent = uniforms.normalMatrix * in.bitangent,
Insights into possible issues
Akyoqvcz exze qodtovsi igcoum

The Shader Profiler

The shader profiler is perhaps the most useful profiling tool for the shader code you write. It has nothing to do with the rendering code the CPU is setting up, or the passes you run or the resources you’re sending to the GPU. This tool tells you how your MSL code is performing line-by-line and how long it took to finish.

The shader profiler
Yba ksewot jkizofez

Pie Chart
Soi Hcobk

constant half3 sunlight = half3(2, 4, -4);

fragment half4 fragment_nature(
  VertexOut in [[stage_in]],
  texture2d_array<float> baseColorTexture [[texture(0)]],
  constant Params &params [[buffer(ParamsBuffer)]])
{
  constexpr sampler s(
    filter::linear,
    address::repeat,
    mip_filter::linear,
    max_anisotropy(8));
  half4 baseColor = half4(baseColorTexture.sample(s, in.uv, in.textureID));
  half3 normal = half3(normalize(in.worldNormal));

  half3 lightDirection = normalize(sunlight);
  half diffuseIntensity = saturate(dot(lightDirection, normal));
  half4 color = mix(baseColor*0.5, baseColor*1.5, diffuseIntensity);
  return color;
}
Reload Shaders
Zuwaes Gmopihc

Reloaded Pie Chart
Qufeaqav Voe Ckesv

GPU Timeline

The GPU timeline tool gives you an overview of how your vertex, fragment and compute functions perform, broken down by render pass.

Capture the GPU workload
Vanfupe kru JDA tizkyiun

Render Passes
Vofyap Polnax

The GPU timeline
Ydu TGE rinuceru

Encoder attachments
Ajfolij ohrolhkatcv

GPU counters
SRE yoirjidr

static var cullFaces = true
Face culling implemented
Siba wucliyv ebyjirusvuf

Memory

➤ In the Debug navigator, click the Memory tool (below Performance) to see the total memory used and how the various resources are allocated in memory:

Resources in memory
Luqouwnob oc cimeph

Instancing

Currently, you load ten skeleton meshes and draw them independently. The skeleton system could do with more efficient instanced drawing. Reducing the number of draw calls is one of the best ways of improving performance. If you render the same mesh multiple times, you should be using instanced draws, rather than drawing each mesh separately.

The Procedural Nature System

Using homeomorphic models, you can choose different shapes for each model. Homeomorphic is where two models use the same vertices in the same order, but the vertices are in different positions. A famous example of this is Spot the cow by Keenan Crane.

Spot by Keenan Crane
Ztiz dq Yoixon Cmupa

Homeomorphic rocks
Deguapolfpov kowkn

 encoder.drawIndexedPrimitives(
   type: .triangle,
   indexCount: submesh.indexCount,
   indexType: submesh.indexType,
   indexBuffer: submesh.indexBuffer.buffer,
   indexBufferOffset: submesh.indexBuffer.offset,
   instanceCount: instanceCount)

Removing Duplicate Textures

Textures use memory, and you should always check that you use the appropriate size for the device. The asset catalog makes this easy for you. If you need a refresher on how to use the asset catalog, Chapter 8, “Textures” has a section “The Right Texture for the Right Job”. However, you should also check that you aren’t duplicating textures.

The heap textures
Mka huey qapzicar

barrel.obj
A reduced heap
A jeruyoz luok

CPU-GPU Synchronization

Managing dynamic data can be a little tricky. Take the case of Uniforms. You update uniforms usually once per frame on the CPU. That means that the GPU should wait until the CPU has finished writing the buffer before it can read the buffer.

Triple Buffering

Triple buffering is a well-known technique in the realm of synchronization. The idea is to use three buffers at a time. While the CPU writes a later one in the pool, the GPU reads from the earlier one, thus preventing synchronization issues.

static let buffersInFlight = 3
var uniforms = [Uniforms](
  repeating: Uniforms(), count: buffersInFlight)
var currentUniformIndex = 0
uniforms.projectionMatrix =
  scene.camera.projectionMatrix
uniforms.viewMatrix = scene.camera.viewMatrix
uniforms.shadowProjectionMatrix = shadowCamera.projectionMatrix
uniforms.shadowViewMatrix = shadowMatrix
uniforms[currentUniformIndex].projectionMatrix =
  scene.camera.projectionMatrix
uniforms[currentUniformIndex].viewMatrix = scene.camera.viewMatrix
uniforms[currentUniformIndex].shadowProjectionMatrix =
  shadowCamera.projectionMatrix
uniforms[currentUniformIndex].shadowViewMatrix = shadowMatrix
currentUniformIndex =
  (currentUniformIndex + 1) % Self.buffersInFlight
let uniforms = uniforms[currentUniformIndex]
Result of triple buffering
Qirogw it qpemqo sempejuxc

Resource Contention
Dapiijpa Xocyancael

commandBuffer.waitUntilCompleted()

Semaphores

A more performant way, is the use of a synchronization primitive known as a semaphore, which is a convenient way of keeping count of the available resources — your triple buffer in this case.

var semaphore: DispatchSemaphore
semaphore = DispatchSemaphore(value: Self.buffersInFlight)
_ = semaphore.wait(timeout: .distantFuture)
commandBuffer.addCompletedHandler { _ in
  self.semaphore.signal()
}
commandBuffer.waitUntilCompleted()

Key Points

  • GPU History, in Activity Monitor, gives an overall picture of the performance of all the GPUs attached to your computer.
  • The GPU Report in Xcode shows you the frames per second that your app achieves. This should be 60 FPS for smooth running.
  • Capture the GPU workload for insight into what’s happening on the GPU. You can inspect buffers and be warned of possible errors or optimizations you can take. The shader profiler analyzes the time spent in each part of the shader functions. The performance profiler shows you a timeline of all your shader functions.
  • GPU counters show statistics and timings for every possible GPU function you can think of.
  • When you have multiple models using the same mesh, always perform instanced draw calls instead of rendering them separately.
  • Textures can have a huge effect on performance. Check your texture usage to ensure that you are using the correct size textures, and that you don’t send unnecessary resources to the GPU.
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2024 Kodeco Inc.

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.

Unlock now