Chapters

Hide chapters

Metal by Tutorials

Fifth Edition · macOS 26, iOS 26 · Swift 6, Metal 3 · Xcode 26

Section I: Beginning Metal

Section 1: 10 chapters
Show chapters Hide chapters

Section II: Intermediate Metal

Section 2: 8 chapters
Show chapters Hide chapters

Section III: Advanced Metal

Section 3: 8 chapters
Show chapters Hide chapters

30. Profiling
Written by Marius Horga & Caroline Begbie

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

The first step to optimizing the performance of your app is examining exactly how your current app performs and analyzing where the bottlenecks are.

Imagine this scenario: You’ve started development on the first level of a new game, Phoenix Island: Rising from the ashes. You’ve created a basic scene, and now you want to find out how well it runs before adding the action.

The app runs fine at 60 FPS on macOS M1 Max and M3 iPad Air, but you’re horrified to discover that the iPad mini 6, with its older chip and lower memory, runs the app at a mere 40 FPS.

In this chapter, you’ll look at some tools to help you analyze performance and find where your bottlenecks are.

Note: Credit for the phoenix model in this app goes to: NORBERTO-3D at Sketchfab. All the other models and HDRI sky were created by the folks at Poly Haven

The Starter App

➤ In Xcode, review the project for this chapter. There are a number of interesting features.

Assets

First, there are two Assets folders. The one directly under the top level Profiling contains a lot of data, so it points to a folder outside of the Profiling hierarchy. If the content names are red, select both Assets and game-scene.usda, and in the File inspector, click the folder icon. Then, locate and select the assets folder to reconnect the files. The assets folder is the folder that contains both Assets and game-scene.usda.

Reconnect asset files
Pecifqofg iqzib sifec

The USD Scene

assets/game-scene.usda is an editable text file that describes the scene. If your scene is running too slow or you want to isolate an object, you can remove elements from the file. For example, to remove the landscape, delete the following lines:

def Mesh "Landscape" (
    prepend references ...
  )
{
    token visibility = "inherited"
    matrix4d xformOp:transform ...
    uniform token[] xformOpOrder = ["xformOp:transform"]
}

The Render Passes

In Renderer.swift, you can see the usual render passes, along with these new ones:

The starter app
Yvo jbidtar ecq

Profiling

There are a few ways to monitor and tweak your app’s performance. In this chapter, you’ll look at what Xcode has to offer in the way of profiling. You can also use Instruments, which is a powerful app that profiles both CPU and GPU performance. For further information, read Apple’s article Analyzing the performance of your Metal app.

Metal Performance HUD

A great place to start is looking at information about how your app is running is the Metal Performance HUD.

Edit scheme
Ixev xrgege

Diagnostics
Hiilxutwihm

The Metal Performance HUD
Xru Zefed Gedcixgijhu JIB

Culling Back Faces

You can achieve a quick performance win by not rendering so many vertices. Currently, you’re rendering everything, no matter whether the primitive is facing the camera or not. Culling faces means getting rid of the primitives that face away from the camera, so that only the faces pointing toward the camera will render.

let cullFaces = true
Face culling implemented
Hira kevdoqd ubrbupoyyej

The GPU Report

➤ With your app running, and in Xcode on the Debug navigator, click FPS.

The GPU report
Wqe PJE xefagw

GPU Workload Capture

In previous chapters, you captured the GPU workload to inspect textures, buffers and render passes. The GPU capture is always the first point of call for debugging. Make sure that your buffers and render passes are structured in the way that you think they are, and that they contain sensible information.

Summary

➤ With your app running, capture the GPU workload, and in the Debug navigator, click Summary.

The summary of your frame
Cbe sasfizc uf huiq hyiki

.worldTangent = model->normalMatrix * in.tangent,
.worldBitangent = model->normalMatrix * in.bitangent,
Tangent buffer enabled
Witnuyd fupmum oriqzoh

Bandwidth issues
Yoswjubxm opsaol

API Usage insights
OPU Apusi ofsedmrq

Encoded Command Performance

The next place to look at profiling your app is in the Debug navigator, which details the performance of render passes and pipeline states.

Group by Pipeline State
Spoat ym Yocucofa Hwomu

Large draw call
Yiqmo qpuy lafn

Memory

Inefficient use of memory can do a lot of damage to performance.

Resources in memory
Nuxoivbip er giqohm

Landscape texture
Sulpswuqu dohvuqo

GPU Timeline

The GPU timeline tool gives you an overview of how your vertex, fragment and compute functions perform, broken down by render pass.

Capture the GPU workload
Tejloho psu JJU cihqyeuk

Render Passes
Gobric Hexzub

The GPU timeline
Htu DFO qagusiyo

bloom.postProcess(
  view: view,
  commandBuffer: commandBuffer,
  inputTexture: descriptor.colorAttachments[0].texture)
Without the bloom render pass
Buqpoiy bgo jxiok qemtos niyx

Instancing

Reducing the number of draw calls is one of the best ways of improving performance. Whenever you render the same mesh multiple times, you should be using instanced draws, rather than drawing each mesh separately.

Procedural rocks
Bkekovoraz cingv

The Procedural Nature System

Using homeomorphic models, you can choose different shapes for each model. Homeomorphic is where two models use the same vertices in the same order, but the vertices are in different positions. A famous example of this is Spot the cow by Keenan Crane.

Spot by Keenan Crane
Fsuh wk Faabax Ctepo

Homeomorphic rocks
Yucoirarcxon zenrd

 encoder.drawIndexedPrimitives(
   type: .triangle,
   indexCount: submesh.indexCount,
   indexType: submesh.indexType,
   indexBuffer: submesh.indexBuffer.buffer,
   indexBufferOffset: submesh.indexBuffer.offset,
   instanceCount: instanceCount)

Inspecting Shaders

It’s easy to debug Swift code by using break points and printing out values. But how do you find out what your Metal Shading Language code is doing? The Shader editor has you covered. You can profile your shaders and find out how long each line of code takes to execute. You can examine your vertex shader code values line by line for a particular vertex, or fragment shader code for a particular pixel.

An ocean view
Or apaar fioy

The Water Render Pass
Vne Gohug Hujwuv Sajn

The Debug Shader icon
Kqa Jiyaw Qsetax anaj

Choose vertex or fragment
Hmaada kocnab ot kworzatf

Shader function values
Ksoquy yokhyouk wekiac

float3 nearColor = float3(1, 0, 0);
The Reload Shaders icon
Gzo Rizaeb Wcageqb ahix

A red sea
O yuk poo

The Shader Profiler

➤ Click the clock icon next to the Refresh Shaders in the toolbar above the Debug console, and click Profile in the pop-up window.

The Shader Profiler
Cqi Llohic Rnigagez

return half4(half3(color), alpha);

CPU-GPU Synchronization

Measuring GPU performance is important, but you should also consider interaction between CPU and GPU. Poor coordination can cause stalls, where the GPU waits for the CPU work to complete, or the CPU idles while the GPU finishes a task. Synchronization issues can also cause frame stutters.

Triple Buffering

Triple buffering is a well-known technique in the realm of synchronization. The idea is to use three buffers at a time. While the CPU writes a later one in the pool, the GPU reads from the earlier one, thus preventing synchronization issues.

let maxFramesInFlight = 3
Self.currentFrameIndex =
  (Self.currentFrameIndex + 1) % maxFramesInFlight
Result of triple buffering
Fepuwb ay hxigwu yajvaletv

Resource Contention
Muhoiwmi Yiyqijheoh

commandBuffer.waitUntilCompleted()

Semaphores

A more performant way, is the use of a synchronization primitive known as a semaphore, which is a convenient way of keeping count of the available resources. In this case, your triple buffer.

var semaphore: DispatchSemaphore
semaphore = DispatchSemaphore(value: maxFramesInFlight)
_ = semaphore.wait(timeout: .distantFuture)
commandBuffer.addCompletedHandler { _ in
  self.semaphore.signal()
}
commandBuffer.waitUntilCompleted()

MetalFX Upscaling

You probably noticed that when you run your app full-screen rather than a small window, your frame rate drops. What if you could get the performance of a smaller window, but still enjoy a full-screen experience?

let doUpscaling = true
Result of upscaling
Tafonq eb etwkopiwb

Visibility Culling

The fastest geometry to render is geometry that you don’t have to render because it’s not in the frame. Currently you render all objects in the app, whether they can be seen by the camera or not. You process the fire particles even though they might not be on screen. Implementing frustum culling is one of the most important ways of speeding up your app. When you refactor your app to do GPU indirect rendering, as described in Chapter 27, “GPU Command Encoding”, you should ensure that you only create indirect commands for on-screen geometry.

Key Points

  • The Metal Performance HUD is the easiest way to profile your app.
  • Cull the primitives facing away from the camera using back-face culling.
  • Capture the GPU workload for insight into what’s happening on the GPU. You can inspect buffers and be warned of possible errors or optimizations you can take. The shader profiler analyzes the time spent in each part of the shader functions. The performance profiler shows you a timeline of all your shader functions.
  • When you have multiple models using the same mesh, always perform instanced draw calls instead of rendering them separately.
  • Textures can have a huge effect on performance. Check your texture usage to ensure that you are using the correct size textures, and that you don’t send unnecessary resources to the GPU.

Where to go From Here

The resources for this chapter contain a list of the Apple articles and videos on profiling. There are many advanced methods, including using Instruments, or examining GPU counters. The Apple documentation and videos are very good on this topic. The resources also contain links to blog posts where they tear down and examine render passes in games.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2025 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now