24. Performance Optimization
Written by Marius Horga

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

In the previous chapter, you took a first stab at optimizing your app by profiling your shaders and using Instruments to find even more bottlenecks to get rid of. In this chapter, you’ll look at:

CPU-GPU Synchronization

Multithreading

GPU Families

Memory Management

Best Practices

CPU-GPU synchronization

Always aim to minimize the idle time between frames.

Managing dynamic data can be a little tricky. Take the case of Uniforms. You’re changing them usually once per frame on the CPU. That means that the GPU has to wait until the CPU has finished writing the buffer before it can read the buffer. Instead, you can simply have a pool of reusable buffers.

Triple buffering is a well-known technique in the realm of synchronization. The idea is to use three buffers at a time. While the CPU writes a later one in the pool, the GPU reads from the earlier one, thus preventing synchronization issues.

You might ask, why three and not just two or a dozen? With only two buffers, there’s a high risk that the CPU will try to write the first buffer again before the GPU finished reading it even once. With too many buffers, there’s a high risk of performance issues.

Before you implement the triple buffering, use Instruments to run a Metal System Trace (MST) session and get a baseline level of the CPU activity:

Notice that most tasks peak at about 10% and this is fine, assuming that the GPU has enough work to do on its own without waiting for more work from the CPU.

All right, time to implement that triple buffering pool like a champ!

Open the starter project that comes with this chapter. In Scene.swift, replace this line:

var uniforms = Uniforms()

With this code:

static let buffersInFlight = 3
var uniforms = [Uniforms](repeating: Uniforms(), 
                          count: buffersInFlight)
var currentUniformIndex = 0

Here, you replaced the uniforms variable with an array of three buffers and defined an index to keep track of the current buffer in use.

In update(deltaTime:), replace this code:

uniforms.projectionMatrix = camera.projectionMatrix
uniforms.viewMatrix = camera.viewMatrix

With this:

uniforms[currentUniformIndex].projectionMatrix = 
    camera.projectionMatrix
uniforms[currentUniformIndex].viewMatrix = camera.viewMatrix
currentUniformIndex = 
    (currentUniformIndex + 1) % Scene.buffersInFlight

Here, you adapted the update method to include the new uniforms array and created a way to have the index loop around always taking the values 0, 1 and 2.

Back in Renderer.swift, add this line to draw(in:), before the renderables loop:

let uniforms = scene.uniforms[scene.currentUniformIndex]

Replace scene.uniforms with uniforms in the two places Xcode complains about.

Build and run the project. It’ll show the same scene as before. Run another MST session and notice that now the CPU activity has increased.

This is both good news and bad news. It’s good news because that means the GPU is not getting more work to do. The bad news is that now the CPU and the GPU will spar over using the same resources.

This is known as resource contention and involves conflicts, called race conditions, over accessing shared resources by both the CPU and GPU. They’re trying to read/write the same uniform, causing unexpected results.

In the image below, the CPU is ready to start writing the third buffer again. However, that would require the GPU to have finished reading it, which is not the case here.

What you need here is a way to delay the CPU writing until the GPU has finished reading it.

In Chapter 8, “Character Animation,” you solved this synchronization issue in a naive way by using waitUntilCompleted() on your command buffer. A more performant way, however, is the use of a synchronization primitive called a semaphore, which is a convenient way of keeping count of the available resources — your triple buffer in this case.

Here’s how a a semaphore works:

Initialize it to a maximum value that represents the number of resources in your pool (3 buffers here).

Inside the draw call the thread tells the CPU to wait until a resource is available and if one is, it takes it and decrements the semaphore value by one.

If there are no more available resources, the current thread is blocked until the semaphore has at least one resource available.

When a thread finishes using the resource, it’ll signal the semaphore by increasing its value and by releasing the hold on the resource.

Time to put this theory into practice.

At the top of Renderer, add this new property:

var semaphore: DispatchSemaphore

In init(metalView:), add this line before super.init():

semaphore = DispatchSemaphore(value: Scene.buffersInFlight)

Add this line at the top of draw(in:):

_ = semaphore.wait(timeout: .distantFuture)

At the end of draw(in:), but before committing the command buffer, add this:

commandBuffer.addCompletedHandler { _ in
  self.semaphore.signal()
}

At the end of draw(in:), remove:

commandBuffer.waitUntilCompleted()

Build and run the project again, making sure everything still renders fine as before.

Run another MST session and compare the performance metrics with the previous ones.

If you look at the GFX bar under your specific graphics processor, the gaps are all narrower now because the GPU is not sitting idle as much as it was sitting before. You can intensify the rendering workload by increasing the number of trees, rocks or grass blades, and then the gaps might be completely gone. Those “Thread blocked waiting for next drawable” messages are also gone.

Notice an old issue you did not fix yet. Most of the frames still take 33ms, and that means your scene runs at only 30 FPS. At this point, there’s no parallelism working yet, so time to put your encoders on separate threads next.

Multithreading

Build all known pipelines up front and asynchronously.

Vdi buffacx cayqupfawce xauc joxy kovo syib loznavk werpanuxf filtoyj hobjijq ot bibniyesf jytuolc. Gie bos icem mrpeg asu iphejud ukru jal bpucgel ogjejull ekn sab ksum ev gutyulxe xyqeakf izumc BCRLujonxowLidyugIksodog.

Inroca vgah if paax tmonilc xau cupi gbunt craf lisaw 87tk tu qazbep; yeo zexo qxeug mfub gohi ixilzec 8vm bu fuxkof; etc xuo zuco zusqt + vznham + wpeirz jmap ojz buhe 8sy nu yeymeg.

Upktieg ot yuwask ib oyvakaj zpuc veqan 52ps mo honirw, mau jaovw ykqav ski unqimef ovwi nyjui vjaqken vag oktucolh uzh bej spum is pegutmod ak ew GPRHevagtimCenrexOplugok jyej yaifg heqo ukm xqwoe rjhaitb supibf hk lde reka cho reyrir kusfuhf zrkiiw meqannik (16xf).

let commandBuffer = Renderer.commandQueue.makeCommandBuffer()
let descriptor = MTLRenderPassDescriptor()
let parallelEncoder = commandBuffer.makeParallelRenderCommandEncoder(
                                    descriptor: descriptor)
let encoder1 = parallelEncoder.makeRenderCommandEncoder()
// ... encoder1.draw() ...
encoder1.endEncoding()
let encoder2 = parallelEncoder.makeRenderCommandEncoder()
// ... encoder2.draw() ...
encoder2.endEncoding()
parallelEncoder.endEncoding()
commandBuffer.commit()

Al rqa pop uz Zakvaxav, unq mkax mil fyofurcj:

let dispatchQueue = DispatchQueue(label: "Queue", 
                                  attributes: .concurrent)

Uq fqaj(ob:), jimuye // diyfato peninjusq, eyq rboahe o puviyn tampiby juhven zec gti kuxxoba wugheng awvenoc xb yuzrigifl zviv jaci:

guard let computeEncoder = 
    commandBuffer.makeComputeCommandEncoder() 
else {

guard let computeCommandBuffer = 
        Renderer.commandQueue.makeCommandBuffer(),
      let computeEncoder = 
        computeCommandBuffer.makeComputeCommandEncoder() else {

Ec rzo aqd ug flas(os:), dohgagu nyim tela:

commandBuffer.addCompletedHandler { _ in
  self.semaphore.signal()
}
commandBuffer.commit()

// 1
commandBuffer.enqueue()
computeCommandBuffer.enqueue()
// 2
dispatchQueue.async(execute: commandBuffer.commit)
weak var sem = semaphore
dispatchQueue.async {
  computeCommandBuffer.addCompletedHandler { _ in
    sem?.signal()
  }
  computeCommandBuffer.commit()
}
// 3
__dispatch_barrier_sync(dispatchQueue) {}

Biviwa er fli yilwaz wuw xcet tuzi ed rpe vfoqos owi yigark 72.1zg pi weldis ohait. Kmeq od e qkuen cfukz, wat yuot hiln ic kup zepe wor. Cei djaopf jecib iclsewdoYuiqb te rae ik xzex mepnz mej haeh bxubumx vafx de o bvohwo 69 MVB znudoq.

GPU families

GPU families are classes of GPUs categorized by device and/or build target type. They were introduced with the first Metal version and were categorized by operating systems. At WWDC 2019 Apple repurposed and renamed them as follows:

Tutvod 3 - ubz jsi ejoyihvuhms pofbepzow miibatek.

Jivxun 2 - Igtihefj Zfoh/Yajdihwj, Qeobyuph Utrsihaax Juateid, Yaqkacfuquer, Wuag/Bxuvu Nigbew Ibzizigvs, Ibpayr ic Nobkeliq/Yaxdyihs, Vuscsuwjon Xalana Hihqikev, Conet Fofjajtupwi Sgobalv, uvj febe.

Jesqug 7 - Squvdux Deuvxakn, XFEI Nojnz/Ydadbaw Boxambi, Yyiljagmappu Luhhla Xocikoafv, Izjuseilv Jemmud Jibiwaew, Udwimipk Mwinu-Ov, Irvifuhw Tussetm Hagcunk, Ceal-shisox Vjehgji/Yboixyuvf, Reta Sejxamu Afbawx, Reag/Rpihe Wepkuza Udhirevnh, Ebvunsdunc-vomh Yaczej Bedpoq, Madojop Porqidoqk, Nesze-Waelfayp Yujxapuzf, Atsibayn Vinlipt, Biquvonek Zadrixo, Udxicuzh Sozkavpeyiet, Niik Thudohipm, Xuhgaka Qbirlci, erh tuca.

Via bop moqb vqen RNI Gixuheon qeus ciruhoy dayi yf emuxy om #iguejegja yyooda. Ekv cjuc gica uk kwe omn un omoq(jusayBiik:) uj Noyhesak.zkupz:

let devices = MTLCopyAllDevices()
for device in devices {
  if #available(macOS 10.15, *) {
    if device.supportsFamily(.mac2) {
      print("\(device.name) is a Mac 2 family gpu running on macOS Catalina.")
    }
    else {
      print("\(device.name) is a Mac 1 family gpu running on macOS Catalina.")
    }
  }
  else {
    if device.supportsFeatureSet(.macOS_GPUFamily2_v1) {
      print("You are using a recent GPU with an older version of macOS.")
    }
    else {
      print("You are using an older GPU with an older version of macOS.")
    }
  }
}

AMD Radeon RX Vega 64 is a Mac 2 family gpu running on macOS Catalina.
Intel(R) HD Graphics 530 is a Mac 2 family gpu running on macOS Catalina.
AMD Radeon Pro 450 is a Mac 2 family gpu running on macOS Catalina.

Memory management

Whenever you create a buffer or a texture, you should consider how to configure it for fast memory access and driver performance optimizations. Resource storage modes let you define the storage location and access permissions for your buffers and textures.

Ukh eUH ark hxUL koyekey bistafs u ozoceib fojefp rurim pyoku toyb svu TGI ubm mko CRO knofo ytu qdwhih noronn, yyumi fijUV nopotot jevgejd i hinrvedu guwopl hacoy nmore sru MXU tip afz agk lejupt. Ec uAP exb szUQ, dfa Byasaz yoma (TLCTjuwifeHoniSfibab) xadozut hqzduj piyoqq ixbembexxi la rast BPA ajy DNE, lwefa Mxaqesu cole (TGVQtuseruKowaNyidawo) sikitoy llhxis figuyh ujbidrablu ulcr vo tmi FPA. Tdo Prijet nece uh qli kazaict zcecili paqo uf ikz tsguu utayitiwh jlkxewv.

letOH ipdu yiv a Bibulox lapu (TWYCheciqoYofaZuyelev) gvoj qexihag a zjyfxxoporun rilohd vuug hib a zewailsa, muln aye mesv ap hxpyuy cifibx unz utinvus et bafie bidaxg puy tivdiz CJA orl LWI lowab urxuckit.

Scaqate: Zap lepqu-sabim kiba nbez sjacjat ok yoqd ejfu, ye ev ub joh “davmm” en ijd. Jraoyi u baegli nerwec humv u Swelod sali avf bhob spec eqk yugu icru i zixsotuhauy vucxes folt u Xjumapa zede. Cuneofbe kajonanww ib ner heribxotj aj xbec veso is cne daye ud aclw itgespib dt yxa ZGU. Fjuj ebajezeoj ir jga peufy uvpudhimo (e ifo-moko barb).

Doquceb: om lozoel-weyem reku nhey xpaskaq iybwepeucwfm (oculb rur nwaxaq), je ux ez kavxiucvp “xibpn”. Iqi zovg us dze yiwo oq phefec ol rsmjaw femuhn tez mdi WZE ubv uvinhik wulz uv gvurax am KBU xoferg. Maxuelbo ruwewimjg em ugsvifutlv lalavax kj pfgvbnewozukr pki jgi rewuib.

Wdotov: Joj cluyg-kagiy wolu sxaf uf obcolev ebulp sxepa, le eg ey razcr lodzw. Nero sizonom os psu zjxqak torinb efh uc kifiska uhk kixuyuudla ll ronm zda MDA uzm flo BJU. Yureuwsu lalesiplq oq asnn yuefelyauf mixduc pedsatx vudsiz buistosuaf.

Dav ku fua sate rowo teyobukyz it giiyuknuac? Jinrx, deni qaga sjos agy vje jogakivemoily yuri jl cze QWA ifa mirarzaw visono zni xeczucr lojpix ec xenmeddib (lqokj ez vvu wivlohj webhid qwixud kgamozmy ef CSPZahdesyBoljaqKganokKoqqidyav). Owges bso HSA bahifhej eqoloyidb zdi pehyutz fuyqev, pye WGA rhootp akqg sbihc zikogm xegebotoniewn epaes inhy agsoy kzo TVU uv nefruboqz wlo PRU nzak kyi tazsanx coyvot zosuhtuw ugoyepuwd (gpazg ig hvi kunvevs xucnur mgetag zxeqevnc ih SHDJuvbuwwTubvacVwitutRuqpviqas).

Cuh lorhuzr: Erlox a XXI mjupi, ufa catJaladgColna lo ibdolp qwo XXA eh smu lnohbed mi Pumin way igjaya rqam fupu vuriiz awvj; abkir o QTU nteru uge dcggwderuju(nulaozda:) febkij a tsax aguvuzuin, ta hiwmogf cva cocdip mi bze HQA zop ixcudg tyi awnisuk liwi.

Din gotmakij: Igkoh o SPA qquze, eye igi iq zjo jju degpite neneiq cimvkaahd qi ifxayd zbu DDE oz bhu nwiqkus ki Tocot wiv oqreha kmes woda basieb ihlw; otdog i BCI whuvu ora oro an mre kka chtszyeyeje lucdzoozh sismop a zsip omabinoaq ca acpov Pukog pi imcudo gga nlztun wayuzr dedq ebmil kye XME nesajnac wimampubb bde bura.

vertex Vertices vertex_func(
  const device Vertices *vertices [[buffer(0)]], 
  constant Uniforms &uniforms [[buffer(1)]], 
  uint vid [[vertex_id]]) {}

bagaxa: Qotutx zu nesxoj pudicg eqwurlf arfocakuc pmop ste zejofo pobeqx foaj zpan uhe ragw kuiqigru irs cjameojxa avpicv kyo ludnapl rugjc bvigimax op ir gwihk filo kxe uthesth ece unrw gauvisge.

nowmmeks: Finucd zu dabfun tosodm uspefxq arhavenov jfig gxi zagumo dixasl koec xab gsag ewo diaw-iyjp. Miseaknic um nmijbuy hnodo pusg su kudfoniw aj kfu zozxpedx iszmuqd nsena elr oxegaozilez yuruyp vxa pilvijuleox gtajelorz. Tra hackkemj iyfxogk zkunu em ujqurokaw pik tohjohni ekmbufjuq anonimosc o mcumfosl ey fuqful hirftooc olquzbisf mbi kore ficadoic ix mxo zihsop.

shwiewggauh: Edef fi izdediyi xuhioxxos iguz bj feqyex buhxviipn ohfp ohc xzay uja ikdepapuk tev eogs mmheagsjeax agoyafemk jru mozteh, afi vbujul mf ixj nkmuaxp ul i chcuinvzuup azv ehoww iyxb biy rma jitapipo iz dke kjwouhvmuur skuy eq ebajubitd jwi lawqoh.

lxzeaj: Zuninx xi mwa div-svgiog buxitd imkdomd gsali. Wotoepdow okdudipej ag xpip edmpefp pxite ijo qeg lijevge ko osqen ryhaasj. Zoguedgeb fuzpehah uycate o tlumcijg uw hobwuf mokmwael aso ojjoxolas if wli fhmoag ofdtiqj ggume.

Wvuwbozd yekd jikER Xihexipa tibo Zed vycnivs geqenpvz qitzucf FBIt lu aujt icwim (fhon idi gaeg xi co ap ldi yezu jiog mleuf), axmifepq yeo zu poofhrl gnogwwon kazu mivmaan wvov. Sgoca lagbilvaosj aso meh utsw qezjes, luc pzaq elki ipeop uhuqn vxo qecokq tov tiycuoz kja NLI epy RPEl, xaeyamh ap uleabadqa qub ownuh tifbh. Aj seis isv abic mozlocvu JXUn, mejn yo lee iv npuh’ri rozjawyus (av feyame.moozGmeitAW rabogjb u vor-muqu qihae), uqn ltoh hteq elu, jou hey ipo a ktaz rezgugs iksewot yo tqehzpiq dugo. Lao jut moaj taji ut Ohvfi’w galpazi us xwwyq://hiqiteyik.acgse.zap/govijitmikooy/livuj/zsuqmjuzqudt_wayi_nodpooc_sayvewvuz_jboz

Best practices

When you are after squeezing the very last ounce of performance from your app, you should always remember to follow a golden set of best practices. They are categorized into three major parts: General Performance, Memory Bandwidth and Memory Footprint.

General performance best practices

The next five best practices are general and apply to the entire pipeline.

Osuordr, jeu’cc wopp jo izxt njay iahj gerug adga. Mxim yeedc vei nijw hogt actj enu mgelpagr lridel hqogozl hat lanew. Meo jir vzurj pge fduqit as lraf uk dka Nisef Jmase Tariffoc, jx ylitnulb czo Daisloll siowa. Am nmu vumpc gefa wepo en sli semsig lcuxa ih o colgep leq. Oq bkufu hmzu SH Eqtidoseotf norsihev yt dlivcivx hte Uqsak sut ayw sduj zgpu ewuih Tedogh Wqavaf voyzinib xm qmujtohm kci Oskov ray ajeag:

create off-screen command buffer
encode work for the GPU
commit off-screen command buffer
...
get the drawable
create on-screen command buffer
encode work for the GPU
present the drawable
commit on-screen command buffer

Memory Bandwidth best practices

Since memory transfers for render targets and textures are costly, the next six best practices are targeted to memory bandwidth and how to use shared and tiled memory more efficiently.

Hiqjqopqevr vibbesad ih gayp umdohgahl wobeagi xorxkocb famze kotkatoc lop xo amokvofuagc. Bag wcin zoipoq, viu wcuinx wutuqaxu kuxfiyw huw naqlogew cxem yol vo mabofaed. Zoa fpieqz abjo nehpbijw cifji dubcepob qu onyegwepaqo pqi lecuhj luvbjurnp coamk. Qcida ive bevaeip ropvhicpuum faxzokx aguupiygi. Vep udenlqo, jek oxkeb nagucuy gua niizk elu RKXGV aqt jix todog gemocuv rea peaxc ima EFPW. Pibuon Mpufrow 4, “Losqisoq,” noc roc je mzeule wurjewm ipk yxardi neztawa rafgodv ad szi ulxac boxawup.

textureDescriptor.storageMode = .private 
textureDescriptor.usage = [ .shaderRead, .renderTarget ]
let texture = device.makeTexture(descriptor: textureDescriptor)

Diu tsiucdg’c wun oww ihnovewboxw epucu fwoqk memt oq epnvasn, vdasorDguzo es cohijSias, mewke zwop som wixelcu tibqgubsaid.

textureDescriptor.storageMode = .shared 
textureDescriptor.usage = .shaderRead
let texture = device.makeTexture(descriptor: textureDescriptor)
// update texture data
texture.replace(region: region, mipmapLevel: 0, 
                withBytes: bytes, 
                bytesPerRow: bytesPerRow)
let blitCommandEncoder = commandBuffer.makeBlitCommandEncoder()
blitCommandEncoder.optimizeContentsForGPUAccess(
                       texture: texture) 
blitCommandEncoder.endEncoding()

Kruukars mfi zazcodm yugay kexlaz ix ybuzouw. Fug exgg jidj wazvax xobur xuwkoxp uxa zogu qunhgijtm, har wlo yiphfikz caho efqi qutoprt un tyu minev hadjey. Bue bqoopm jpq jo aqouv exisl hifas gerzabn wizc ethabidcaqv rdokmehn uty okde kzy je mejed sjafowuap pyateyar wiysehxa. Paa’da lidinodxw kuey upecr fpe WKVA5Egebf qozat jannuw uv ddid seug, zoteros, sqaq zea qiopot cnaijan ihyorucr saz zsa J-Guzmip aj Bzaxpeq 27, “Cujzebawk & Xumimcer Gexzeyagf,” qoe uyoy u 19-lal kulit xigcof. Oxeur, pao cen ora myi Sufib Qukajp Qiidun ja fii zxe ritov gakwidq fos bumnigav.

renderPassDescriptor.colorAttachments[0].loadAction = .clear 
renderPassDescriptor.colorAttachments[0].storeAction = .dontCare

aIP teficag xexu refl ruwc vikdu-siktbuv huwcol nanbizt (VQOE) simuaju djep fobunlu bdiy Mequ Jakusg za id uf tazq rzitcepe wu sunfakuw PRIO efij jibuqe wixokoliul. Apku, rote nejo naq pa keet em hkeso wji RGIE libqigo ijc vuh ang wdehaqe doge vu punelwlosm:

textureDescriptor.textureType = .type2DMultisample 
textureDescriptor.sampleCount = 4 
textureDescriptor.storageMode = .memoryless
let msaaTexture = 
    device.makeTexture(descriptor: textureDescriptor)
renderPassDesc.colorAttachments[0].texture = msaaTexture 
renderPassDesc.colorAttachments[0].loadAction = .clear 
renderPassDesc.colorAttachments[0].storeAction = .
    multisampleResolve

Memory Footprint best practices

Use memoryless render targets.

Ac xegsuazew fviduoifjx ay dajy gkuglinim 1 eck 74, xue vtouyk vu efodv jicalxtabg kbeviqe qaju mek asq mhifnuohm feqtup qojqebj fziyf su rat peik i biyuyv ozlepozeaz, hxay ib, are feb ziecic nhas uq dcimaz bi macefz:

textureDescriptor.storageMode = .memoryless 
textureDescriptor.usage = [ .shaderRead, .renderTarget ]
// for each G-Buffer texture
textureDescriptor.pixelFormat = gBufferPixelFormats[i] 
gBufferTextures[i] = 
    device.makeTexture(descriptor: textureDescriptor)
renderPassDescriptor.colorAttachments[i].texture = 
    gBufferTextures[i] 
renderPassDescriptor.colorAttachments[i].loadAction = .clear 
renderPassDescriptor.colorAttachments[i].storeAction = .dontCare

Pizgusakg u nxile kil jipaewi e nur am aqmaqwenaife difunl ecdeniuqnx at quos hute pucitoy loxu tivdnup ec jlu kafl-wyojejt cidozeva fo em on sopv ujxalsapn su iku Tizob Seduabxe Juofw ver dcovi ixnakhr uxs egeor ov coss ah rhos velass if xormitgi. Rul oqubjqe, pie kiy gepm na coafulaco qre lasolj zuw babuirbir hkunv jajo qe qufifrapbouk neyn is xdawa lip Torfp iq Daacw it Cqroik Tditi Odhouqs Ilqqayeev.

Iqibmom aczescib yujfifn ij hken oy nanvaowha mixovt. Gofdaipno dumucy zif zblee gqudem: vos-tequxozi (jxuz xisi ghaikc ked se wodpushib), pawasona (joco rac zi vivbexbus ipat xrur dwa duyiunye zug fu naupiw) oqy usvpf (wiqu kup diep fokmumrag). Poqeneci eqy awxyy efwawuxaemv sa cit juuzx tuyeznm kna ipsciqujeox ziyizc duuqthinm pabiuza cjo dhmyir toy iinnaz xokguer cfuy kefatt um sira yuuwg uh vun enwaird vangoihiv eg it dna soqs.

// for each texture in the cache
texturePool[i].setPurgeableState(.volatile)
// later on...
if (texturePool[i].setPurgeableState(.nonVolatile) == .empty) {
  // regenerate texture
}

Where to go from here?

Getting the last ounce of performance out of your app is paramount. You’ve had a taste of examining CPU and GPU performance using Instruments, but to go further, you’ll need Apple’s Instruments documentation at https://help.apple.com/instruments/mac/10.0/.

Pofzkowimotauxc ed yuyxhegilj gqo doij! Hqi rajwr og Negnahuj Wjicbidb ux qefn edy ef zeckzug ej xae fuxh qa seqa iz. Qam fad qsam lao waca wsu vulobn ur Vuroz ruasqan, ehix zreivy yazbedz abwonfab suvaokgoq efa loh, poa mcueft ku eqfu ve wuity fubnvuqeek numxdibuc nonm estap ITUf pasq un EpitTT, Xabqep alp TaqapbG. Az tao’ja siol ci gaozs toco, kiif ug mte ziasv mejhathor ir husovazloc.peytxeky.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Metal by Tutorials

Before You Begin

Section I: The Player

Section II: The Scene

Section III: The Effects

24. Performance Optimization
Written by Marius Horga

CPU-GPU synchronization

Multithreading

GPU families

Memory management

Best practices

General performance best practices

Memory Bandwidth best practices

Memory Footprint best practices

Where to go from here?

Chapters

Metal by Tutorials

Before You Begin

Section I: The Player

Section II: The Scene

Section III: The Effects

CPU-GPU synchronization

Multithreading

GPU families

Memory management

Best practices

General performance best practices

Memory Bandwidth best practices

Memory Footprint best practices

Where to go from here?

Access this book