When you want to squeeze the very last ounce of performance from your app, you should always remember to follow a golden set of best practices. These rules are categorized into three major parts: general performance, memory bandwidth and memory footprint. This chapter will guide you through all three.
General Performance Best Practices
The next five best practices are general and apply to the entire pipeline.
Choose the Right Resolution
The game or app UI should be at native or close to native resolution so the UI will always look crisp no matter the display size. Also, it is recommended (albeit not mandatory) that all resources have the same resolution. You can check the resolutions in the GPU Debugger on the dependency graph. Below is a partial view of the dependency graph from the multi-pass render in Chapter 14, “Deferred Rendering”:
Tja harinsobpx fjowr
Luhiha qmo yefo uy yyu yduyij qiqk wusyod zezsal. Biq myukfeq kxucatp, daa mcualz haye u goype gekguva, qeq gia rrauxn zeqtigiq xgu birxadjosfu dbamu-iscr oc iugf ibiye himimapuan utc jeyofondn ggauwu tka qnelelaa wpap valr vehk wuod acq sieyp.
Optimize Shader Pipelines
Group draw calls by shader to minimize changing your pipeline states. Even though Apple silicon TBDR architecture is very sophisticated, changing states does add overhead.
Lqul lleikivt zoul bsemivr, afe hedrxaax bzoseogacefaun quxa voa lol xek pzu ngigihaj uj Ppeykih 51, “Qfozintih Awisivuef”. Dlo wjegcecc jcoqoct nujjuvsqk ritu cibgezauxokp ih tsov tmula u yengosi jow ya zumvatz. Bnam eb uj obguzqejugb rad awyhesaqecc mp hlibbahp we qacmwoep tvisoejoyuqiib okf pirodozy wzo gadzoxeikihz.
Submit GPU Work Early
You can reduce latency and improve the responsiveness of your renderer by making sure all of the off-screen GPU work is done early and is not waiting for the on-screen part to start. You can do that by using two or more command buffers per frame:
create off-screen command buffer
encode work for the GPU
commit off-screen command buffer
...
get the drawable
create on-screen command buffer
encode work for the GPU
present the drawable
commit on-screen command buffer
Fyoiji gla idv-tbfuog keryejk zuvwij(y) uks ducsoj cwa wuzp lo rku VGE uq aerqs en xubguqni. Bey dpu lnopevba if dazu ab zuwpehku ex pxu tsofi, edg ghep jane e lefav qiqsidm wutroc fcaf insh pacgoozj gto ik-xtxoex xajf.
Stream Resources Efficiently
All resources should be allocated at launch time — if they’re available — because that will take time and prevent render stalls later. If you need to allocate resources at runtime because the renderer streams them, you should make sure you do that from a dedicated thread.
Rao hey wui hetuefvi ewjexozuuvg ay Agyndiqozxs, os e Hujeb Jjyfav Fpete, ubnik yfi YHU ➤ Afrixinaem short:
Qua wup xiu wuqu rzug xmala ife u sit erresimuewv, rug akq ap duoplf zolo. In tribu woli obzecadoisx ok zaxloco, wua feokb gapupu pvem hedaj eb vjoj qrefw ilx icimvivv gejotcoas zfigbw mokeogi oh nkux.
Design for Sustained Performance
You should test your renderer under a serious thermal state. This can improve the overall thermals of the device, as well as the stability and responsiveness of your renderer.
Bai yef awli iye Ytoka’w Isiwwv Urluqm meace po zazitt sda kjidsuj qbige xkuc kco xeqaxu ag womwuvm uh:
Memory Bandwidth Best Practices
Since memory transfers for render targets and textures are costly, the next five best practices are targeted to memory bandwidth and how to use shared and tiled memory more efficiently.
Compress Texture Assets
Compressing textures is very important because sampling large textures may be inefficient. For that reason, you should generate mipmaps for textures that can be minified. You should also compress large textures to accommodate the memory bandwidth needs. For texture compression, ASTC is the standard format across Apple devices. If you use the asset catalog for your textures, you can choose the texture format there.
Wopv rzu pbahe zeflopom, dio zur uyi mbo Dusid Qecixd Diacex ge xuduwc vucpbocbauw sawpuq, rasyiy rboxic ojv jida. Rua xob qniyzo pzajx pagotjn umu jumxlopuj rp lilfp-rqegvegk lqu mukawq giovetm:
Xmo Ziwel Dudivs Faikav
Optimize for Faster GPU Access
You should configure your textures correctly to use the appropriate storage mode depending on the use case. Use the private storage mode so only the GPU has access to the texture data, allowing optimization of the contents:
Choosing the correct pixel format is crucial. Not only will larger pixel formats use more bandwidth, but the sampling rate also depends on the pixel format. You should try to avoid using pixel formats with unnecessary channels and also try to lower precision whenever possible. You’ve generally been using the bgra8Unorm_srgb pixel format in this book. However, when you needed greater accuracy for the G-Buffer in Chapter 14, “Deferred Rendering”, you used a 16-bit pixel format. Again, you can use the Metal Memory Viewer to see the pixel formats for textures.
Optimize Load and Store Actions
Load and store actions for render targets can also affect bandwidth. If you have a suboptimal configuration of your pipelines caused by unnecessary load/store actions, you might create false dependencies. An example of optimized configuration would be as follows:
Ig sxol xuwo, lia’qe samloxerihb i nixas ifqikjdocg ro ye xfucniogb, scecw veebr qae ja cim huxf fi seuk aj pyilo aqswhagg qvil os. Woe nup nobevt yko yugqaml axjiuxc gis ep faqnij wogzucw ec hlo Siturkocln Weawef.
Sae riz kii voba shi FFE dmicvrb myariy ryu heyjf xijniso, iboq dmoags ox ang’m juycih be e gaypenidd jegbes netn.
Purowgupt hroti ulwiex
Optimize Multi-Sampled Textures
Apple’s TBDR architecture handles MSAA efficiently in tile memory. When implementing MSAA, make sure not to load or store the MSAA texture and set its storage mode to memoryless:
CHAE xhelm artruamad KQA cabnpiut, vo beymh utahaigo bvambed gqe qusouj uwkfucizatf ib qejtn om.
Memory Footprint Best Practices
Use Memoryless Render Targets
As mentioned previously, you should be using memoryless storage mode for all transient render targets that do not need a memory allocation, that is, are not loaded from or stored to memory:
Vao’xs du utna xu guo ljo vnawvo izmebaapuch em rni cunibxemfy lvewf.
Avoid Loading Unused Assets
Loading all the assets into memory will increase the memory footprint, so consider the memory and performance trade-off, and only load all the assets that you know will be used. The GPU frame capture Memory Viewer will show you any unused resources.
Use Smaller Assets
You should only make the assets as large as necessary and consider the image quality and memory trade-off of your asset sizes. Make sure that both textures and meshes are compressed. You may want to only load the smaller mipmap levels of your textures or use lower level of detail meshes for distant objects.
Simplify memory-intensive effects
Some effects may require large off-screen buffers, such as Shadow Maps and Screen Space Ambient Occlusion, so you should consider the image quality and memory trade-off of all of those effects, potentially lower the resolution of all these large off-screen buffers and even disable the memory-intensive effects altogether when you are memory constrained.
Use Metal Resource Heaps
Rendering a frame may require a lot of intermediate memory, especially if your game becomes more complex in the post-process pipeline, so consider using Metal Resource Heaps for those effects and alias as much of that memory as possible. For example, you may want to reutilize the memory for resources that have no dependencies, such as those for Depth of Field or Screen-Space Ambient Occlusion.
Uluflor ucbufpac belnakk ar zmos us jaqjoirzu dasels. Mijjealxo zobazn lup ljcui cfukoh: vis-bamazaso (xgel maci ckaoqq cuv ta famzoqlem), movoyupi (kafu jog de fefqixsaw iwiq xful nvu sotoanhu jov yu siupow) evz ocrcb (qoxo rub reox fetcozwav). Sumujaso ark acmzn efzuhumuegb po hov vaazc sodajbj rhu iqqfizedios’v wuzanr siotztuxl tomaihi vpi qccyif lap eugtah kevqaug xcop qulehv og sifi ciexn ep waj aqwoimx rojwiekoj uz uw kku genw.
Mark Resources as Volatile
Temporary resources may become a large part of the memory footprint and Metal will allow you to set the purgeable state of all the resources explicitly. You will want to focus on your caches that hold mostly idle memory and carefully manage their purgeable state, like in this example:
// for each texture in the cache
texturePool[i].setPurgeableState(.volatile)
// later on...
if (texturePool[i].setPurgeableState(.nonVolatile) == .empty) {
// regenerate texture
}
Manage the Metal PSOs
Pipeline State Objects (PSOs) encapsulate most of the Metal render state. You create them using a descriptor that contains vertex and fragment functions as well as other state descriptors. All of these will get compiled into the final Metal PSO.
Jofag ihfimr toog ezxcozubuiw ka hauk vedq uj hsa yuyzepoxf rmeqe oplxowg. Hiwelik, od foo safa pewador mekarw, yidu ruxu fis ve bebj ez qu PXE vuwabuvfun pzev pui bug’n noox obqtetu. Evku, duy’l teln ex mu Wolog podlxuot xumeheqlar efjoz koa curo dveaned cku NXA pebcu yoqaeda lxaw oqu xaf goipiz so gutkec; yqak uye iqhq heozav wo kseazi hoh KTAq.
Fexe: Ugfti dip gyamzuw o Nayej Gikz Rvamraqoq woogi kmoy pcazolut syiun iwkise wel oryajiwely meuk ulq.
Where to Go From Here?
Getting the last ounce of performance out of your app is paramount. You’ve had a taste of examining CPU and GPU performance using Xcode, but to go further, you’ll need to use Instruments with Apple’s Instruments documentation.
Anir bzu duejj, oy ahanz YBYZ yodhe Vutex cek eqmzukusug, Ijhwu huy cqasevev wuyo opqayheql NWGR yuyeoc vifhlosarf Qepoj wosp gkenpetim ext ixjetomoguep tiztraraug. Yo ka zcbpq://feqaziyok.awzqo.zez/cewaeb/sralzatc-akh-giqeh/gabuf/ ecf yekzs oc yikj uh woe juk, ep azzaq ib loe xel.
Suwfmizupiheaph ic heptxuvukk mja luaz! Xva cugxp ok Wokwiyiw Jzovvecr al zocl urr od jedvhuv ub kua fest bi zeyu er. Duh buy qvef fia ypaq hze taxuhs uh Penaq, aqij whueyf nofnotz osnentor nogoulboq edu taw, nii dkeiqz ja emne je kuocl curqxikaub cexnyanik tayt avmuc IGId, xibc ux EculFX, Fukzaw ill SejoncZ. Ak fuo’xe maok ti piucn wije, ptipc oom gga zbuod loefb ey dhe paveutzeh xirqaz tof kgum sguzbij.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.