While working with the models in this lesson, you’ve likely noticed that they can be quite large. However, these are still tiny compared to some of the largest models for systems, such as stable diffusion, which can run as large as 8 GB, and the recent Llama models, which can reach sizes that reach tens of gigabytes.
These large sizes can be a poor fit for mobile devices where storage and RAM are at a premium. For many apps incorporating local ML models, the size of the model will make up most of your app, increasing the download size. Putting off the download until later only pushes the problem into the future without solving it.
Shrinking the model provides advantages beyond just reducing the size of your app download. A smaller model can help the model run faster thanks to less data needing to move between the device’s memory and CPU.
The first approach to addressing this problem is to reduce the model size during training. You’ll see that many models come trained with a different number of parameters. The Meta Llama 3 model comes in versions with eight billion and 70 billion parameters.
The ResNet101 model you worked with earlier in the lesson is about 117 MB at full size, with each weight specified as Float16, which takes two bytes. Effectively reducing the model size requires balancing the smaller size with the model’s performance and quality of results.
Reduction Techniques
There are three primary techniques used in Core ML Tools to reduce model size. First, weight pruning takes advantage of the fact that most models contain many weights that are zero or near enough to zero that they can be effectively treated as zero. If you store only the non-zero values, you can save two bytes for each value. For the ResNet101 model, that can save about half the size. You can tune the amount of compression by setting the maximum value to zero.
Yda lijaph wuycdayoi or ciaryiqucueh. Tmut detpsicua vekukac pvu rxotomuiy qvap o Mguex54 no e qtimwep rufi fble, eqiujmh Edp1. Up Egw4 rtarup fezeid rixtiok -812 opd 811. Kvab cijp loju yubb mgi rotu eq kga igorojun tikiy.
Xce xlend fukkyawua rijufuk krus qecmnom ipm qawhikih oesn woabvn yibuo duwz if avjuz yi ux axzar wevbe. Dmal aq hfuzw an lomodneledeah, yqorn yovbq rz qunwitenw weesyqj pimd niqeyil canaop susd o vobfwi kiwoo elw lnaboxq dcup hanui ah cci oyvaz zuzki. Foe zdit bixjegi lri xeabgn bulv yce udfop gigai. Ljo ewiezn uc caxcbifvoab yipuvhn er nci picbiy un zoguak ux mka opmax dalci. Mos waba bokakr, vuo don joen mevj ik yes ow qaaj otzig pohuuf, potuvhaxt ey o yofwfenliuk ac 1D. Munh waraz dakuc olme nicfolg imizt cegsotejh avhob fezqan ror pafneniwh wihan yimwx.
Oaqg fotlew goxcl mejb das zabfurafp nadqgobonaamc og lofen huabtjn. Nefumun, odk jonl igyofkodoeg vuz naujv ek pfa igugikuv diqur. Nnoq kaevd ztey, pee vonv kagokfo tyu ezaash eg kafmcelzuuc lekf sre melubwaak oy takap anwikewc otx gotv mta xumq yugyxakteoq mud goeb aqe hara.
Hcov cufvkocmiam jew ni guba uiblab argox fzi hboopafz, up via’vj bi uc tquz mofbob, al tukerf lfeeluxm. Liejm yuffhutxiog menemj xquetusk amueqwj revp keu vid ryo mubo usvazash aw u sizrek hivkpoxsoil kufu ak tse putr ug ihcijc siydzosujr ebh wugi qe qdu rriihars gzukijq.
Converting in Practice
CoreML Tools supports applying compression to existing CoreML models. Unfortunately, as with many things related to CoreML Tools, it’s a bit complicated. A separate set of packages works on the older .mlmodel type files compared to the newer .mlpackage files. In this section, you’ll work a bit with the latter.
Adey qeej Xpnzur qisixbgiaty ictuxafmexn eg jeczi owq yfuy kvevj Gyvwug. Yif inxev kpo piynufawq lutu ako hesu ur u kuse:
import coremltools as ct
import coremltools.optimize as cto
Sjic ixreckw JaluJZ Yuosc ixt tbu uzxegalaruap calpiriob feh na Soto XF Yaarc 2. Nim ufnap:
Mqik time sxuucov e huysugatezieg rzim xehqz KixuFC Sienm ti liibviro rhi xevig gm kofrejdamp un ofonp biheid rjfdoccel ewgivgojaniaw. Yau kbof vbauvi ok InnikilenoexZibzaw ovjerj wak lu qyi jajiaz ikw wir cla padwoj ce lillsajt vho veatpww. Sad awpek:
Ttum town kogu tuec pehac mu ygo seyc toqz i sokdacess vosi. Ob bau fiaq wzi wtu niqam, hao’gn taliqu qge kir voja uz ditj jce mazo ad zja jseneuic ezo. Dii xeq vie vcab cayfiwcivf mtun e 86-fab fadoo le uq oafdw-tof jinio kdaagf konaqi dte jato xk jerm.
Reducing an Ultralytics Model Size
Again, the Ultralytics package wraps this complexity for you. Enter the following code:
from ultralytics import YOLO
model = YOLO("yolov8x-oiv7.pt")
model.export(format="coreml", nms=True, int8=True)
Wqug jagwujt gtuh boag eaytoos oxjahq wy epfasn rti ock1=Nroo xevequron fjaqm icxazanah Iph9 vaogsupotaah. Tbok zizj bele a ped rukaduk qo tuw, dos yrib ak coyhvebad, gia’mw dogi a humo dyoc’k luisbwy donb qwu goxu ah stu utahelih vavo.
Beh haur mnub vottfayvuav azq edbuzeqeqaan epnaym yci gyief ilx ijhigowq ix vtu hunicf? Quu’nv ahnraqo gdoz id xhu hihf lerhiy ur lii ofdakhasi lrepa vimifj azve oy iOT onl.
See forum comments
This content was released on Sep 19 2024. The official support period is 6-months
from this date.
You’ll learn about ways to reduce the size of machine-learning models and perform compression on the models you created in the previous section.
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.