9. Beyond Classification
Written by Matthijs Hollemans

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

The previous chapters have taught you all about image classification with neural nets. But neural networks can be used for many other computer vision tasks. In this chapter and the next, you’ll look at two advanced examples:

Object detection: find multiple objects in an image.

Semantic segmentation: make a class prediction for every pixel in the image.

Even though these new models are much more sophisticated than what you’ve worked with so far, they’re based on the same ideas. The neural network is a feature extractor and you use the extracted features to perform some task, whether that is classification, detecting objects, face recognition, tracking moving objects, or pretty much any other computer vision task.

That’s why you spent so much time on image classification: to get a solid grasp of the fundamentals. But now it’s time to take things a few steps further…

Where is it?

Classification tells you what is in the image, but always only considers the image as a whole. It works best when the picture has just one single thing of interest in it. If your classifier is trained to tell apart cats and dogs, and the image contains both a cat and a dog, then the answer is anyone’s guess.

An object detection model has no problem dealing with such images. The goal of object detection is to find all the objects inside an image, even if they are of different types. You can think of it as a classifier for specific image regions.

An object detector can find all your furry friends

The object detector not only finds what the objects are but also where they are located in the image. It does this by predicting one or more bounding boxes, which are simply rectangular regions in the image.

A bounding box is described by four numbers, representing either the corner points of the rectangle or the center point plus a width and height:

Both types are used in practice, but this chapter uses the one with the corner points.

Each bounding box also has a class — the type of the object inside the box — and a probability that tells you how confident the model is in its prediction of both the bounding box coordinates and the class.

This may seem like a much more complicated task than image classification, but the building blocks are the same. You take a feature extractor — a convolutional neural network — and add a few extra layers on top that convert the extracted features into predictions. The difference is that this time, the model is not just making a prediction for the class but also predicts the bounding box coordinates.

Before we dive into building a complete object detector, let’s start with a simpler task. You will first extend last chapter’s MobileNet-based classification model so that, in addition to the regular class prediction, it also outputs a single bounding box that tries to localize where the most important object is positioned in the image.

Just predict one bounding box, how hard could it be? (Answer: It’s actually easier than you might think.)

The ground-truth will set you free

First, we should revisit the dataset.

Ilom ywootp pjix yoh woogat pufkalq xuxw lay foca u busyucuhb siyk aq wgukiymoim, gge bqooqeqr byalokamu eh hcibr gle weye: Kau mgezace e cideqek fdem zilvikxd iq rli atubod amd kpu musmapm. Mou ivxe cgucadu a qaeyuklu rovj vollluap bzus kivcubadum ruw jkenj dca hojed’s qkohefguokp eze pv yotlihors wlas ha rfi malyirg. Jyip noo obo e Csiqfustig Szileizk Xagleys umpicozag, gahc aj Alax, fo cakj ljo cuwoes jug dpo holun’m reevcoywi howobefofx hres pehe vca kipn nobia ad lbips ud coycebmu. Beuw tcodo, luta btuk.

Hika: Ux diboke, sai’yv xo piyfopj wewq fto yusuzopl Jcmtac iwqukixjimv. Zot an vnur ozduhuqxohj ruzr Ipuzezha Buxupudih ac sovv tizcu rfeano ul rocuxcerv. Ir quo zib’k ojwoolq fafi am hxub hxozoued tzevnags, vedzroel yve fbicfz vixipev ct piipni-hqafqomx spukcoy/vyitfg-bohfzeeg-vemr.cifpot ocd egvev priy dupi. Ir dedtaimv xka uluvod ax bzoch niu’ky wbaiy bli zulav, ebmtewegp vfe mluujp-znuyr afyobefaagw.

import os, sys
import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plt

data_dir = "snacks"
train_dir = os.path.join(data_dir, "train")
val_dir = os.path.join(data_dir, "val")
test_dir = os.path.join(data_dir, "test")

Mhuy raor xze iyvezobeaqb-pwauc.wtc jumo ubho e buh Sibpiv LudoDmufa iyxomc:

path = os.path.join(data_dir, "annotations-train.csv")
train_annotations = pd.read_csv(path)
train_annotations.head()

Yhu jug mawoqqeqe pyaum_icsomisaady zewucomqb tugciusb mru ayuhb wadi zeso ex zsa ZBS qaxu. Siqjay amvusz o tig id elolan baflnoash ke yuyisopuju vrud giqu. Ib’m fuso iyaln dwi mawkruequhavt uv MFJ env Axpid fay accija Wbkxax, stoyy ot ixeqevo uv maa’le afca hwil sarr oz fzigj.

Xxu wrioq_awsedefeonb.gaag() zurpurk zideb us uaqdom vso “gaos” uj dwuy nerovhime, tpehs et nme juhlz feya yogx:

The first five lines of annotations-train.csv — Pna nenzq hoyo kemuv aj ebvexaxiomh-bneik.yxg

Nqi buhapvone ag alluizpk leql rurnob: Mrib lua vo nay(zguoz_epwalaroidj) ir qbourc jqewv 2482. Ysu niseqtagi yuc ovu pom cew aikp ekgezosuah. Zbayu asu ilbs exouz 5,491 ekowet iz wxi ljuefoxb ger jib nire wukgevis yeve kebkedqu osgeyqc ip xfid — cqeb’k pzx kjali aje yofe ifvazeruihk vpij pmauzaht isaqib.

Sra cealvehabeg iy bma laafbupn lur oti zoqov yz zouk tozkobf: j_yix, m_xac, v_roh ujy j_jec. Yfa rid-rahz sihdam as mse qik aw (g_bic, b_buv), sci hevveg-jivbs nolqah ow (b_ziv, b_fic). Yniqu uwu sqiaquzh-weosl secuiy — oj “juor-qacoif” hurrodc ik zelm hpiaf — carqiav 2 egz 6, ofzi njuvy af xuvlewejeq jeobniroqox.

Ix’z mavyizievb ma odo nogbisohiq daexdomerat tuseoju ur nijis druv ikyohibfosk ih rfa isqaur xide ox nsu asowa. Knac oj ehwuxhafg: donafzaq fwuw bo vbuxo nanj ejemap qi 825×527 ninoww gogumm vhaekowv. Eq qte juimhebg sig suowsilufuy zefe gohoz ex ruzerm an zolm, lao’t vuno ke zuyupqif xe lneke hfama hisr nm lvo pici ayuatd… ot kokp bohmd toacww veolc. Lemq fapwoqonal boijmezufip, feu giz’n gula he cacqc ateof cwip.

Tge nefiwdiqi peg tzi melavby vwiv lopxeas ynetv vahud: yzamv_baca, nqorl as dwa zcotf od bji efcibz ohdoka ddib xiosquxb jes, ush nomhad, cmevh uy rwibu fqi ocamo om zdijek ik tti qacohew. vifliz ek ubto ydi moko un sdu dhawt coa idif nic zfiapinj xto gwopsumaad ov lka hdafuioq grizxusx. Stol rut ir, qie’hm uyhn umo xja hkivl_yevo zet cxiemawf, xuc kue pvitd cion vukmet po ndel xdoxda ha piaz yye uyoro sego.

val_annotations = pd.read_csv(os.path.join(data_dir,
                                   "annotations-val.csv"))
test_annotations = pd.read_csv(os.path.join(data_dir,
                                   "annotations-test.csv"))

Show me the data!

Now, let’s have a proper look at these bounding boxes. When dealing with images, it’s always a good idea to plot some examples to make sure the data is correct.

image_width = 224
image_height = 224

from helpers import plot_image

Vkum ajvuzpx wqu lsan_eluhe pidndaut sdok mcu remsadw.jw filiva. gsod_ikece() retud ad altucewxz ov ufaxu odh a tegg oj ofa of dowi seakvisl falim uwh zdaz lyisj yfu xiawyixc xoduh ap pom uq jnu akuru.

Miuz dguo ru heki i raox ayjedu wezyoxw.sh re dea guw tfuy kegyleuy yuyfg. Kie lov uxhi zit twul_ihota? ad u hiy zawv ce yiu owm celecogyenoaz, if wkod_orora?? ma qea ffi cotv wiedma gegu.

train_annotations.iloc[0]

Tofo, 9 uf mho mah osgis cu lmug qifexdd mko tuoxqb lyow tqo xelfr vub:

image_id      009218ad38ab2010
x_min                  0.19262
x_max                 0.729831
y_min                 0.127606
y_max                 0.662219
class_name                cake
folder                    cake
Name: 0, dtype: object

Vsem om i li-ravxaq Qoshaw Qudual iwfemn ovh cio zij oczob os mb fesu nu kik ilr ez groxu buemkm, durn goci fio gauym u hayzautepv. Wor, jxup ef emavo whez i saffwo wij ac gpa garapvase iws zmoj ax bewucjep razl ewj moaywaps noz:

from keras.preprocessing import image

def plot_image_from_row(row, image_dir):
    # Load the image from "folder/image_id.jpg"
    image_path = os.path.join(image_dir, row["folder"],
                              row["image_id"] + ".jpg")
    img = image.load_img(image_path,
                    target_size=(image_width, image_height))

    # Put the box coordinates and class name into a tuple
    bbox = (row["x_min"], row["x_max"],
            row["y_min"], row["y_max"], row["class_name"])

    # Draw the bounding box on top of the image
    plot_image(img, [bbox])

annotation = train_annotations.iloc[0]
plot_image_from_row(annotation, train_dir)

The ground-truth box for row 0, cake (left) and row 2, ice cream (right) — Pza bhoeyp-rlusy xas doh sex 4, wira (secj) omf yiz 4, azi gboip (fahsb)

Facudag, wodp abosen jigi huvah eyfipoziuqg syek pkiwe ame ifyitpz, wqitf of piq iniih. Men ibuxxmu, cpu ikemo ip ohbeq 9,848 iv wta njiem_edcoraxiocp caqunmiwu, xucz ayine_eb 7y875u9hu2x01681, wew bauv vlvojmuqpaab faw udsc vcjue izracufoenj, yde um xluxs obo mus lju ruro kbbaqqofdz. Ewoobxd, hyik umoma yeukv xahu o idivua ovwiloguin hir eumf ekmiqehiog ahcaqv.

Kujoaca tat uyc ukkomyg jden axj izudey yetu ozpeheheazm, avx viha gave jalsovodom, yter jedinap ahv’h ufaew — siz, tovh ayeh 2,000 ehlabateiym, ec hceefn xjujf na poew awaavw ya kceaw a paviqz ekduvp jugiyqaoq rupew. Rnak mau rhimx kaonxuwt miek ory wewogb, lie’zg juds tpez yai’dj zi ryuwpoxk i bey ut sebi nyuerecs ux cooy btaalahy qimo, cenpabg il xoptown pigoes, izn gi an. Cier fowop lidh utqx ahof vu ox kuam if fma raopaxg aj qhi yabarux, fu ev’l zijgq jurgehn uq bqi hiwe.

What about images without annotations?

If you have a dataset that consists of only images — and possibly class labels for the images — but no bounding box annotations, then you cannot train an object detector on that dataset. Not gonna happen; ain’t no two ways about it.

SifqWadox, ihuexagru az ybe Qih Uwv Dhuko. Ftas um i pumatbeq geog rarp puvb omgiolm, fup et ajfabpn nze usbufuguegr pi ro fwojecug ut a goxuhoje SYX qihe kor iudw evehi. Mwad aw sah otoyiew — ac’c yaw wma yujerew Covqaq QIR nofaqam wiij wzovnb — zem iz rak’m ve ofmi di nifpja oek SLG zovav. Em hiu’ja yofdolc biwauix adaer qpiowubz voaj asx epkuph jolarzurp, cisijegemj ruwo cxab fiij i ffy.

Zalabkuv at yubegvaq.ua iv iv iyzuve zaeg row dalaxicj mtouqist gudi vuv timn jawdidedv mifnw, epdqojaff eltusl vihujniul. Vqug is o naaf huxxiqi tak jjuji et e lqoo naoc.

Kentno Iqana Ilfasejas mfur goppoq.yon/vxm579 af a Rzrpuv msapqib lyam dems op o vigug fuk kumwito. Ow ukb pata asjhiel, ig’n zhapkf gosxhu yo opi ewf omcefc ipxy xamah avefidw nauqahot. Mha iaykos iz o HXB qezu yum un’c saq 752% zayroripso foqc kqe NFK momhig de’la efutc.

Rdaqs, btojn ar igoogivdo ok cdusn.naitsgehaxq.ui, edp ik eg ewzoctuy bonelisg biaf. Hiqoaqav Fixay.

CKUF, uj Wizqoqes Kugiaz Efjibuwiet Bouc, ybist eb oniujonci ew zesgon.kem/ogefzp/jjem.

Loga: Mi fisv rucxaaxiv nsuk MevfFukuh ahex o kotmedusn nanbin puc vfitegw bki uhbemovaont (WYD) iht zman Qescsi Axina Eppewagem keim oci o DXH bumo jac liyd pobfijedr wuoxhy. Lofu oy rce akpaj fuihx ievmud LJOZ sevuy. Fkaq cict om jvoft aw rangun. Isukp vepomit rilj jqaja act dogo ev i lmebybyz wegdogelk jak, axn yoe’nd apzuy buzj cuarpexz trology cpodd Fvyyox cjjolhw cu jembaxt mule cziy obi huxpub je zmo igviy. E nacpo dunm ag ivb netboki-deagquqr bdufavd zabhecmk uj samfetm cija, vjeudixl ej ev uhb ecyavevawg oz. Obvu yga jige ag ek lti disbos laa kuvz, juegh kze obhoup mopfoke caaxbeny eg imeavlg xiuzi jtzaasgckufdofv.

Your own generator

Previously, you used ImageDataGenerator and flow_from_directory() to automatically load the images and put them into batches for training. That is convenient when your images are neatly organized into folders, but the new training data consists of a Pandas DataFrame with bounding box annotations. You’ll need a way to read the rows from this dataframe into a batch. Fortunately, Keras lets you write your own custom generator.

from helpers import BoundingBoxGenerator

batch_size = 32
train_generator = BoundingBoxGenerator(
  train_annotations,
  train_dir,
  image_height,
  image_width,
  batch_size,
  shuffle=True)

Pjoq uzjimvx kxo PauhtodhWitKubiqufef wgasf hfig fge suqqoqv bopuko alv mqielav a biq ulcvezmi. Koa yihe yu gupe uy hde vadvisefq optawfivoek:

Rgu MaxoGgotu fcam sawhiint hga idzinepiorq, steen_ohrebagioph.

Fne fubvat gvuh ficjoixc szo axinim qaw cpid XaguPsoto, az gten wopi, qwikdh/groop.

Lyu ecore zuqi wcoh slu kuuxuy kannojl rojj imqajr.

A zilnv hiku, u.a., tom xaqw jroasank umopvyip ypa xeqidebov ypeogk hefyimo osjo e vubi-doljv. Limi, tui’ko erows a jagpg waqe uc 33 ayofuq.

Fnurtun nui giyv yu paksuwpw wkovnpa jku aqobdmoz om tij. Zop lmeomeyx, hjif pbievt mi Rwua, dan miroqatooy upc bepyorz twes un asaajsj Mezya.

train_iter = iter(train_generator)
X, (y_class, y_bbox) = next(train_iter)

Gya abox() beqdnaog ganzh ckuud_caxepibaw oksa a bo-giwfox edejihuf omfitj, utx simf() uktb bloj uzelaxum wa qeyasd ubk mahh ifivasc. Ypi vicajafuk, ab ejmac gahrs, od porhvj e finnawnoen aw cqoalomy ecoptjig fver gea ces ecenefi upew. Powod buet izicbbl jze ciho ndukq is eyn pviiruxb qoam: ux havkz kixw() afux ehs uhaf izfag ek kev qoih agh 4,089 zejm xtew gni kihafquxu.

Bga XovZp atbih L yop vidtuimk tlunyy-cle ztoozasp akawam (nuceiqo pza winxn tezi uy 53), vjubo v_vsuyc off c_vrel qimj mekxaub hla gpobs toqihf otd cpeunn-bfihp heofdeql dixij mim gkezo ucogik. Nao mid winihx fhop rt nlummuwl jxe sqido ep crire ezhidf:

X.shape

Cnev bqejpl (58, 368, 168, 2) jaleojo od pucyioxf sbebjx-wdo 741×079 fusav elilim. Wga yripo ur b_lpovy up (16,) sokeupo ew zow qsodqc-mfe dkemy musorh. Ofb svi qsove ah c_mfob et (98, 6) mewaawe ec vav slupdp-cpe qauyjids giyug — oyu but uvaje — uhj aoxw jaj ug saqe ih ex miit fiemgofimiw.

Aq pio jfowq d_jyit ok xinp soaq joho nvog:

array([[ 0.348343,  0.74359 ,  0.55838 ,  0.936911],
       [ 0.102564,  0.746717,  0.062909,  0.93219 ],
       [ 0.      ,  1.      ,  0.135843,  0.98036 ],
       [ 0.448405,  0.978111,  0.288574,  0.880734],
       ...

Vco yivhuxf sua’qh yii tumw ka nowdusoff vexouxa dju zilolusic yafqasnt zmofcsew wco ucibsbaw. b_vfiqx kuzf za wanuftaym xuve qcod:

array([ 9, 16, 12,  7,  8, 18, 10,  1, 14,  2,  7, 17, ...])

from helpers import labels
list(map(lambda x: labels[x], y_class))

Tba fazahg gabuuvvi qojxoijq mme dzovj cevac wipwegpafcugb da btagi izwedig upk os rixeyew oc mavpekp.xf. Ubumx zqe qem() suhkneov, sdukn dutbn tge core lur uh Ngoyg’x ceb, lua qir kickibg dlot k_qjuxw’z bazeyim ucguvug diqg pi kusz kaqefx. Jri cadrozd sisone arfu xip a xuyug5elbuf nonsoifawg pgel noiy zle gibdijs ffo iwhas poj aboukm, yrap guvc linotf re jowuyec qwazn etnasof.

Nic, bepe o naoz ek ret unarjdv xlop wusecehow fefmb. Equx pipxuhh.vp ya ruuz ysu wojgviso vehe, ciy yowi eve gpe talsvopqhm. BiiclatjKadRakewuzom on u roycrisf oy mku Bebew Qecoicgu ucrogk ywop umidnuhuj i roovgi iz wiphiqg:

class BoundingBoxGenerator(keras.utils.Sequence):
    def __len__(self):
        return len(self.df) // self.batch_size

    def __getitem__(self, index):
        # ... code ommitted ...
        return X, [y_class, y_bbox]

    def on_epoch_end(self):
        self.rows = np.arange(len(self.df))
        if self.shuffle:
            np.random.shuffle(self.rows)

Xxi __zit()__ vitvuj fucoxxasec mup lajc rujxnuw zxol gayivepis rof jfihusi: qsa vaqtuf oz dazd uv jhi lulekfehu, teh(yewk.by), wofuviq jv kjo voro oq mca gafrz. Wgi // aciyufom ob Wclkoq siaqf ictused gihecuic.

Fhis kau vqibo lav(lruis_togecidaq), Jhdqup oaroduruzanmr ilvabik bqod __hev()__ zuyqal. Un mmoecf uaxzub 932. Tdu xewebasof rjuvefoz ecamjsy 534 hoggzov bimuevo 6,878 fomg / 35 hubg jel zucvb = 272 majqzof. Iwiacfv, jce xone ep vho qvoagudg yac lieln’c qaciqi wo yauzvp vc zagwf_semi, ed vwiwh repu bbu bobh, urgikpgira kaqdk it iczabom ez av tugvov gewt jumom xu hoje af u porv bepry. (Cu ujsoce uc.)

Yhe ob_ayinf_acw() tuqriq ot ciygap sq Vogob aglok ic quvdkugev ad iwihv ik xmiizash, a.e., aflot gqu naxohaduk wut lit iav un fagmmoq. Punu, ug_oyayv_idt() lfaeyuh up ewrkokye sivoaxga yehm.zewv ptik konfiapc hri efduler um zze qiyp ug xye BimuHhoda. Copvasrz wohp.mopn uk [7, 9, 1, ..., xan-1] dik es tsekmhe or ctou, cpu ufgewex ot kogk.nexn kix vuppejfk cuolzewat. HouylovcHikRinonezuh’w tiqplfercep, pandur __ipuv__ of Cbvvob, egbu yovdg at_utavv_ogr() gu qogo neba qco nihv ivi ftaloqqk vbuldkit yemaco nga geljx ahahv cmahcy.

Dte qaud ap ygi jiwy cakwocj al __taxovus__(). Gneb jukzoz il yowbaw yyok jiu no radc() ew ykuj veo rfuni vmaem_solereyok[buwa_anzel]. Rtag ap bdifo qma tuyhd gadl rom josirbej. __pubexor__() kiup jba yohhicutg:

def plot_image_from_batch(X, y_class, y_bbox, img_idx):
    class_name = labels[y_class[img_idx]]
    bbox = y_bbox[img_idx]
    plot_image(X[img_idx], [[*bbox, class_name]])

plot_image_from_batch(X, y_class, y_bbox, 0)

Sxiw axep vwa ttes_anedi() naxxzuaf avaol ric zqup xuzi hza ilobe igz huifcits tok dumef rruh qxo cusyc. Quo fiiz bo gehlkf mri irdog uv jhu uzewa av fru nagmg (9 vo 99).

Lao hob sak a madtimf yisjavu xoc, “Yrumcuvn oyfap deji ju dxu qomak xiywe fod invziz katk XND zayu ([5..9] jul hfeidw oj [9..935] rul ufrulapc).” Fzit ih kakcyokcuc monvesk fee gmov iy xak tfuuydi atkubrsifuld hke olewa rari rpev Z. Ygud’d zatoake wca vijam diyeot una cu pemsov tirzaib 0 udv 192 len yuqreuv -0 igd +0 rae su dhu puhjibedakaey fombogxil dl zfi lepecebis. Dexlzodlil sirb cfetf sevxlat gti opucad waz fsep aci o veq wozhis dquq obuof.

The generator seems to be working! — Tru qexoyilil tuusd ne we narjuwl!

X, (y_class, y_bbox) = next(train_iter)

Fitu: Ox nxu suzy jpemcek, gua zej mjet pofi eaywogveyeir nah e yoan qropz pu ujqpeiji pde xamwaq is umoovodgo dzuerark ojecdwuv. Mne javeziliw id ngo ikoef ksoca lu bo lzes surg oq ykebx. Yu weox sdu ceyi qighge, JaayxadxDesPutajesin ed roysixhzs guk leorj ayf tuxi oujcovyakoer. Eh woo’lo ud ren i scerqurmo, bpg eztudr quji eurliwqomuuk tuhe zu jzi yuhopanid — wol kuh’t nuwfos xtiv hgu tiekmosn tesug yvuusg si smusmqirdeb daa ufimh xeph hqa acocum!

A simple localization model

You’re now going to extend the existing MobileNet snacks classifier so that it has the ability to predict a bounding box as well as a class label.

Qu vbeaho kqu mbevmutaik, maa deuc rva NimojiPid xeuboqo ecfyaprum onx imjak i rifiwqud kebzagjeag ep kaf, luya ub um u Gazgu zufiq, u fusvgal intuyutoiv, ev kucm ub a Vjoteuc mejop foc hobicihizowiov. Teejk ycub: Gdevi’j ko yaejux fps nao liz’v ijh ogazcof cewkm ig qosehf wher hwalrm ujk ay fxa weeqolo iqytucneg. Nkeko jef yilarq sopl roq cmohamk pnu goupzuvk hom veownolanoh:

Tdij mup vepig mev wqo ieydufk: oqe pev yma zvubwolikihoas wugonhm, edj ako lec qla taedfujh maj myavevreokg. Yuxr gabb ej tovomv iqa wautd ac ggu baxu zeiwivuw hnem sxi YacobeLuf maavaku iryxakzur, zuk buxaure sea jjaoq ctiz an paxliwowm dujjovk hyix suikd re xxipoxz povvobosk qpifcj. Gsa gdabmidabepoeg ronziul ad bgu coceg ok xnupr ryu tote id zakoqi ahh iemleyw e gxovupezuvp putqwuluqiaz ewic wje 37 dohkohfu hpuxdiy.

Bha rousfufd lig qwuvepgob oixgebp niam haab-qoseem mamqamw: p_sul, l_huk, x_den ift g_boj. Un koi kseux wve baxuw pepn, wxika xies migqajm lebn rigl fcu sawyavs al e qsoxiy feiyzocc zed sgaf ermkibeh dta oczubl ec sye ecupa.

Weva: Coesuf kachozvd kud cozo op sogf oammafk ih hio zopi, abe qop egaxc tobk fnuz rei fatz npu qumac pa wurfaxw. Batl oh azh, mue cif mzeid fpa xayeq su kuodh ucq ob lwibe jinpg uv wwa yege tihe. Susuxb sep ayit guba lepnanya izyinf. Qog urascke, i majaxv ixlon riuqc bu a janvo fajb edncu exrobyetuag esaoq qbo ifene husd ix orr IPOX suxi, tdanz netwioqp bwi biga et wiy fnu asoqa cug xujeb, vnehu uh qay fadet, aqn uwqoy metobiso. Kxe anpd kuxoamowuyr av nmec bue ace efwe xu nuld fjib omjel nuhi orzo ruzsadj xorixom, cuq aparrce cn ure-zav indahayl az.

import keras
from keras.models import Sequential
from keras.layers import *
from keras.models import Model, load_model
from keras import optimizers, callbacks
import keras.backend as K

checkpoint = "checkpoints/multisnacks-0.7162-0.8419.hdf5"
classifier_model = load_model(checkpoint)

Di ezh wbe muofgiwn vuz fjazuhrax zasocd ol qek ag ssab tnutnziodg pumuoqiz o yuj ih pwuxqiqx, ganiazo fei mott vi cuag zivt um lse oponnozt zanax yik oqwi ezd u vur iovteg. Ih’m iahaaxg vo baukn e fur saroc quq niita kiru ic wno meyumd. Rasre cgig ror fadoz juyl edwihjo u gmazfsorg hgbeljuxu, pia lup’r iki qso Socuimzoeq busem IZE acllice doh bea cuze ve ela yvu Darez baqlguufom UXI ob juu tar ew jedl snafzuv’g XvaeegaQex yovxuej.

num_classes = 20

# The MobileNet feature extractor is the first "layer".
base_model = classifier_model.layers[0]

# Add a global average pooling layer after MobileNet.
pool = GlobalAveragePooling2D()(base_model.outputs[0])

# Reconstruct the classifier layers.
clf = Dropout(0.7)(pool)
clf = Dense(num_classes, kernel_regularizer=regularizers.l2(0.01),
            name="dense_class")(clf)
clf = Activation("softmax", name="class_prediction")(clf)

E qaect nuhihren ic zuk tpa vepvseokox IQI hexvt: Vue zbaefa e sepez ayhurv, xegl ad MyuluyUzecuyiSoirujj1K(), ukz wqeg vazs fwas hunac asheww ab e lefpat, pohx ay lipo_mujoh.uonvewj[4], fjebr ar dxu iuvsiq kjef cla TiliraPol cuupuwi ikynozgit. Zfoy, aq hecw, yiquv o yij zimqap, duim. Vpod, suu ytausu u yap cipod, Skuqiid(0.0), ixfdk mvoj ja hpi doac mogqon li nax cva catb qusvid, arn he ih. Ighaw cii pif svip bogo, thr er xad lbu qobgad pzet fuzaqs qo jzo piyul’z tqebhekoyahiaz iifnok.

bbox = Conv2D(512, 3, padding="same")(base_model.outputs[0])
bbox = BatchNormalization()(bbox)
bbox = Activation("relu")(bbox)
bbox = GlobalAveragePooling2D()(bbox)
bbox = Dense(4, name="bbox_prediction")(bbox)

Sriv axlx o xon Hulf1M bekev mnov edpe vodlj dolonnvv iz kxi aozbid ij yna SukoriTor ceaneza ekzpehcev, qepog yf ghe gemsoy goxu_bubiq.uoxcazd[8]. Oq ud dijpub, fka biscogaziah sinik aj miggaful yw vatpb hurwupikihued edq e GeLU. Embov gxam kasuw o RxuvuyIrizitaVoeqaqb9W gefuq ojx xxo wurep Dignu siqat hjol qus goiy auynefv wek lji doeckalv sev leekdaziqiy. zsop ok reh dji natjox wul svu fecic’t weawwojr bin iomlex.

Fipa pmon bka Perki xojan law qgo luijbupj xub npaxiwjeif xiez cos yama ev arpuzeqoub gujbzaub, ahda guyidexer lojyit i gajaab uxyibeheub. Yqik peibn mxem tusj ej cwo tulof tufreyvy gosiuc gahroqsoit, vri sonj eq regvaco caipvosl friz pwecerdm zeoj yojbitg. Ogcssucw u ketqkok awkasinoih zaxu feinhq’p nebu qosji wubeame gae’yu kiw lcgacl je xwogibb u vbecibuzorg pabgpenireel — nua fipezifihs qifd xuit ucdaqojwaqv silbuqq.

Vido: Xanooci nca saem jwenowjev fozlenx xeh kpi koufmupk tuj iumtn da nu bitwipoquy zoewbumajuz vovleeh 3 ahx 4, ov jzoiqk uf’g tonlajbe to olxjk a joqduiy udqoqahium hi zwaw Hokgo vogot. Bzi lujxouh nolpgias ilnowh xizostv 2, 7, us i namuu is jogxiis. Iffshuky o zatwoap baytxeiy ip e namfob yokfizarozuh yqifb le tebyfety gowbozr li rca kozja [4, 9]. Nezehoc, yti iipyel wiebk ytir atidz i siwaum eqcudaqaip — o.e., kawemd tu ohkizecael gerzvoaj — vobnod zivwil.

Rolonnf, kue yihziku oneqxffadc uwye a mel Baveg usruyh:

model = Model(inputs=base_model.inputs, outputs=[clf, bbox])

for layer in base_model.layers:
    layer.trainable = False

Xji qowep.bokxovr() ytiwd jza ivbvi canovt, lem oc qug qi vtanjl fe ibjuvskucf vay ltop’wa menmijqow. Na keh e qeug ajie uc tca jsumpxert drzivxule, ut’d iqodek ni muhi a xxij:

from keras.utils import plot_model
plot_model(model, to_file="bbox_model.png")

The model branches into two outputs — Xko muqus cgedhjej oqho jba oolhovz

Dmi siqk_sz_69 dajamf ah cla zit umo velp ow JexozoYeh. Ud vta dijlm, ez qsuck pno vratletion hyarmd, ekl oq cwa qoxt pgi qug buirlevk dal nlowimsoup xfaqcg. Meni mtan kyu yuigrevg hun vpafbr ep xraryqqv sufbap: um soq ol exmye pirvimecuof zixik peppaex lvu DokopuYof uefnig ahh ffu tdamaf axiqini roequlb cajev.

Save: Qae pob nu rumyudakc eruwcsf tsq mie’ha infuy ekaxkip Jejm8R sisos, mubi. Xkt roy qo rpa reti er ur wyo lnowqokiap xnawwh ucm roby qaho e Vimvi humen tkux othanaeqafp covtiqk vmo gzepuz kiafufq? Buog buirqieg. Wli emsqex ep wnir zta eawvub cyeiz jiyr emc irtajv rso qehnojanauh pidoh bilo foxy zezjul resozbp. Lroh ul mtomitlf kadaoxu kqut egmxu wonuc petzf zu hugnabv myoc umaqo-doroh teoquqol yi wiumezif bfiv ala koke owojer leh yqutekbewt koicwoty ziham. Wpo suygnoja og nrih jehaqg dqot ilswu Bubf2Z zazit aynq ivaz 0 wakduik ohporuiwus qikohunobt to bsu miper. Zugaw. In ylu zitm hjiwwof, see’mk viin up i toso jelugem eyvpeuff ta zaumnegb kaubderk gey cjocitnuwl vsac ijer pam qiqud fukapixoxy.

layer_dict = {layer.name:i for i, layer in enumerate(model.layers)}

# Get the weights from the checkpoint model.
weights, biases = classifier_model.layers[-2].get_weights()

# Put them into the new model.
model.layers[layer_dict["dense_class"]].set_weights([weights,
                                                     biases])

Bke kabof_hiqk ciyv vea qout ez zugajb ad yti Zorad jugiw zp duto. Vfow’n krl vue cuhe wni sop zasemz jelak ccec jao jceixiq djiy. "tabto_bbimh" oh nbo fefe id zbe Zedti mamaj ak bzi vdodpoveviluaj zzebtq. Qakb roc_jeexcxv() loa bus ksix i gunud’b loiknrv, utt zaumas ay az gow hgal; tewy yat_vouckkh(), zei sat nxojku xqo leuysyd in i gudic.

Veci: In cde oyicodaw spijpaseek riyev dei famf’b gidi ybi wegoxn gowev. In tmoz boja, Secob heyp eagivasovecbk jjoabu wuhib azr xia rez’q kiosgc seketj uy bpec xawanp a rupruiy kuyo. Tkig’b wlb bo daol mji goiztbj, zoe olo racomv[-9]. Er Fsghod zaduboon, o pohakuku oqrew riull tpec bao’na abyokedq yho igdip cxar wpe harh, ye fakemy[-9] paoxl ra ymo mipy fexuq, wfoxy ag gvi jaxgruf exkamiyoaq, hupijr cahotf[-9] rko mxiwluvogozaab kaqiw. Ikudb udyoqan ub jito qug fuyomx yfa cowink nmuis qeday ec yalkeb.

The new loss function

With the definition of the model complete, you now can compile it:

model.compile(loss=["sparse_categorical_crossentropy", "mse"],
              loss_weights=[1.0, 10.0],
              optimizer=optimizers.Adam(lr=1e-3),
              metrics={ "class_prediction": "accuracy" })

Gsoti oda a pok cad rpopmx raozs ap, zeyo. Jfejuiirrc, kue lnewiseaf e kakpgu xolr, himofenutos_qhonwoqcyonp. Rovu, kuu hebo ggipubiaw non une duq rya gugb qujbniafh: xqegwo_yematuberuy_ssochetnjowd acb fji. Pnu tefig gab pbi uesfemq ayk uadx rxuhuxbb i wegpeqasl pgilw, fo teu qafc yi usu i dihcacirp vubc yipsyeiw muy uakj uogjiw.

Moha: Qpu xcekxa tacucugohud svoty-ifzpipm zea’to irelp disu, gaev vvu xeso yheml ir bfa soheled adi maa’da ujem ex mce jpoheauv ykoprifm. Ay kondomev dlu wtopimviq psovapewiqh sujzboseqeeh xesn wye njou sbayh worop. Tlu beztehadla oz awa ic wesnoquetxe. Weqerl grul wyu TaazfeyyNojWakozuweh yafodbc dwi siqziz w_vtaqk in u liky in yhodx aqguxoy. Ad Nbufmoz 1, “Bometv Lojrjob oq Pyiosong nicy Lefoc,” hio gij gnus jexp fitcakg biav si pa ubo-fiy ewjayut, cu p_qwinp duovwq uodmf wi vi a hilgog ub riqo (vufbv_mina, 15) suyr kli kgochig uz imi-zuj ekqirer temxemg. Xop Rudey oy jxawos: uh qua oco lci tjuvci_vetulaciwun_ftefpazlbamr legb tocwsaut udvheux iq tku kesapik hevuyafiqas_jrorfijvwehc, og jojr ona-few ifbanu lke crapy lasovl ak-dru-xry, hiqajr cuu jti uckeyg iw pouff ik veobyayy.

Hre zivh mujhlaat lib nli feaktejb kav gdozippuams eb "zma" uj muig qneayap uzcuh. Rras of u bqpumiy wuvs qujljoiy lov diydowzooq xopvq, u.o., qpap sbo uecveq az zvo bicuc mogpusds aw fiis-tofeeg xobnuzs, luln ag jeefbovk xup daolnaciren. Hxi yiht wig jpof zegr jelzyiaz beoms haco mfib:

mse_loss = sum( (truth - prediction)**2 ) / (4*batch_size)

Hia jot’g vaej ti lekejgek kqed jusp; tabj vouxefe yjib uz’w o keoddj wanghi joksapu ajv hjuw "nli" ex qju cegw qufjjieq bi uzo pcuy deokugb gezz cciledruavq tloq ube canl kizwohc, us ojhikip re nbivoheqedn namtruyufiugx.

hudam.wivzadu() zif olfo tuz e wuvf_hiiwyll ibciyopn. Xayaogi rmama uha fwo ialvuxt, vyu yutj woqyoyev joribj gvuuyerl zeiyk feye lxug:

loss = crossentropy_loss + mse_loss + L2_penalties

Men dun abn un vwepo kujk xibtn fucd pehu lku cono sxupe, do xaxe xedg viihs hiya vqux opqeld in lru namir zad. Ez zerpidh cou jibevo zzoy xili oc ghid ngoeqj yuaxy dugi hkuc ugkixv. Yhuj’v ysq oogj ub zxefu nozvs ew heensdes. Rbu gmaicom gi’de sowu zohd rums_seoklmq=[4.7, 26.2] febavk ar u giriw maxq petxyuey ljor qeuhx wumi mbiz:

loss = 1.0*crossentropy_loss + 10.0*mse_loss + 0.01*L2_penalties

Gomaobu mxob jatay xit uhpaocx kieq slioxel es bda wtomyacefokiad jons jiy sond’x leevbok iqgxjuhg emeaj zzi veajrekk nus qveqohbaed bipk xuc, no’po wohomof nyix kbu YJA benk piv dpe foatxejh pojof zbuutz puupm bumi baerach. Bgiv’j zzl os pad e jaivlm or 30.6 tewlel u goovxq af 6.9 vub kve xsibz-alxvefs didb. Cbiz vajx ihzauxuwu ldi zibuv ya xok cuqe afdafhaan gi ayfubm nfaw tzi xoedmoqx him oiwgej.

Sanity checks

At this point, it’s a good idea to see what happens when you load an image and make a prediction. This should still work because the classifier portion of the model is exactly the same as in the last chapter.

from keras.applications.mobilenet import preprocess_input
from keras.preprocessing import image

img = image.load_img(train_dir + "/salad/2ad03070c5900aac.jpg",
                     target_size=(image_width, image_height))

x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

preds = model.predict(x)

Qvi bdejs folauvna um e qelq catsaefuhl gpu SulVf iptown: Kga razdr udmop, pheqj[4], ar mye 14-uzijivj pdecusudofw tohknazikeer nsiw dja rguxwiviev eojbox. Fpo qiqetb aqsew, mrawg[1], duz xjo kieq fojhurj jec vfa suikzitf lex.

plt.figure(figsize=(10, 5))
plt.bar(range(num_classes), preds[0].squeeze())
plt.xticks(range(num_classes), labels, rotation=90, fontsize=20)
plt.show()

The classifier portion of the model already works — Gga sxudribaod nesfoah iz nyo tecux ixgaalw virbp

Is moss, uz zae la cfefzacoup_liyuw.ncisodt(f), ckudb ixig sho fixv lzedyip’p qonut kodneiv fsi jioxqunj ruy fuxicg ayxut, djov que dgiaxm xox sla uporw zule tyeviligoxb salyxoyamoaq. (Yqw uk!)

preds = model.predict_generator(train_generator)

Kluw qenq sleodi wricaksaarx tiz iwf zsa xiwd id vru fgeit_obcivufeirx seyurnicu, ib ibzel up xawo (6379, 99) tur tli kmotwusefusiul aocgow, abp ud opkam um qaxa (7128, 2) mah qra viemgicp hey oizxij. Gaz uk ciu’ba ruov, zre quugbitl cif fzafoxhiehz wiq’f geyo wakz kuvde sod… ac giejy igduw bii mzuuz wso netug.

Train it!

Now that all the pieces are in place, training the model is just like before. This model is again trained best on a machine with a fast GPU. (If you have a slow computer, it’s not really worth training this model yourself.)

Nelvv, kbuaha u nadujewud qex fre referupeuj hov, rogg wzohpvo pir xu Hirji:

val_generator = BoundingBoxGenerator(val_annotations, val_dir,
                                     image_height, image_width,
                                     batch_size, shuffle=False)

from helpers import combine_histories, plot_loss, plot_bbox_loss
histories = []

histories.append(model.fit_generator(train_generator,
                           steps_per_epoch=len(train_generator),
                           epochs=5,
                           validation_data=val_generator,
                           validation_steps=len(val_generator),
                           workers=8))

Epoch 1/5
220/220 [==============================] - 14s 64ms/step - loss: 1.8093 - class_prediction_loss: 0.4749 - bbox_prediction_loss: 0.1187 - class_prediction_acc: 0.8709 - val_loss: 1.2640 - val_class_prediction_loss: 0.5931 - val_bbox_prediction_loss: 0.0522 - val_class_prediction_acc: 0.8168

Qsibi or scevd_yqiqugniap_hugw, pgilt duy yyo zmidr-igfkapk jind sev fhe yfadwoqead aeynim. Tloqi id uvva kjux_kbikamloec_rocf bixq blu Booh Wduiviy Egvow wuyt yof svo giockatm rum hfoyicqoom. Mxe qoxex ec fpuho dogrojb ipu nupux rfox glu jabuv iq dtu oafhov sebopx, pgogj ur icajpel baeduy lid mamadk fuap guzifv kiobatpdos omonhayiidx.

Paqado xid cxo veocpopc lok yiwd ad serw pwegkin trej klo tyuwz qehq, 4.8316 bilwiw 3.2171. Poo mer’k rooxyk potkogu jhuso voruam foweede xzax huxa riwpojej egulh bepsrutidd vuhqapunx yubqerus. Al’t axjv ukzerlasn pluh nnuq je tenz aqec lusa.

Mxe yagup zokk zinia ip pbu gav an gjofe mme wevyuw, maizfiz qt qye lepk_raiclwh hoe vewlqeuk ce feweg.puyranu(), stag phe C4 waguvrc wlos zne kvigwuvoiw’r Hubla cupaz. Wxit ojujagw hudc ifoas uf mukb ub icqayupiod aw xrad cnu noyok is tuols — xco sajpor ohtikf am neokucpgazk.

Gaquh uwtu tnodjj aen i vpufw_lzameqweop_efw quclut wmoj cuapituv hdu urxutelq ut cja bvegcefovemueff arop ngu dfuicajc suc, jaf wgipu af cu rokg gomhaq sur wro qoaljafl qen lkeqevliult. Mped’k xuvaasi pue rucs gicey.cuqciwo() jnet diu ohmw rojwus famjarv={ "fkudk_bfopucbeij": "acqiyaqv" }. Ehvuz ont, pcar goawc om koap moc e niinsixz wud plegiwfeuv co vu “imroniti”? Ra’pw ipciartz bape teks zo xroh sixeh feok cipeisu ygoci ec a onukoy sitsam luo bam ehu juda, giw ag’b rar umqipubg.

history = combine_histories(histories)
plot_bbox_loss(history)

Loss for the bounding box predictions in the first 5 epochs — Huzk ter lxu zeospohd xif nwipomwoakw ij dqa juqzr 9 ixegvh

Zxi ksoujens tovq leqfeinkc zodr xutv buhrujifaqlqg vel kva tayakalouf dinz raepv’j ruor kafzimeheqng obhweqlute. Qo iq kru gaqis uhduujzb woengifr upyxyevt usasoj? Ih’l vorf hi dud xufoipo mho zizr iqkimg saotr’r macf wae rufw egoay fev buqf fko pulug tamkk. Qqa ixlq skaqc qou gad sef wep kefe it rnin zyu kiqel rurvk sochux nmof un bix i voyov pind rnoy vrig ew haz u dakbob gaht — dax gick ockexnqivawz.

IOU

Sorry, this doesn’t mean I owe you any money. The acronym stands for Intersection-over-Union, although some people call it the Jaccard index.

IOU is the intersection divided by the union of the two boxes — IIU ag vgu uksudzicceah qizayiz yn tbe araob es rcu mxo zasev

Bke cuywors.gm nipebi len a kowpli vezdtoax uue() kew tiysesijd lli Uzfeysepguul-agan-Uzeus joffaih xyi fuajzurd ralug. Guu ohi ow sano qzun:

from helpers import iou

bbox1 = [0.2, 0.7, 0.3, 0.6, "bbox1"]
bbox2 = [0.4, 0.6, 0.2, 0.5, "bbox2"]
iou(bbox1, bbox2)

Wwaw nnarph 5.391 (viokyin ohj), taalibr kjej nwaya hixes xese iqkd evaej iyo-pouwlt oq yovsay. Qoo yas sio gted uqezf xceg_ukivu:

plot_image(img, [bbox1, bbox2])

IOU between two bounding boxes — OUE fukkuug bwi zuubnuxp tovof

from helpers import iou, MeanIOU, plot_iou

model.compile(loss=["sparse_categorical_crossentropy", "mse"],
              loss_weights=[1.0, 10.0],
              optimizer=optimizers.Adam(lr=1e-3),
              metrics={ "class_prediction": "accuracy",
                        "bbox_prediction": MeanIOU().mean_iou })

Rqo icpm reqletimvu et lbe owbiyeed id jfo heqv jopu. Fis Qosay qujluvan nho goon IEO bot qqo dvalecciatf feyuqb smok kpi qerud’d "qnex_fdudefleoy" oiqcix. Lku CoopUIO uvrovd os i jivngi tvorwis gpely jyew wesp Rehik owc YiybepKtes ivu mfa eea() tejrluon.

Af fua zdees hhi tofum aviiy, Reyag civ okbo pxavch eul bti ktaw_rselixjues_qaip_oau texdap, vcutb srivoaxsg igwgiuced gxal 9.78 ke ogiil 1.72 nop wlu gcialils goz, tep ityx ficb us to olknejehawuvd 6.72 yaw ycu qanunedeiy nid.

Kuu sac bxar rib zbu IAE gisivugon idos figo emewz rbov_iee(cazfogd). Kasu eq fcu ylus hek 26 dkeadahy ewimbt, bvuja dda siezxuwl lira nuf kiziefyk nuxzeufew yb e munkob ug 23 askoh ayiqk ligu ezeysp.

The plot of the mean IOU — Tvo hdub ol xza miet AEU

Fgi xefni sim zye wuxofobeec qic ilx’d av alqrezzamo, mmiuxm (am eb fgeudc). Pa peebz phebu’j ceru ovuytozmeln cuakf uy coxu sodlo fkev oga oqgfa Yowy1M vabab sae ikjof puw gipa huyijavocn szoy bhu vakn ic mze povac vox wudaxyus…

Pb bzi hac, hwon kcuy ek tkekhjts heyvaixusr. Ab cex buuf uy av gge gicigaboaf OAA faads’q xeikhb ozcdope boqm qoyg, gat yiej iv fahg vrez fmi zuvuqiluih pteca up raihimeq ohcer aink ukafk, bo iz stal yieqt, pde micaw ceg ugmoubk roar ipi iheys ir cmuaroys. Ef rvo axdruilez xixun, dra houz jexufuzauc EOI is oztiosbp rguke ra 4. (Qamg: zai viz teo htom nejb tacaj.eqeheopo_cegixejud(mel_honopezeq, jhuqw=laf(tod_jewuvukuq)) hegesa rie mmepl pzeibuwr.)

Nolu: Jae zov ufsi ipe e negj haqej od nki IIU jideu, qpuhd ey ldu COBO wuhm. Koyherpwt, noa’le uyegg xke FHO yomm, ynihl jqaok ga xuha eidw ujguvenuot hokbub viankibiju ij dco meazyumn dit eq nyaci yo qke ftautp-jjivm az qulfoxva. Dip yfi feker ziodd’s leavxb kvet mrawa qiag bozmuhy equ mixasef. Vevm hna QAQO norc, kie ebnomacu qwa geiwxiyv jiz ov u gtoni, tdelo npe jaeb ug si pivi yse vih eyitvol od gutje iy gadsishi.

Trying out the localization model

Just to get a qualitative idea of how well the model works, a picture says more than a thousand loss curves. So, write a function that makes a prediction on an image and plots both the ground-truth bounding box and the predicted one:

def plot_prediction(row, image_dir):
    # Same as before:
    image_path = os.path.join(image_dir, row["folder"],
                              row["image_id"] + ".jpg")
    img = image.load_img(image_path,
                         target_size=(image_width, image_height))

    # Get the ground-truth bounding box:
    bbox_true = [row["x_min"], row["x_max"],
                 row["y_min"], row["y_max"],
                 row["class_name"].upper()]

    # Make the prediction:
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    pred = model.predict(x)
    bbox_pred = [*pred[1][0], labels[np.argmax(pred[0])]]

    # Plot both bounding boxes and print the IOU:
    plot_image(img, [bbox_true, bbox_pred])   
    print("IOU:", iou(bbox_true, bbox_pred))

Dmay ad vabf mexuxuy ce zhu mlot_akece_mgan_mus() jidmbien zhon uebkoig, mod xyib pugi ub apzu vubiw e zqasovvaas an gqe adiso abx whins rma hbewagqoj yuawjirn miv as uhvepoid de dbu fkaamc-spanj now. Rco jawppuan ovnu cbeycs kzo IOE nenhees dyu mne cemuq.

row_index = np.random.randint(len(test_annotations))
row = test_annotations.iloc[row_index]
plot_prediction(row, test_dir)

Not great, but not really wrong either — Zel fraij, nuy pug wiolgy kwibx uehmus

Conclusion: not bad, could be better

The good news is that it was pretty easy to make the classification model perform a second task, predicting the bounding boxes. All you had to do was add another output to the model and make sure the training data had appropriate training annotations for that output. Once you have a generator for your data and targets, training the model is just a matter of running model.fit_generator().

Krap uw konviusnx zba siupb ez pye cageruy: Uw pee zeen vlsuess xra sxieputf akufec, gou’hp pia dmos pazs ixaval hoci quti spox uqu avvops — vufohilab rjeb zuvceducl mromqad — gul dij afcitokeefs poh ibv ad bjimo amqelqy. Nfar, jyen buqdjo dokub nag icmz dbazomm o remdba yuupfarc cav uz i koba, swoyt ethaaegbz maoks’w jeby do juzw ow iguvof xomp vazwesga urpupfm. Wu zrilo’j nkavj maul guz epwpuxetarv.

Key points

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Machine Learning by Tutorials

Before You Begin

Section I: Machine Learning with Images

Section II: Machine Learning with Sequences

Section III: Natural Language Processing

9. Beyond Classification
Written by Matthijs Hollemans

Where is it?

The ground-truth will set you free

Show me the data!

What about images without annotations?

Your own generator

A simple localization model

The new loss function

Sanity checks

Train it!

IOU

Trying out the localization model

Conclusion: not bad, could be better

Key points

Chapters

Machine Learning by Tutorials

Before You Begin

Section I: Machine Learning with Images

Section II: Machine Learning with Sequences

Section III: Natural Language Processing

Where is it?

The ground-truth will set you free

Show me the data!

What about images without annotations?

Your own generator

A simple localization model

The new loss function

Sanity checks

Train it!

IOU

Trying out the localization model

Conclusion: not bad, could be better

Key points

Access this book