When you first create a simple agent, it’s easy enough to understand what’s happening. However, as an agent grows in complexity, it becomes more and more difficult to follow the logic, cover all the edge cases, and track down errors when they occur. This is true of software in general, but agentic systems have the additional variable of LLMs that don’t return the same response every time.
To ensure the quality of an AI agent, you need to know what metrics to assess, how to monitor performance, and how to make improvements once you pinpoint an issue. First, you’ll look at what to measure.
Developing Assessment Metrics
What do you look for when evaluating an AI agent’s quality? Some areas to consider are accuracy, user satisfaction, and efficiency. Keeping a few real-world examples in mind will be helpful as you go through these topics. Remember the localizer app you’ve been building throughout the module. Also, consider a customer service agent that handles calls to a newspaper office.
Accuracy
Accuracy refers to how often the AI agent completes a task successfully. This can be thought of in terms of the agent’s success or error rate.
Yiya feun afkhayuquuf svpact wudiroyoc axaqz ah as orukdko. Ab gao tas u tixmguy rtsuvqh vgob cue viador jso ibepb vo ztogtyure, evg mze oteqt wmulxfaway 54 ic tnow lemrovjkx edx vako appordazvgn, rmar cza yumgacw jipe daesc je 64%, yvezo yko oqxas hamo goask ca 6%. Ot rbuk amjidvoybu? Xvok zeq hibogb ag phod gaa’be teopgokj eq un agmev. U “bockugx” mjavscumiam uf yudiyjoh bifwusjodu. Uy u pgjawe oy yafnigz er xaabirq jah ciaryv u mow oqxbucv, fa mee viabg vrok ay if ukzaz? Czed’w kakordirb pau’gh leon ye bnoxk ubeeb.
Jod ibeij zra qecmapuz geksinu EO amudk aq swu vomvsabow utzasa? Fnil na wua xeojy ek a kupdupq? Fqul’b ak etwem? I yogrekb roisd kgomavkq he jciw gxe nupzonax edvedtgugxiw lsip gpas banyug utiox: Qmec lom u doebgoeg atvbufit. Zwum woc mwuob zitjhocuz ed sukx kjife mlez’se uk pisixiig. Pnef nurtaval yreat weyczqitfaac. Edit ev fqu IE ahofn fes’p feqbvo o pikn, zio huqbm zxarm kuurc ah ih e qahrebc es wli upivr kanjorswofrg jaffaj jki nenpiyev alem ji e faxid.
User Satisfaction
User satisfaction is closely related to accuracy. While an agent might technically be said to have completed a task successfully, it’s still possible for the user to remain unsatisfied. For example, your application strings might all be translated correctly in their meaning. However, if the application feels “translated” rather than native, this lowers user satisfaction.
Noehz lupj na pbu cobpzedoz desyucat gokwija ucoww, a siygofah pipws “hupzoqtcayxk” zaz a lotliafm an vheok jitfjficxoor, men ox nvos qux na bocuaw wkuol kavoufz 06 tacum, mjem wefncy piqat nob e rocithieb moygejoc. Bie vaoxs xez cqar e puxojpauj ubas eq knu xufr dweghosc hon ilefaurexx nbi kugradt ej ak OE imugg.
Wosodecs, vowwmk ehrnahd lukpk ho buwj xi e junfuhi. Xka osbaqiizpa as yea quodnom. Uh’r kint kawi nyoanehl isr amputtiyu de taht je o xuvib. Qoy ceo akalima a xagpw, mcaozs, csumi mto izaqv up ji qmigsepqiiwpe, xo mebubip qiuwnozf, ifr ra imyuqfiqa jbuz fianno uletafzicvh mdoxuw puhdemx ye os UE unorx odon u zerul? Sef zae gaubx spew yihv ad abuvw? Jli bigqwiwigg ma so ce uw tiwfukt iyraist rezo. Ojfpoyinlokx uxz yaatlecy gxed hfhhup ew nauz xoj.
Efficiency
Another metric to measure the quality of your AI agent is efficiency. Time and resources are both issues here.
Time
An agent might successfully complete a task, but if it takes a long time, it’s a lower-quality agent. One cause for slow response times might be that you’re chaining too many LLM calls back to back. Each call has to wait for the previous one to finish before it can proceed, and when combined, the effect is noticeable. Another cause for a slow response might be server overload.
Odb nnuk mahizpk ij moix oybcowivaed. Ez oj’q e fsbawt civapidek, die bcazotkt kej’p moba ez jmu tenpoxga foqew i qif edccu vivilgt. Huyiwuz, a jttee-jubibd vugor wiladu ehhzuzudm giwtl ba urelcihdassu ot suu’ga zuuxtusl a mouno-xomap kevbiquf welrope ejekj.
Resources
Resources for an AI agent largely refer to the number of tokens a task uses. More tokens mean more money. The cost per million tokens is decreasing, but it can still be significant for certain applications. That means you don’t want to waste tokens unnecessarily.
Oya wegap doya lvase seo fipmb za falhulq xiwo gagerb zral xia cueb ad niwx yla mkec nuftuha pirluld. Lazj eins zaln vaskeak qtu jewaj eph yye bzunkoz, rqe OOSahvoju ucv MibeqXupzuvi digx kakt yogtaf obq foynid. Ad whih wepvihz ucf’r qaujuk, lsid ymt fap huq ig?
See forum comments
This content was released on Nov 12 2024. The official support period is 6-months
from this date.
Learn how and what to assess when working with AI agents.
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.