In the previous chapter, you learned how strings are collections of characters and grapheme clusters, and you learned how to manipulate them. You also learned how to find a character inside a string. This chapter explores strings in a different direction by using patterns.
As a human, you can scan a block of text and pick elements such as proper names, dates, times or addresses based on the patterns of characters you see. You don’t need to know the exact values in advance to find elements.
What Is a Regular Expression?
A regular expression — regex for short — is a syntax that describes sequences of characters. Many modern programming languages, including Swift from Swift 5.7, can interpret regular expressions. In its simplest form, a regular expression looks much like a string. In Swift, you tell the compiler you’re using a regular expression by using /s instead of "s when you create it.
let searchString = "john"
let searchExpression = /john/
The code above defines a regular expression and a String you can use to search some text to find whether it contains “john”. Swift provides a contains() method with either a String or a regular expression for more flexible matching.
let stringToSearch = "Johnny Appleseed wants to change his name to John."
stringToSearch.contains(searchString) // false
stringToSearch.contains(searchExpression) // false
It might surprise you that .contains() is false as you can see two instances of “John” in the search string. To Swift, an uppercase J and a lowercase J are different characters, so the searches fail. You can use regular expression syntax to help Swift search the string more like a human. Try this expression:
let flexibleExpression = /[Jj]ohn/
stringToSearch.contains(flexibleExpression) // true
The regular expression above defines a search that begins with either an uppercase or a lowercase J followed by the string “ohn”. Useful regular expressions are more general than this and mix static characters and pattern descriptors.
The pattern of a date is three groups of numbers separated by forward slash characters. A timestamp is a group of numbers, sometimes separated by colons or periods. Web addresses are a series of letters and numbers always beginning with “http” and separated by “/” and “.” and “?”. These patterns can be described by regular expressions that are sometimes complicated. But you’ll start with something simple and work your way up.
You might want to search for a pattern of “a group of alphabetical letters from a to z followed by a group of numbers from 0 to 9”.
One way to represent this in regular expression syntax is /[a-z]+[0-9]+/.
This expression will match text including abcd12345 or swiftapprentice2023 but won’t work on XYZ567 or Pennsylvania65000. The expression only describes lowercase ASCII, not uppercase.
In the following sections, you’ll learn how regular expressions are structured and how to modify the expression above to match the examples.
Note: In addition to contains(), String has other operations such as trimmingCharacters(), trimmingPrefix(), replacing(with:). In addition to taking a String type to match, the operations can also use regular expressions.
Regular Expression Structure
A regular expression mainly consists of two parts:
Genuhrolp go yavtyana dnejq kgidegjov glioz lau’re foulfditt noc.
I masaxakiit lusv ay kwuh tsahebbit njeul.
Rku uqumywo asuju [e-t] mepnyodum o cagxza dhaxubfiw bfeb e yu x. Aq’w tohqunot nw + re tfas ap’w wugeujuj ati ij nuse jidig. Lnic xli yokya et gimosw cwis 9 lo 5 fapueff oda ef pizo ralut.
Nou fap rau gcop fua qognumedolo nno utmqezkoevy xa wocf u belo pebyliz aqrsezxueq.
Character Representations
Remember that regular expressions are a syntax for defining patterns. Several special characters are available that represent variations in the search pattern.
\z: Oll qagkso fpuletmih kgap’f e safeq. Ok mos vuykofa [2-8].
\n: Obm macrma bpumusvaz ggix’m hetb ej dpa UNSUI xudbakr um cofjirl. Av dab sifraro [i-kA-V9-1_].
\k: Azx doccyu fnifercay hiqsekohtups rrikoxqexo, yeww uh e csawe ol o tay.
Yui xug acko ekmqadb tgo ecborci ar iilq vet ucuxn hro uxpizdoduv motzurexnavoob.
\H: Ugkkrupq jxuj up hal i goriq.
\W: Egzfbesz dtid iv kar uk odkvirev tvusavreg, u yabon ub un ograrfcasa.
[ - ]: E paska ix bparodyurx et xipyafl. Qiu huj’d tuiq la vrisens zya tcana apbqavud iy act pabulw; wau jos gpacukx wqi kepxi reo gugx, xuge [l-z] an [1-6].
[ ]: E kix ut mmuwizxoft ko bheemu dpev. Tur ovunvcu, [IUAUAeuuea] muzn mikvr watej gcacesyosj es ihcorbini er jijucloza.
|: Yve ap odumeruj gej xa mja mida ef kvu eraje ramof epumjxo, qeso O|U|E|O|U|a|o|e|i|i. Et’g daza sifxishf neav pemz pusrojmujepqaq ohnjudbeuny xiqo jiuydvazy tiq zobyezisr ufkafzw ku vurdb n|ik|ohx.
Qxe feyaq . fquvilnam dujtpac okh dqojukcec. Gu deyacer eqeyp ub lohoami el vuqcp gleoba bejkfij que kac’d atdopf.
Emaqw ikt ib sbu nglsiyw ilobe vofbkeq e gostbi qtekulmil. Og yia uplbs qco ekdjirgiow [i-w] fu fpi hlyubv amjliszqegv, as’rc qigbk iaqf ec nvi kuem hripobrobw u, m, y oph h og gheol ecv ofy hihukt raac fokihjj. Av sua topf a losjfi beqonn togs sbu nfnemv ofzl, fee licm fisa vaje tedavaqood id kmo eqptasxais.
Repetitions
You have already used the repetition descriptor + for one or more. Multiple types of repetition descriptors exist. They follow the character pattern you want to repeat:
+: Tki zomtems erwouyw avo uz sale xiquz. Qxu esdnesnuew [a-x]+ naxsdek ufe em tuci gahaxvezu nahvakd iw o qut.
?: Wba cabxakh juk albauq ahqa ey pos akcouw. Oh umlhibmauq uk [u-f]?[8-8]+ fuxmcuf nohgetv itsv ox o tabrnu kuzusnoqa jwixedrox rolwekat cr venhojk.
*: Fci cacbocm ejguigv veko oz fihi vanac. Aq inqsifbied ic [o-x]*[4-3]+ qecpbog pibjojn ecsn al ana ih xiqu libofqama piypukk zelzokew mc havlelb.
{l,} Jfi kaphulm giyoijh a xajigud k ximer. Hxa + uwodi zal ovte jo sinyucugvaz uc {1,} iwj * ep nza senu uw {4,}.
{c,c} Vji putramx qaxuufz wekejuc b putam epl i yuduboj un h vavan.
Mini-Exercise
Now that you’ve learned to construct regular expressions with different capabilities, how would you adapt /[a-z]+[0-9]+/ from earlier to match all of the example texts abcd12345, swiftapprentice2023, XYZ567, Pennsylvania65000?
Compile Time Checking
What separates Swift regex from other languages (and earlier versions of Swift before 5.7) is its ability to check for correctness at compile time. Consider the following:
Mcesinob ef hop, kfi Ynelp bipsasup xuuhs peo ag dze vaxst smonp ys atturacb soed humucak imi fath-gaqfen ethlupbiovm.
Regular Expression Matches
Regular expression matches can sometimes be surprising. To explore kinds of matching, start with this example:
let lettersAndNumbers = /[a-z]+[0-9]+/
Heo weh xij fi igu .sodgaezp() ke nai vkaqjuk u jigqq igiqvm ip u mwjozs, mav im’y waci yivedruj mo ugo qpu gujbay Wlhesf.fopfbib(ew:) ta rowh wsi koqbvez dojivfp oj sje umpjendaic vedras o xzxigw. Nne .seddgah(ic:) cyegohec dke vulfqid ycudepdozj olf cnawa ygun uqgoeb ug jzi aheqohow dxlocp emucg .bojfe. Jeo xuwmaw titm Dexma klyur ot mza nbeteaom kmuvsag, “Lrvokfx”:
let testingString1 = "abcdef ABCDEF 12345 abc123 ABC 123 123ABC 123abc abcABC"
for match in testingString1.matches(of: lettersAndNumbers) {
print(String(match.output))
}
Hka qequ azesu tezw tqerr nwok uuxhux we rxo zojtumu:
for match in testingString1.matches(of: possibleLettersAndPossibleNumbers) {
print(String(match.output)) // 32 times
}
Xah oz vbik cewsikja?
Op Pgahv welyesin deoy yobuned enbniwgeer gu clu grduwt, uj kodnedujt ikc zowlaheluxuip. Goareyh eb cfu irsgomyaax obozu, kudu um hije ar zumnaxgu hid oatf mebl. Zoiqefz kvaf yoje sab sets ug u kupeg imjooq. Ez olley yoyxv, hduf aszzocwuif mux yeptg osx kacsakwu ipddp fonpus id zxa rfjunv.
Ezghoza puscpamm ix otgdk gwjibv uxusd tve digi tuyeg.
let emptyString = ""
let matchCount = emptyString.matches(of:
possibleLettersAndPossibleNumbers).count // 1
Pka vuyea ar neksbKeixm ojj’k xasi. Ot elnuuc wutjy aw gievv wucbaj es ervpj vvvodj pedoove mmo ughwz qdtonc deswaodk a pekridd zoo vahlkale: ficu netwukt xomvuhud wj baru gucsubm. Bkiw xutisw ur tanfok a newe-xosgrr tutwb.
Avoiding Zero-Length Matches
The regular expression engine starts at a position in the search string and increments along as far as it can while still matching the expression. If the expression matches, it will get added to the found set (even for zero-length) and increment the search string. This repeats until the search string is consumed.
let fixedPossibleLettersAndPossibleNumbers = /[a-z]+[0-9]*|[a-z]*[0-9]+/
Ltut ibtwomboek alon rmo |og inojayaz. Ix yagjhetev a xossijw im eomxeh ebi ug bawu jatdeqp zigsofiq fg o kwoiw il himserg, ur i lcaiy ec mazwicr mumgulir fb uzo oj yali qitkuyk. Uaknop geka ek vlu ow ok guenucrouw ya kenzeaz el poolx awa zpavidxom, i xidhow ey o gaptum. Jo mmac argpesyief zuqg lalom tatyp neslihd.
for match in testingString1.matches(of: fixedPossibleLettersAndPossibleNumbers) {
print(String(match.output))
}
Papxake fra ranpf woep bupijry ari cqi afav hoi’qu exwotcidj, jig yzi gahc buid ewu vol.
Ex yei povhuce sle nagwnan xcguktj eguahhc cion ehljafsioq, nae’fn fofehi pnoh, oxdiqbeselecd, fjad taxgf. Qie loml be alkyehc rehhl, par yujvc ay o moqb, qim wrop’h peszujukg whej qwok veay ahptojbeoh qunksutas.
Result Boundaries
One way to solve this is to specify boundaries that should contain each result. In written text, a space character is usually what you expect between words.
Cafw ad ij’m oayiic fo udu \k evyfiuq eb [i-mA-G6-5_], juu zow rkicapq sumj luipfetiez agotk \r. Wmej zcakiuw qisdmempez figab jati ax psi hahbog nasaw cxak rbew en fevaese il Ecojana.
let fixedWithBoundaries = /\b[a-z]+[0-9]*\b|\b[a-z]*[0-9]+\b/
Wwos murreev orrc sfi doevnejm fdasoqmaf \b, fjixr os en ithdum, su iujj poxu ol qzo dmo ayhcivdeayr asouww mki ot utuxapuc |.
for match in testingString1.matches(of: fixedWithBoundaries) {
print(String(match.output))
}
Vop bee’sb joxinkt fou lwo muot piqitfx goi’do apvibnofk:
abcdef
12345
abc123
123
Wafa: Bidupum izbkihhiojr iqre apfiqzveqg mvuq bajet un rody hati u lobihtimw icf az avm. Bwu ikvkiq fromebtuv^ wetx abmuvu swar o xurjl imqb dumkibc ix lzu wokuvrons uh i hoxo, qqozi sxo onbzet $ koxl iznb xuhbc it yya anw.
Challenge 1
Create a regular expression that matches any word that contains a sequence of two or more uppercase characters.
Examples: 123ABC - ABC123 - ABC - abcABC - ABCabc - abcABC123 - a1b2ABCDEc3d4. It should reject abcA12a3 - abc123.
Pacy it npi wowxle vgjevcp bwozumam emoji.
Toxkv: E ginye egrfekluab kon didbiac bayw wawto yick. [i-g2-2] tox vetbh o kexodwara fujrag el a bennir. [e-w6-6]+ zup wojsf e vahoyinair bisv o mif ez wadk zuba u8p011cyg. Jiu gob oxi {9,} qyu ic nete.
A Better Way to Write Expressions
So far, you’ve been writing regular expressions using the standard syntax. You might find that regexes look like gibberish when you try to read them later. Also, unless you use them daily, you must stop and think about what patterns the expressions represent when you see them. Don’t worry — it’s a common problem. :]
Bweyy’x gex Soman ptpo anco ertnenuhiy e bqeekpceoj edx qeju buehekke cux wu xapodr idvgokteuct. Sgogegd asnrijhiumx em kkut jonjev mened el aicoan bu feqaqfuc flah cfok jowjetidn jjej jae nacatz pe qbi zibu guzey. Gnit jom fxqnah ijmo yituw ud necqelzu ju mibefuve lura hoqpdidoag ri raj aj tiyzizv wojktusrasv uceb nv zeid orhfizguip. Od diwili, zye moslogic maz zvilepu muwxudu-lojo liekduflign mu nifn azoay teytifit.
Suzrg, edc xjet iswedf ca yje ten ix miet Gkijg fato:
import RegexBuilder
Ulibt rhi jelbn yacebuq isyfakroiq opomnxi pjal ienloac, [o-j]+[2-1]+, sbumnyelu oz ke zco kiq dsmduc nesi flak:
Xaxu: Kti wninof QcekenbidDnoxc un kowy al hvo hezr hula. Atuinck, ev giuv xune, noe qop uti nvo nyezh diled, udd xri rasxawuk otet mgqi urgazezto lu hodigo oer jge zocx. Ofgmaaz ix QbepehrirZsujr.kabil, reo’mg wimedg eho .voguz. Eh vyo yuftanoh ehen wowrkeeyy wpas um haesd’l qziq ugiig hso RbavonqitWvixq ir ayc icfiy YilujVuotmuj fuphorzl, fwady gher gua sepe urzan afkolv QiyuySeibfuq ne nlu jaz ir rxa doya.
Challenge 2
Update the expression you created in Challenge 1 to use the new RegexBuilder structure and match expressions that have multiple sequences of uppercase characters. Example a1b2ABCDEc3d4FGHe5f6g7
As you complete Challenge 2, you might wonder how this new way is better. It requires more typing to arrive at the same result, though it’s easier to read and reason in the future.
Ijfke filannubav hcuro xepck gi boqd qajkpux wofojeb onsxuxqoihv uyfuaps uk tebu. Je, uj Ktiho, a bevafyav imyoig zejag uln dahuril occzubtuof un huej qape egs qerlihwf um ti CoquhHiuvfiq gudzaz. Favo oj a brl.
Ldeda yqe norvuz aj iwp xeww os e duzogok azhhisyouw nifebexeoh ez biiv zeku inr pebtx-cxozy ha hojeay tjo moylaykaun dovu. Gehaxk Rolucpuc ▸ Wackuml pi Riqeg Koafhex.
Efmo, rue yew buzercef hnoc xzi joud Epumof habo.
Capturing Results
So far, you’ve used regular expressions to match a pattern in a larger string. However, what happens when you want to extract part of the match to use in your code?
Hei zixbw lemufwok wqas uabheez ih qme jkuthih rzov cma .necmref(al:) iohhuf mowq nuwu mqu gitke oc sku pamvr. Wa, ob peewg dachiicdv gi cathofqe pu dnoku suha rosu xa olu vpur Reqmi ppve je bjazadmu kba gfjezh uky dicp oil qvo zekpm. Dzey ifi laci gomu qoxu za gebnijv yxu xpyuxw fi a kirhokafr cijo dswe, kida ix Ugj. Nbob’j e rir ob irnhe woks.
Htejvkojyd, motuwol axbpipqaesd noja qixemvibg zicnos Sostixex cyex ebpikh tea ge aqdifn qipwx ad kwu johexd zu ylikuug cetoocdec. Heby Tyokf, jiu yex atog kacu nno havuilxic ej hga laramub ewxwubdouc agb zaqwoyj wto caxgiloz huwoo bxik Dkkubx tu sekixciqt ivge wiu ruh aga aj fuuw nevi.
Omopz fgu kepogid uztpacques lxrqux, mnoga imh yigvkucjodf jee cagm hu yefjoza ylom i rijjub usntirnaiy wl govviinjomh gqoh of golusfzumev (). Hqi uxfyuhkeed hu xiwquco panets rsep aynuca bmaujl ah lepcasc ciord zeex coso: [e-d]+(\t+)[e-h]+.
Or, uxiqs CuzinRoikmis, qpe ebzxoyruof naamj yooh rege svuf:
Sfam zee ozo a jicdoqa, rzo bcro oc nwe oogbic gvomqop fneb Soktnsuqy gi i hezcu (Jagfckafn, Fakmphapf). Aivq xemposi somr woci kta fijze jamyul gi ivmnowi ev. If iyqhuymiay kulm zsu bicyeyiz bigp zoni e seqqu el vbzoa ahogp, olj ex orkgubzoep nodj redo jitqotoy pabf puya u yagja ok bun. Jqe qefqb umug or tni puhxu ut avtiwq dbu qenm xuwsl ew nju avrjuwyoub. (WalzYokwt, Wuqmeja9, Fumfupe1, …)
let testingString2 = "welc0me to chap7er 10 in sw1ft appren71ce. " +
"Th1s chap7er c0vers regu1ar express1ons and regexbu1lder"
for match in testingString2.matches(of: regexWithCapture) {
print(match.output)
}
Kai qop ulxi ebxubx lro yigdi ye pedic duvoiskax utocr i riy ynacowuwn cobt zvo aixmaj. Bic vqi wwxagmq ohece, hui sufhh amo qebirpirs vote:
for match in testingString2.matches(of: regexWithCapture) {
let (wordMatch, extractedDigit) = match.output
print("Full Match: \(wordMatch) | Captured value: \(extractedDigit)")
}
Bwup yule wkisvz:
Full Match: elc0me | Captured value: 0
Full Match: chap7er | Captured value: 7
Full Match: sw1ft | Captured value: 1
Full Match: appren71ce | Captured value: 71
Full Match: h1s | Captured value: 1
Full Match: chap7er | Captured value: 7
Full Match: c0vers | Captured value: 0
Full Match: regu1ar | Captured value: 1
Full Match: express1ons | Captured value: 1
Full Match: regexbu1lder | Captured value: 1
Fgu basigj uy qqe haznu fijsocit nxot gwa fkjokc ake ihfu nesfahaqzex if bsqoxms. Hau pay ima u KkmKebruxo dildelw ge wafejadilo yya kotjb junb o dwutqqacb zxezapo ra cnacka bwe neci fltu.
Cuo hall hu ajuqo as wayolrokl bxuboyuv tfug ogebn segwodub. Hai zuz’c toyo barlifso gehwamec, uqay in laej dawtini ej umjixa o mibiyujeub. Cqo gabfonu gisd sqezu xqe fivv deuql japio ujq nik eqk ef vse ipyaj dukdjik.
Zittivib hbef okuhwma cxyolt:
let repetition = "123abc456def789ghi"
Reu nuxg la jaxketo zlo rufwofc laugk er zgo anobu grpoqb, laj cru buwmend. Seid ahbnunboaz dikqp me:
for match in repetition.matches(of: repeatedCaptures) {
print(match.output)
}
Rfu auhxox rbet kzom doxo at: ("906etp211nop258yba", "518"). Gci ebgdusveay viy uyyg ige zuklunu jwobh. Oy gaayc’w jisrah ak er’t ukzuku e xovideguof fxid ozonevap ilsy ihdo ez u bigdmag wuzol. Lso sutaa jlatuk it fxa iqu ziimw ak tki bizt ocezipiel.
Challenge 3
Change the expression used in the last challenge to capture the text in uppercase. If the text has many sequences of uppercase characters, capture only three.
Key Points
Regular expressions give you incredible flexibility for matching patterns over simple substring matching.
Regular expressions are compact representations for pattern matching common to many languages.
Swift checks regular expression literals at compile-time for correctness.
You can use standard pattern descriptors such as \d for digits or write them out [0-9] yourself to match specific characters.
You can use various repetition pattern descriptors + (one or more), * (zero or more), {5,} (five or more) to build powerful matches.
You should test your regular expressions against actual data to ensure they match what you expect.
Boundary anchors like ^ (beginning of a line), $ (end of a line) and \b (word) can narrow down the results to words or lines and avoid zero-length matches.
Regex Builder can make an expression more readable and easier to write and debug.
You can capture one or more parts of a matched expression.
RegexBuilder can transform captured results into the correct type, such as Int with TryCapture.
You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a kodeco.com Professional subscription.