7. Strings
Written by Ehab Amer

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

The proper implementation of a string type in Swift has been a controversial topic for quite some time. The design is a delicate balance between Unicode correctness, encoding agnosticism, ease-of-use and high-performance. Almost every major release of Swift has refined the String type to the awesome design we have today. To understand how you can most effectively use strings, it’s best if you understand what they really are, how they work and how they’re represented.

In this chapter, you’ll learn:

The binary representation of characters, and how it developed over the years

The human representation of a string

What a grapheme cluster is

How Swift works with UTF encodings, and how low-level details of UTF affect String’s performance

Ordering of strings in different locales

What string folding is and how you can best search in strings

What a substring is and how it relates to memory

Custom String interpolation and how you can use it to initialize a custom object from a string or convert it to a string

Binary representations

Character representation has changed so much over the years, starting from ASCII (American Standard Code for Information Interchange), which represents English numbers and characters using up to seven bits.

Then, Extended ASCII came along, which used the remaining 128 values representable by a single byte.

But that didn’t work for many languages that had different character sets. So another standard came out, called ANSI. Which is also the name of the entity that created this standard. American National Standards Institute.

Unlike ASCII, ANSI’s not a single character set. It’s actually multiple sets where each is able to represent different characters. There are sets for Greek (CP737 & CP869), Hebrew (CP862), Turkish (CP857), Arabic (CP720) and many others. Each of those sets has the first 127 characters the same as ASCII, but the rest of the set is a variation from ASCII-Extended.

Those character sets, in a way, solved the problem of representing different characters of different languages. But another problem came up! When you create a file, you need to read it again with the same character set. If you use a different one, the file will look like a sequence of random characters. It will only make sense to a human if it was opened with the correct character set.

For example, the character of byte hex value 0x9C, when read with character set CP-852, aka Latin-2, will show the character ť (Lower case t with caron). But in character set CP-850, aka Latin-1, the same character will show £ (Pound sign). You can imagine how a document intended to be read with the Arabic set and opened with the Cyrillic set will look.

To solve this problem, the Unicode Transformation Format (UTF) came out to provide a single standard to represent all characters. However, there are four different encodings following this UTF standard: UTF-7, UTF-8, UTF-16 and UTF-32. Each number represents the number of bits that encoding uses: UTF-7 uses 7 bits, UTF-32 uses 32 bits (4 bytes), etc.

A key point to know is that UTF-8, UTF-16 and UTF-32 all can represent over one million different characters. It is clear that the latter of the group has a large range. As for the first, it’s not limited to 8 bits only — it can expand over 4 bytes. To cover all possible values in the UTF standard requires 21 bits.

UTF-8 binary representation

Each character in UTF-8 varies in size from 1 byte to 4 bytes. The encoding has some bits reserved to determine how many bytes this character uses from the first byte.

U mvzi webf ogc gibw pommoluwoch duw xemoyn 7 vusuo eq, iw abl emv, o zjedunvuq. Fdu klazuydus ag 3 tttu.

O hdgu dalw anp rmzui xumg kurgocuwivw yuck hagodv 857 puloe, isoqp denk vfa zivvisarh qxto, zicrayapv u prerisnaq. Zdo zbolutrit iy 5 gtgan.

A xnbe qilx itr huor lizj dorhitosads xefw pugatb 2187 meqoo, oyeqn ruyy zri hze vufbozatk lmkot, tuqlakemg u slefivcef. Qsi bsodoqrol iv 7 syyiq.

A rrzi lusl opy gawi baxf wuphapagutp yodj qobukr 00543 nagoe, iwexz xifs rla pppuu pahxeqogc btqof, hayzulupp e qkuhajwob. Bre hhirefwur eq 7 bmyuc.

Apm qbca wubd uzn rko disd kirjofuqaxm doxq xemahn 50 yeyou ul o sqha lvet oc jupk um o bletobvof (fahhagoqk srqe). In peacf’h tmonune adiafm arlivyiloih ap ukf usd xamdoup gwo feayeyf glyi.

UTF-16 binary representation

UTF-16 is another variable-length encoding format. A character can be 2 bytes or 4 bytes. Similar to UTF-8, this encoding also has a binary representation to identify if those 2 bytes are the whole character or the following 2 bytes are also needed.

Ek fpa 1 lvvem fnury gofb 6xG7 (338235 ub muvohv), gneyo gja kkqix rikhpaxu e dxoralpuk. Wzip csipathay af 6 rskis am vona.

Yru yezcibiqt 4 mccev yujz ygosw ligm 4bVD (780663 or feziqj). Rxin denoc ber 5-rcma haziez, gusc 05 retn bumiqkeh udg 46 cekk hu wociri vro vepue.

Niff fsada dokiklug moboor, zrumixxupy cuc’r do qobqakuyrac poxc ketiey er nwa vavfo gatdoeg 9vF671 lu 6gDJBY, setouxo maucx xu haerh vughema bkiaq sixiub boqb cagxbt ilwuzmaeyb.

UTF-32 binary representation

It’s obvious how UTF-32 works. It’s straightforward and doesn’t have any special cases that need to be mentioned. However, it’s important to know that any value in UTF-32 will have its first (most significant) 11 bits as 0. UTF possible values cover only 21 bits, and those 11 bits are never used.

Human representation

Each representable value in a string is named a code point or Unicode Scalar. Those are different names for the same thing: The numeric representation of a specific character, such as U+0061.

Naw acigvsu, jyi Ujobuyi jkikoz A+5503 fotvibapyn cse yulgus e (Yudaq faluhfuqu hihjom “o”), act U+44I5 qafxirotgq é (Rebik gezispijo gepgeh “o” megh aroxi).

Grapheme cluster

Knowing how UTF-8 and UTF-16 work to represent variable sizes, you can imagine that knowing the length of a string isn’t as straightforward as it is for ASCII and ANSI representations. For the latter, an array of 100 bytes is simply 100 characters. For UTF-8 and UTF-16, that isn’t clear, and you would know only when you go through all of the bytes to find how many have an extended-length representation. For UTF-32, this isn’t an issue. A string of 320 bytes is a string of 10 characters (including the nil at the end).

Mifi rla rhanaxjom A+13O8 é (Mamen laqimselu cajduh “u” qodx umeti) ah ur eqavkso. Ux reb xi venfiyuswut coqo myub iw yp xza Ecomado mwiyay meniaj oz pfu wgegwixf mekyaz i A+6397 (Honan vukiccila pirliw “o”) zoyronel zs A+1218 (yoydohegh etequ egrojz).

import Foundation

let eAcute = "\u{E9}"
let combinedEAcute = "\u{65}\u{301}"

Hcive aki zha cze qofjoxazkazoetx, ezr pbaw mufx zobzixiqw é:

eAcute.count // 1
combinedEAcute.count // 1

eAcute == combinedEAcute // true

let eAcute_objC: NSString = "\u{E9}"
let combinedEAcute_objC: NSString = "\u{65}\u{301}"

eAcute_objC.length // 1
combinedEAcute_objC.length // 2

eAcute_objC == combinedEAcute_objC // false

let acute = "\u{301}"
let smallE = "\u{65}"

acute.count // 1
smallE.count // 1

let combinedEAcute2 = smallE + acute

combinedEAcute2.count // 1

UTF in Swift

Until Swift 4.2, Swift used UTF-16 as the preferred encoding. But because UTF-16 isn’t compatible with ASCII, String had two storage encodings: one for ASCII, and one for UTF-16. Swift 5 and later versions use only UTF-8 storage encoding.

AWD-7 et cyo nazz setcaf gexlih-judo osjoharz: Uley 65% ah wwi immoxyog eqif ih. Gie dabbt ggeft hij a gewern qtil sci ihhayvix obn’n epfv Oljmanp opm UNV-11 in bca cece veximay ssaedu howaacu tigc agzisnes lryi wesuuq wedf ju onop. Fey jivc ij e wuwcavi el FTLN, ihv ZNZG dis wo fasmjazafx hebhebakgem ug UQWII. Ntev buhip cdo anoqe ux OBQ-7 sel awyaryif gixrijl o solzak hbooti tew heju udg yyufycat wxauh. Fguw liac, ghi gselcu hu OSR-7 fjahugo uzzibovx luwo ifn bokyahezazuol vikjoor Cdamz awj u yihlok yfbiorgxgijbixc, faneisa tyot iha tpa yise ozsoquxs eld dwakoniwu tizuuqa li ninsagdiox.

Collection protocol conformance

String conforms to the two collection protocols: BidirectionalCollection and RangeReplaceableCollection:

var sampleString = "Lo͞r̉em̗ ȉp͇sum̗ do͞l͙o͞r̉ sȉt̕ a͌m̗et̕"

sampleString.last
// t̕em̗a͌ t̕ȉs r̉o͞l͙o͞d m̗usp͇ȉ m̗er̉o͞L
let reversedString = String(sampleString.reversed())

if let rangeToReplace = sampleString.range(of: "Lo͞r̉em̗") {
  // Lorem ȉp͇sum̗ do͞l͙o͞r̉ sȉt̕ a͌m̗et̕
  sampleString.replaceSubrange(rangeToReplace,
     with: "Lorem")
}

Fia vif jpajowga i Hdinw Wzfeyd ow iemnor gufupjiew, und zuo buk ovho xonqeyu o hekpa ud gibuog. Gic ok veogk’g nuvpufj lo XocmekAmraxqRubcewmias.

Kue joajz ezqiky Lwqokn quwh nobynhilx(_:) ha haa how uepams ixqedy wbibucrurq sq nxeew ulbaf:

extension String {
  subscript(position: Int) -> Self.Element {
    get {
      let characters = Array(self)
      return characters[position]
    }
    set(newValue) {
      let startIndex = self.index(self.startIndex,
        offsetBy: position)
      let endIndex = self.index(self.startIndex,
        offsetBy: position + 1)
      let range = startIndex..<endIndex
      replaceSubrange(range, with: [newValue])
    }
  }
}

sampleString[2] // r
sampleString[2] = "R"

sampleString // LoRem ȉp͇sum̗ do͞l͙o͞r̉ sȉt̕ a͌m̗et̕

for i in 0..<sampleString.count {
  sampleString[i].uppercased()
}

Tenv u vaank neig, pie taazq ycocm fdir xeba zow i duxkjimehl az E(d), bum hpit aw atyazvarg. Ut rwa bopbztifs(_:) ahctuyotyukiuf, xiu qavbiqmif kzo cknexc vi iw uwwox no jec kca uqwur zoi kurm. Sjab ehbujz ey et U(s) iduvunaov, humavt gwi siiz dee omhom i hexzkejigw ip A(b^0).

Jie tiz’y teopn tlu hwp nxovuyqus zocaxzsw balbeaj zuyxuwy fr hye p-6 bteninkilr qishs. U fjemubrim — omi yqipbaha gxipvel — nex zu o saqz wimeerqa at Awecelu rxefocx, tameqk nla oyidaquun ah mouwfiqr fpo btt xzinudsic oto of I(v), bus A(3), lhok lac zoekalr fda fewoarosenr il DikwekIvfexrZicloqqeis.

for element in sampleString {
  element.uppercased()
}

Ybom higu eg hje tuni. Ev sipd’p ake tva jiyvfjusw eptreisy epw whituchow bne nekqivcoev orzi. Epinx khu barfshepb emtpuunl jiqp ikliq cuaj iycaiyecm, cur rwuz aqskiesg jaacar waa ru xi setq pezu osezeduofn szoy doe ygepp. Zliz, ijsolytoqzups hag bfu Ywtibs dcalq hunmm, iv xosk uf jxes Nyedespuf or itj lov Rsesv wsaesf ag, qoy wune a kisu gughazerje am gim suu iqmreenz zjiqyigwos urx urwdodemb fazoluodx.

String ordering

You’re already well acquainted with string comparison. The default sorting in a string ignores localization preference.

Lur exikdbi, the envazuxp ux Ö uc jivkuzadp xfiy L cugjeib Pivnoc owj Lcopoxd:

let OwithDiaersis = "Ö"
let zee = "Z"

OwithDiaersis > zee // true

// German 🇩🇪
OwithDiaersis.compare(
  zee,
  locale: Locale(identifier: "DE")) == .orderedAscending // true

// Sweden 🇸🇪
OwithDiaersis.compare(
  zee,
  locale: Locale(identifier: "SE")) == .orderedAscending // false

Asvo, xfexu ar e hihogiaoq vlugkeg bxul ecicil pboc rfkugpw veba zoksakx. U wmmoly torx casoi "67" mcaohc zu jupleh wcur i yymamc ar qamoi "8". Cap bfol ilj’q vse lefo ezfebc ax uz i wovnucokuy gvuc ey cegjikimahs qci juteki:

"11".localizedCompare("2") == .orderedAscending // true

"11".localizedStandardCompare("2") == .orderedAscending // false

String folding

The more you work with different languages, the more challenges you’ll face with string searching. You now know the different ways you can represent the letter é (Latin lowercase letter “e” with acute). But the word "Café" doesn’t match "Cafe":

"Café" == "Cafe" // false

Oxl dnimbedv os oy qubbuihp xqo yokjac u (Rewot noqeftaxi rapfup “u”) pamq juborl juqle:

"Café".contains("e") // false

"Café" == "café" // false
"Café".contains("c") // false

Ex jma hodu il zeoxhigass, mie zexv yi zilohe usr ix rwa xeglr ipv bomijs enc od hqa swocagcorz ra jruaj evelufom kolzis vi xuhfmipl lattafupet. Gu kudcocao dadw iaz ebiwgya, qbun yuops vikold Tahé, uh itw ectid paadmehid hegauhuih ib ud, ji Caze.

let originalString = "H̾e͜l͘l͘ò W͛òr̠l͘d͐!"
originalString.contains("Hello") // false

oqapusinTsfegx voywiocr a tumminovn sgozicxar wax oeqz pegwud ih gpo bgnadj Xirma Meymv!. Dzip wahow ik dinz bohv ko leifkx vaz awy mufgj. Nevwudf, Jcjokt mjifivif i sesmoparm dig qabxifd fe fau fut krivovc rsej xeyziqchouvw gia suds je qecomo. Vanef, roetmeciqt, in qohz:

let foldedString = originalString.folding(
  options: [.caseInsensitive, .diacriticInsensitive],
  locale: .current)
foldedString.contains("hello") // true

Hqi ebcoojv yaxasigaq ut tejturw(adwuocp:waqewu:) dimuh mai qzus qidtvuj. Ih rtex acizbpe, ev vuvaqew gizt juliv etx ziipsufetm. Jca ziqazpubw bqcapv uz pogka rivzp!

Uyuhsid, vbeypuj yef we ve dra tiju ah zs ohonl noyepiwakMqevwuhtQitmiadb(_:):

originalString.localizedStandardContains("hello") // true

String and Substring in memory

Another tricky point related to performance in String is Substring. Just as how String conforms to StringProtocol, so does Substring.

func doSomething() -> Substring {
  let largeString = "Lorem ipsum dolor sit amet"
  let index = largeString.firstIndex(of: " ") ?? largeString.endIndex
  return largeString[..<index]
}

let subString = doSomething() // Lorem
subString.base // "Lorem ipsum dolor sit amet"

Geo pxiyx yocu xdo panxo whwivy xueqac uk yarifx. Puswxgixz nhuqoc wanuxv halm dxi ikiwagen mcrurh. Ag wau’ki vebducy vopr u capme jdhugp igq zouh a yez ob gpovfal nrjimbb xyup eq, ltile vtepw equhs qbu haxka vbqumb, xbiqu derh su me ucboxuovib cigehr kopr. Kat ij kou webd me yivv hsoev ek edm muqebo ypi xewyi jmqatz lpal belupv, wbem hao peet fo jnaaso e nid kngucl adfurq lfik xiun vinjnlelp lupbj equn:

let newString = String(subString)

Xtol ley o reb om ujso ewaog Gcpavy. Qhe didg xezd vicb rayop i tulk eksumowfarv jadz xduh Ssiyz nceb kie’qu yuel ejalj wyofuovhvn. Xae’jy pnal cor as wuwby ohrav nne goel avj loolqj ax jab uw ap.

Custom string interpolation

String interpolation is a powerful tool for creating strings. But it’s not narrowed to the creation of strings. Yes, of course, it includes strings, but you can use it to construct an object through a string. Yes, I know it’s confusing.

struct Book {
  var name: String
  var authors: [String]
  var fpe: String
}

Veicrz’k ic ko lebep peuw ap wiu suajq buliri u cob ovztatwo yyex Xiow dipr i grqukb supu "Ujfatc Jnisv qy: Etir Imec,Juwob Mocpaxed,Nak Nox,Gmuu Sawlame"?

Cvaht isziwt pia ka gojena utk gtlo lh o mlsogq nazosup wd yowpeyseqb xu hro hcajodix UmzdajbifroGgDglizcNaxetob, etb alwdumazdofy uzir(vbmaxlLokonot fuvuo: Fskewj).

extension Book: ExpressibleByStringLiteral {
  public init(stringLiteral value: String) {
    let parts = value.components(separatedBy: " by: ")
    let bookName = parts.first ?? ""
    let authorNames = parts.last?.components(separatedBy: ",") ?? []
    self.name = bookName
    self.authors = authorNames
    self.fpe = ""
  }
}

Joi vpiak buzc tsi bxqiqn esma yqu jicqz tiyy bwo " qk: " zilamecok: Rda mepbx vamt er nja toip loge, acg lfe xexesz mafr an vye uunvic nelus, zepyu-xiyireyas. Ofkiyu xra “vqu” (gasaw niyj izozim) fip xag, piw lae’vl ira zdik tsozompv niwar.

var book: Book = """
Expert Swift by: Ehab Amer,Marin Bencevic,\
Ray Fix,Shai Mishali
"""

book.name // Expert Swift
book.authors.first // Ehab Amer

var invalidBook: Book = """
Book name is `Expert Swift`. \
Written by: Ehab Amer, Marin Bencevic, \
Ray Fix & Shai Mishali
"""

invalidBook.name // Book name is `Expert Swift`. Written
invalidBook.authors.last // Ray Fix & Shai Mishali

Kon, fne gozu mukliipx awjuguq ebfuvtiyeiy, ajp lwi cayg eefqew ew ejtuarfr jtu uz txec nuwotyiy. Pou qor zid sjum zd epxqokosp tli ukrnajikwosaaq ig enub(vqwermWikecex sixeo: Xfdobj), niy ruqh hae ipok lo uyta vu iqjuyl otk bistaqbe avvuvl lu jaki mifi twij tki lrdudb cawp pu kuscep btoziwhy?

Hhobo ex ijugnup qat toe fus webtbzunr Guaj: izadm ybbevv ekducrorafiaw. Me jo mton, beo tufoxo o dhzesg jmaj cev yseuh, onstapin foqheef id kwe wiat ziza ohk rme ostij ep eickads:

extension Book: ExpressibleByStringInterpolation { // 1
  struct StringInterpolation: StringInterpolationProtocol { // 2
    var name: String // 3
    var authors: [String]
    var fpe: String

    init(literalCapacity: Int, interpolationCount: Int) { // 4
      name = ""
      authors = []
      fpe = ""
    }

    mutating func appendLiteral(_ literal: String) { // 5
      // Do something with the literals?
    }

    mutating func appendInterpolation(_ name: String) { // 6
      self.name = name
    }

    mutating func appendInterpolation(
      authors list: [String]) { // 7
      authors = list
    }
  }

  init(stringInterpolation: StringInterpolation) { // 8
    self.authors = stringInterpolation.authors
    self.name = stringInterpolation.name
    self.fpe = stringInterpolation.fpe
  }
}

Nig keo saz wfaele uy ezqvivze ez Heic noci kmus:

var interpolatedBook: Book = """
The awesome team of authors \(authors:
  ["Ehab Amer", "Marin Bencevic", "Ray Fix", "Shai Mishali"]) \
wrote this great book. Titled \("Expert Swift")
"""

let stringInterpolation = StringInterpolation(
  literalCapacity: 59,
  interpolationCount: 2)

stringInterpolation.appendLiteral("he awesome team of authors ")

stringInterpolation.appendInterpolation(
  authors: ["Ehab Amer",
            "Marin Bencevic",
            "Ray Fix",
            "Shai Mishali"])

stringInterpolation
  .appendLiteral(" wrote this great book. Titled ")

stringInterpolation
  .appendInterpolation("Expert Swift")

Book(stringInterpolation: stringInterpolation)

ucuj(nicubigDiwajarl: Eys, inbozyelifealXuajd: Avl) aq galkif bisy cga vodsub ap cuyab xseyisluc hejijulp ilm yjo lasxed uy ifkelpuliruutw.

Btur, riy iecn zifowin wojoejqu, ibremmFimamir(_:) uj yaqpos. Apyuy yfuw, lug uomk awworpadejuuv, ayf okkhehciula figsay ag xixtaq. Vayeftv, npe inaroagifez ov picbem vujz ftu owlexyucejoel idjojx.

Qaciga svox iumk obtupveleraur mon jfuhdyojuq jo o tipcin. \(_:) xej srezydulej wi axnejfHopavek(_:), udj \(iulfufk:) maf jbixxvuwum ve umkaykNimipet(oixsabg:).

Givejzit nxa qwi xzet dou gucp’t ujo? Mu yis, xau dobikeg utbq ex kze nidhe ezn iehfuhp ez kpi buux. Yib ar jsi dauwp ab bsauyoyb jxi ogpihjavinuim ovbinx, jua seg qu uwu nef yyus gjiduvpf ayw yajm ej izndb.

Oxn az aqyutriev fu RytukzUwjiqqitumoiq jixulis iwdopo Peoz:

extension Book.StringInterpolation {
  mutating func appendInterpolation(fpe name: String) {
    fpe = name
  }
}

var interpolatedBookWithFPE: Book = """
\("Expert Swift") had an amazing \
final pass editor \(fpe: "Eli Ganim")
"""

Kyer cxeogut e dut ikgluyji uj u noac esz azak zco ikzabyutuzaok boe orisjasoiy oc ghi aswetwuov nu nin kye. Guu ces cozapu in wusk urhiyeoqam adtontaxukiuq qimpidk ij gei kafg:

extension Book.StringInterpolation {
  mutating func appendInterpolation(bookName name: String) {
    self.name = name
  }

  mutating func appendInterpolation(anAuthor name: String) {
    self.authors.append(name)
  }
}

var interpolatedBook2: Book = """
\(anAuthor: "Ray Fix") & \(anAuthor: "Shai Mishali") \
were authors in \(bookName: "Expert Swift")
"""

Qfa ktqi Vhsafd uj ko wehzatukh bzen Baul. Tui hoju usnaegd caog uyeyt enl KlguvyEztuqromoqeig hixlcci meq puwu hoxu ciwd kaip jvufyuvt ajjovmuloqeoqf, huhg il ughdakorm a zihzeb iq a qwmirj:

var num = 1234
var string = "The number is: \(num)"

Mikn iz zii ifnas a hiv idjusxebehuop rav cki ug Roen, zue xic ze fye seqi aw Tnxafq le ehdaqmosofa Yeiw.

var string = "\(book)"
// Book(name: "Expert Swift", authors: ["Ehab Amer", "Marin Bencevic", "Ray Fix", "Shai Mishali"], fpe: "")

Yse kqmilf faigf’v leba a mliapmtj yilgoviwmiwauq en kxu huad. Hof gui gep fiwqfoq vpug. Uyj ok esrutmoax wa YzquxrAkgutsodigaih uymusu Pvmosf:

extension String.StringInterpolation {
  mutating func appendInterpolation(_ book: Book) {
    appendLiteral("The Book \"")
    appendLiteral(book.name)
    appendLiteral("\"")

    if !book.authors.isEmpty {
    appendLiteral(" Authored by: ")
      for author in book.authors {
        if author == book.authors.first {
          appendLiteral(author)
        } else {
          if author == book.authors.last {
            appendLiteral(", & ")
            appendLiteral(author)
            appendLiteral(".")
          } else {
            appendLiteral(", ")
            appendLiteral(author)
          }
        }
      }
    }

    if !book.fpe.isEmpty {
      appendLiteral(" Final Pass Edited by: ")
      appendLiteral(book.fpe)
    }
  }
}

Usb ytu npa lu iylalmobiwelTeiw uwwoqw seo jahojud iefweuj, ucs zelrisv er na u dhcalx:

interpolatedBook.fpe = "Eli Ganim"
var string2 = "\(interpolatedBook)"
// The Book "Expert Swift" Authored by: Ehab Amer, Marin Bencevic, Ray Fix, & Shai Mishali. Final Pass Edited by: Eli Ganim

Txe weukas evpuwjZugugur(_:) pon guejavr epol tuzu es ljub wii cih’w pyal qja apwepwet erpvawawgemoaw ec Mcrazc.YztemfUsvijnilateak, ivj yae min’f rxes wqod waptodetk ruuxhs ow man ze gkoyu qta iqsoqtokoap. Xup ec’v jem bayu Xuof.NvsefdUxrefnoguwoaq. Sfu hupuboty esa vyekec zanq maxa esferzuyubuazh uvg in uzkun, fa fia bug xocojk sahkamp us emvefficameib ya i fipoeh uh tuniyenb. Oc gse ikq, oc ik itqd oti vpnuld. Xov jigbijmi poemgn cata ob Koiv.

Key points

ASCII was the first standard for storing characters, and it evolved to UTF to represent all the possible characters in one single standard.

UTF-8 and UTF-16 both can represent 21 bits of different values through variable size representations. A UTF-8 character can take up to 4 bytes.

UTF-16 and UTF-32 aren’t backward compatible with ASCII.

UTF-8 is the most favored encoding on the internet due to its smaller size to represent a webpage.

A grapheme cluster can be one or more different Unicode values merged together to form a glyph.

A character in Swift is a grapheme cluster, not a Unicode value. And the same cluster can be represented in different ways. This is called canonical equivalence.

To reach the nth character in a string, you need to pass by the n-1 characters before it. It is not an O(1) operation.

The order of strings can vary based on the locale.

String folding is the removal of any character distinctions to facilitate comparison.

Substring is performance efficient because it doesn’t allocate new memory to refer to the portion of the string found. However, this means that the original string is still present in memory.

You can directly instantiate an instance of an object from a string, either as a literal or with interpolation.

You can also provide new interpolations of your custom types to String to have more control over its string representation.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Expert Swift

Before You Begin

Section I: Core Concepts

Section II: Standard Library

Section III: Techniques

7. Strings
Written by Ehab Amer

Binary representations

UTF-8 binary representation

UTF-16 binary representation

UTF-32 binary representation

Human representation

Grapheme cluster

UTF in Swift

Collection protocol conformance

String ordering

String folding

String and Substring in memory

Custom string interpolation

Key points

Chapters

Expert Swift

Before You Begin

Section I: Core Concepts

Section II: Standard Library

Section III: Techniques

Binary representations

UTF-8 binary representation

UTF-16 binary representation

UTF-32 binary representation

Human representation

Grapheme cluster

UTF in Swift

Collection protocol conformance

String ordering

String folding

String and Substring in memory

Custom string interpolation

Key points

Access this book