Swift Regex Tutorial: Getting Started

Master the pattern-matching superpowers of Swift Regex. Learn to write regular expressions that are easy to understand, work with captures and try out RegexBuilder, all while making a Marvel Movies list app! By Ehab Amer.

Leave a rating/review
Download materials
Save for later

Searching within text doesn’t always mean searching for an exact word or sequence of characters.

Sometimes you want to search for a pattern. Perhaps you’re looking for words that are all uppercase, words that have numeric characters, or even a word that you may have misspelled in an article you’re writing and want to find to correct quickly.

For that, regular expressions are an ideal solution. Luckily, Apple has greatly simplified using them in Swift 5.7.

In this tutorial, you’ll learn:

  • What a regular expression is and how you can use it.
  • How Swift 5.7 made it easier to work with regular expressions.
  • How to capture parts of the string you’re searching for.
  • How to use RegexBuilder to construct a complex expression that’s easy to understand.
  • How to load a text file that is poorly formatted into a data model.
  • How to handle inconsistencies while loading data.
Note: This tutorial assumes some familiarity with iOS programming, including working with strings. If you’re new, consider starting with a book like SwiftUI Apprentice or a video course like Your First iOS and SwiftUI App: An App From Scratch. That said, feel free to give this tutorial a go and jump into the forum (linked below) to ask questions!

Getting Started

Download the starter project by clicking Download Materials at the top or bottom of the tutorial.

The app you’ll be working on here is MarvelProductions. It shows a list of movies and TV shows that Marvel has already published or announced.

Here’s what you’ll see when you first build and run:

First build and run shows the same movie repeated. You need to fix the data with regular expressions.

You’ll notice that there’s only one repeated entry, which is great for Moon Knight fans but a bit disheartening otherwise. That’s because the app’s data needs some work before it’s ready to display. You’ll use Swift Regex to accomplish this.

Understanding Regular Expressions

Before you get into the app directly, you need to understand what regular expressions, also known as regex, are and how to write them.

Most of the time, when you search for text, you know the word you want to look for. You use the search capability in your text editor and enter the word, then the editor highlights matches. If the word has a different spelling than your search criteria, the editor won’t highlight it.

Regex doesn’t work that way. It’s a way to describe sequences of characters that most text editors nowadays can interpret and find text that matches. The results found don’t need to be identical. You can search for words that have four characters and this can give you results like some and word.

To try things out, open MarvelProductions.xcodeproj file in the starter folder. Then select the MarvelMovies file in the Project navigator.

Xcode open with the file MarvelMovies open

While having the text file focused in Xcode, press Command-F to open the search bar. Click on the dropdown with the word Contains and choose Regular Expression.

Note: If your screen looks messy because lines wrap around, you can toggle this by pressing Control-Shift-Command-L or choosing Wrap Lines from Xcode’s Editor menu.

Selecting Regular Expression search mode

In the search text field, you can still enter a word to search for in the file like you normally would, but now you can do much more. Enter \d in the text field. This will select every digit available in the file.

Xcode highlighting all the digit characters found in the text file

Try to select numbers that aren’t part of the id values that start with tt. Enter the following into the Search field:


Xcode highlighting all the digits in the file except the digits in the id

The regex you just entered matches any digits in a word that don't start with tt. The breakdown of this regex is as follows:

  • Word boundary: \b.
  • Negative lookbehind for tt: (?<!tt).
  • One or more digits: \d+.
Note: To better understand regular expression syntax, consider searching online for some cheat sheets. An Introduction to Regular Expressions is a good starting point. Be sure to check out its playground, included in the downloadable materials.

Swiftifying Regular Expressions

Swift 5.7 introduces a new Regex type that's a first-degree citizen in Swift. It isn't a bridge from Objective-C's NSRegularExpression.

Swift Regex allows you to define a regular expression in three different ways:

  1. As a literal:
  2. let digitsRegex = /\d+/
  3. From a String:
  4. let regexString = #"\d+"#
    let digitsRegex = try? Regex(regexString)
  5. Using RegexBuilder:
  6. let digitsRegex = OneOrMore {

The first two use the standard regular expression syntax. What's different about the second approach is that it allows you to create Regex objects dynamically by reading the expressions from a file, server or user input. Because the Regex object is created at runtime, you can't rely on Xcode to check it, which is a handy advantage to the first approach.

The third is the most novel. Apple introduced a new way to define regular expressions using a result builder. You can see it's easier to understand what it's searching for in the text. An arguable drawback is that this approach is more verbose.

Now it's time to see Swift Regex in action. Open the Project navigator and select the file ProductionsDataProvider.swift.

ProductionsDataProvider.swift open in Xcode

Loading the Marvel Movies List

As you can see, the data provider only loads a few sample objects and isn't loading the data from the MarvelMovies file. You'll use regular expressions to find the values and load them into an array of MarvelProductionItem objects. You might wonder, "Why do I need to use regular expressions to load the file? It looks clear, and I can separate it with normal string operations."

MarvelMovies file open in Xcode

The answer is "looks can be deceiving". The file looks organized to the human eye, but that doesn't mean the data itself is organized.

If you look closely, empty spaces separate the fields. This space can be two or more space characters, one or more tab characters or a collection of both of them together.

Using usual string splitting is possible if separators are explicit and unique, but in this case, the separator string varies. Also, spaces appear in the content, making it hard to use conventional means to break down each line to parse the values. Regex is ideal here!

Reading the Text File

The first thing you need to do is load the text file. Replace the existing implementation of loadData() in ProductionsDataProvider.swift with:

func loadData() -> [MarvelProductionItem] {
  // 1
  var marvelProductions: [MarvelProductionItem] = []

  // 2
  var content = ""
  if let filePath = Bundle.main.path(
    forResource: "MarvelMovies",
    ofType: nil) {
    let fileURL = URL(fileURLWithPath: filePath)
    do {
      content = try String(contentsOf: fileURL)
    } catch {
      return []

  // TODO: Define Regex
  // 3
  return marvelProductions

This code does three things:

  1. Defines marvelProductions as an array of objects that you'll add items to later.
  2. Reads the contents of the MarvelMovies file from the app's bundle and loads it into the property content.
  3. Returns the array at the end of the function.

You'll do all the work in the TODO part.

If you build and run now, you'll just see a blank screen. Fear not, you're about to get to work writing the regular expressions that find the data to fill this.

Defining the Separator

The first regular expression you'll define is the separator. For that, you need to define the pattern that represents what a separator can be. All of the below are valid separator strings for this data:

  • SpaceSpace
  • SpaceTab
  • Tab
  • TabSpace

However, this is not a valid separator in the MarvelMovies file:

  • Space

A valid separator can be a single tab character, two or more space characters, or a mix of tabs and spaces but never a single space, because this would conflict with the actual content.

You can define the separator object with RegexBuilder. Add this code before the return marvelProductions:

let fieldSeparator = ChoiceOf { // 1
  /[\s\t]{2,}/ // 2
  /\t/ // 3

In regular expression syntax, \s matches any single whitespace character, so a space or a tab, whereas \t only matches a tab.

The new code has three parts:

  1. ChoiceOf means only one of the expressions within it needs to match.
  2. The square brackets define a set of characters to look for and will only match one of those characters, either a space or a tab character, in the set. The curly braces define a repetition to the expression before it, to run two or more times. This means the square brackets expression repeats two or more times.
  3. An expression of a tab character found once.

fieldSeparator defines a regex that can match two or more consecutive spaces with no tabs, a mix of spaces and tabs with no specific order or a single tab.

Sounds about right.

Now, for the remaining fields.

Defining the Fields

You can define the fields in MarvelProductionItem as follows:

  • id: A string that starts with tt followed by several digits.
  • title: A string of a different collection of characters.
  • productionYear: A string that starts with ( and ends with ).
  • premieredOn: A string that represents a date.
  • posterURL: A string beginning with http and ends with jpg.
  • imdbRating: A number with one decimal place or no decimal places at all.

You can define those fields using regular expressions as follows. Add this after the declaration of fieldSeparator before the function returns:

let idField = /tt\d+/ // 1

let titleField = OneOrMore { // 2

let yearField = /\(.+\)/ // 3

let premieredOnField = OneOrMore { // 4

let urlField = /http.+jpg/ // 5

let imdbRatingField = OneOrMore { // 6

These regex instances are a mix between RegexBuilders and literals.

The objects you created are:

  1. idField: An expression that matches a string starting with tt followed by any number of digits.
  2. titleField: Any sequence of characters.
  3. yearField: A sequence of characters that starts with ( and ends with ).
  4. premieredOnField: Instead of looking for a date, you'll search for any sequence of characters, then convert it to a date.
  5. urlField: Similar to yearField, but starting with http and ending with jpg.
  6. imdbRatingField: Similar to premieredOnField, you'll search for any sequence of characters then convert it to a Float.

Matching a Row

Now that you have each row of the MarvelMovies file broken down into smaller pieces, it's time to put the pieces together and match a whole row with an expression.

Instead of doing it all in one go, break it down into iterations to ensure that each field is properly matched and nothing unexpected happens.

Add the following Regex object at the end of loadData(), just before return marvelProductions:

let recordMatcher = Regex { // 1

let matches = content.matches(of: recordMatcher) // 2
print("Found \(matches.count) matches")
for match in matches { // 3
  print(match.output + "|") // 4

This code does the following:

  1. Defines a new Regex object that consists of the idField regex followed by a fieldSeparator regex.
  2. Gets the collection of matches found in the string you loaded from the file earlier.
  3. Loops over the found matches.
  4. Prints the output of each match followed by the pipe character, |.

Build and run. Take a look at the output in the console window:

Found 49 matches
tt10857160	|
tt10648342	|
tt13623148	|
tt9419884	  |
tt10872600	|
tt10857164	|
tt9114286	  |
tt4154796	  |
tt10234724	|

Notice the space between the text and pipe character. This means the match included the separator. So far, the expression is correct. Now, expand the definition of recordMatcher to include titleField:

let recordMatcher = Regex {

Build and run, then take a look at the console output:

Found 1 matches
tt10857160	She-Hulk: Attorney at Law ........|

What just happened? Adding the title expression caused the rest of the file to be included in the first match except for the final rating value.

Well... this unfortunately makes sense. The title expression covers any character type. This means that even separators, numbers, URLs and anything gets matched as part of the title. To fix this, you want to tell the expression to consider looking at the next part of the expression before continuing with a repetition.

Looking Ahead

To get the expression to look ahead, you want the operation called NegativeLookAhead. In regular expression syntax, it's denoted as (?!pattern), where pattern is the expression you want to look ahead for.

titleField should look ahead for fieldSeparator before resuming the repetition of its any-character expression.

Change the declaration of titleField to the following:

let titleField = OneOrMore {
  NegativeLookahead { fieldSeparator }

Build and run. Observe the output in the console log:

Found 49 matches
tt10857160	She-Hulk: Attorney at Law	                  |
tt10648342	Thor: Love and Thunder	                    |
tt13623148	I Am Groot	                                |
tt9419884	  Doctor Strange in the Multiverse of Madness	|
tt10872600	Spider-Man: No Way Home	                    |

Excellent. You fixed the expression, and it's back to only picking up the fields you requested.

Before you add the remaining fields, update ones with an any-character-type expression to include a negative lookahead. Change the declaration of premieredOnField to:

let premieredOnField = OneOrMore {
  NegativeLookahead { fieldSeparator }

Then, change imdbRatingField to:

let imdbRatingField = OneOrMore {
  NegativeLookahead { CharacterClass.newlineSequence }

Since you expect the rating at the end of the line, the negative lookahead searches for a newline character instead of a field separator.

Update recordMatcher to include the remaining fields:

let recordMatcher = Regex {

Build and run. The console will show that it found 49 matches and will print all the rows correctly. Now, you want to hold or capture the relevant parts of the string that the expressions found so you can convert them to the proper objects.

Capturing Matches

Capturing data inside a Regex object is straightforward. Simply wrap the expressions you want to capture in a Capture block.

Change the declaration of recordMatcher to the following:

let recordMatcher = Regex {
  Capture { idField }
  Capture { titleField }
  Capture { yearField }
  Capture { premieredOnField }
  Capture { urlField }
  Capture { imdbRatingField }

Then change the loop that goes over the matches to the following:

for match in matches {
  print("Full Row: " + match.output.0)
  print("ID: " + match.output.1)
  print("Title: " + match.output.2)
  print("Year: " + match.output.3)
  print("Premiered On: " + match.output.4)
  print("Image URL: " + match.output.5)
  print("Rating: " + match.output.6)

Build and run. The console log should output each row in full with a breakdown of each value underneath:

Found 49 matches
Full Row: tt10857160	She-Hulk: Attorney at Law......

ID: tt10857160
Title: She-Hulk: Attorney at Law
Year: (2022– )
Premiered On: Aug 18, 2022
Image URL: https://m.media-amazon.com/images/M/MV5BMjU4MTkxNz......jpg
Rating: 5.7
Full Row: tt10648342	Thor: Love and Thunder.....

ID: tt10648342
Title: Thor: Love and Thunder
Year: (2022)
Premiered On: July 6, 2022
Image URL: https://m.media-amazon.com/images/M/MV5BYmMxZWRiMT......jpg
Rating: 6.7

Before you added any captures, the output object contained the whole row. By adding captures, it became a tuple whose first value is the whole row. Each capture adds a value to that tuple. With six captures, your tuple has seven values.

Naming Captures

Depending on order isn't always a good idea for API design. If the raw data introduces a new column in an update that isn't at the end, this change will cause a propagation that goes beyond just updating the Regex. You'll need to revise what the captured objects are and make sure you're picking the right item.

A better way is to give a reference name to each value that matches its column name. That'll make your code more resilient and more readable.

You can do this by using Reference. Add the following at the top of loadData():

let idFieldRef = Reference(Substring.self)
let titleFieldRef = Reference(Substring.self)
let yearFieldRef = Reference(Substring.self)
let premieredOnFieldRef = Reference(Substring.self)
let urlFieldRef = Reference(Substring.self)
let imdbRatingFieldRef = Reference(Substring.self)

You create a Reference object for each value field in the document using their data types. Since captures are of type Substring, all the References are with that type. Later, you'll see how to convert the captured values to a different type.

Next, change the declaration of recordMatcher to:

let recordMatcher = Regex {
  Capture(as: idFieldRef) { idField }
  Capture(as: titleFieldRef) { titleField }
  Capture(as: yearFieldRef) { yearField }
  Capture(as: premieredOnFieldRef) { premieredOnField }
  Capture(as: urlFieldRef) { urlField }
  Capture(as: imdbRatingFieldRef) { imdbRatingField }

Notice the addition of the reference objects as the as parameter to each capture.
Finally, change the contents of the loop printing the values of data to:

print("Full Row: " + match.output.0)
print("ID: " + match[idFieldRef])
print("Title: " + match[titleFieldRef])
print("Year: " + match[yearFieldRef])
print("Premiered On: " + match[premieredOnFieldRef])
print("Image URL: " + match[urlFieldRef])
print("Rating: " + match[imdbRatingFieldRef])

Notice how you are accessing the values with the reference objects. If any changes happen to the data, you'll just need to change the regex reading the values, and capture it with the proper references. The rest of your code won't need any updates.

Build and run to ensure everything is correct. You won't see any differences in the console log.

At this point, you're probably thinking that it would be nice to access the value like a property instead of a key path.

The good news is that you can! But you'll need to write the expression as a literal and not use RegexBuilder. You'll see how it's done soon. :]

Transforming Data

One great feature of Swift Regex is the ability to transform captured data into different types.

Currently, you capture all the data as Substring. There are two fields that are easy to convert:

  • The image URL, which doesn't need to stay as a string — it's more convenient to convert it to a URL
  • The rating, which works better as a number so you'll convert it to a Float

You'll change these now.

In ProductionsDataProvider.swift, change the declaration of urlFieldRef to:

let urlFieldRef = Reference(URL.self)

This changes the expected type to URL.

Then, change imdbRatingFieldRef to:

let imdbRatingFieldRef = Reference(Float.self)

Similarly, this changes the expected data type to Float.

Next, change the declaration of recordMatcher to the following:

let recordMatcher = Regex {
  Capture(as: idFieldRef) { idField }
  Capture(as: titleFieldRef) { titleField }
  Capture(as: yearFieldRef) { yearField }
  Capture(as: premieredOnFieldRef) { premieredOnField }
  TryCapture(as: urlFieldRef) { // 1
  } transform: {
    URL(string: String($0))
  TryCapture(as: imdbRatingFieldRef) { // 2
  } transform: {

Notice how you captured urlField and imdbRatingField changed from just Capture(as::) to TryCapture(as::transform:). If successful, the later attempts to capture the value will pass it to transform function to convert it to the desired type. In this case, you converted urlField to a URL and imdbRatingField to a Float.

Now that you have the proper types, it's time to populate the data source.

Replace the code you have inside the loop to print to the console with:

let production = MarvelProductionItem(
  imdbID: String(match[idFieldRef]), // 1
  title: String(match[titleFieldRef]),
  productionYear: ProductionYearInfo.fromString(String(match[yearFieldRef])), // 2
  premieredOn: PremieredOnInfo.fromString(String(match[premieredOnFieldRef])), // 3
  posterURL: match[urlFieldRef], // 4
  imdbRating: match[imdbRatingFieldRef]) // 5


This creates an instance of MarvelProductionItem and appends it to the array, but there's a little more happening:

  1. You convert the first two Substring parameters to strings.
  2. ProductionYearInfo is an enum. You're creating an instance from the string value. You'll implement this part in the next section. For now, the value is always ProductionYearInfo.unknown.
  3. PremieredOnInfo is also an enum you'll implement in the next section. The value for now is PremieredOnInfo.unknown.
  4. The value provided for the poster is a URL and not a string.
  5. The rating value is already a Float.

Build and run. You should see the Movies and TV shows listed on the app.

The app showing data presented on the screen

Creating a Custom Type

Notice that Production Year displays Not Produced and Premiered On shows Not Announced, even for old movies and shows. This is because you haven't implemented the parsing of their data yet so .unknown is returned for their values.

The production year won't always be a single year:

  • If it's a movie, the year will be just one value, for example: (2010).
  • If it's a TV show, it can start in one year and finish in another: (2010-2012).
  • It could be an ongoing TV show: (2010- ).
  • Marvel Studios may not have announced a date yet, making it truly unknown: (I).

The value for PremieredOnInfo is similar:

  • An exact date may have been set, such as: Oct 10, 2010.
  • An exact date may not yet be set for a future movie or show, in which case only the year is defined: 2023.
  • Dates may not yet be announced: -.

This means the data for these fields can have different forms or patterns. This is why you captured them as text and didn't specify what exactly to expect in the expression.

You'll create an expression for each possibility and compare it with the value provided. The option you'll set is the expression that matches the string as a whole.

For example, if the movie is in the future and only a year is mentioned in the Premiered On field, then the expression that's expecting a word and two numbers with a comma between them will not succeed. Only the expression that is expecting a single number will.

Conditional Transformation

Start breaking down what you'll do with the year field. The value in the three cases will be within parentheses:

  • If it's a single year, the expression is: \(\d+\).
  • If it's a range between two years, it's two numbers separated by a dash: \(\d+-\d+\).
  • If it's an open range starting from a year, it's a digit, a dash then a space: \(\d+-\s\).

Open ProductionYearInfo.swift and change the implementation of fromString(_:) to:

public static func fromString(_ value: String) -> Self {
  if let match = value.wholeMatch(of: /\((?<startYear>\d+)\)/) { // 1
    return .produced(year: Int(match.startYear) ?? 0) // 2
  } else if let match = value.wholeMatch(
    of: /\((?<startYear>\d+)-(?<endYear>\d+\))/) { // 3
    return .finished(
      startYear: Int(match.startYear) ?? 0, 
      endYear: Int(match.endYear) ?? 0) // 4
  } else if let match = value.wholeMatch(of: /\((?<startYear>\d+)–\s\)/) { // 5
    return .onGoing(startYear: Int(match.startYear) ?? 0) // 6

  return .unknown

Earlier, you read that there is a different way to name captured values using the regex literal. Here is how.

To capture a value in a regex, wrap the expression in parentheses. Name it using the syntax (?<name>regex) where name is how you want to refer to the capture and regex is the regular expression to be matched.

Time to break down the code a little:

  1. You compare the value against an expression that expects one or more digits between parentheses. This is why you escaped one set of parentheses.
  2. If the expression is a full match, you capture just the digits without the parentheses and return .produced using the captured value and casting it to an Int. Notice the convenience of using the captured value.
  3. If it doesn't match the first expression, you test it against another that consists of two numbers with a dash between them.
  4. If that matches, you return .finished and use the two captured values as integers.
  5. If the second expression didn't match, you check for the third possibility, a number followed by a dash, then a space to represent a show that's still running.
  6. If this matches, you return .onGoing using the captured value.

Each time, you use wholeMatch to ensure the entire input string, not just a substring inside it, matches the expression.

Build and run. See the new field properly reflected on the UI.

The app running and showing the production year info for each item

Next, open PremieredOnInfo.swift and change the implementation of fromString(_:) there to:

public static func fromString(_ value: String) -> Self {
  let yearOnlyRegexString = #"\d{4}"# // 1
  let datesRegexString = #"\S{3,4}\s.{1,2},\s\d{4}"#

  guard let yearOnlyRegex = try? Regex(yearOnlyRegexString),
    let datesRegex = try? Regex(datesRegexString) else { // 2
    return .unknown

  if let match = value.wholeMatch(of: yearOnlyRegex) { // 3
    let result = match.first?.value as? Substring ?? "0"
    return .estimatedYear(Int(result) ?? 0)
  } else if let match = value.wholeMatch(of: datesRegex) { // 4
    let result = match.first?.value as? Substring ?? ""
    let dateStr = String(result)
    let date = Date.fromString(dateStr)
    return .definedDate(date)

  return .unknown

This time, instead of regular expression literals, you store each regular expression in a String and then create Regex objects from those strings.

  1. You create two expressions to represent the two possible value cases you expect, either a four-digit year, or a full date consisting of a three-character full or shortened month name or a four-character month name, followed by one or two characters for the date, followed by a comma, a whitespace and a four-digit year.
  2. Create the two Regex objects that you'll use to compare against.
  3. Try to match the whole string against the yearOnlyRegexString`. If it matches, you return .estimatedYear and use the provided value as the year.
  4. Otherwise, you try to match the whole string against the other Regex object, datesRegexString. If it matches, you return .definedDate and convert the provided string to a date using a normal formatter.
Note: This approach doesn't allow for named captures since you can't define both the expression and the capture name at runtime. If you need to capture parts of the expression while using a regex built from a string literal, recall that you define a capture with parentheses. You can then reference the capture with \n where n is the number of the capture. Take care to access the captures safely in the correct order.

Build and run. Behold the Premiered On date correctly displayed. Nice work!

The app showing all the fields of the items

Where to Go From Here

Understanding regular expressions can turn you into a string manipulation superhero! You'll be surprised at what you can achieve in only a few lines of code. Regular expressions may challenge you initially, but they're worth it. There are many resources to help, including An Introduction to Regular Expressions, which links to even more resources!

To learn more about Swift Regex, check out this video from WWDC 2022.

You can download the completed project files by clicking the Download Materials button at the top or bottom of this tutorial.

Swift's Regex Builders are a kind of result builder. Learn more about this fascinating topic in Swift Apprentice Chapter 20: Result Builders.

We hope you enjoyed this tutorial. If you have any questions or comments, please join the forum discussion below!