Swift Regex Tutorial: Getting Started
Master the pattern-matching superpowers of Swift Regex. Learn to write regular expressions that are easy to understand, work with captures and try out RegexBuilder, all while making a Marvel Movies list app! By Ehab Amer.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
Swift Regex Tutorial: Getting Started
30 mins
- Getting Started
- Understanding Regular Expressions
- Swiftifying Regular Expressions
- Loading the Marvel Movies List
- Reading the Text File
- Defining the Separator
- Defining the Fields
- Matching a Row
- Looking Ahead
- Capturing Matches
- Naming Captures
- Transforming Data
- Creating a Custom Type
- Conditional Transformation
- Where to Go From Here?
Searching within text doesn’t always mean searching for an exact word or sequence of characters.
Sometimes you want to search for a pattern. Perhaps you’re looking for words that are all uppercase, words that have numeric characters, or even a word that you may have misspelled in an article you’re writing and want to find to correct quickly.
For that, regular expressions are an ideal solution. Luckily, Apple has greatly simplified using them in Swift 5.7.
In this tutorial, you’ll learn:
- What a regular expression is and how you can use it.
- How Swift 5.7 made it easier to work with regular expressions.
- How to capture parts of the string you’re searching for.
- How to use
RegexBuilder
to construct a complex expression that’s easy to understand. - How to load a text file that is poorly formatted into a data model.
- How to handle inconsistencies while loading data.
Getting Started
Download the starter project by clicking Download Materials at the top or bottom of the tutorial.
The app you’ll be working on here is MarvelProductions. It shows a list of movies and TV shows that Marvel has already published or announced.
Here’s what you’ll see when you first build and run:
You’ll notice that there’s only one repeated entry, which is great for Moon Knight fans but a bit disheartening otherwise. That’s because the app’s data needs some work before it’s ready to display. You’ll use Swift Regex to accomplish this.
Understanding Regular Expressions
Before you get into the app directly, you need to understand what regular expressions, also known as regex, are and how to write them.
Most of the time, when you search for text, you know the word you want to look for. You use the search capability in your text editor and enter the word, then the editor highlights matches. If the word has a different spelling than your search criteria, the editor won’t highlight it.
Regex doesn’t work that way. It’s a way to describe sequences of characters that most text editors nowadays can interpret and find text that matches. The results found don’t need to be identical. You can search for words that have four characters and this can give you results like some and word.
To try things out, open MarvelProductions.xcodeproj file in the starter folder. Then select the MarvelMovies file in the Project navigator.
While having the text file focused in Xcode, press Command-F to open the search bar. Click on the dropdown with the word Contains and choose Regular Expression.
In the search text field, you can still enter a word to search for in the file like you normally would, but now you can do much more. Enter \d
in the text field. This will select every digit available in the file.
Try to select numbers that aren’t part of the id values that start with tt. Enter the following into the Search field:
\b(?<!tt)\d+
The regex you just entered matches any digits in a word that don't start with tt. The breakdown of this regex is as follows:
- Word boundary:
\b
. - Negative lookbehind for tt:
(?<!tt)
. - One or more digits:
\d+
.
Swiftifying Regular Expressions
Swift 5.7 introduces a new Regex type that's a first-degree citizen in Swift. It isn't a bridge from Objective-C's NSRegularExpression
.
Swift Regex allows you to define a regular expression in three different ways:
- As a literal:
- From a String:
- Using RegexBuilder:
let digitsRegex = /\d+/
let regexString = #"\d+"#
let digitsRegex = try? Regex(regexString)
let digitsRegex = OneOrMore {
CharacterClass.digit
}
let digitsRegex = /\d+/
let regexString = #"\d+"#
let digitsRegex = try? Regex(regexString)
let digitsRegex = OneOrMore {
CharacterClass.digit
}
The first two use the standard regular expression syntax. What's different about the second approach is that it allows you to create Regex objects dynamically by reading the expressions from a file, server or user input. Because the Regex object is created at runtime, you can't rely on Xcode to check it, which is a handy advantage to the first approach.
The third is the most novel. Apple introduced a new way to define regular expressions using a result builder. You can see it's easier to understand what it's searching for in the text. An arguable drawback is that this approach is more verbose.
Now it's time to see Swift Regex in action. Open the Project navigator and select the file ProductionsDataProvider.swift.
Loading the Marvel Movies List
As you can see, the data provider only loads a few sample objects and isn't loading the data from the MarvelMovies file. You'll use regular expressions to find the values and load them into an array of MarvelProductionItem
objects. You might wonder, "Why do I need to use regular expressions to load the file? It looks clear, and I can separate it with normal string operations."
The answer is "looks can be deceiving". The file looks organized to the human eye, but that doesn't mean the data itself is organized.
If you look closely, empty spaces separate the fields. This space can be two or more space characters, one or more tab characters or a collection of both of them together.
Using usual string splitting is possible if separators are explicit and unique, but in this case, the separator string varies. Also, spaces appear in the content, making it hard to use conventional means to break down each line to parse the values. Regex is ideal here!