An Introduction to Regular Expressions

This introduction to regular expressions teaches you the basics of regular expressions and how to use them. By Tom Elliott.

Leave a rating/review
Download materials
Save for later
Share

A regular expression (commonly known as a “regex”) is a string or a sequence of characters that specifies a pattern. Think of it as a search string — but with super powers!

A plain old search in a text editor or word processor allows you to find simple matches. A regular expression can also perform these simple searches, but it takes things a step further and lets you search for patterns, such as two digits followed by a letter, or three letters followed by a hyphen.

This pattern matching allows you to do useful things like validate fields (phone numbers, email addresses), check user input, perform advanced text manipulation and much, much more.

Use the Download Materials button at the top or bottom of this tutorial to download a Regular Expressions Cheat Sheet PDF and a Swift playground to practice with. You can print out the Cheat Sheet and use it as reference as you’re developing. Use the Swift playground, which contains examples, to try out lots of different regular expressions. All of the examples of regular expressions that appear, both in this tutorial and the successor, have live examples in that playground, so be sure to check them out.

/The (Basics|Introduction)/

If you are new to regular expressions and are wondering what all the hype is about, here’s a simple explanation: regular expressions provide a way to search a given text document for matches to a specific pattern, and they may alter the text based on those matches. There are many awesome books and tutorials written about regular expressions — you’ll find a short list of them at the end of this tutorial.

Regular Expressions Playground

In this tutorial, you’ll create a lot of regular expressions. If you want to try them out visually as you’re working with them, then a Swift playground is an excellent way to do so!

The playground in the materials you’ve downloaded contains a number of functions at the top to highlight the search results from a regular expression within a piece of text, display a list of matches or groups in the results pane of the playground, and replace text. Don’t worry about the implementation of these methods for now though; you can learn about them in the next tutorial. Instead, scroll down to the Basic Examples and Cheat Sheet sections and follow along with the examples.

In the results sidebar of the playground, you’ll see a list of matches alongside each example. For “highlight” examples, you can hover over the result and click the eye or the empty circle icons to display the highlighted matches in the search text.

Viewing results in the playground

You’ll learn how to create NSRegularExpressions later, but for now you can use this playground to get a feeling for how various regular expressions work, and to try out your own patterns.

Examples

Let’s start with a few brief examples to show you what regular expressions look like.

Here’s an example of a regular expression that matches the word “jump”:

jump

That’s about as simple as regular expressions get. You can use some APIs that are available in iOS to search a string of text for any part that matches this regular expression — and once you find a match, you can find where it is or replace the text.

Here’s a slightly more complicated example — this one matches either of the words “jump” or “jumps”:

jump(s)?

This is an example of using some special characters that are available in regular expressions. The parenthesis create a group, and the question mark says “match the previous element (the group in this case) 0 or 1 times”.

Now for a really complex example. This one matches a pair of opening and closing HTML tags and the content in between.

<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>

Wow, looks complicated, eh? :] Don’t worry, you’ll be learning about all the special characters in this regular expression in the rest of this tutorial and, by the time you’re done, you’ll understand how this works! :]

If you want more details about the previous regular expression, check out this discussion for an explanation.

Note: In real-world usage, you probably shouldn’t use regular expressions alone to parse HTML. Use a standard XML parser instead!

Overall Concepts

Before you go any further, it’s important to understand a few core concepts about regular expressions.

Literal characters are the simplest kind of regular expression. They’re similar to a “find” operation in a word processor or text editor. For example, the single-character regular expression t will find all occurrences of the letter “t”, and the regular expression jump will find all appearances of “jump”. Pretty straightforward!

Just like a programming language, there are some reserved characters in regular expression syntax, as follows:

  • [
  • ( and )
  • \
  • *
  • +
  • ?
  • { and }
  • ^
  • $
  • .
  • | (pipe)
  • /

These characters are used for advanced pattern matching. If you want to search for one of these characters, you need to escape it with a backslash. For example, to search for all periods in a block of text, the pattern is not . but rather \..

Each environment, be it Python, Perl, Java, C#, Ruby or whatever, has special nuances in its implementation of regular expressions. And Swift is no exception!

Both Objective-C and Swift require you to escape special characters in literal strings (i.e., precede them by a backslash \ character). One such special character is the backslash itself! Since the patterns used to create a regular expression are also strings, this creates an added complication in that you need to escape the backslash character when working with String and NSRegularExpression.

That means the standard regular expression \. will appear as \\. in your Swift (or Objective-C) code.

To clarify the above concept in point form:

  • The literal "\\." defines a string that looks like this: \.
  • The regular expression \. will then match a single period character.

Capturing parentheses are used to group part of a pattern. For example, 3 (pm|am) would match the text “3 pm” as well as the text “3 am”. The pipe character here (|) acts like an OR operator. You can include as many pipe characters in your regular expression as you would like. As an example, (Tom|Dick|Harry) is a valid pattern that matches any of those three names.

Grouping with parentheses comes in handy when you need to optionally match a certain text string. Say you are looking for “November” in some text, but it’s possible the user abbreviated the month as “Nov”. You can define the pattern as Nov(ember)? where the question mark after the capturing parentheses means that whatever is inside the parentheses is optional.

These parentheses are called “capturing” because they capture the matched content and allow you reference it in other places in your regular expression.

As an example, assume you have the string “Say hi to Harry”. If you created a search-and-replace regular expression to replace any occurrences of (Tom|Dick|Harry) with that guy $1, the result would be “Say hi to that guy Harry”. The $1 allows you to reference the first captured group of the preceding rule.

Capturing and non-capturing groups are somewhat advanced topics. You’ll encounter examples of capturing and non-capturing groups in the follow up tutorial.

Character classes represent a set of possible single-character matches. Character classes appear between square brackets ([ and ]).

As an example, the regular expression t[aeiou] will match “ta”, “te”, “ti”, “to”, or “tu”. You can have as many character possibilities inside the square brackets as you like, but remember that any single character in the set will match. [aeiou] looks like five characters, but it actually means “a” or “e” or “i” or “o” or “u”.

You can also define a range in a character class if the characters appear consecutively. For example, to search for a number between 100 to 109, the pattern would be 10[0-9]. This returns the same results as 10[0123456789], but using ranges makes your regular expressions much cleaner and easier to understand.

But character classes aren’t limited to numbers — you can do the same thing with characters. For instance, [a-f] will match “a”, “b”, “c”, “d”, “e”, or “f”.

Character classes usually contain the characters you want to match, but what if you want to explicitly not match a character? You can also define negated character classes, which start with the ^ character. For example, the pattern t[^o] will match any combination of “t” and one other character except for the single instance of “to”.