Regular Expressions in Kotlin

Learn how to improve your strings manipulation with the power of regular expressions in Kotlin. You’ll love them! By arjuna sky kok.

4.7 (3) · 1 Review

Download materials
Save for later
Share
You are currently viewing page 2 of 4 of this article. Click here to view the first page.

Flag expression

RegexOption‘s purpose is to alter the behavior of a regex. But you can achieve the same results without using RegexOption by writing the rule in the regex string, like this:

val pattern = Regex("(?i)batman(?-i)")

You get the same result as using Regex.IGNORE_CASE.

This strange syntax is a flag expression. Flag expressions have special meanings. (?i)batman(?-i) doesn’t mean the regex string matches the (?i)batman(?-i) string exactly.

The regex engine interprets the flag expressions differently than normal characters. (?i) tells the regex engine to treat the characters case-insensitively from now on. On the other hand, (?-i) tells the regex engine to treat the characters case-sensitively from this point on.

So (?i)b(?-i)atman means only b is case-insensitive. The rest of the characters are case-sensitive.

But for this example, you’ll use only RegexOption.

Understanding Character Classes, Groups, Quantifiers and Boundaries

Another problem appears. A superhero called catman enters Supervillains Club. How do you forbid both catman and batman?

With a standard string method, you can use the if condition with a logical operator. But you’ll use regex.

You want to check whether the string is batman or catman. Notice, only one character is different. The rest characters, atman, are the same.

Using Character Classes

You can use a character class to group b and c. Replace your pattern line with:

val pattern = Regex("[bc]atman", RegexOption.IGNORE_CASE)

The [ and ] create a character class. [bc] means either b or c. [aiueo] means vowels.

There are special characters inside square brackets. If you want to negate the characters, you can use ^. [^aiueo] means any characters other than vowels.

You can also use - to create a range of characters. [a-z] means a, b, c until z.

Build and run the app. Try to input catman. The validation works flawlessly.

Supervillains catman Validation in Registration Form

Next, you’ll take a look at groups and quantifiers.

Using Groups and Quantifiers

All is well until batwoman breaks into Supervillains Club. Now you need to prevent batman and batwoman as well. Notice, the difference is the wo string: You can’t use the character class to solve this problem.

bat[wo]man means batwman or batoman. It doesn’t match batwoman.

What you want is a group.

Add this new rule to the existing regex syntax. Replace your pattern line with:

val pattern = Regex("[bc]at(wo)?man", RegexOption.IGNORE_CASE)

Here, you use ( and ) to create a group. (wo) means a group of the wo string. Groups make characters a single unit.

You want to make this group optional and you apply ? after the group. The regex string is bat(wo)?man.

? is a quantifier. A quantifier defines how many occurrences of a unit. There are a few of varieties of quantifiers in regex:

  • ?: 0 or 1 occurrence.
  • +: 1 or unlimited occurrences.
  • *: 0 or unlimited occurrences.

You could use quantifiers to match occurrences of a unit:

  • ba+ matches ba and baaaaa fully, but doesn’t match b.
  • ba* matches b, ba and baaaaa fully.
  • ba? matches b and ba fully, but only matches baaaa partially.

In your group, (wo)?, the syntax means the group on the left side of ? is either one occurrence or nothing.

That’s the purpose of the group. w and o in wo aren’t separable.

Build and run the app. Check to see that batwoman and catwoman can’t enter Supervillains Club.

Supervillains Club batwoman Validation

Supervillains Club catwoman Validation

What if you hadn’t used a group so that the regex string should have been batwo?man?

[spoiler title=”Solution”]
That means the ? modifier only applies to the o character. So batwo?man matches batwman.

You don’t want this. You want either batman or batwoman, but not batwman.
[/spoiler]

Using Boundaries

You’re satisfied with your superb code: You protected Supervillains Club from superheroes. Then one day, a supervillain named I'm not Batman tries to register, and the validation stops the supervillain.

You get a complaint from your employer.

Now, you need to add a logic that the regex string needs to match batman, catman, batwoman and catwoman only if they appear at the beginning of the string.

Use a boundary to solve this problem. Add ^ in the front of the regex string. Then replace your pattern line with:

val pattern = Regex("^[bc]at(wo)?man", RegexOption.IGNORE_CASE)

The ^ character doesn’t have the same meaning as the ^ character inside the brackets. ^bat means bat at the beginning of the string. [^bat] means any characters other than b, a and t.

Build and run the app. Now I'm not Batman can register successfully in Supervillains Club.

Supervillains Successful Registration for "I'm not Batman"

Note: If you want to match the regex string at the end of the string, use $. So bat$ means bat at the end of the string.

Regex Helper Tools from IntelliJ IDEA

Sometimes when writing your regex pattern, you want to check if it works as soon as possible, even without running your app. For this purpose, use regex helper tools from IntelliJ IDEA.

Move your caret to the regex pattern and press Alt-Enter on Linux/Windows or Option-Enter on Mac:

Check RegExp Menu

You have two helper tools dealing with regex. One edits the regex fragment, and the other checks the regex pattern.

Choose Check RegExp:

Valid Result in Check RegExp Form

You have a form to validate an input string with your regex pattern. If the input string matches the regex pattern, you’ll see a green check mark.

If you put in an invalid input string:

Invalid Result in Check RegExp Form

You’ll see a red exclamation mark.

If you find that your regex pattern doesn’t work as expected, go back to your regex pattern. Press Alt-Enter or Option+Enter again:

Edit RegExp Fragment

Then choose Edit RegExp Fragment:

RegExp Fragment Editor

You’ll see a dedicated editor for your regex pattern where you can edit your regex pattern and get hints. For example, delete ) after the alphabet o:

RegExp Fragment Editor Validation

You’ll see a warning about the missing ).

For this regex pattern, an editor is overkill. But it can be handy while editing a complex regex.

Understanding Predefined Classes and Groups

You were working on the signup page when you got a distress call: Some superheroes have infiltrated Supervillains Club, and you need to root them out!

Open http://localhost:8080/impostors and you’ll see some names:

Supervillains Club Finding Impostors Form

Click Find Impostors and you’ll get… nothing. The clue is anyone with Captain is a superhero. Based on that information, it’s time to write a new regex.

You’ll use findAll, a different method from Regex. You don’t want to check whether a regex string matches a string. You want to take out strings that match the regex string inside a string.

In RegexValidator.kt, replace the content of filterNames with:

val pattern = Regex("""Captain""")
return pattern.findAll(names).map {
  it.value
}.toList()

The findAll method returns a list of Regex objects. To get the string match, you use the value property of Regex.

Build and run the app. Submit the form, and you’ll get this result:

Supervillains Club Captain Captain Result

Not good! You could write the regex string like this: Captain (America|Marvel). It works for this case, but it’s not scalable.

What if there’s another impostor named Captain Saving the World or Captain Love? Then you’d need to rewrite your regex string.

There’s a better way. You can use predefined classes and the + quantifier.

Replace your Regex with:

val pattern = Regex("""Captain\s\w+""")

\s and \w are predefined classes. \s means any spaces, like Space or Tab. \w means any word characters.

Build and run the app. Click Find Impostors and you’ll get this result:

Supervillains Club Finding Impostors Result

Bingo- you successfully rooted them out!

Note: Sometimes you might prefer a raw string that’s easier to read. Because \w is same as [a-zA-Z_0-9] you can change your regex with this more readable one:
val pattern = Regex("""Captain\s[a-zA-Z_0-9]+""")
val pattern = Regex("""Captain\s[a-zA-Z_0-9]+""")