Chapters

Hide chapters

Data Structures & Algorithms in Kotlin

First Edition · Android 10 · Kotlin 1.3 · IDEA

Before You Begin

Section 0: 3 chapters
Show chapters Hide chapters

10. Tries
Written by Irina Galata, Kelvin Lau and Vincent Ngo

Heads up... You're reading this book for free, with parts of this chapter shown beyond this point as scrambled text.

The trie (pronounced try) is a tree that specializes in storing data that can be represented as a collection, such as English words:

A trie containing the words CAT, CUT, CUTE, TO, and B
A trie containing the words CAT, CUT, CUTE, TO, and B

Each character in a string is mapped to a node. The last node in each string is marked as a terminating node (a dot in the image above). The benefits of a trie are best illustrated by looking at it in the context of prefix matching.

In this chapter, you’ll first compare the performance of the trie to the array. You’ll then implement the trie from scratch.

Example

You are given a collection of strings. How would you build a component that handles prefix matching? Here’s one way:

class EnglishDictionary {

  private val words: ArrayList<String> = ...

  fun words(prefix: String) = words.filter { it.startsWith(prefix) }

}

words() goes through the collection of strings and returns the strings that match the prefix.

If the number of elements in the words array is small, this is a reasonable strategy. But if you’re dealing with more than a few thousand words, the time it takes to go through the words array will be unacceptable. The time complexity of words() is O(k*n), where k is the longest string in the collection, and n is the number of words you need to check.

Imagine the number of words Google needs to parse
Imagine the number of words Google needs to parse

The trie data structure has excellent performance characteristics for this type of problem; like a tree with nodes that support multiple children, each node can represent a single character.

You form a word by tracing the collection of characters from the root to a node with a special indicator — a terminator — represented by a black dot. An interesting characteristic of the trie is that multiple words can share the same characters.

To illustrate the performance benefits of the trie, consider the following example in which you need to find the words with the prefix CU.

First, you travel to the node containing C. This quickly excludes other branches of the trie from the search operation:

Next, you need to find the words that have the next letter, U. You traverse to the U node:

Since that’s the end of your prefix, the trie returns all collections formed by the chain of nodes from the U node. In this case, the words CUT and CUTE are returned. Imagine if this trie contained hundreds of thousands of words.

The number of comparisons you can avoid by employing a trie is substantial.

Implementation

Open up the starter project for this chapter.

TrieNode

You’ll begin by creating the node for the trie. Create a new file named TrieNode.kt. Add the following to the file:

class TrieNode<Key>(var key: Key?, var parent: TrieNode<Key>?) {

  val children: HashMap<Key, TrieNode<Key>> = HashMap()

  var isTerminating = false

}

Trie

Next, you’ll create the trie itself, which will manage the nodes. Create a new file named Trie.kt. Add the following to the file:

class Trie<Key> {

  private val root = TrieNode<Key>(key = null, parent = null)

}

Insert

Tries work with lists of the Key type. The trie takes the list and represents it as a series of nodes in which each node maps to an element in the list.

fun insert(list: List<Key>) {
  // 1
  var current = root

  // 2
  list.forEach { element ->
    if (current.children[element] == null) {
      current.children[element] = TrieNode(element, current)
    }
    current = current.children[element]!!
  }

  // 3
  current.isTerminating = true
}

Contains

contains is similar to insert. Add the following method to Trie:

fun contains(list: List<Key>): Boolean {
  var current = root

  list.forEach { element ->
    val child = current.children[element] ?: return false
    current = child
  }

  return current.isTerminating
}
"insert and contains" example {
  val trie = Trie<Char>()
  trie.insert("cute".toList())
  if (trie.contains("cute".toList())) {
    println("cute is in the trie")
  }
}
---Example of insert and contains---
cute is in the trie
fun Trie<Char>.insert(string: String) {
  insert(string.toList())
}

fun Trie<Char>.contains(string: String): Boolean {
  return contains(string.toList())
}
"insert and contains" example {
  val trie = Trie<Char>()
  trie.insert("cute")
  if (trie.contains("cute")) {
    println("cute is in the trie")
  }
}

Remove

Removing a node in the trie is a bit more tricky. You need to be particularly careful when removing each node since nodes can be shared between multiple different collections. Write the following method immediately below contains:

fun remove(collection: CollectionType) {
  // 1
  var current = root

  collection.forEach {
    val child = current.children[it] ?: return
    current = child
  }

  if (!current.isTerminating) return

  // 2
  current.isTerminating = false

  // 3
  val parent = current.parent
  while (current.children.isEmpty() && !current.isTerminating) {
    parent?.let {
      it.children[current.key] = null
      current = it
    }
  }
}
fun Trie<Char>.remove(string: String) {
  remove(string.toList())
}
"remove" example {
  val trie = Trie<Char>()

  trie.insert("cut")
  trie.insert("cute")

  println("\n*** Before removing ***")
  assert(trie.contains("cut"))
  println("\"cut\" is in the trie")
  assert(trie.contains("cute"))
  println("\"cute\" is in the trie")

  println("\n*** After removing cut ***")
  trie.remove("cut")
  assert(!trie.contains("cut"))
  assert(trie.contains("cute"))
  println("\"cute\" is still in the trie")
}
---Example of: remove---

*** Before removing ***
"cut" is in the trie
"cute" is in the trie

*** After removing cut ***
"cute" is still in the trie

Prefix matching

The most iconic algorithm for the trie is the prefix-matching algorithm. Write the following at the bottom of Trie:

fun collections(prefix: List<Key>): List<List<Key>> {
  // 1
  var current = root

  prefix.forEach { element ->
    val child = current.children[element] ?: return emptyList()
    current = child
  }

  // 2
  return collections(prefix, current)
}
private fun collections(prefix: List<Key>, node: TrieNode<Key>?): List<List<Key>> {
  // 1
  val results = mutableListOf<List<Key>>()

  if (node?.isTerminating == true) {
    results.add(prefix)
  }

  // 2
  node?.children?.forEach { (key, node) ->
    results.addAll(collections(prefix + key, node))
  }

  return results
}
fun Trie<Char>.collections(prefix: String): List<String> {
  return collections(prefix.toList()).map { it.joinToString(separator = "") }
}
"prefix matching" example {
  val trie = Trie<Char>().apply {
    insert("car")
    insert("card")
    insert("care")
    insert("cared")
    insert("cars")
    insert("carbs")
    insert("carapace")
    insert("cargo")
  }

  println("\nCollections starting with \"car\"")
  val prefixedWithCar = trie.collections("car")
  println(prefixedWithCar)

  println("\nCollections starting with \"care\"")
  val prefixedWithCare = trie.collections("care")
  println(prefixedWithCare)
}
---Example of prefix matching---

Collections starting with "car"
[car, carapace, carbs, cars, card, care, cared, cargo]

Collections starting with "care"
[care, cared]

Challenges

Challenge 1: Adding more features

The current implementation of the trie is missing some notable operations. Your task for this challenge is to augment the current implementation of the trie by adding the following:

Solution 1

For this solution, you’ll implement lists as a computed property. It’ll be backed by a private property named storedLists.

private val storedLists: MutableSet<List<Key>> = mutableSetOf()

val lists: List<List<Key>>
  get() = storedLists.toList()
storedLists.add(list)
storedLists.remove(list)
val count: Int
  get() = storedLists.count()

val isEmpty: Boolean
  get() = storedLists.isEmpty()

Key points

  • Tries provide great performance metrics in regards to prefix matching.
  • Tries are relatively memory efficient since individual nodes can be shared between many different values. For example, “car”, “carbs”, and “care” can share the first three letters of the word.
Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2024 Kodeco Inc.

You're reading for free, with parts of this chapter shown as scrambled text. Unlock this book, and our entire catalogue of books and videos, with a Kodeco Personal Plan.

Unlock now