Collection Data Structures in Swift
- Getting Started
- What is “Big-O” Notation?
- Common iOS Data Structures
- Expected Performance and When to Use Arrays
- Sample App Testing Results
- When to use Dictionaries
- Expected Performance
- Sample App Testing Results
- When to use Sets
- Sample App Testing Results
- Lesser-known Foundation Data Structures
- NSHashTable and NSMapTable
- Pop Quiz!
- Where to Go From Here?
Imagine you have an application that needs to work with a lot of data. Where do you put that data? How do you keep it organized and handle it efficiently?
If your program only manages one number, you store it in one variable. If it has two numbers then you’d use two variables.
What if it has 1000 numbers, 10,000 strings or the ultimate library of memes? And wouldn’t it be nice to be able to find a perfect meme in an instant?
In that case, you’ll need one of the fundamental collection data structures, such as Arrays, Dictionarys, and Sets. As you’ll learn, these collection data structures allow you and the user to manipulate huge databases with a swipe across a screen. Thanks to Swift and its ever-evolving nature, you’ll be able to use native data structures that are blazing fast!
Here’s how the tutorial will flow:
- First, you’ll review what a data structure is, and then you’ll learn about Big-O notation. It’s the standard tool for describing the performance of different data structures.
- Next you’ll observe these data structures by measuring the performance of arrays, dictionaries, and sets — the most basic data structures available in Cocoa development. Incidentally, it’ll also double as a rudimentary introduction to performance testing.
- As you proceed, you’ll compare the performance of mature Cocoa structures with newer, Swift-only counterparts.
- Finally, you’ll briefly review some related types offered by Cocoa. These are data structures that you might be surprised to learn are already at your fingertips!
Before you dive in and explore the data structures used in iOS, you should review these basic, key concepts about what they are and how to measure their performance.
There are many types of collection data structures, and each represents a specific way to organize and access a collection of items. A specific collection type might make some activities especially efficient such as adding a new item, finding the smallest item, or ensuring you’re not adding duplicates.
Without collection data structures, you’d be stuck trying to manage items one by one. A collection allows you to:
- Handle all those items as one entity
- Imposes some structure
- Efficiently insert, remove, and retrieve items
What is “Big-O” Notation?
Big-O notation — that’s the letter O, not the number zero — is a way of describing the efficiency of an operation on a data structure. There are various kinds of efficiency: you could measure how much memory the data structure consumes, how much time an operation takes under the worst case scenario, or how much time an operation takes on average. It’s not the raw memory or time that we care about here though. It is the way the memory or time scales, as the size of the data structure scales.
In this tutorial, you’ll measure how much time an operation takes on average.
Common sense will tell you that an operation takes longer to perform when there are larger quantities of data. But sometimes there is little or no slowdown, depending on the data structure.
Big-O notation is a precise way of describing this.
You write an exact functional form that roughly describes how the running time changes based on the number of elements in the structure.
When you see Big-O notation written as
n is the number of items in the data structure, and
something-with-n is roughly how long the operation will take.
“Roughly”, ironically enough, has a specific meaning: the behavior of the function at the asymptotic limit of very large
n is a really, really large number — you’re thinking about how the performance of some operation will change as you go from
The most commonly seen Big-O performance measures are as follows, in order from best to worst performance:
O(1) — (constant time)
No matter how many items are in a data structure, this function calls the same number of operations. This is considered ideal performance.
O(log n) — (logarithmic)
The number of operations this function calls grows at the rate of the logarithm of the number of items in the data structure. This is good performance, since it grows considerably slower than the number of items in the data structure.
O(n) — (linear)
The number of operations this function calls will grow linearly with the size of the structure. This is considered decent performance, but it can grind along with larger data collections.
O(n (log n)) — (“linearithmic”)
The number of operations called by this function grows by the logarithm of the number of items in the structure multiplied by the number of items in the structure. Predictably, this is about the lowest level of real-world tolerance for performance. While larger data structures perform more operations, the increase is somewhat reasonable for data structures with small numbers of items.
O(n²) — (quadratic)
The number of operations called by this function grows at a rate that equals the size of the data structure, squared — poor performance at best. It grows quickly enough to become unusably slow even if you’re working with small data structures.
O(2^n) — (exponential)
The number of operations called by this function grows by two to the power of the size of the data structure. The resulting very poor performance becomes intolerably slow almost immediately.
The number of operations called by this function grows by the factorial of the size of the data structure. Essentially, you have the worst case scenario for performance. For example, in a structure with just 100 items, the multiplier of the number of operations is 158 digits long. Witness it for yourself on wolframalpha.com.
Here’s a more visual representation of performance and how it degrades when there are more items in a collection, going from one to 25 items:
Did you notice that you can’t even see the green
O(log n) line because it is so close to the ideal
O(1) at this scale? That’s pretty good! On the other hand, operations that have Big-O notations of
O(2^n) degrade so quickly that by the time you have more than 10 items in a collection, the number of operations spikes completely off the chart.
Yikes! As the chart clearly demonstrates, the more data you handle, the more important it is to choose the right structure for the job.
Now that you’ve seen how to compare the performance of operations on data structures, it’s time to review the three most common types used in iOS and explore how they perform in theory and in practice.