26.1 Introduction to Tries

Do or do not. There is no Trie.

Let us first consider a potential improvement for our current HashMap implementation.

HashMaps are already incredibly fast. For our Resizing Separate Chaining Hash Table, contains(x) method has $\Theta (1)$ runtime (assuming even spread), and our add(x) method also has $\Theta(1)$ runtime (assuming even spread and amortized). This is generally the best that we can do, but if we have some additional insight on the data we are storing, we could get even faster.

Special Case 1: Character Keyed Map

If we know that our keys are only ASCII characters, we can do away with our general-purpose HashMap and instead use an array where each index in the array corresponds to a specific ASCII character:

public class DataIndexedCharMap<V> {
    private V[] items;
    public DataIndexedCharMap(int R) {
        items = (V[]) new Object[R];
    }
    public void put(char c, V val) {
        items[c] = val;
    }
    public V get(char c) {
        return items[c];
    }
}

The above is a possible implementation for a map that takes in character keys. The value R represents the number of possible characters (e.g. 128 for ASCII). We no longer have to store any buckets which could hurt our runtime (at the cost of additional memory). We know that our data will be evenly spread.

Special Case 2: String Keyed Map

Suppose we know that our keys are always strings. We can use a special data structure called a Trie. This data structure stores each letter of a string as a node in a tree. It has great performance for getting words, adding words, and some special string operations.

Trie Demo

Suppose we wanted to store the words "sam", "sad", "sap", "same", "a", and "awls". We want to create a data structure that will allow us to add these words in and make it clear that our set contains those words and not any suffixes or prefixes of those words.

There are a few key ideas for Tries:

Every node stores only one letter.
Nodes can be shared by multiple keys.

Consider a Trie with the words "sam" and "sad" already in it:

When we add the word "sap", we can make use of the fact that we already have the prefix "sa" in the Trie:

Adding "same" follows a similar procedure. We have the prefix "sam" in the Trie, so we can use it to our advantage:

When adding "a", our first instinct may be to add an edge between the root and the existing "a" in our Trie:

However, this way would be a bit misleading because we do not know if the "a" is the start of the word "ame". Instead, we create an entirely new node:

Now when adding "awls", we can just use our similar procedure of using the prefix "a":

This is looking great already! We can see the words that we added very clearly in the Trie. However, there is an issue. We are supposed to have just the words "sam", "sad", "sap", "same", "a", and "awls" in the Trie. With our current structure, we cannot say for sure which prefixes should be considered to be in the Trie and which should not be. For example, we want the prefix "sam" to be in the Trie, but we do not want "awl" or "aw" to be considered to be in the Trie.

To address this issue, we will color the last character of each string to be blue to indicate that there is a word that ends with that character:

Now we are finished! To search, we will traverse our Trie from the root and compare with each character of the string we are searching for as we go down. Thus, there are only two cases when we wouldn't be able to find a string; either the final node is white or we fall off the tree.

Examples:

contains("sam"): true, blue node
contains("sa"): false, white node
contains("a"): true, blue node
contains("saq"): false, fell off tree

Tries as Maps

Tries can also be maps if we also store data in the nodes. For example, we can have each blue node also hold a number, which could represent the multiplicity of that word.

Summary

See an animated demo of creation of a Trie map here.

Previous26. Prefix Operations and Tries Next26.2 Trie Implementation

Last updated 2 years ago