Gotta [know] them all!

Aldrin Brillante
7 min readMay 11, 2022

Reconstructing the idea of the pokedex to implement a trie data structure — prefix tree.

Decades after the launch of the Pokémon enterprise, fans of the old and trainers of today can easily distinguish the majority of the original Pokémon family. We knew that the powerful Charizard came from the evolution of an original Charmander (unless you’re fighting a Ditto). These family evolutions, or family trees, is what connects the first evolution of Pokémon to their next and following evolution(s) thereafter.

Reminiscing the adventures of our childhood as well as learning the various data structures in computer science, we cannot help but connect these links between the evolution of Pokémon with the visualization of the trie data structure, or prefix tree. A lot of connections within familial pokemon circles contain prefixes that connects them in a grammatical sense, while other familial pokemon connections have suffixes that distinctly place them together. There are even family ties that have know grammatical connection altogether! As an avid pokemon trainer, we depend on Professor Oak’s Pokedex hardware to assist us in linking and distinguishing the different types of pokemon. This is what we want to recreate using a trie data structure. We were determined to recreate the idea of a pokedex implementing the familial and grammatical connections of these pokemon using the prefix tree method. This is why we decided to base our project on visualizing a trie data structure, or prefix tree, to be able to assist users in distinguishing the connections between one evolution to the next.

Image credit: Reddit

What is a Prefix Tree?

A prefix tree is a special type of tree data structure. In the world of computer science, a prefix tree has many names. A prefix tree is also known as a “trie”, which comes from “retrieval” (but pronounced as “try”), since prefix trees are ideal for retrieving relevant patterns in a given set of strings. It can also be called a digital tree. Essentially, this is a type of k-ary search tree, meaning that this is a tree data structure that is used for locating specific keys within itself, within its set. These keys contain links within the nodes that are defined not by the entire key but by the characters individually. We use this feature of prefix trees to create a Pokédex that highlights related Pokémon based on the prefixes of their names. This will become clearer after we first explain the concept of prefix trees in more detail.

Consider reading this excellent primer on tree by Vaidehi Joshi titled “How To Not Be Stumped By Trees” if they are unfamiliar to you.

Overview of Prefix Trees

Prefix trees are generally structured with an empty string at the root node (“”) that is linked to a set of child nodes, with each child node representing one letter of the alphabet in use. Since the English alphabet consists of 26 letters, there are 26 possible child nodes stemming from the root node, and each of those child nodes can in turn link to another set of 26 possible child nodes, and so on. In theory, each node initially contains an array of empty pointers that are replaced with actual pointers as strings are inserted into the trie. However, our Python implementation of the prefix tree uses a dictionary to keep track of references to child nodes rather than an array of empty pointers. This saves some memory space and makes the code for the nodes a little more intuitive.

Image Credit: HackerEarth

Excerpts:

Here is an excerpt of the implementation of the prefix tree node class (all comments and doc-strings and some methods have been removed for the sake of brevity):

We then use our PrefixTreeNode class to implement our PrefixTree as follows (again, all comments and doc-strings and some methods have been removed for the sake of brevity):

Prefix trees are ideal for traversing strings quickly, which is why they are involved in applications like autocomplete.

Image Credit: Wikipedia

Algorithm Analysis

No data structure or algorithm is capable of perfect time complexity and space complexity; typically, time complexity is sacrificed to improve space complexity, or vice versa, or time complexity for one operation is improved at the expense of another operation (such as insertion, deletion, etc.). Prefix trees are designed to strike the balance between time complexity and space complexity by utilizing a fair amount, but not too much, of each. There is usually less work required to add strings after many strings have already been added, since it is likely that prefixes from existing strings can be reused for new insertions. The worst-case scenario for creating a trie depends on the longest string inserted as well as the total number of strings inserted. Thus, if you are familiar with Big-O notation, creating a trie entails a time complexity of O(mn), where m is the size of the longest string and n is the total number of strings in the trie. The same time complexity is involved for searching, inserting, and deleting a string from a trie, so O(mn) applies to these operations as well; however, m represents the length of the target string rather than the longest string (and n still represents the total number of strings).

The space complexity of a trie is very similar to its time complexity: O(mn), where m is the size of the longest string and n is the total number of strings in the trie. If arrays are used for referencing node children, then the space complexity is actually a lot worse, because every node would have an array of mostly empty pointers. However, since our implementation uses dictionaries instead of arrays, each node contains a dictionary that corresponds directly to the number of child nodes it has, which is typically 1 or 2 (but can vary depending on the strings in the trie).

Pokémon and Prefix Trees

We are interested in applying prefix trees to the realm of Pokémon to illustrate the genealogy of Pokémon based on their names. Every Pokémon has multiple evolutionary states. The evolutionary tree for each Pokémon can be modeled using tries because all evolutionary states in such a tree begin with the same few letters to indicate the “genus” for that Pokémon. Thus, each Pokémon genus can be grouped based on the prefix for all of its evolutionary states. Therefore, the prefix tree is ideal for representing the genealogical trees of Pokémon!

However, it is also worth noting that not all Pokémon use the same prefix for their evolutionary states. Some use the same suffix (such as Spearow and Fearow) while some have no shared patterns at all (such as Ekans and Arbok, although there is still a clear linguistic pattern between these two states). Thus, our Prefix Tree Pokédex is only useful for Pokémon whose evolutionary states use similar prefixes, which is the case for most Pokémon.

We can first add one Pokémon as follows (the “True” label indicates the end of the string, which is useful to demarcate strings that are themselves prefixes of other strings):

Charmander Prefix Tree Visualization

Here is a visual representation of the evolutionary tree consisting of Charizard, Charmeleon, and Charmander (note the shared use of the “char” prefix):

Here is a tree with many more Pokémon (note that common suffixes are not joined, as with “pikachu” and “raichu”):

Visualization

Besides implementing a prefix tree, we also wanted to showcase a visualization for the Pokemon genealogy. We considered a few options for this, such as `d3.js` for a neat web visualization, as well as some kind of Command Line visualization. Ultimately, we decided to use a Python graph visualization library, since a prefix tree is fundamentally a graph anyway (and we thought it would be a more interesting challenge for us). We had three options for Python graph libraries: NetworkX, PyGraphViz, and PyDot. NetworkX is great for graph analysis, but not really suited for graph visualization (as proclaimed in the Networkx Documentation). PyGraphViz was more viable than NetworkX, but we found a helpful StackOverflow post that tipped us more in favor of PyDot, which had more documentation than PyGraphViz anyway. To summarize, there are a myriad of ways to visualize graphs programmatically, and we eventually decided to opt for the PyDot library for this project.

Future Work

One way to extend this project would be to implement suffix trees, which operate the same as prefix trees but join common characters at the end of the strings rather than the beginning of the strings. Suffix trees seem much more difficult to implement than prefix trees, so this project expansion is left as an exercise to the reader. It might also be nice to add some color or other visual flair to the PyDot visualizations.

Our code for this project can be found at this GitHub repository: https://github.com/GSPuniani/prefix-tree-pokedex

Further Reading

Here is an excellent introduction to prefix trees (and a big inspiration for this article): https://medium.com/basecs/trying-to-understand-tries-3ec6bede0014

--

--