I was recently pointed to an article about a new kind of trie, the QP trie. “QP” stands for “quadbit-popcount patricia”, and the QP trie combines techniques from DJB’s critbit trees and Bagwell’s hash array mapped tries.
In previous posts, I explored shortcomings of a particular implementation of a critbit tree versus a red-black tree for construction time and lookup. To summarize, it turns out that red-black trees can be significantly more efficient than critbit trees for string lookups if you sort your tree by length and reduce the number of comparisons done at each level. The red-black tree takes a little more memory, but the performance win is significant.
Fundamentally, the critbit tree underperforms because of the amount of work it has to do to at each level. This time appears to be dominated by both branching and indirection overhead.
When I read the QP trie post, I was curious how it would perform from both a memory and time perspective. It turns out that it beats the socks off of both critbit and specialized red-black trees in terms of construction time, search time, and total memory usage.
I chatted with the author of QP tries, Tony Finch, on Twitter and offered to (at a minimum) benchmark the QP trie against other competing data structures. Since I already have a project benchmarking various implementations of string searching algorithms / data structures, I felt this was only fair. Furthermore, my previous posts are not particularly rigorous in terms of expressing benchmark results.
Just so this isn’t an entirely wasted blog post, here are some preliminary results from an Intel Core i7-4702HQ fixed at 2.20GHz. All tests are on a set of 5 million unsorted items with a 12-byte shared prefix (and an average of 70.8 trailing bytes). Construction time and lookup time are measured; items are looked up in insertion order; construction time necessarily includes overhead of memory allocation.
The wordlist size is 207,020,714 bytes, including newline characters.
- Construction time: 65,304,179,282 cycles (~29.68 seconds)
- Lookup time: 21,181,297,328 cycles (~9.62 seconds)
- Memory consumption: 2,162,020,522 bytes (10.44x overhead)
- Construction time: 22,665,676,482 cycles (~10.30 seconds)
- Lookup time: 18,272,427,268 cycles (~8.31 seconds)
- Memory consumption: 362,020,714 bytes (~1.75x overhead)
- Construction time: 16,092,474,335 cycles (~7.31 seconds)
- Lookup time: 12,663,792,573 cycles (~5.76 seconds)
- Memory consumption: 402,020,714 bytes (~1.94x overhead)
- Construction time: 11,149,795,662 cycles (~5.07 seconds!)
- Lookup time: 7,072,997,623 cycles (~3.21 seconds!)
- Memory consumption: 274,871,024 bytes (~1.33x overhead!)
The QP trie is impressive. I look forward to using it in real projects soon as well as exploring how we can develop it further and how it compares to more separate data structures under varying read/update/delete heavy loads.