Favorite Data Structures
During technical interviews for software engineers I used to ask people what their favorite data structure was. If they didn’t have one or hadn’t thought very deeply about data structures since maybe university, I would tell them that mine was the mighty hash table. Then I would ask them, if you had to build a hash table from scratch (assume one wasn’t available in the standard library of your choice). They could use any language they wanted or pseudo code if they liked. The point wasn’t to see if they could build a production quality implementation during our time together but just to see how they would reason through it. That’s why I was excited to see Facebook open source a new hash table implementation to the community.
The mighty hash table is useful because it provides O(1) lookup, insert, and erase. No matter how big the table gets the average number of steps performed by the algorithm is constant. Two very important considerations for any hash table implementation is how it will map keys to slots (the hash function) and how it will handle collisions of keys. In a hash function you want speed and a great distribution of keys to slots with minimum collisions. There are two common ways to handle collisions when they occur.
- Separate Chaining - where you use another data structure like a lined list to store the keys and then search that data structure to find the key for the operation.
- Probing - where the key is stored at some un-occupied offset from the index the key hashed to.
Speed like all things in engineering is about trade offs and the F14 implementation uses a technique to do parallel lookup of key blocks (instead of hashing to a single slot in F14 the data structure hashs to a block of 14 slots) and utilizes vector instructions to search all fourteen slots in parallel.
Pretty cool, and if a candidate brought up F14 during an interview I would be “reasonably” impressed :-)
Photo by Arif Wahid on Unsplash