Introducing the ALS HashMap: a general-purpose, fast and memory efficient HashTable algorithm.
It has been ages since my last blog post on this site and even more since my last contribution to this hacking community or the programming world in general.
For that reason and seeing the great work several developers are putting forward I’ve decided to create this blog post about my greatest investigation work so far with the hopes that it will be useful for others to build their tools upon.
This article will be centered on a more generic topic that doesn’t directly involve console hacking or homebrew development, but rather computer science in general, more specifically data structures and algorithmic complexity.
For a few years I have been developing a library for the C language that aims at providing high-level features and an extension to the standard. One of these features was the ability to code dynamically typed objects, and this required using a HashTable to allow for constant operations on the attributes and methods that such objects contained.
The inner working of a hashmap consists of a table (an array or buffer in memory) where we store elements depending on an index similarly to vectors, but unlike the latter the index in a hashmap does not correspond to a position within the array, but rather corresponds to a unique key that we use to identify the stored element.
This combination of keys and values is called a key-value pair.
This is identical to how variables work; the name of the variable is a unique identifier that holds a given value.
What the hashmap does is run the key through a function called the Hash Function, which maps a key to a slot within the table where we can store the value.
There is however a problem with hashmaps: the number of possible keys is theoretically infinite but the number of entries within the table is extremely small. This means that completely different keys will end up mapping to the same slot, thus causing what is known as a hash collision.
How you handle this collision in order to store both elements is one of the main areas of study within the realm of hashing.
The simplest technique used to handle a hash collision is called separate chaining.
The idea behind this is to store a linked list or any other data structure (self-balancing binary trees are often also used) in each of the table’s slot. Whenever a key is mapped to a slot when delegate the operation to the auxiliary structure contained, thus allowing for more than one key to be hashed to the same slot.
This form of HashMap is extremely simple to program and generates relatively good results in most general cases.
Java’s HashMap implementation uses this form of collision resolution with Red Black Trees for logarithmic search within a given bucket.
There are a few problems with this technique however: every operation requires dynamic memory only, and there might be cases where table slots are not occupied, thus wasting space.
This is specially true if your hash function does not generate well distributed hashes.
Chaining is also not cache-friendly due to the nature of dynamic memory, and we can have cases where most of the entries are empty and we have very few occupied slots with huge auxiliary data, thus gratefully harming time performance and resulting in memory waste.
To help solve the issue a completely different technique was developed that reduces the time to perform operations and guarantees good usage of the table: probing.
With chaining we used different linked lists stored within each of the table’s slot where we could insert any number of colliding elements, this however proves to be inefficient in both space and time.
Probing however relies on a totally different approach for handling collisions; the idea is to find an empty slot within the table where we can store a colliding element.
There are different ways to achieve this lookup for an empty spot, but they are mostly categorized as three: linear probing, quadratic probing and double hashing.
Linear probing allows to look for an empty spot within the table in a linear direction starting from the colliding spot, for example if we have a collision at index 4 in a table of size 16 we would be checking at index 5, 6, 7, 8, etc until an empty spot is found.
Quadratic probing is essentially the same but using the index increment is calculated by a quadratic function instead of linearly iterating the table while double hashing requires running the key through a second hash function that calculates the index increment based on the key.
Probing allows for table space to be used exclusively as the container for our elements and improves lookup time due to more cache hits as iterations are done through consecutively stored memory locations and if a good second hash function is used we can have logarithmic operations and well distributed tables. This is the technique used by Python and some C++ implementations.
There is however a drawback with probing: as the table gets filled it takes more iterations to find an empty slot, meaning that eventually as the table is reaching its full capacity operations start reaching the linear O(n) complexity and the table becomes slow. To prevent this we need to define a maximum allowed table usage, and this is usually around 70% of the table’s size.
In other words, for a table of size 32, you cannot store more than 22 elements (0.7*32 = 22).
And it gets even worse in some scenarios. Suppose you are to store 30 elements in a hash table, since this number goes beyond 70% of a table of size 32, you need a table of 64 entries, of which 34 will be empty; that’s more than 50% loss of table space regardless of the fact that probing should have better table usage.
Different techniques have been developed to improve probing, the most important ones for this article are cuckoo and hopscotch hashing.
Cuckoo hashing is a relatively simple technique; when a new item is being inserted and it collides you usually attempt to find an empty spot for the new item, however with cuckoo hashing if the item that is already stored happens to have been reallocated, while the new item actually belongs there then we reinsert the already contained item while keeping the new item in its hashed slot. In other words, items that belong to a certain slot as mandated by the hash function take precedence over items that have been allocated by probing when a collision occurs, this way we increase the number of items with constant O(1) complexity.
Hopscotch hashing works on the idea to limit the number of iterations needed to operate on inserted items. The idea is fairly simple; use probing to find an empty spot for the colliding element, however if this new slot is too far away from the original (as mandated by the hopscotch region) then we need to rearrange some of the elements to make sure we find a closer spot, usually by finding a third occupied slot closer to the hashed index that can be moved to the empty slot without breaking the hopscotch region limitation.
Both cuckoo and hopscotch rearrange the location of stored elements to improve table performance, but all techniques still fail to guarantee a logarithmic worst-case: you may still find yourself having to do O(n) iterations to find an empty slot thus the maximum allowed table usage restriction still applies, therefore we still waste a lot of table space.
ALS HashMap: One technique to rule them all.
Despite all the different techniques and studies done to improve hash tables, we still find ourselves with less than optimum memory usage and possible degrading of table performance to O(n) in the worst case.
I end up saying to myself “This is intolerable! HashMaps should guarantee an always logarithmic worst case and high table usage (as less empty space as possible)”, so I set out to create an algorithm that could achieve this: high table usage (high load-factors) with always guaranteed logarithmic runtime for all operations. Thus the ALS HashMap was born.
The ALS HashMap is an algorithm that unifies all of the above mentioned techniques into one, using the best of each world to allow for as much table usage as possible without ever having to do an operation above the log(n) scale.
The algorithm is therefore a hybrid, it uses both probing and chaining: probing allows us to allocate elements within the table, allowing less dynamic memory and less wasted table space, however chaining allows us to store more elements than the table allows and therefore avoiding expensive O(n) rehashing of the table. Cuckoo and hopscotch techniques are also used as a way to rearrange elements and preventing probing from doing more iterations than allowed.
This is mainly achieved by using both probing and chaining for collision resolution. Probing is limited to only a certain slots near the hashed index (the hopscotch region), while chaining is limited to only when probing fails and only a certain number of elements can be stored until a table resize is required.
In the ALS HashMap, each bucket in the table contains two bits of information that tells the algorithm what is stored in that bucket and how, as well as either a key-value pair or an auxiliary data collection (a generic pointer or C union type can be used to model this). There are 4 different states a given bucket can have and this is where the algorithm gets the name from.
– E: mark that signifies the bucket being empty.
– A: mark signifying the bucket contains an auxiliary data collection. Items hashing to an A type bucket will be inserted into the contained collection as normally done with chaining.
– L: this mark signifies the bucket contains an single key-value pair that has been allocated here by the hash function, in other words the stored key belongs here and all operations on this item will be O(1).
– S: this mark is used for buckets that contain a single key-value pair that has been allocated here by probing. In other words, the pair stored here hashes to a different slot that is already occupied, so the algorithm has used probing to find a nearby empty bucket to store this item.
When a new item is being inserted we run the key through the hash function as usual and we operate depending on the state of the bucket the key has been hashed to.
– E: the bucket is empty so we just store the key-value pair in it and mark it as type L.
– A: the bucket contains an auxiliary data collection as we insert the key-value pair into it.
– L: the bucket contains an item that belongs here so we must solve collision. Here’s where the magic happens. The ALS algorithm first attempts to find a nearby empty bucket within the hopscotch region by linearly probing in both directions starting from the hashed slot. The number of allowed probes is called the probe range and it’s a value that should be no more than the logarithm in base 2 of the table size. If probing succeeds at finding an empty slot within the hopscotch region we insert the colliding element there and mark the slot as type S. When probe fails due to being unable to find an empty bucket within the limited region then we resort to chaining: we create an auxiliary data structure inside the hashed slot and mark it as type A, then we insert into it all the items that belong to this slot (which can all be found within the hopscotch region).
– S: this is when cuckoo hashing comes into play. The key-value pair contained within this slot has been allocated here by probing, which means operations on this item will be within the logarithmic scale, however the new item to be inserted has been hashed to this slot, meaning its complexity will be constant if inserted here. Therefore we remove and reinsert the contained element, while inserting the new element in this slot, changing its mark from S to L.
Inserting items is the more tricky part of the algorithm as it requires rearranging items according to the marks of the buckets that items are being hashed to. However the way the algorithm works makes lookup and deletion much simpler and generally faster as you only really have to take into consideration buckets of type L and A.
For lookup and deletion, a key being hashed to a bucket of type E means the key doesn’t exist, otherwise it would have been found in this bucket or a nearby one. Same happens with buckets of type S: when doing a lookup or deletion and the hash function points to a bucket of type S we conclude that such key has never been inserted, otherwise the bucket would be of type L due to the cuckoo technique moving the S-contained element somewhere else to allow for an L-element with constant operations.
This means that for lookup and deletion only two bucket types are of use: L and A.
If our hash function points to a bucket of type A we simply search in the contained data structure for the element. If the bucket is of type L we check if the contained key matches the key being searched, if not then we do probing and if that fails then the key is not found.
Since probing is limited by the hopscotch region we are guaranteed that no probing operation will ever go above the log(n) runtime, and auxiliary data structures are also limited to a certain number of elements for the same reason, resorting to table resize when this happens.
Now you may be asking: How good does this algorithm perform?
I’ve ran tests on this algorithm that consisted of inserting 500,000 elements into it and compare it with std::unordered_map, the standard HashMap implementation for C++. The results where astonishing:
– C++’s HashMap failed to store that amount of items as it needed a table of 2^20 (a table with one million entries) to keep table usage bellow 70% (as explained above, probing requires this limitation to keep operations logarithmic) and my computer could not allocate that much space in a go. Even if it was possible, it still requires a table of one million entries to store only half the items.
– The ALS HashMap not only manages to store all the elements, it does it using a table of size 2^19, roughly a bit more than 500,000.
– The table usage factor for the ALS HashMap is above 90%, in other words, more than 90% of the table space is used, meaning there is an extremely efficient usage of table space.
– As explained, all operations done by the ALS algorithm are guaranteed to be bellow the logarithmic scale, in the case of this test the probe range was 19 and the biggest found auxiliary structure contained only 9 items. This means that for a hashmap containing 500,000 elements, the worst that could happen will be either having to do 19 linear iterations or iterating through a simple linked list of 9 elements.
Another test conducted with real data (more specifically, Python’s string methods to simulate the class dictionary) showed the same results: a faster running time with high table usage and less overall wasted memory.
Now I have to ask myself what the future of this algorithm will be and whether it proves to be useful for real-life applications, only time will tell. For now I will leave you to the link on github where you can find the source code and documentation on the project.
I hope you’ve enjoyed the read and hopefully this won’t be the last you hear from me.