Concurrent Hashing Map with Hopscotch Hashing
Here is a link to our final report: report
We are aiming at implementing the concurrent hashmap using hopscotch hashing technique, one of the modern hashing techniques popular in academia. The idea is inspired by former researchers' result on Cuckoo hashing techniques, which implement Cuckoo hashing for read-dominant workload and integrated with Memcached. As Hopsotch hashing idea is published more recently and claims to have better overall performance on sequential version, we are driven by the curiosity of discovering the performance of its concurrent version, comparing with Cuckoo hashing and other popular concurrent Hashmap implementations.
The project interests us in two perspectives:
Unlike Cuckoo hashing which combines two different hash function, there's only single hash function in Hopscotch hashing. Items hashed to the same entry will be resolved in the following way: If the entry is empty, the item will be stored there. If not, it will be rehashed into somewhere within H distance from the original place that hashed to, where H is a constant. Every entry has hop-information associated with it: a H-bit bitmap, indicating if there are item that original hashed to current entry stored in the next H neighboring entries. The candidate-searching for collision resolution is by linear probing method, we record the first available position as j. If j is not within H range, we need to find an item y which lies between j and original hashed position i. We displacing y to j to creates a new slot closer to i. Repeat this process, until no empty slots available, we will do the rehashing.
Here are the deliverables that we plan to achieve:
For the fifth point made above, it is a 'Hope-to-achieve' stretch goal that we don't actually have an idea for design and implementation, currently. The nature of write-expensive storage medium like Non-volatile memory can lead to different design of data structures.
The concurrent Hashing techniques have been explored by researchers for years. The baseline we select for benchmarking result is ConcurrentHashMap.java from java.concurr.util. To the best of our knowledge, this is the most efficient concurrent hash algorithm. Also, we mentioned a similar but older techniques called Cuckoo hashing, implemented by Bin. To evaluate our result, we will release a benchmarking result between our hopscotch hashing-based implementation and those existing implementation, with respect to scalability, throughput and memory efficiency. The final result will be illustrated in chart indicating the performance and scalability versus number of threads.
The challenge comes from the rigorousness of data structure design: We need to ensure the generality of the data structure, handling different race condition and deadlock problems, while emphasize on read-heavy workload. Furthermore, the rehashing is always a major difficulties for hashmap implementation.
For the stretch goal, there are obstacles for both implementation and experiments. For the implementation, as we have mentioned, there's no specific paper or research result that we could refer to, to handling workload under write-expensive storage medium, for a specific data structure. For the experiments, we may need to apply for the access of Non-volatile memory emulator from Prof.Andy Pavlo, this is also not guaranteed.
What have we been doing for this project in the past few weeks? We have throughly read the paper in which Hopscotch Hashing is proposed. By reading this paper and some other related work, such as Lea's implementation of Java's ConcurrentHashMap, we have obtained an general idea about what might affect the performance of an concurrent Hashtable, and techniques of optimization. After we understand the details of Hopscotch Hashing, we have implemented, in java, an serial version and a coarse-grained concurrent version of it.
What have we learnt so far?
Hopscotch hashing is an open addressing hashing algorithm. In the paper where Hopscotch hashing is proposed, the author gives pseudocode for the concurrent version of Hopscotch hashing. It is worth mentioning that the author uses bucket-level locking, which is fine if the only contains()
, add()
and remove()
methods are needed. However in practice, some table-level lock is required, such as calls to getSize()
, isEmpty()
or rehash()
. In this case, having bucket-level lock is not a good idea.
The solution is proposed by Doug Lea, in his implementation of ConcurrentHashMap. Instead of having a single table-level lock, we can have N reentrant lock, each covering a continuous portion of the array of buckets. To insert new
As mentioned in the first part, we have implemented an serial version and coarse-grained concurrent version of Hopscotch hashing table in Java. We have also made it clear what will we benchmark our final implementation of Hopscotch hashing against. In particular, we will compare the speedup of fixed-size data insertion and static data read, on the latedays machines (Intel Xeon E5645 CPU) on the following occasions:
java.util.concurrent.ConcurrentHashMap
, to the best of our knowledge, is one of the fastest thread-safe Hash table that we can find on JVM. We will compared our Hopscotch Hashing table with this widely-used implementation in terms of read/write throughput, scalability and space efficiency. Especially, as the Hopscotch claims to maintain satisfying performance even when the Hash table is over 90% full[2], we are expecting to see the superiority, in terms of read/write throughput in such circumstance.
We have to admit, though, that we are slightly behind our schedule. So we both decided to prioritize this project and spend more time on it during the next few weeks. Hopefully by wisely spending the Thanksgiving vacation we get back on the track.
For Poster session, we will be presenting several graphs, including speedup, read/write throughput with respect to different number of cores. We will also have our machines set up such that people can come and see the running instances of our code, but that hardly qualifies as a demo.
Our biggest challenge is lack of experience in designing and implementing such kind of fundamental data structure for speed. Also the separation of Hopscotch path discovery and execution requires subtle lock management. Currently we are still facing lot of nondeterministic bugs.
[1] http://stackoverflow.com/questions/8882885/is-it-possible-to-force-an-existing-java-application-to-use-no-more-than-x-cores [2] Herlihy, M., Shavit, N., & Tzafrir, M. (2008, September). Hopscotch hashing. In International Symposium on Distributed Computing (pp. 350-364). Springer Berlin Heidelberg.] [3] Fan, B., Andersen, D. G., & Kaminsky, M. (2013). MemC3: Compact and concurrent memcache with dumber caching and smarter hashing. In Presented as part of the 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13) (pp. 371-384).