Categories
Blog

Hashtables in javascript

I was looking into some performance problems with knockout yesterday and starting digging into the code. In particular I was wondering why it would take their mapping plugin several seconds to wrap an array of 500 objects. Profiling lead me to look deeper into the code, and the following code stroke me as very strange:

Basically this is a hash implementation of only put and get operations using two arrays. I don’t think I need to tell you how bad the performance of this will be once you start adding lots of items 🙂

The reason why they didn’t just use an ordinary object to store their values, was that their keys were objects and not just simple strings or integers. I started wondering what other people did with this problem. There must be a library for this. And low an behold there was.

Then I started thinking, why don’t you just jsonize the object and then store that as the key in a plain js object. Both things should be really fast, since every new browser nowadays has a very fast JSON implementation.

With the 3 implementations in hand I did a simple benchmark. Generate 10.000 objects, store them in the hashtable with the object as a key. Followed by getting all the objects again.

Using Firefox 12: naive double array: 1370ms, json: 77ms, jshashtable: 170. Using Chrome 19 and IE9 the performance gap is very similar. Chrome is of course a tad faster.

Update: reran the benchmark using suggestions from Tim.

 

Code is available here.

4 replies on “Hashtables in javascript”

jshashtable is designed to be flexible and therefore makes no assumptions about how keys should be considered equal or how to hash them. Keys that hash to the same value go into a bucket (essentially an array). Searching for a key therefore hashes the key, gets the correct bucket and then iterates through all keys in the bucket until it finds the correct one. The default hash function is to call toString() on the key. Therefore for strings and numbers it’s nice and fast because everything goes in separate buckets; for objects, if no hashing function is present, everything goes into the “[object Object]” bucket and things quickly get slow. However, you can provide a hashing function to the constructor. JSON isn’t really suitable as a hashing algorithm because there’s no guarantee about the order in which object properties are enumerated and theoretically the same object could produce different JSON representations but in practice it’s likely to be OK. So what you could do is:

var ht = new Hashtable(function(obj) {
return JSON.stringify(obj);
});

The jshashtable documentation covers this in detail: http://www.timdown.co.uk/jshashtable/

I would rather not use JSON as a default. The biggest reason is that JSON.stringify is not guaranteed to be present in non-ECMAScript 5 environments (which includes IE < 9, which I don't intend to stop supporting). Also, as mentioned, it's not guaranteed to produce the same serialization each time it's called for the same object. It could also be inefficient for a large object. It's pretty trivial to use JSON as a hashing function though (see last comment). I may update the docs.

Comments are closed.