
Title:
Cuckoo Hashing Part3  Intro to Parallel Programming

Description:

So a few notes on implementation.

The real benefits of this particular formulation of hashing

are that our lookups are again constant time,

and that's terrific for parallelism for a machine like the GPU,

because we keep all of our threads busy.

There's no thread divergence at all, and the construction algorithm is actually pretty simple and pretty fast.

It's actually a very efficient hash table construction when it's done in shared memory with a small of item to hash,

but it turns out that it actually works fairly well with large data sets

when the hash table needs to be constructed in main memory.

The really important thing to keep in mind there

is that the operation of write my item into the hash table and kick the other item out needs to be an atomic operation

in our implementation we used atomic exchange

to make sure the 2 competing threads don't have any possibility of stomping on top of each other.

And a final note, there's more than one way we might solve this general problem of check if an element is in a set.

Instead of using hash tables, we could choose a fairly brute force way to do this.

We could choose to sort all the elements in the set as a construction step,

and then, for the lookup step, we could do binary into the set

to see if that particular element is present.

So for instance, we might have 5 keys here in a set.

To construct this particular data structure, we sort these in order of the key,

and to do lookup we just do binary search within this set.

So for instance, if we're looking up value k6 with one thread,

we'd go see k5my keys bigger than that.

I'd then pop this way and so on until I get to k6.

I might have another thread that's looking up k3.

It would start in the middle at k5 and then do binary search until it finds k3.

Now sorting is quite fast on GPUs, and even though hash tables are faster for this particular computation,

they're not so much faster that should never consider a sort to be a good idea.

Often on the GPU, sometimes a brute force approach,

like sort and binary search, might be your best option.