
Title:
Cuckoo Hashing Part2  Intro to Parallel Programming

Description:

No, it definitely will not always succeed.

There are some nice probabilistic guarantees about how often it will succeed

depending on the size and number of the hash tables,

but the easy counterexample is to say that, well, here we have 2 hash tables.

If we had 3 items, each of which had the same H1 and H2

so for instance if we had 3 items where H1 and H2 were both 0,

there's no possible way that we can fit them into the hash table,

because we only have 2 slots where any hash function is equal to 0.

So in practice we choose a certain number of iterations, and we continue to iterate,

trying to fill up this hash table until we decide that we've done too many iterations.

And so if that's the case then we just stop, we choose new hash functions, and we start over.

And again, there's very nice probabilistic guarantees about how often this is going to finish.

So in the research that inspired this work,

the guarantee that we tried to use

was that we could guarantee that it was going to fail less than 1 out of every million times.

So once we construct the hash table, the lookup procedure is really simple.

We're going to calculate all the hash functions for the item that we want to look up.

So for instance, if I want to look up item B, I calculate item B's hash functions.

Here hash function 1 is equal to 0, and hash function 2 is equal to 1.

So what I'm going to do is I'm going to look in all tables

using the particular hash functions until I find what I'm looking for.

So first, I'm going to look in table 1, and I know that I'm going to look in slot 0.

Here I look in slot 0, and I say, "Wait a second, that's not B."

So then I have to go to table 2, look and see that its hash value is equal to 1,

so I'll look in slot 1 in table 2, and I'll say, "Ah, there's B!"

I've now found the value that I'm looking for.

Now if we don't find it in any of these locations, it's just not in the has table.

Now the nice part here is that this is a constant time lookup.

It just requires T lookups, and T is a constant.

It might be 2, it might be 3, and so on.

This is different than chaining.

Chaining has a variable time lookup.

It depends on how many items are in the bucket,

and if we have many items in the bucket, and we have to look all the way to the end,

it can potentially take a very long time,

whereas we can guarantee exactly how much work,

and it's a constant amount of work to look up any item in these hash tables.