1
00:00:00,136 --> 00:00:02,589
No, it definitely will not always succeed.
2
00:00:02,589 --> 00:00:05,978
There are some nice probabilistic guarantees about how often it will succeed
3
00:00:05,978 --> 00:00:08,444
depending on the size and number of the hash tables,
4
00:00:08,444 --> 00:00:12,414
but the easy counterexample is to say that, well, here we have 2 hash tables.
5
00:00:12,414 --> 00:00:16,720
If we had 3 items, each of which had the same H1 and H2--
6
00:00:16,720 --> 00:00:20,557
so for instance if we had 3 items where H1 and H2 were both 0,
7
00:00:20,557 --> 00:00:23,360
there's no possible way that we can fit them into the hash table,
8
00:00:23,360 --> 00:00:27,097
because we only have 2 slots where any hash function is equal to 0.
9
00:00:27,097 --> 00:00:31,534
So in practice we choose a certain number of iterations, and we continue to iterate,
10
00:00:31,534 --> 00:00:35,910
trying to fill up this hash table until we decide that we've done too many iterations.
11
00:00:35,910 --> 00:00:40,813
And so if that's the case then we just stop, we choose new hash functions, and we start over.
12
00:00:40,813 --> 00:00:44,347
And again, there's very nice probabilistic guarantees about how often this is going to finish.
13
00:00:44,347 --> 00:00:47,003
So in the research that inspired this work,
14
00:00:47,003 --> 00:00:49,186
the guarantee that we tried to use
15
00:00:49,186 --> 00:00:55,022
was that we could guarantee that it was going to fail less than 1 out of every million times.
16
00:00:55,022 --> 00:00:58,505
So once we construct the hash table, the lookup procedure is really simple.
17
00:00:58,505 --> 00:01:02,012
We're going to calculate all the hash functions for the item that we want to look up.
18
00:01:02,012 --> 00:01:07,242
So for instance, if I want to look up item B, I calculate item B's hash functions.
19
00:01:07,242 --> 00:01:10,954
Here hash function 1 is equal to 0, and hash function 2 is equal to 1.
20
00:01:10,954 --> 00:01:13,306
So what I'm going to do is I'm going to look in all tables
21
00:01:13,306 --> 00:01:16,416
using the particular hash functions until I find what I'm looking for.
22
00:01:16,416 --> 00:01:20,982
So first, I'm going to look in table 1, and I know that I'm going to look in slot 0.
23
00:01:20,982 --> 00:01:24,029
Here I look in slot 0, and I say, "Wait a second, that's not B."
24
00:01:24,029 --> 00:01:28,915
So then I have to go to table 2, look and see that its hash value is equal to 1,
25
00:01:28,915 --> 00:01:33,023
so I'll look in slot 1 in table 2, and I'll say, "Ah, there's B!"
26
00:01:33,023 --> 00:01:35,313
I've now found the value that I'm looking for.
27
00:01:35,313 --> 00:01:38,329
Now if we don't find it in any of these locations, it's just not in the has table.
28
00:01:38,329 --> 00:01:41,539
Now the nice part here is that this is a constant time lookup.
29
00:01:41,539 --> 00:01:44,074
It just requires T lookups, and T is a constant.
30
00:01:44,074 --> 00:01:46,383
It might be 2, it might be 3, and so on.
31
00:01:46,383 --> 00:01:48,111
This is different than chaining.
32
00:01:48,111 --> 00:01:49,947
Chaining has a variable time lookup.
33
00:01:49,947 --> 00:01:52,315
It depends on how many items are in the bucket,
34
00:01:52,315 --> 00:01:55,118
and if we have many items in the bucket, and we have to look all the way to the end,
35
00:01:55,118 --> 00:01:57,165
it can potentially take a very long time,
36
00:01:57,165 --> 00:01:59,456
whereas we can guarantee exactly how much work,
37
00:01:59,456 --> 00:02:03,000
and it's a constant amount of work to look up any item in these hash tables.