1 00:00:00,136 --> 00:00:02,589 No, it definitely will not always succeed. 2 00:00:02,589 --> 00:00:05,978 There are some nice probabilistic guarantees about how often it will succeed 3 00:00:05,978 --> 00:00:08,444 depending on the size and number of the hash tables, 4 00:00:08,444 --> 00:00:12,414 but the easy counterexample is to say that, well, here we have 2 hash tables. 5 00:00:12,414 --> 00:00:16,720 If we had 3 items, each of which had the same H1 and H2-- 6 00:00:16,720 --> 00:00:20,557 so for instance if we had 3 items where H1 and H2 were both 0, 7 00:00:20,557 --> 00:00:23,360 there's no possible way that we can fit them into the hash table, 8 00:00:23,360 --> 00:00:27,097 because we only have 2 slots where any hash function is equal to 0. 9 00:00:27,097 --> 00:00:31,534 So in practice we choose a certain number of iterations, and we continue to iterate, 10 00:00:31,534 --> 00:00:35,910 trying to fill up this hash table until we decide that we've done too many iterations. 11 00:00:35,910 --> 00:00:40,813 And so if that's the case then we just stop, we choose new hash functions, and we start over. 12 00:00:40,813 --> 00:00:44,347 And again, there's very nice probabilistic guarantees about how often this is going to finish. 13 00:00:44,347 --> 00:00:47,003 So in the research that inspired this work, 14 00:00:47,003 --> 00:00:49,186 the guarantee that we tried to use 15 00:00:49,186 --> 00:00:55,022 was that we could guarantee that it was going to fail less than 1 out of every million times. 16 00:00:55,022 --> 00:00:58,505 So once we construct the hash table, the lookup procedure is really simple. 17 00:00:58,505 --> 00:01:02,012 We're going to calculate all the hash functions for the item that we want to look up. 18 00:01:02,012 --> 00:01:07,242 So for instance, if I want to look up item B, I calculate item B's hash functions. 19 00:01:07,242 --> 00:01:10,954 Here hash function 1 is equal to 0, and hash function 2 is equal to 1. 20 00:01:10,954 --> 00:01:13,306 So what I'm going to do is I'm going to look in all tables 21 00:01:13,306 --> 00:01:16,416 using the particular hash functions until I find what I'm looking for. 22 00:01:16,416 --> 00:01:20,982 So first, I'm going to look in table 1, and I know that I'm going to look in slot 0. 23 00:01:20,982 --> 00:01:24,029 Here I look in slot 0, and I say, "Wait a second, that's not B." 24 00:01:24,029 --> 00:01:28,915 So then I have to go to table 2, look and see that its hash value is equal to 1, 25 00:01:28,915 --> 00:01:33,023 so I'll look in slot 1 in table 2, and I'll say, "Ah, there's B!" 26 00:01:33,023 --> 00:01:35,313 I've now found the value that I'm looking for. 27 00:01:35,313 --> 00:01:38,329 Now if we don't find it in any of these locations, it's just not in the has table. 28 00:01:38,329 --> 00:01:41,539 Now the nice part here is that this is a constant time lookup. 29 00:01:41,539 --> 00:01:44,074 It just requires T lookups, and T is a constant. 30 00:01:44,074 --> 00:01:46,383 It might be 2, it might be 3, and so on. 31 00:01:46,383 --> 00:01:48,111 This is different than chaining. 32 00:01:48,111 --> 00:01:49,947 Chaining has a variable time lookup. 33 00:01:49,947 --> 00:01:52,315 It depends on how many items are in the bucket, 34 00:01:52,315 --> 00:01:55,118 and if we have many items in the bucket, and we have to look all the way to the end, 35 00:01:55,118 --> 00:01:57,165 it can potentially take a very long time, 36 00:01:57,165 --> 00:01:59,456 whereas we can guarantee exactly how much work, 37 00:01:59,456 --> 00:02:03,000 and it's a constant amount of work to look up any item in these hash tables.