No, it definitely will not always succeed.
There are some nice probabilistic guarantees about how often it will succeed
depending on the size and number of the hash tables,
but the easy counterexample is to say that, well, here we have 2 hash tables.
If we had 3 items, each of which had the same H1 and H2--
so for instance if we had 3 items where H1 and H2 were both 0,
there's no possible way that we can fit them into the hash table,
because we only have 2 slots where any hash function is equal to 0.
So in practice we choose a certain number of iterations, and we continue to iterate,
trying to fill up this hash table until we decide that we've done too many iterations.
And so if that's the case then we just stop, we choose new hash functions, and we start over.
And again, there's very nice probabilistic guarantees about how often this is going to finish.
So in the research that inspired this work,
the guarantee that we tried to use
was that we could guarantee that it was going to fail less than 1 out of every million times.
So once we construct the hash table, the lookup procedure is really simple.
We're going to calculate all the hash functions for the item that we want to look up.
So for instance, if I want to look up item B, I calculate item B's hash functions.
Here hash function 1 is equal to 0, and hash function 2 is equal to 1.
So what I'm going to do is I'm going to look in all tables
using the particular hash functions until I find what I'm looking for.
So first, I'm going to look in table 1, and I know that I'm going to look in slot 0.
Here I look in slot 0, and I say, "Wait a second, that's not B."
So then I have to go to table 2, look and see that its hash value is equal to 1,
so I'll look in slot 1 in table 2, and I'll say, "Ah, there's B!"
I've now found the value that I'm looking for.
Now if we don't find it in any of these locations, it's just not in the has table.
Now the nice part here is that this is a constant time lookup.
It just requires T lookups, and T is a constant.
It might be 2, it might be 3, and so on.
This is different than chaining.
Chaining has a variable time lookup.
It depends on how many items are in the bucket,
and if we have many items in the bucket, and we have to look all the way to the end,
it can potentially take a very long time,
whereas we can guarantee exactly how much work,
and it's a constant amount of work to look up any item in these hash tables.
不,它肯定不会总是成功。
关于它成功的频率,有一些非常不错的概率保证,
取决于哈希表大小和数量。
但是很容易举反例,嗯,在这里我们有 2个哈希表。
如果我们有 3 项,其中每个有相同的 H1 和 H2 —
所以,例如如果我们有 3 项,H1 和 H2 都是0,
我们不可能有办法把他们放进哈希表,
因为我们只有 2 个空档,任何的哈希函数等于 0。
因此,在实践中我们选择一定次数的迭代,然后我们继续迭代,
试图填满该哈希表,直到我们决定我们做了太多次迭代。
所以如果是这样,那么我们就停下来,我们选择新的哈希函数,然后我们重新开始。
再次说明,完成这个过程频率,有非常不错的概率保证。
所以启发这项工作的研究中,
我们尝试使用的保证
是我们可以保证失败的机率低于百万分之一。
所以一旦我们构造了哈希表,查找过程是非常简单的。
对于想要查找的项,我们要计算所有哈希函数。
例如,如果我想要查找项目 B,我计算出项目 B 的哈希函数。
在这里,哈希函数 1 等于 0,哈希函数 2 等于 1。
所以我要做的是在所有表中查找
使用特定的哈希函数,直到找到我要找的。
所以首先,我要去看看表 1 ,我知道我要在空档 0 中看看。
在这里我看看空档 0,我说,"等一下,那不是 B。"
然后我得去表 2,看到其哈希值等于 1,
所以我会在表 2中看空档1,我会说,"啊,B在这里!"
我现在找到了我要找的值。
现在如果我们在所有这些位置都找不到它,那它就不在哈希表。
现在在这里很好的地方是,这查找事件不变。
它只是要求 T 查找,而 T 是一个常数。
它可能是 2,它可能会是 3,等等。
这不同于链接。
链接的查找时间可变。
它取决于在桶里有多少项目,
如果我们在桶里有很多项目,我们就必须一直查看到最后,
这有可能花费很长时间,
而在哈希表中,我们可以保证我们确切的工作量,
在这些哈希表中查找任何项都是固定的工作量。