## 02-39 Strong Collision Resistance

• 0:00 - 0:05
[Evans] The key point is that this was for weak collision resistance.
• 0:05 - 0:08
For weak collision resistance, our assumption,
• 0:08 - 0:13
an attacker needs to do about 2 to the b work where b is the number of hash bits
• 0:13 - 0:15
to have a good chance of finding a collision.
• 0:15 - 0:18
Strong collision resistance is harder than weak.
• 0:18 - 0:21
To obtain strong collision resistance is actually much harder,
• 0:21 - 0:24
and we'll see that we need about twice as many bits for that--
• 0:24 - 0:28
that the attacker's effort is more like 2 to the b over 2,
• 0:28 - 0:33
so we'll need about twice as many output bits in our hash function to provide this.
• 0:33 - 0:37
The reason for this is because of what's known as the birthday paradox.
• 0:37 - 0:40
• 0:40 - 0:44
It's a surprise to people who don't follow the mathematics on this,
• 0:44 - 0:46
• 0:46 - 0:49
• 0:49 - 0:53
The way this is usually framed is you have some group of k people--
• 0:53 - 0:56
let's say it's a kindergarten classroom--
• 0:56 - 1:01
and you want to know what the probability that 2 people have the same birthday.
• 1:01 - 1:05
The reason this is called a paradox is the answer is actually much higher
• 1:05 - 1:09
than most people expect. So let's compute this.
• 1:09 - 1:13
We'll assume that there are no leap years and that birthdays are uniformly distributed.
• 1:13 - 1:17
We assume the same property for our outputs of our ideal hash function.
• 1:17 - 1:22
So we'll compute the complement probability that there are no duplicates,
• 1:22 - 1:26
and then the probability that there is at least 1 duplicate is 1 minus this.
• 1:26 - 1:30
And the way to think about that is we can go through the people in the class.
• 1:30 - 1:32
We can assign some birthday to each one.
• 1:32 - 1:37
So we'll assign b0 to the first one, b1 to the second one,
• 1:37 - 1:39
keep assigning birthdays.
• 1:39 - 1:41
There are lots of different ways we could do that.
• 1:41 - 1:44
In order to assign birthdays with no duplicates,
• 1:44 - 1:48
then there's a limited number of ways that we can pick any of 365 days
• 1:48 - 1:50
for the first birthday.
• 1:50 - 1:53
For the second one, if we want no duplicates,
• 1:53 - 1:56
we can't use whatever day we pick for the first one.
• 1:56 - 2:01
So it will be that times 364 times 363 and so forth,
• 2:01 - 2:05
and that's the number of ways to assign birthdays with no duplication.
• 2:05 - 2:08
We're trying to compute the probability, so we're going to divide that
• 2:08 - 2:12
by the number of ways to assign with duplication,
• 2:12 - 2:17
which is just 365 choices for each person.
• 2:17 - 2:24
So in general, this first value is n factorial divided by n minus k factorial,
• 2:24 - 2:27
and the bottom result is just n to the k.
• 2:27 - 2:30
N is the number of possible days or the number of possible hash outputs.
• 2:30 - 2:32
K is the number of trials.
• 2:32 - 2:37
And so our probability that there are duplicates is just 1 minus this.
• 2:37 - 2:40
This is what we need for strong collision resistance.
• 2:40 - 2:44
It's higher than what we needed for weak collision resistance,
• 2:44 - 2:48
which was 1 minus 1 minus 1 over n to the k.
• 2:48 - 2:51
So to actually compute this, these numbers would get really big
• 2:51 - 2:53
if we tried to actually compute the factorials.
• 2:53 - 2:57
Instead we need to use approximations to do this.
• 2:57 - 2:59
But to give you some idea what the results are like,
• 2:59 - 3:05
if we have the duplicate birthday question where we have 365 days and 20 people,
• 3:05 - 3:09
the probability that there's at least 1 duplicate exceeds 0.4.
• 3:09 - 3:15
If we're thinking about hash collisions, if we only had a 64-bit hash
• 3:15 - 3:19
and our attacker was much weaker than the one we hypothesized--
• 3:19 - 3:25
let's say they can only do 2 to the 32--that's already almost a 40% chance of success.
• 3:25 - 3:27
This success goes up quite quickly.
• 3:27 - 3:35
If the attacker can do 2 to the 34 hashes, then the success probability is very close to 1.
• 3:35 - 3:37
So the conclusion is that given an ideal hash function
• 3:37 - 3:43
with N possible outputs, an attacker needs about N guesses to find an input
• 3:43 - 3:48
that hashes to a particular value but only needs about the square root of N guesses
• 3:48 - 3:50
to find a pair that collide.
• 3:50 - 3:55
This assumes the attacker can store all those hash values as they try the guesses
• 3:55 - 3:59
and compare it to all the previous ones.
• 3:59 - 4:04
This is the reason why hash functions need to have large outputs.
• 4:04 - 4:10
SHA-1, which was a widely used hash function, used 160 bits in its output.
• 4:10 - 4:12
This was actually broken.
• 4:12 - 4:17
There is a way to find collisions using SHA-1 with only 2 to the 51 operations--
• 4:17 - 4:22
much fewer than the 2 to the 80 that one would expect from this square root estimate,
• 4:22 - 4:27
and that's because of mathematical weaknesses in the way SHA-1 works.
• 4:27 - 4:33
SHA-2, the output can be either 256 or 512 bits.
• 4:33 - 4:37
As long as there aren't mathematical weaknesses in the cipher,
• 4:37 - 4:43
if it was a really ideal hash function, this would be big enough to defeat any realistic attacker,
• 4:43 - 4:48
but there are beginning to be suggestions that there may be ways to break this.
• 4:48 - 4:50
No practical breaks have been found yet.
• 4:50 - 4:54
And SHA-3, the winner is expected to be announced later this year.
• 4:54 - 4:58
So for the rest of this class we're going to assume that we do have an ideal hash function
• 4:58 - 5:03
and that it has enough outputs to provide strong collision resistance
• 5:03 -
against a motivated attacker.
Title:
02-39 Strong Collision Resistance
Description:

more » « less
Team:
Udacity
Project:
CS387 - Applied Cryptography
Duration:
05:06