Return to Video

02-39 Strong Collision Resistance

  • 0:00 - 0:05
    [Evans] The key point is that this was for weak collision resistance.
  • 0:05 - 0:08
    For weak collision resistance, our assumption,
  • 0:08 - 0:13
    an attacker needs to do about 2 to the b work where b is the number of hash bits
  • 0:13 - 0:15
    to have a good chance of finding a collision.
  • 0:15 - 0:18
    Strong collision resistance is harder than weak.
  • 0:18 - 0:21
    To obtain strong collision resistance is actually much harder,
  • 0:21 - 0:24
    and we'll see that we need about twice as many bits for that--
  • 0:24 - 0:28
    that the attacker's effort is more like 2 to the b over 2,
  • 0:28 - 0:33
    so we'll need about twice as many output bits in our hash function to provide this.
  • 0:33 - 0:37
    The reason for this is because of what's known as the birthday paradox.
  • 0:37 - 0:40
    It's called a paradox. It's not really a paradox.
  • 0:40 - 0:44
    It's a surprise to people who don't follow the mathematics on this,
  • 0:44 - 0:46
    but there's nothing paradoxical about it.
  • 0:46 - 0:49
    It's not like the Berry paradox that leads to a contradiction.
  • 0:49 - 0:53
    The way this is usually framed is you have some group of k people--
  • 0:53 - 0:56
    let's say it's a kindergarten classroom--
  • 0:56 - 1:01
    and you want to know what the probability that 2 people have the same birthday.
  • 1:01 - 1:05
    The reason this is called a paradox is the answer is actually much higher
  • 1:05 - 1:09
    than most people expect. So let's compute this.
  • 1:09 - 1:13
    We'll assume that there are no leap years and that birthdays are uniformly distributed.
  • 1:13 - 1:17
    We assume the same property for our outputs of our ideal hash function.
  • 1:17 - 1:22
    So we'll compute the complement probability that there are no duplicates,
  • 1:22 - 1:26
    and then the probability that there is at least 1 duplicate is 1 minus this.
  • 1:26 - 1:30
    And the way to think about that is we can go through the people in the class.
  • 1:30 - 1:32
    We can assign some birthday to each one.
  • 1:32 - 1:37
    So we'll assign b0 to the first one, b1 to the second one,
  • 1:37 - 1:39
    keep assigning birthdays.
  • 1:39 - 1:41
    There are lots of different ways we could do that.
  • 1:41 - 1:44
    In order to assign birthdays with no duplicates,
  • 1:44 - 1:48
    then there's a limited number of ways that we can pick any of 365 days
  • 1:48 - 1:50
    for the first birthday.
  • 1:50 - 1:53
    For the second one, if we want no duplicates,
  • 1:53 - 1:56
    we can't use whatever day we pick for the first one.
  • 1:56 - 2:01
    So it will be that times 364 times 363 and so forth,
  • 2:01 - 2:05
    and that's the number of ways to assign birthdays with no duplication.
  • 2:05 - 2:08
    We're trying to compute the probability, so we're going to divide that
  • 2:08 - 2:12
    by the number of ways to assign with duplication,
  • 2:12 - 2:17
    which is just 365 choices for each person.
  • 2:17 - 2:24
    So in general, this first value is n factorial divided by n minus k factorial,
  • 2:24 - 2:27
    and the bottom result is just n to the k.
  • 2:27 - 2:30
    N is the number of possible days or the number of possible hash outputs.
  • 2:30 - 2:32
    K is the number of trials.
  • 2:32 - 2:37
    And so our probability that there are duplicates is just 1 minus this.
  • 2:37 - 2:40
    This is what we need for strong collision resistance.
  • 2:40 - 2:44
    It's higher than what we needed for weak collision resistance,
  • 2:44 - 2:48
    which was 1 minus 1 minus 1 over n to the k.
  • 2:48 - 2:51
    So to actually compute this, these numbers would get really big
  • 2:51 - 2:53
    if we tried to actually compute the factorials.
  • 2:53 - 2:57
    Instead we need to use approximations to do this.
  • 2:57 - 2:59
    But to give you some idea what the results are like,
  • 2:59 - 3:05
    if we have the duplicate birthday question where we have 365 days and 20 people,
  • 3:05 - 3:09
    the probability that there's at least 1 duplicate exceeds 0.4.
  • 3:09 - 3:15
    If we're thinking about hash collisions, if we only had a 64-bit hash
  • 3:15 - 3:19
    and our attacker was much weaker than the one we hypothesized--
  • 3:19 - 3:25
    let's say they can only do 2 to the 32--that's already almost a 40% chance of success.
  • 3:25 - 3:27
    This success goes up quite quickly.
  • 3:27 - 3:35
    If the attacker can do 2 to the 34 hashes, then the success probability is very close to 1.
  • 3:35 - 3:37
    So the conclusion is that given an ideal hash function
  • 3:37 - 3:43
    with N possible outputs, an attacker needs about N guesses to find an input
  • 3:43 - 3:48
    that hashes to a particular value but only needs about the square root of N guesses
  • 3:48 - 3:50
    to find a pair that collide.
  • 3:50 - 3:55
    This assumes the attacker can store all those hash values as they try the guesses
  • 3:55 - 3:59
    and compare it to all the previous ones.
  • 3:59 - 4:04
    This is the reason why hash functions need to have large outputs.
  • 4:04 - 4:10
    SHA-1, which was a widely used hash function, used 160 bits in its output.
  • 4:10 - 4:12
    This was actually broken.
  • 4:12 - 4:17
    There is a way to find collisions using SHA-1 with only 2 to the 51 operations--
  • 4:17 - 4:22
    much fewer than the 2 to the 80 that one would expect from this square root estimate,
  • 4:22 - 4:27
    and that's because of mathematical weaknesses in the way SHA-1 works.
  • 4:27 - 4:33
    SHA-2, the output can be either 256 or 512 bits.
  • 4:33 - 4:37
    As long as there aren't mathematical weaknesses in the cipher,
  • 4:37 - 4:43
    if it was a really ideal hash function, this would be big enough to defeat any realistic attacker,
  • 4:43 - 4:48
    but there are beginning to be suggestions that there may be ways to break this.
  • 4:48 - 4:50
    No practical breaks have been found yet.
  • 4:50 - 4:54
    And SHA-3, the winner is expected to be announced later this year.
  • 4:54 - 4:58
    So for the rest of this class we're going to assume that we do have an ideal hash function
  • 4:58 - 5:03
    and that it has enough outputs to provide strong collision resistance
  • 5:03 -
    against a motivated attacker.
Title:
02-39 Strong Collision Resistance
Description:

more » « less
Team:
Udacity
Project:
CS387 - Applied Cryptography
Duration:
05:06
Amara Bot added a translation

English subtitles

Revisions