< Return to Video

Semantic Security (16 min)

  • 0:00 - 0:04
    My goal for the next few segments is to
    show you that if we use a secure PRG We'll
  • 0:04 - 0:08
    get a secure stream safer. The first thing
    we have to do is define is, what does it
  • 0:08 - 0:12
    mean for a stream safer to be secure? So
    whenever we define security we always
  • 0:12 - 0:15
    define it in terms of what can the
    attacker do? And what is the attacker
  • 0:15 - 0:19
    trying to do? In the context of
    stream ciphers remember these are only used
  • 0:19 - 0:23
    with a onetime key, and as a result the
    most the attacker is ever going to see is
  • 0:23 - 0:27
    just one cypher text encrypted using the
    key that we're using. And so we're gonna
  • 0:27 - 0:31
    limit the attackers? ability to basically
    obtain just one cypher text. And in
  • 0:31 - 0:35
    fact later on we're going to allow the
    attacker to do much, much, much more, but
  • 0:35 - 0:38
    for now we're just gonna give him one
    cypher text. And we wanted to find,
  • 0:38 - 0:43
    what does it mean for the cypher to
    be secure? So the first definition that
  • 0:43 - 0:47
    comes to mind is basically to say, well
    maybe we wanna require that the attacker
  • 0:47 - 0:51
    can't actually recover the secret key.
    Okay so given ciphertext you
  • 0:51 - 0:55
    shouldn't be able to recover the secret
    key. But that's a terrible definition
  • 0:55 - 0:59
    because think about the falling brilliant
    cipher but the way we encrypt the
  • 0:59 - 1:03
    message using a key K is basically
    we just output the message. Okay this
  • 1:03 - 1:07
    is a brilliant cipher yeah of course it
    doesn't do anything given a message just
  • 1:07 - 1:12
    output a message as the cipher text.
    This is not a particularly good encryption
  • 1:12 - 1:16
    scheme; however, given the cipher text,
    mainly the message the poor attacker
  • 1:16 - 1:20
    cannot recover the key because he doesn't
    know what the key is. And so as a result
  • 1:20 - 1:25
    this cipher which clearly is insecure,
    would be considered secure under this
  • 1:25 - 1:29
    requirement for security. So this
    definition will be no good. Okay so it's
  • 1:29 - 1:32
    recovering the secret key is the wrong way
    to define security. So the next thing we
  • 1:32 - 1:36
    can kinda attempt is basically just say,
    well maybe the attacker doesn't care about
  • 1:36 - 1:40
    the secret key, what he really cares about
    are the plain text. So maybe it should be
  • 1:40 - 1:44
    hard for the attacker to recover the
    entire plain text. But even that doesn't
  • 1:44 - 1:48
    work because let's think about the
    following encryption scheme. So suppose
  • 1:48 - 1:53
    what this encryption scheme does is it
    takes two messages, so I'm gonna use two
  • 1:53 - 1:58
    lines to denote concatenation of two
    messages M0 line, line M1 means
  • 1:58 - 2:03
    concatenate M0 and M1. And imagine
    what the scheme does is really it outputs
  • 2:03 - 2:08
    M0 in the clear and concatenate to
    that the encryption of M1. Perhaps even
  • 2:08 - 2:13
    using the One Time Pad okay? And
    so here the attacker is gonna be given one
  • 2:13 - 2:17
    cipher text. And his goal would be to
    recover the entire plain texts. But the
  • 2:17 - 2:22
    poor attacker can't do that because here
    maybe we've encrypted M1 using the One
  • 2:22 - 2:26
    Time Pad so the attacker can't
    actually recover M1 because we know the
  • 2:26 - 2:30
    One Time Pad is secure given just
    one cipher text. So this construction
  • 2:30 - 2:34
    would satisfy the definition but
    unfortunately clearly this is not a secure
  • 2:34 - 2:38
    encryption scheme because we just leaked
    half of the plain text. M0 is
  • 2:38 - 2:42
    completely available to the attacker so
    even though he can't recover all of the
  • 2:42 - 2:46
    plain text he might be able to recover
    most of the plain text, and that's clearly
  • 2:46 - 2:51
    unsecured. So of course we already know
    the solution to this and we talked about
  • 2:51 - 2:55
    Shanon definition of security perfect
    secrecy where Shannon's idea was that in
  • 2:55 - 2:59
    fact when the attacker intercepts a cipher
    text he should learn absolutely no
  • 2:59 - 3:03
    information about the plain texts. He
    shouldn't even learn one bit about the
  • 3:03 - 3:07
    plain text or even he shouldn't learn, he
    shouldn't even be able to predict a little
  • 3:07 - 3:11
    bit about a bid of the plain text.
    Absolutely no information about the plain text.
  • 3:11 - 3:15
    So let's recall very briefly
    Shannon's concept of perfect secrecy
  • 3:15 - 3:19
    basically we said that you know given a
    cipher We said the cipher has perfect
  • 3:19 - 3:25
    secrecy if given two messages of the same
    length it so happens that the distribution
  • 3:25 - 3:30
    of cipher texts. Yet if we pick a random
    key and we look into distribution of
  • 3:30 - 3:35
    cipher texts we encrypt M0 we get
    exactly the same distribution as when we
  • 3:35 - 3:39
    encrypt M1. The intuition here was that if
    the advisory observes the cipher texts
  • 3:39 - 3:44
    then he doesn't know whether it came from
    the distribution the result of encrypting
  • 3:44 - 3:48
    M0 or it came from the distribution as
    the result of encrypting M1. And as a
  • 3:48 - 3:53
    result he can't tell whether we encrypted
    M0 or M1. And that's true for all
  • 3:53 - 3:57
    messages of the same length and as a
    result a poor attacker doesn't really know
  • 3:57 - 4:01
    what message was encrypted. Of course we
    already said that this definition is too
  • 4:01 - 4:05
    strong in the sense that it requires
    really long keys if cipher has short
  • 4:05 - 4:10
    keys can't possibly satisfy this
    definition in a particular stream ciphers
  • 4:10 - 4:14
    can satisfy this definition. Okay so let's
    try to weaken the definition a little bit
  • 4:14 - 4:19
    and let's think to the previous segment,
    and we can say that instead of requiring
  • 4:19 - 4:24
    the two distributions to be absolutely
    identical what we can require is, is that
  • 4:24 - 4:29
    two distributions just be computationally
    indistinguishable. In other words a poor,
  • 4:29 - 4:33
    efficient attacker cannot distinguish the
    two distributions even though the
  • 4:33 - 4:38
    distributions might be very, very, very
    different. That just given a sample for
  • 4:38 - 4:43
    one distribution and a sample for another
    distribution the attacker can't tell which
  • 4:43 - 4:47
    distribution he was given a sample from.
    It turns out this definition is actually
  • 4:47 - 4:52
    almost right, but it's still a little too
    strong, that still cannot be satisfied, so
  • 4:52 - 4:56
    we have to add one more constraint, and
    that is that instead of saying that this
  • 4:56 - 5:01
    definition should have hold for all M0 and M1. It is to hold for only pairs M0 M1
  • 5:01 - 5:05
    that the attackers could
    actually exhibit. Okay so this actually
  • 5:05 - 5:10
    leads us to the definition of semantics
    security. And so, again this is semantics
  • 5:10 - 5:15
    security for one time key in other words
    when the attacker is only given one cipher
  • 5:15 - 5:20
    text. And so the way we define semantic
    security is by defining two experiments,
  • 5:20 - 5:25
    okay we'll define experiment 0 and
    experiment 1. And more generally we will
  • 5:25 - 5:29
    think of these as experiments parentheses
    B, where B can be zero or one. Okay so the
  • 5:29 - 5:33
    way the experiment is defined is as
    follows, we have an adversary that's
  • 5:33 - 5:37
    trying to break the system. An adversary
    A, that's kinda the analog of statistical
  • 5:37 - 5:41
    tests in the world of pseudo random
    generators. And then the challenger does
  • 5:41 - 5:45
    the following, so really we have two
    challengers, but the challengers are so
  • 5:45 - 5:49
    similar that we can just describe them as
    a single challenger that in one case takes
  • 5:49 - 5:54
    his inputs bits set to zero, and the other
    case takes his inputs bits set to
  • 5:54 - 5:57
    one. And let me show you what these
    challengers do. The first thing the
  • 5:57 - 6:01
    challenger is gonna do is it's gonna pick
    a random key and then the adversary
  • 6:01 - 6:06
    basically is going to output two messages
    M0 and M1. Okay so this is an explicit
  • 6:06 - 6:11
    pair of messages that the attacker wants
    to be challenged on and as usual we're not
  • 6:11 - 6:16
    trying to hide the length of the messages,
    we require that the messages be equal
  • 6:16 - 6:22
    length. And then the challenger basically
    will output either the encryption of M0
  • 6:22 - 6:26
    or the encryption of M1, okay so in
    experiment 0 the challenger will output
  • 6:26 - 6:30
    the encryption of M0. In experiment 1 the challenger will output the encryption
  • 6:30 - 6:34
    of M1. Okay so that the difference between
    the two experiments. And then the
  • 6:34 - 6:39
    adversary is trying to guess basically
    whether he was given the encryption of M0
  • 6:39 - 6:44
    or given the encryption of M1. Okay so
    here's a little bit of notation let's
  • 6:44 - 6:50
    define the events Wb to be the events that
    an experiment B the adversary output one.
  • 6:50 - 6:55
    Okay so that is just an event that
    basically in experiment zero W0 means that
  • 6:55 - 7:00
    the adversary output one and in experiment
    one W1 means the adversary output one. And
  • 7:00 - 7:05
    now we can define the advantage of this
    adversary, basically to say that this is
  • 7:05 - 7:10
    called the semantics security advantage of
    the adversary A against the scheme E,
  • 7:10 - 7:15
    to be the difference of the probability of
    these two events. In other words we are
  • 7:15 - 7:20
    looking at whether the adversary behaves
    differently when he was given the
  • 7:20 - 7:25
    encryption of M0 from when he was given
    the encryption of M1. And I wanna make
  • 7:25 - 7:29
    sure this is clear so I'm gonna say it one
    more time. So in experiment zero he was
  • 7:29 - 7:34
    given the encryption of M0 and in
    experiment one he was given the encryption
  • 7:34 - 7:38
    of M1. Now we're just interested in
    whether the adversary output 1 or not.
  • 7:38 - 7:42
    ... In these experiments. If in both
    experiments the adversary output 1 with
  • 7:42 - 7:47
    the same probability that means the
    adversary wasn't able to distinguish the
  • 7:47 - 7:52
    two experiments. Experiments zero
    basically looks to the adversary the same
  • 7:52 - 7:56
    as experiment one because in both cases
    upload one with the same probability.
  • 7:56 - 8:01
    However if the adversary is able to output
    1 in one experiment with significantly
  • 8:01 - 8:06
    different probability than in the other
    experiment, then the adversary was
  • 8:06 - 8:10
    actually able to distinguish the two
    experiments. Okay so... To say this more
  • 8:10 - 8:14
    formally, essentially the advantage again
    because it's the difference of two
  • 8:14 - 8:19
    probabilities the advantage is a number
    between zero and one. If the advantage is
  • 8:19 - 8:23
    close to zero that means that the
    adversary was not able to distinguish
  • 8:23 - 8:27
    experiment zero from experiment one.
    However if the advantage is close to one,
  • 8:27 - 8:32
    that means the adversary was very well
    able to distinguish experiment zero from
  • 8:32 - 8:36
    experiment one and that really means that
    he was able to distinguish an encryption
  • 8:36 - 8:40
    of M0 from an encryption of M1, okay?So that's out definition. Actually
  • 8:40 - 8:44
    that is just the definition of the
    advantage and the definition is just what
  • 8:44 - 8:48
    you would expect basically we'll say that
    a symmetric encryption scheme is
  • 8:48 - 8:52
    semantically secure if for all efficient
    adversaries here I'll put these in quotes
  • 8:52 - 8:57
    again, "For all efficient adversaries the
    advantage is negligible." In other words,
  • 8:57 - 9:02
    no efficient adversary can distinguish the
    encryption of M0 from the encryption
  • 9:02 - 9:06
    of M1 and basically this is what it
    says repeatedly that for these two
  • 9:06 - 9:11
    messages that the adversary was able to
    exhibit he wasn't able to distinguish
  • 9:11 - 9:15
    these two distributions. Now I want to
    show you this, this is actually a very
  • 9:15 - 9:20
    elegant definition. It might not seem so
    right away, but I want to show you some
  • 9:20 - 9:24
    implications of this definition and you'll
    see exactly why the definition is the way
  • 9:24 - 9:29
    it is. Okay so let's look at some
    examples. So the first example is suppose
  • 9:29 - 9:33
    we have a broken encryption scheme, in
    other words suppose we have an adversary A
  • 9:33 - 9:38
    that somehow given the cipher text he is
    always able to deduce the least
  • 9:38 - 9:44
    significant bit of the plain text. Okay so
    given the encryption of M0, this adversary
  • 9:44 - 9:49
    is able to deduce the least significant
    bit of M0. So that is a terrible
  • 9:49 - 9:53
    encryption scheme because it basically
    leaks the least significant bit of the
  • 9:53 - 9:57
    plain text just given the cipher text. So
    I want to show you that this scheme is
  • 9:57 - 10:02
    therefore semantically secure so that kind
    of shows that if a system is semantically
  • 10:02 - 10:06
    secure than there is no attacker of this
    type. Okay so let's see why is the system
  • 10:06 - 10:10
    not semantically secure well so what we
    are gonna do is we're gonna basically use
  • 10:10 - 10:14
    our adversary who is able to learn our
    most insignificant bits, we're going to
  • 10:14 - 10:18
    use him to break semantic security so
    we're going to use him to distinguish
  • 10:18 - 10:23
    experiment zero from experiment one Okay
    so here is what we are going to do. We are
  • 10:23 - 10:27
    algorithm B, we are going to be algorithm
    B and this algorithm B is going to use
  • 10:27 - 10:31
    algorithm A in its belly. Okay so the
    first thing that is going to happen is of
  • 10:31 - 10:36
    course the challenger is going to choose a
    random key. The first thing that is going
  • 10:36 - 10:40
    to happen is that we need to output two
    messages. The messages that we are going
  • 10:40 - 10:43
    to output basically are going to have
    differently significant bits. So one
  • 10:43 - 10:48
    message is going to end with zero and one
    message is going to end with one. Now what
  • 10:48 - 10:51
    is the challenger going to do? The
    challenger is going to give us the
  • 10:51 - 10:55
    encryption of either M0 or M1,
    depending on whether in experiment 0 or
  • 10:55 - 10:59
    in experiment 1. And then we just
    forward this cipher text to the adversary
  • 10:59 - 11:04
    okay so the adversary A. Now what is the
    property of adversary A? Given the cipher
  • 11:04 - 11:09
    text, adversary A can tell us what the
    least significant bits of the plain text is.
  • 11:09 - 11:13
    In other words the adversary is going
    to output the least significant bits of M0 or M1
  • 11:13 - 11:18
    but low and behold that's
    basically the bit B. And then we're just
  • 11:18 - 11:23
    going to output that as our guest so let?s
    call this thing B prime Okay so now this
  • 11:23 - 11:28
    describes the semantic security adversary.
    And now you tell me what is the semantic
  • 11:28 - 11:34
    security advantage of this adversary? Well
    so let's see. In experiment zero, what is
  • 11:34 - 11:39
    the probability that adversary B outputs
    one? Well in experiment zero it is always
  • 11:39 - 11:44
    given the encryption of M zero and
    therefore adversary A is always output the
  • 11:44 - 11:49
    least significant bit of M zero which
    always happens to be zero. In experiment
  • 11:49 - 11:53
    zero, B always outputs zero. So the
    probability that outputs one is zero.
  • 11:53 - 11:58
    However in experiment one, we're given the
    encryption of M1. So how likely is
  • 11:58 - 12:03
    adversary B to output one in experiment
    one well it always outputs one, again by
  • 12:03 - 12:07
    the properties of algorithm A and
    therefore the advantage basically is one.
  • 12:07 - 12:12
    So this is a huge advantage, it's as big
    as it's gonna get. Which means that this
  • 12:12 - 12:17
    adversary completely broke the system.
    Okay so we consider, so under semantic
  • 12:17 - 12:22
    security basically just deducing the least
    significant bits is enough to completely
  • 12:22 - 12:27
    break the system under a semantic security
    definition. Okay, now the interesting
  • 12:27 - 12:32
    thing here of course is that this is not
    just about the least significant bit, in
  • 12:32 - 12:37
    fact of the message for
    example the most significant bit, you know
  • 12:37 - 12:42
    bit number seven Maybe the XOR of all the bits in the message and so on
  • 12:42 - 12:47
    and so forth any kind of information, any
    bit about the plain text they can be
  • 12:47 - 12:51
    learned basically would mean that the
    system is not semantically secure. So
  • 12:51 - 12:56
    basically all the adversary would have to
    do would be to come up with two messages
  • 12:56 - 13:00
    M0 and M1 such that under one thing that
    they learned it's the value zero and then
  • 13:00 - 13:05
    the other thing, the value, is one. So for
    example if A was able to learn the XOR
  • 13:05 - 13:09
    bits of the message then M0
    and M1 would just have different
  • 13:09 - 13:13
    XOR for all the bits of their
    messages and then this adversary A would
  • 13:13 - 13:18
    also be sufficient to break semantic
    security. Okay so, basically for cipher
  • 13:18 - 13:23
    is semantically secure then no
    bit of information is revealed to an
  • 13:23 - 13:27
    efficient adversary. Okay so this is
    exactly a concept of perfect secrecy only
  • 13:27 - 13:31
    applied just efficient adversaries
    rather than all adversaries. So the next
  • 13:31 - 13:35
    thing I wanna show you is that in fact the
    one time pad in fact is
  • 13:35 - 13:39
    semantically secure, they better be
    semantically secure because it's in fact,
  • 13:39 - 13:43
    it's more than that it's perfectly secure.
    So let's see why that's true. Well so
  • 13:43 - 13:47
    again it's one of these experiments, so
    suppose we have an adversary that claims
  • 13:47 - 13:51
    to break semantic security of the one time
    pad. The first the adversary is gonna do
  • 13:51 - 13:56
    is he's gonna output two messages M0
    and M1 of the same length.
  • 13:56 - 14:00
    Now what does he get back as a challenge? As a
    challenge, he gets either the encryption
  • 14:00 - 14:04
    of M0 or the encryption of M1 under
    the one time pad.
  • 14:04 - 14:08
    And he's trying to distinguish between those two possible
    cipher texts that he gets, right?
  • 14:08 - 14:12
    In experiment zero, he gets the encryption of
    M0 and in experiment one, he gets the
  • 14:12 - 14:17
    encryption of M1. Well so let me ask
    you, so what is the advantage of adversary
  • 14:17 - 14:21
    A against the one time patent? So I
    remember that the property of the ones I
  • 14:21 - 14:26
    had is that, that K, XOR M0 is
    distributed identically to K, XOR M1.
  • 14:26 - 14:31
    In other words, these distributions are
    absolutely identical distribution,
  • 14:31 - 14:36
    distributions, identical. This is a
    property of XOR. If we XOR the random pad
  • 14:36 - 14:41
    K with anything, either M0 or M1,
    we get uniform distribution. So in both
  • 14:41 - 14:45
    cases, algorithm A is given as in input
    exactly the same distribution. Maybe the
  • 14:45 - 14:50
    uniform distribution on cipher texts. And
    therefore it's gonna behave exactly the
  • 14:50 - 14:55
    same in both cases because it was given
    the exact distribution as input. And as a
  • 14:55 - 15:00
    result, its advantage is zero, which means
    that the onetime pad is semantically
  • 15:00 - 15:04
    secure. Now the interesting thing here is
    not only is it semantically secure, it's
  • 15:04 - 15:08
    semantically secure for all adversaries.
    We don't even have to restrict the
  • 15:08 - 15:12
    adversaries to be efficient. No adversary,
    doesn't matter how smart you are, no
  • 15:12 - 15:17
    adversary will be able to distinguish K XOR M0 from K XOR M1 because the two
  • 15:17 - 15:21
    distributions are identical. Okay, so the
    one time pad is semantically secure. Okay,
  • 15:21 - 15:26
    so that completes our definition of
    semantic security so the next thing we're
  • 15:26 - 15:30
    going to do is prove to the secure PRG,
    in fact implies that the stream cipher is
  • 15:30 - 15:31
    semantically secure.
Title:
Semantic Security (16 min)
Video Language:
English

English subtitles

Revisions