< Return to Video

Lecture 9: Security and Cryptography (2020)

  • 0:01 - 0:05
    okay so again let's get started with
  • 0:03 - 0:07
    today's lecture so today we're going to
  • 0:05 - 0:09
    be talking about security and
  • 0:07 - 0:10
    cryptography and today's lecture is
  • 0:09 - 0:13
    gonna be a little bit different than our
  • 0:10 - 0:15
    treatment of this topic in last year's
  • 0:13 - 0:17
    class so last year we focused a little
  • 0:15 - 0:19
    bit more on security and privacy and
  • 0:17 - 0:22
    from the perspective of a user of a
  • 0:19 - 0:23
    computer but today we're going to focus
  • 0:22 - 0:25
    a little bit more on security and
  • 0:23 - 0:27
    cryptography concepts that are relevant
  • 0:25 - 0:29
    in understanding some of the tools that
  • 0:27 - 0:31
    we talked about earlier in this class so
  • 0:29 - 0:32
    for example we talked about hash
  • 0:31 - 0:35
    functions or cryptographic hash
  • 0:32 - 0:38
    functions like sha-1 in the get lecture
  • 0:35 - 0:39
    or we talked about public keys when we
  • 0:38 - 0:41
    talked about SSH in the command line
  • 0:39 - 0:43
    environment in a command line
  • 0:41 - 0:45
    environment lecture and so today we'll
  • 0:43 - 0:46
    talk about there's different
  • 0:45 - 0:48
    cryptographic primitives in more detail
  • 0:46 - 0:49
    and get an understanding of how they
  • 0:48 - 0:51
    work and how they're used in these
  • 0:49 - 0:53
    different tools that we're teaching in
  • 0:51 - 0:56
    this class this lecture is not a
  • 0:53 - 0:58
    substitute for a more rigorous class in
  • 0:56 - 1:00
    security so they're they're a bunch of
  • 0:58 - 1:01
    really good classes at MIT like six
  • 1:00 - 1:04
    eight five eight which is on computer
  • 1:01 - 1:06
    system security or six eight five seven
  • 1:04 - 1:09
    and six eight seven five which are more
  • 1:06 - 1:11
    focused on cryptography so don't do
  • 1:09 - 1:13
    security work without formal training in
  • 1:11 - 1:16
    security from these classes or elsewhere
  • 1:13 - 1:18
    and unless you're an expert don't roll
  • 1:16 - 1:20
    your own crypto don't build your own
  • 1:18 - 1:22
    crypto implementations or protocols and
  • 1:20 - 1:24
    the same principle applies to computer
  • 1:22 - 1:26
    system security this lecture is not
  • 1:24 - 1:28
    about building your own stuff it's about
  • 1:26 - 1:29
    understanding what's already out there
  • 1:28 - 1:31
    and so this lecture will have a very
  • 1:29 - 1:33
    informal but we think practical
  • 1:31 - 1:35
    treatment of these basic cryptography
  • 1:33 - 1:37
    concepts and yeah hopefully it'll help
  • 1:35 - 1:40
    you understand some of the tools we
  • 1:37 - 1:41
    talked about earlier in this class any
  • 1:40 - 1:45
    questions about the plan for today's
  • 1:41 - 1:47
    lecture great so the first topic for
  • 1:45 - 1:49
    today is something called entropy
  • 1:47 - 1:51
    entropy is a measure of randomness and
  • 1:49 - 1:52
    this is useful for example when trying
  • 1:51 - 1:55
    to determine the strength of a password
  • 1:52 - 1:58
    so let's take a look at this comic from
  • 1:55 - 2:00
    xkcd we're a big fan of xkcd comics so
  • 1:58 - 2:02
    this comic raise your hand if you've
  • 2:00 - 2:05
    seen this before know a good number of
  • 2:02 - 2:08
    you so this comic is complaining about
  • 2:05 - 2:09
    this common pattern that's been taught
  • 2:08 - 2:12
    to users of computers that when you
  • 2:09 - 2:13
    design passwords they should be things
  • 2:12 - 2:17
    like that T
  • 2:13 - 2:19
    our zero ub40 orm purse and three string
  • 2:17 - 2:21
    in the top-left like we should design
  • 2:19 - 2:22
    passwords that are full of funny
  • 2:21 - 2:25
    characters and things like that to make
  • 2:22 - 2:27
    it hard for attackers to guess and yet
  • 2:25 - 2:29
    turns out that passwords like that are
  • 2:27 - 2:30
    actually pretty weak and guessable by
  • 2:29 - 2:33
    computers that can guess postures really
  • 2:30 - 2:35
    fast and brute-force attacks and on the
  • 2:33 - 2:38
    other hand passwords which maybe
  • 2:35 - 2:39
    intuitively don't look as secure like
  • 2:38 - 2:42
    the one on the bottom left correct horse
  • 2:39 - 2:45
    battery staple that one turns out to be
  • 2:42 - 2:48
    way more secure so how do I actually
  • 2:45 - 2:50
    quantify the security of these different
  • 2:48 - 2:52
    passwords it's by measuring the amount
  • 2:50 - 2:55
    of randomness in the password how many
  • 2:52 - 2:58
    bits of randomness are in there and so
  • 2:55 - 2:59
    entropy is measured in bits this is like
  • 2:58 - 3:07
    the same bits from information theory
  • 2:59 - 3:09
    and we're only going to talk about the
  • 3:07 - 3:11
    simple case where you're trying to
  • 3:09 - 3:13
    measure the amount of randomness when
  • 3:11 - 3:16
    you're choosing from a set of things
  • 3:13 - 3:17
    uniformly at random so for example when
  • 3:16 - 3:20
    you're constructing a password that's in
  • 3:17 - 3:22
    the format of four random words you're
  • 3:20 - 3:24
    kind of considering all possible
  • 3:22 - 3:26
    sequences of four random words made from
  • 3:24 - 3:27
    some dictionary you have you might have
  • 3:26 - 3:28
    a dictionary would say a hundred
  • 3:27 - 3:31
    thousand words and you're selecting each
  • 3:28 - 3:32
    word uniformly at random how many
  • 3:31 - 3:34
    possibilities are there
  • 3:32 - 3:36
    well you can go and figure that out in
  • 3:34 - 3:38
    that example once you know how many
  • 3:36 - 3:41
    possibilities there the measure of
  • 3:38 - 3:49
    entropy is log base 2 of the number of
  • 3:41 - 3:51
    possibilities and as that comic suggests
  • 3:49 - 3:53
    this is related to how long it'll take
  • 3:51 - 3:55
    an attacker to try to brute-force
  • 3:53 - 3:57
    through your different passwords like if
  • 3:55 - 3:58
    you have a thousand possibilities you're
  • 3:57 - 4:00
    guessing passwords at a thousand
  • 3:58 - 4:04
    passwords a second that's not a very
  • 4:00 - 4:08
    good password so this is a couple of
  • 4:04 - 4:09
    quick examples a coin flip has two
  • 4:08 - 4:15
    possibilities and let's assume we have a
  • 4:09 - 4:20
    fair coin and so a coin flip has log
  • 4:15 - 4:21
    base 2 of 2 is one bit of entropy and
  • 4:20 - 4:25
    another thing we might look at is
  • 4:21 - 4:27
    something like a dice roll so there's
  • 4:25 - 4:33
    six possibilities and log
  • 4:27 - 4:34
    two of six is something like 2.6 bits of
  • 4:33 - 4:36
    entropy
  • 4:34 - 4:40
    so that's how we quantify the amount of
  • 4:36 - 4:42
    randomness in something so now going
  • 4:40 - 4:43
    back to that example in the xkcd comic
  • 4:42 - 4:45
    when we want to figure out how much
  • 4:43 - 4:47
    entropy is in a password we have to
  • 4:45 - 4:49
    consider and if the model for how the
  • 4:47 - 4:51
    password was generated for example in
  • 4:49 - 4:53
    the top left you could consider okay we
  • 4:51 - 4:55
    take one dictionary word make some
  • 4:53 - 4:58
    substitutions of some of the characters
  • 4:55 - 5:00
    with numbers that look similar to that
  • 4:58 - 5:01
    character add one punctuation mark at
  • 5:00 - 5:04
    the end and add one numeral after that
  • 5:01 - 5:06
    and we can take that model and then use
  • 5:04 - 5:07
    common rhetoric to figure out how many
  • 5:06 - 5:09
    possibilities there are and from that we
  • 5:07 - 5:11
    can derive how many bits of entropy are
  • 5:09 - 5:13
    in that password so in that particular
  • 5:11 - 5:14
    example I don't know exactly what model
  • 5:13 - 5:17
    they were using for the password but
  • 5:14 - 5:19
    they calculated their 28 bits of entropy
  • 5:17 - 5:22
    whereas in the bottom-left example that
  • 5:19 - 5:24
    correct horse battery staple they assume
  • 5:22 - 5:27
    that you're working from a dictionary of
  • 5:24 - 5:29
    about 2,000 words and so when you
  • 5:27 - 5:31
    combine four of those words together you
  • 5:29 - 5:32
    get about 44 bits of entropy from that
  • 5:31 - 5:35
    so it's much more secure than the
  • 5:32 - 5:37
    example before it so any questions about
  • 5:35 - 5:44
    this definition of entropy or why it's
  • 5:37 - 5:46
    useful and so when you're generating
  • 5:44 - 5:47
    your own passwords keep this in mind you
  • 5:46 - 5:49
    want a high entropy password and the
  • 5:47 - 5:50
    exact number you need depends on exactly
  • 5:49 - 5:52
    what you're trying to protect against
  • 5:50 - 5:54
    like in general a concept in securities
  • 5:52 - 5:56
    you have to keep in mind what your
  • 5:54 - 5:57
    threat model is like what attackers
  • 5:56 - 5:58
    you're concerned about what kinds of
  • 5:57 - 6:01
    technique the attackers might be using
  • 5:58 - 6:03
    for example this comic refers to an
  • 6:01 - 6:05
    attacker that can guess a thousand
  • 6:03 - 6:07
    passwords a second this might be
  • 6:05 - 6:09
    something that's possible for say a web
  • 6:07 - 6:11
    service that allows people to try to log
  • 6:09 - 6:13
    in with your email and then random
  • 6:11 - 6:15
    passwords that the attacker is trying
  • 6:13 - 6:18
    but this thousand passwords the second
  • 6:15 - 6:20
    model might not be accurate for other
  • 6:18 - 6:22
    scenarios for example an offline
  • 6:20 - 6:24
    password cracking scenario or maybe the
  • 6:22 - 6:26
    attacker has broken into a website and
  • 6:24 - 6:28
    downloaded their database and they have
  • 6:26 - 6:29
    some obfuscated form of your password
  • 6:28 - 6:31
    and they're trying to figure out what
  • 6:29 - 6:32
    the password is maybe they can paralyze
  • 6:31 - 6:34
    this attack and make it go to million
  • 6:32 - 6:36
    guesses a second and so exactly how much
  • 6:34 - 6:37
    entropy you need depends on exactly what
  • 6:36 - 6:39
    you're trying to protect against but
  • 6:37 - 6:40
    roughly forty bits of entropy might be
  • 6:39 - 6:43
    good enough for
  • 6:40 - 6:44
    which is protected by a website and
  • 6:43 - 6:47
    you're concerned about online password
  • 6:44 - 6:49
    guesses and then maybe something like 80
  • 6:47 - 6:51
    bits of entropy might be good if you're
  • 6:49 - 6:53
    concerned about offline attacks and you
  • 6:51 - 6:58
    want to be really really secure so
  • 6:53 - 7:00
    they're rough guidelines you can use and
  • 6:58 - 7:02
    then how do you actually generate strong
  • 7:00 - 7:04
    passwords well you have some model for
  • 7:02 - 7:05
    password for example the for dictionary
  • 7:04 - 7:07
    works thing and you can actually get a
  • 7:05 - 7:09
    dictionary and then you can use methods
  • 7:07 - 7:11
    like dice where so there's a song we
  • 7:09 - 7:13
    linked to in the lecture notes where you
  • 7:11 - 7:14
    can actually get physical dye and roll
  • 7:13 - 7:16
    them and then map dice rolls to
  • 7:14 - 7:18
    dictionary words in order to eventually
  • 7:16 - 7:19
    turn that into a password and doing
  • 7:18 - 7:22
    something like this using some kind of
  • 7:19 - 7:24
    physical token that you know is random
  • 7:22 - 7:26
    like a balanced die or a coin that you
  • 7:24 - 7:28
    know is balanced is a good thing to do
  • 7:26 - 7:30
    because humans are actually not good at
  • 7:28 - 7:31
    choosing random numbers right if I just
  • 7:30 - 7:33
    asked you to name a random number for 1
  • 7:31 - 7:35
    to 100 chances are that you're probably
  • 7:33 - 7:36
    not doing so uniformly at random very
  • 7:35 - 7:37
    well and so that's why it's actually
  • 7:36 - 7:42
    good to use these physical tokens in
  • 7:37 - 7:45
    order to produce randomness so entropy
  • 7:42 - 7:49
    that's our first concept recovering any
  • 7:45 - 7:52
    questions about that or about this comic
  • 7:49 - 7:55
    great so getting into slightly more
  • 7:52 - 7:56
    interesting and complicated topics the
  • 7:55 - 7:59
    next thing we're going to talk about is
  • 7:56 - 8:01
    hash functions so hopefully most of you
  • 7:59 - 8:02
    were here during the get lecture where
  • 8:01 - 8:05
    we talked about the sha-1 hash function
  • 8:02 - 8:12
    used in get so now going into that topic
  • 8:05 - 8:14
    in a little bit more detail hash
  • 8:12 - 8:17
    functions at a high level are functions
  • 8:14 - 8:20
    that map a variable amount of data into
  • 8:17 - 8:23
    a fixed size output so for example the
  • 8:20 - 8:25
    sha-1 hash functions is one example of a
  • 8:23 - 8:30
    hash function takes in some input of
  • 8:25 - 8:35
    some number of bytes and outputs exactly
  • 8:30 - 8:37
    160 bits of output so that's kind of the
  • 8:35 - 8:40
    type signature of this particular hash
  • 8:37 - 8:41
    function and then these functions have
  • 8:40 - 8:43
    some number of properties that are
  • 8:41 - 8:47
    useful so at a high level
  • 8:43 - 8:48
    these can be thought about as hard to
  • 8:47 - 8:50
    invert functions that have
  • 8:48 - 8:53
    random-looking outputs we can actually
  • 8:50 - 8:54
    try this out on some random piece of
  • 8:53 - 8:57
    data
  • 8:54 - 9:00
    for example if I enter into my terminal
  • 8:57 - 9:02
    printf hello this does exactly what you
  • 9:00 - 9:04
    would expect it does prints the set to
  • 9:02 - 9:07
    standard out and I can pipe this to the
  • 9:04 - 9:09
    sha-1 sum command so this is a command
  • 9:07 - 9:12
    line program that accepts input via
  • 9:09 - 9:13
    standard in and computes this sha-1
  • 9:12 - 9:14
    function which takes in some variable
  • 9:13 - 9:17
    number of bytes from the input and
  • 9:14 - 9:20
    produces a 160-bit output which in this
  • 9:17 - 9:22
    particular case is represent or encoded
  • 9:20 - 9:24
    as a hexadecimal string so it's a length
  • 9:22 - 9:26
    40 hexadecimal string and you see this
  • 9:24 - 9:28
    output right here this - just means it
  • 9:26 - 9:31
    took it it took its input from
  • 9:28 - 9:32
    standardin so this output just looks
  • 9:31 - 9:34
    like some random number but one
  • 9:32 - 9:37
    important thing is that this is a
  • 9:34 - 9:39
    deterministic number if you try the same
  • 9:37 - 9:41
    command on your own computer printf
  • 9:39 - 9:43
    hello sha-1 something you will get the
  • 9:41 - 9:44
    same number out so sha-1 is some
  • 9:43 - 9:47
    well-known function that people have
  • 9:44 - 9:49
    agreed upon for all its parameters we'll
  • 9:47 - 9:52
    see that if we tweak the input a little
  • 9:49 - 9:54
    bit like say changed hello to holo with
  • 9:52 - 9:56
    a capital H now I get a completely
  • 9:54 - 9:58
    different looking output and this also
  • 9:56 - 9:59
    looks like some other kind of random ish
  • 9:58 - 10:00
    number even though it is deterministic
  • 9:59 - 10:08
    and you could reproduce this on your own
  • 10:00 - 10:16
    computer hash functions have a number of
  • 10:08 - 10:18
    properties that are pretty important the
  • 10:16 - 10:19
    first property that cryptographic hash
  • 10:18 - 10:21
    functions have is that their
  • 10:19 - 10:22
    non-invertible and what that means is
  • 10:21 - 10:25
    that if you take the output from this
  • 10:22 - 10:28
    function for example that a a f4
  • 10:25 - 10:30
    ballaugh 3 for D strings shown there
  • 10:28 - 10:33
    from that output it's hard to figure out
  • 10:30 - 10:37
    what the input was that produced that
  • 10:33 - 10:38
    output so you can go one way compute the
  • 10:37 - 10:41
    sha-1 hash easily but you can't go
  • 10:38 - 10:43
    backwards another property that these
  • 10:41 - 10:51
    functions have is that their collision
  • 10:43 - 10:53
    resistant and what this property means
  • 10:51 - 10:57
    is that it's hard to find two different
  • 10:53 - 11:00
    inputs that produce the same output and
  • 10:57 - 11:02
    so this basically describes what a
  • 11:00 - 11:04
    cryptographic hash function is so any
  • 11:02 - 11:07
    questions about the kind of
  • 11:04 - 11:09
    specification of a cryptographic hash
  • 11:07 - 11:09
    function
  • 11:09 - 11:13
    okay so what are these hash functions
  • 11:12 - 11:16
    actually useful for well we've already
  • 11:13 - 11:19
    seen one application in git for content
  • 11:16 - 11:22
    address storage so we want it get we
  • 11:19 - 11:24
    want some uniform way of naming
  • 11:22 - 11:26
    different objects that are in the object
  • 11:24 - 11:28
    store and it turns out that get
  • 11:26 - 11:30
    addresses all of them by their sha-1
  • 11:28 - 11:33
    hash so you have the actual data you
  • 11:30 - 11:35
    want to store and then to name that
  • 11:33 - 11:36
    particular piece of data you just name
  • 11:35 - 11:38
    the sha-1 hash and all of that is stored
  • 11:36 - 11:42
    in the object store in that particular
  • 11:38 - 11:43
    way we see this when looking at many
  • 11:42 - 11:45
    different parts of git for example right
  • 11:43 - 11:48
    here I'm going to get repository if I do
  • 11:45 - 11:51
    get log it shows me the commits and for
  • 11:48 - 11:53
    example this number up here is the
  • 11:51 - 11:55
    cryptographic hash function sha-1 apply
  • 11:53 - 11:58
    to the commit object that describes this
  • 11:55 - 12:00
    particular commit so does anybody know
  • 11:58 - 12:02
    why git uses a cryptographic hash
  • 12:00 - 12:03
    function here as opposed to so you might
  • 12:02 - 12:05
    have heard in your other computer
  • 12:03 - 12:07
    science classes like say your
  • 12:05 - 12:09
    introductory algorithms class there are
  • 12:07 - 12:11
    things called hash functions without the
  • 12:09 - 12:14
    word cryptographic appended in front of
  • 12:11 - 12:16
    them and they have similar properties
  • 12:14 - 12:19
    that they turn a variable sized input
  • 12:16 - 12:21
    into some fixed size output but they
  • 12:19 - 12:23
    don't quite have these properties where
  • 12:21 - 12:25
    it's hard to find an input that produces
  • 12:23 - 12:27
    a particular output or things like that
  • 12:25 - 12:29
    it's a kind of weaker definition than
  • 12:27 - 12:30
    this so why is it that in get we care
  • 12:29 - 12:32
    about having a cryptographic hash
  • 12:30 - 12:34
    function as opposed to just a regular
  • 12:32 - 12:36
    old hash function does anybody have any
  • 12:34 - 12:36
    ideas
  • 12:45 - 12:52
    yeah that's that's basically it that we
  • 12:49 - 12:54
    don't want to have kind of conflicts in
  • 12:52 - 12:56
    the output from this hash function like
  • 12:54 - 12:58
    every commit is identified by a hash
  • 12:56 - 13:00
    function every file is identified by the
  • 12:58 - 13:01
    hash of that file if it were ever the
  • 13:00 - 13:03
    case that two different pieces of
  • 13:01 - 13:06
    content in practice produce the same
  • 13:03 - 13:08
    output that is if the function were not
  • 13:06 - 13:10
    collision resistant that could be really
  • 13:08 - 13:12
    problematic right because then you and I
  • 13:10 - 13:14
    we could have to do to get repos that we
  • 13:12 - 13:16
    think are the same we check out the same
  • 13:14 - 13:18
    commit hash and we might end up with
  • 13:16 - 13:21
    different files and this is concerning
  • 13:18 - 13:23
    because git is used to track software a
  • 13:21 - 13:26
    track development of software and it's
  • 13:23 - 13:28
    also kind of involved in making sure
  • 13:26 - 13:29
    that the right people are authoring the
  • 13:28 - 13:31
    software nothing funny has happened in
  • 13:29 - 13:33
    the process for example there all these
  • 13:31 - 13:35
    open source projects like the Linux
  • 13:33 - 13:37
    kernel where development is done using
  • 13:35 - 13:40
    git it would be really bad if some
  • 13:37 - 13:41
    contributor to get could say edit some
  • 13:40 - 13:43
    file and propose some change that looks
  • 13:41 - 13:45
    pretty benign like oh let me go and
  • 13:43 - 13:47
    improve this part of Linux submit that
  • 13:45 - 13:49
    change request to the Linux developers
  • 13:47 - 13:52
    and then in practice actually supply a
  • 13:49 - 13:54
    git repository that has the same commit
  • 13:52 - 13:56
    hash and whatnot but actually the file
  • 13:54 - 13:59
    contents are different there's something
  • 13:56 - 14:01
    malicious so git actually relies on this
  • 13:59 - 14:03
    sha-1 function being a cryptographic
  • 14:01 - 14:10
    hash function in order to achieve
  • 14:03 - 14:12
    security any questions about that and
  • 14:10 - 14:14
    some other interesting applications of
  • 14:12 - 14:16
    hash functions so as we saw hash
  • 14:14 - 14:18
    functions turn big inputs into small
  • 14:16 - 14:20
    outputs and in a way because the hash
  • 14:18 - 14:22
    function is collision resistant the
  • 14:20 - 14:24
    output can be used to kind of attest to
  • 14:22 - 14:27
    or identify the input and so you can
  • 14:24 - 14:30
    think of a hash as a short summary of a
  • 14:27 - 14:32
    file for example in this directory of a
  • 14:30 - 14:35
    bunch of files and I can compute the
  • 14:32 - 14:38
    sha-1 sum of some file in this directory
  • 14:35 - 14:41
    and this is the sha-1 algorithm applied
  • 14:38 - 14:42
    to this readme MD file and what's
  • 14:41 - 14:44
    interesting is that it is
  • 14:42 - 14:45
    computationally hard or like impossible
  • 14:44 - 14:48
    you can kind of think of it as
  • 14:45 - 14:50
    impossible to find any other file so a
  • 14:48 - 14:53
    different file that has the same hash
  • 14:50 - 14:55
    output and one scenario in which this is
  • 14:53 - 14:56
    useful is when you download files from
  • 14:55 - 14:59
    the internet
  • 14:56 - 15:01
    for example there are lots of Linux
  • 14:59 - 15:04
    distributions that distribute large CD
  • 15:01 - 15:05
    or DVD images from their website like I
  • 15:04 - 15:08
    can go to Debian org and download the
  • 15:05 - 15:10
    latest version of Debian the thing is
  • 15:08 - 15:11
    that hosting those files can be
  • 15:10 - 15:12
    expensive and so a lot of people are
  • 15:11 - 15:14
    nice enough to host mirrors of these
  • 15:12 - 15:18
    files so instead of downloading Debian
  • 15:14 - 15:20
    from Debian org I can go to one of many
  • 15:18 - 15:22
    other sites and download what are
  • 15:20 - 15:24
    supposed to be the same files that are
  • 15:22 - 15:25
    hosted at Debian org but how do I know
  • 15:24 - 15:29
    that I actually got the correct file
  • 15:25 - 15:31
    like what if I set up a malicious mirror
  • 15:29 - 15:33
    and you go to like Anisha is evil Debian
  • 15:31 - 15:35
    website calm and then try to download
  • 15:33 - 15:37
    Debian turns out that your Linux
  • 15:35 - 15:39
    installation is backdoored well one
  • 15:37 - 15:41
    thing you could do is download a copy
  • 15:39 - 15:42
    from the original double-unit website
  • 15:41 - 15:43
    and then download my version and compare
  • 15:42 - 15:45
    them but that kind of defeats the
  • 15:43 - 15:46
    purpose right because we want to avoid
  • 15:45 - 15:48
    downloading things from Debian org
  • 15:46 - 15:49
    because hosting these files is expensive
  • 15:48 - 15:50
    and we want all these different people
  • 15:49 - 15:54
    to be able to mirror copies of the files
  • 15:50 - 15:56
    elsewhere so does anybody see how
  • 15:54 - 15:58
    cryptographic hash functions could be
  • 15:56 - 15:59
    useful to solve this problem that I want
  • 15:58 - 16:03
    to download a file from an untrusted
  • 15:59 - 16:05
    source but and not from like the trusted
  • 16:03 - 16:06
    source itself but maybe I can get some
  • 16:05 - 16:08
    small piece of information from this
  • 16:06 - 16:10
    trusted source in order to know whether
  • 16:08 - 16:11
    the file I downloaded from the untrusted
  • 16:10 - 16:18
    source is the thing I was supposed to
  • 16:11 - 16:19
    get yes like it's basically just a
  • 16:18 - 16:21
    straightforward application of
  • 16:19 - 16:23
    cryptographic hash functions
  • 16:21 - 16:26
    so what Debian org can do is they can
  • 16:23 - 16:28
    produce their kind of correct ISO file
  • 16:26 - 16:30
    or whatever they want and instead of
  • 16:28 - 16:33
    publishing the file itself on their
  • 16:30 - 16:35
    website they can publish a hash of that
  • 16:33 - 16:37
    file so compared to the file itself
  • 16:35 - 16:39
    which may be many gigabytes this is only
  • 16:37 - 16:41
    like in this particular case 160 bits of
  • 16:39 - 16:44
    data right so very cheap to host and
  • 16:41 - 16:46
    then what I can do is a user is I can
  • 16:44 - 16:48
    download that file from any random
  • 16:46 - 16:50
    website it could be an untrusted website
  • 16:48 - 16:54
    and after I download I just double check
  • 16:50 - 16:56
    the sha-1 hash and if the hash matches
  • 16:54 - 16:57
    then I know that I have the right file
  • 16:56 - 17:00
    because it's computationally infeasible
  • 16:57 - 17:02
    for somebody to give me some different
  • 17:00 - 17:04
    file that happens to have the same hash
  • 17:02 - 17:07
    because hash functions are collision
  • 17:04 - 17:10
    resistant so any questions about that
  • 17:07 - 17:10
    application yeah
  • 17:18 - 17:22
    yeah so that's a good question like why
  • 17:20 - 17:24
    do you need different people to host the
  • 17:22 - 17:26
    information like wouldn't it be equally
  • 17:24 - 17:27
    expensive for everybody so the answer is
  • 17:26 - 17:29
    that question is a little bit
  • 17:27 - 17:31
    complicated but like here's that here's
  • 17:29 - 17:33
    a partial answer one thing is that
  • 17:31 - 17:34
    downloading files from a server is
  • 17:33 - 17:36
    affected by how far away the server is
  • 17:34 - 17:39
    from you so for example if the servers
  • 17:36 - 17:41
    in Massachusetts and you're in say China
  • 17:39 - 17:42
    like you have to kind of make a big
  • 17:41 - 17:44
    round trip across the internet and that
  • 17:42 - 17:46
    may be expensive for a number of reasons
  • 17:44 - 17:48
    like the latency is high and the traffic
  • 17:46 - 17:49
    traffic needs to go through kind of lots
  • 17:48 - 17:51
    of different wires to make its way all
  • 17:49 - 17:53
    the way to where you are and so one
  • 17:51 - 17:54
    thing that these websites do is that
  • 17:53 - 17:56
    they distribute their content to servers
  • 17:54 - 17:57
    that are all over the world and then as
  • 17:56 - 17:58
    a user you download from the server
  • 17:57 - 18:01
    that's closest to you like for example
  • 17:58 - 18:02
    mit maintains a Debian package
  • 18:01 - 18:05
    repository and like kind of mirrors all
  • 18:02 - 18:07
    the Debbie and stuff so if you're a
  • 18:05 - 18:10
    Debian user at MIT you can use the MIT
  • 18:07 - 18:12
    copy of everything and then you can kind
  • 18:10 - 18:14
    of access it over our fast local network
  • 18:12 - 18:16
    and that traffic never needs to go to
  • 18:14 - 18:19
    the outside Internet at all so it's very
  • 18:16 - 18:23
    fast that's a good question any other
  • 18:19 - 18:24
    questions ok and then one final kind of
  • 18:23 - 18:26
    interesting application of hash
  • 18:24 - 18:29
    functions is something called a
  • 18:26 - 18:30
    commitment scheme so I want to play a
  • 18:29 - 18:32
    game and I need a volunteer for this so
  • 18:30 - 18:33
    you don't actually need to get up from
  • 18:32 - 18:35
    your seat or anything I was need you to
  • 18:33 - 18:36
    talk with me so any volunteers raise
  • 18:35 - 18:38
    your hand yeah okay yeah what's your
  • 18:36 - 18:41
    name
  • 18:38 - 18:42
    Abdul Aziz okay great so Abdul Aziz
  • 18:41 - 18:44
    we're going to play a game we're going
  • 18:42 - 18:46
    to play a game where I'm going to flip a
  • 18:44 - 18:48
    coin and then you're gonna call heads or
  • 18:46 - 18:50
    tails and if you call it right you win
  • 18:48 - 18:52
    and if you call it wrong you lose and
  • 18:50 - 18:56
    there are no stakes for this game but
  • 18:52 - 18:58
    just the pride of winning so sadly I
  • 18:56 - 18:59
    checked my wallet and all I have is
  • 18:58 - 19:01
    dollar bills I don't have any coins with
  • 18:59 - 19:02
    me so instead I'm going to just flip the
  • 19:01 - 19:05
    coin in my head all right
  • 19:02 - 19:08
    so okay I flip the coin call heads or
  • 19:05 - 19:14
    tails sorry you lost it was heads I
  • 19:08 - 19:15
    don't I play again yeah I can cheat
  • 19:14 - 19:17
    right I can just see what you say and
  • 19:15 - 19:20
    say the opposite thing so let's try
  • 19:17 - 19:22
    fixing this game how about you call
  • 19:20 - 19:26
    heads or tails after I say what the
  • 19:22 - 19:28
    flip result was okay yeah so if I say oh
  • 19:26 - 19:35
    the result is tails what are you gonna
  • 19:28 - 19:39
    say are you call tails yeah so is it
  • 19:35 - 19:41
    possible to play this guess what guess
  • 19:39 - 19:43
    what the coin flip result is game in a
  • 19:41 - 19:45
    fair way without having a physical coin
  • 19:43 - 19:46
    that we share like because I can't
  • 19:45 - 19:47
    really manipulate your physical reality
  • 19:46 - 19:49
    if I flip a coin in front of you
  • 19:47 - 19:51
    probably trust that it's okay right
  • 19:49 - 19:52
    so it turns out that hash functions give
  • 19:51 - 19:55
    us a kind of cool way to solve this
  • 19:52 - 19:58
    problem to through a idea called a
  • 19:55 - 19:59
    commitment scheme so I can say they're
  • 19:58 - 20:03
    like here's the construction of the
  • 19:59 - 20:05
    solution I can pick heads or tails and
  • 20:03 - 20:09
    I'm actually going to pick a big random
  • 20:05 - 20:14
    number say like this number here and
  • 20:09 - 20:17
    what I can do is compute the sha-1 sum
  • 20:14 - 20:18
    of this number at this moment you
  • 20:17 - 20:20
    haven't seen this number yet I'm just
  • 20:18 - 20:23
    doing all this in my head and then what
  • 20:20 - 20:25
    I do is I tell you okay I flipped a coin
  • 20:23 - 20:26
    and I'm not going to tell you what the
  • 20:25 - 20:29
    result is just yet because you haven't
  • 20:26 - 20:30
    called heads or tails but I'll tell you
  • 20:29 - 20:32
    what the shell wants some of the result
  • 20:30 - 20:34
    is here you go and I tell you this value
  • 20:32 - 20:36
    now after this you can call heads or
  • 20:34 - 20:38
    tails so what do you say like say say
  • 20:36 - 20:40
    heads afterwards what I can do is I can
  • 20:38 - 20:42
    reveal to you what my input to this
  • 20:40 - 20:44
    function was and then you can
  • 20:42 - 20:45
    cross-check this right you can compute
  • 20:44 - 20:47
    the sha-1 sum on the input to verify
  • 20:45 - 20:49
    that the output is what I said it was
  • 20:47 - 20:51
    earlier and then we can have some way of
  • 20:49 - 20:52
    mapping these numbers to heads or tails
  • 20:51 - 20:54
    so I might have agreed upon beforehand
  • 20:52 - 20:57
    that even numbers are heads and odd
  • 20:54 - 20:58
    numbers or tails and so this is a way of
  • 20:57 - 21:01
    fixing that game so we can actually play
  • 20:58 - 21:04
    this game in in our heads right I can
  • 21:01 - 21:06
    pick a value but not reveal that value
  • 21:04 - 21:07
    to you but I can commit to the value so
  • 21:06 - 21:09
    this is a kind of binding commitment
  • 21:07 - 21:11
    scheme that I can't change my mind after
  • 21:09 - 21:14
    I've told you this but it doesn't reveal
  • 21:11 - 21:15
    the original value to you and so this is
  • 21:14 - 21:18
    one other neat application of
  • 21:15 - 21:18
    cryptographic hash functions so any
  • 21:18 - 21:24
    questions about this particular
  • 21:18 - 21:27
    construction okay great so moving on to
  • 21:24 - 21:30
    the next topic we're going to talk about
  • 21:27 - 21:35
    key derivation functions
  • 21:30 - 21:35
    [Applause]
  • 21:39 - 21:47
    often abbreviate it as KDF so this is a
  • 21:45 - 21:50
    concept that's very similar to hash
  • 21:47 - 21:52
    functions except it has kind of one
  • 21:50 - 21:55
    extra property that it is slow to
  • 21:52 - 21:56
    compute for example there's a hash
  • 21:55 - 22:08
    function of key derivation function
  • 21:56 - 22:12
    known as P pbkdf2 pbkdf2 password-based
  • 22:08 - 22:14
    key derivation function that has a kind
  • 22:12 - 22:15
    of similar form as these hash functions
  • 22:14 - 22:17
    we were talking about here that they
  • 22:15 - 22:18
    take in some variable length input in
  • 22:17 - 22:19
    pretty so fixed length output but
  • 22:18 - 22:20
    they're meant to be used for one
  • 22:19 - 22:22
    particular purpose
  • 22:20 - 22:25
    the purpose is generally to use the
  • 22:22 - 22:27
    fixed length output as a key in another
  • 22:25 - 22:28
    cryptographic algorithm and we'll talk
  • 22:27 - 22:31
    about those algorithms like what use the
  • 22:28 - 22:33
    output of this thing for in a moment but
  • 22:31 - 22:36
    a one property of these things is that
  • 22:33 - 22:39
    they're slow does anybody have any idea
  • 22:36 - 22:41
    why you'd want an algorithm to be slow
  • 22:39 - 22:43
    like normally we want algorithms to be
  • 22:41 - 22:46
    fast right so why would we want an
  • 22:43 - 22:46
    algorithm to be slow yes
  • 22:54 - 23:00
    yeah that's exactly it so I'll repeat so
  • 22:58 - 23:02
    it goes into the microphone the reason
  • 23:00 - 23:04
    you want these to be slow is when you're
  • 23:02 - 23:06
    actually using it for something like
  • 23:04 - 23:08
    password authentication where you have
  • 23:06 - 23:09
    the hash of a password saved and then
  • 23:08 - 23:10
    somebody inputs the password you want to
  • 23:09 - 23:12
    know if that corresponds to the hash
  • 23:10 - 23:15
    it's ok if it's slow because you're only
  • 23:12 - 23:16
    doing this check kind of once but the
  • 23:15 - 23:17
    other scenario in which you're going to
  • 23:16 - 23:18
    be using this function is when
  • 23:17 - 23:21
    somebody's trying to brute-force a
  • 23:18 - 23:22
    password say a website has their
  • 23:21 - 23:23
    password database stolen and somebody's
  • 23:22 - 23:25
    going through all the accounts I'm
  • 23:23 - 23:28
    trying to break all the passwords well
  • 23:25 - 23:29
    in that case you want this to be slow
  • 23:28 - 23:30
    because someone's gonna be doing this
  • 23:29 - 23:32
    like millions and millions of times and
  • 23:30 - 23:33
    you can slow down the attacker a lot by
  • 23:32 - 23:35
    making this function slow and so it's
  • 23:33 - 23:36
    fine if this takes you like one second
  • 23:35 - 23:39
    upon logging in to compute this function
  • 23:36 - 23:40
    but when your brute forcing it we don't
  • 23:39 - 23:42
    go to a thousand guesses a second like
  • 23:40 - 23:46
    in that xkcd comic we can slow it down a
  • 23:42 - 23:48
    little bit so what is the output of key
  • 23:46 - 23:50
    derivation functions actually used for
  • 23:48 - 23:52
    well the next topic we're going to talk
  • 23:50 - 23:54
    about probably like one of the most
  • 23:52 - 23:55
    classic things when you think about
  • 23:54 - 24:00
    cryptography is encryption and
  • 23:55 - 24:17
    decryption the next topic is symmetric
  • 24:00 - 24:18
    key cryptography and like the rest of
  • 24:17 - 24:20
    this lecture we're not going to talk
  • 24:18 - 24:21
    about how you implement these we're
  • 24:20 - 24:24
    going to talk about the API for a
  • 24:21 - 24:25
    symmetric key symmetric key crypto like
  • 24:24 - 24:28
    how it's used
  • 24:25 - 24:31
    so symmetric key cryptosystems have a
  • 24:28 - 24:33
    couple different functions they have a
  • 24:31 - 24:35
    key generation function which is a
  • 24:33 - 24:39
    randomized function that produces a
  • 24:35 - 24:43
    thing we call the key and then they have
  • 24:39 - 24:43
    a pair of functions encrypt and decrypt
  • 24:46 - 24:53
    and encrypt take as input something we
  • 24:49 - 24:55
    refer to as the plaintext and this is
  • 24:53 - 24:58
    just some sequence of bytes some data
  • 24:55 - 24:59
    and it takes in a key so something that
  • 24:58 - 25:03
    came as an output of this key generation
  • 24:59 - 25:03
    function and produces
  • 25:04 - 25:09
    what we call the ciphertext and then
  • 25:07 - 25:15
    decrypt does the opposite of this so it
  • 25:09 - 25:23
    takes the ciphertext along with the key
  • 25:15 - 25:25
    and produces the plaintext and this
  • 25:23 - 25:29
    triple of functions has a couple
  • 25:25 - 25:32
    properties one is that like one one team
  • 25:29 - 25:33
    you might expect is that this thing
  • 25:32 - 25:36
    doesn't really tell you all that much
  • 25:33 - 25:45
    about this input to the encryption so
  • 25:36 - 25:47
    property number one is given the
  • 25:45 - 26:02
    ciphertext you can't figure out the
  • 25:47 - 26:03
    plaintext without the key and the other
  • 26:02 - 26:12
    property is kind of the obvious
  • 26:03 - 26:14
    correctness property that if you take
  • 26:12 - 26:17
    something and you encrypt it some
  • 26:14 - 26:20
    message M with a key K and then you
  • 26:17 - 26:24
    decrypt that ciphertext using the same
  • 26:20 - 26:24
    key that gives you back the same message
  • 26:24 - 26:30
    this is the kind of obvious correctness
  • 26:27 - 26:32
    property so does this description make
  • 26:30 - 26:34
    sense does it fit your kind of intuitive
  • 26:32 - 26:36
    understanding of taking some piece of
  • 26:34 - 26:38
    data and obscuring it so you can't
  • 26:36 - 26:40
    really tell anything about the original
  • 26:38 - 26:43
    input but then taking that obscured
  • 26:40 - 26:45
    result and then passing it there's some
  • 26:43 - 26:50
    decryption function given that key to
  • 26:45 - 26:52
    retrieve the original input and this
  • 26:50 - 26:53
    this isn't really a rigorous definition
  • 26:52 - 26:55
    of what it means for something to be
  • 26:53 - 27:02
    secure but it's a good enough intuitive
  • 26:55 - 27:03
    definition that we can work with it so
  • 27:02 - 27:08
    any questions about that description
  • 27:03 - 27:10
    there so where can key cryptography be
  • 27:08 - 27:12
    useful we'll talk about a whole bunch of
  • 27:10 - 27:13
    examples later in this lecture but one
  • 27:12 - 27:15
    example we'll talk about right now is
  • 27:13 - 27:17
    encrypting files for storage and
  • 27:15 - 27:20
    untrusted cloud service
  • 27:17 - 27:24
    so consider say something like Dropbox
  • 27:20 - 27:26
    or Google Drive or things like that when
  • 27:24 - 27:28
    you upload files there you're trusting
  • 27:26 - 27:30
    the service to not look at your files or
  • 27:28 - 27:32
    do anything malicious with them these
  • 27:30 - 27:34
    services like at least the ones I named
  • 27:32 - 27:36
    are not intend encrypted or anything
  • 27:34 - 27:38
    like that like in theory any employee
  • 27:36 - 27:40
    those companies could look at your files
  • 27:38 - 27:42
    now of course these companies have lots
  • 27:40 - 27:43
    of policies and technical controls in
  • 27:42 - 27:45
    place for making sure that that sort of
  • 27:43 - 27:46
    thing doesn't happen but that doesn't
  • 27:45 - 27:49
    mean that it's not technically possible
  • 27:46 - 27:50
    and so one thing you might want to do if
  • 27:49 - 27:53
    you don't want to trust these cloud
  • 27:50 - 27:54
    services to not peek at your data not do
  • 27:53 - 27:55
    like machine learning over them or do
  • 27:54 - 27:57
    other sorts of things that you wouldn't
  • 27:55 - 27:59
    really want is you can just take your
  • 27:57 - 28:04
    files and encrypt them before uploading
  • 27:59 - 28:06
    them to these these web services so does
  • 28:04 - 28:07
    that idea make sense that I can take my
  • 28:06 - 28:09
    file like Center pictures or whatever
  • 28:07 - 28:10
    pass it through an encryption function
  • 28:09 - 28:11
    and peruse the cipher text and then
  • 28:10 - 28:13
    place that cipher text on the web
  • 28:11 - 28:15
    service safe for backup purposes or
  • 28:13 - 28:17
    whatever and if I ever need that I can
  • 28:15 - 28:19
    retrieve the cipher text then use my key
  • 28:17 - 28:20
    to decrypt it back into the plaintext
  • 28:19 - 28:22
    and they can use a result for doing
  • 28:20 - 28:29
    whatever I need to do does that make
  • 28:22 - 28:30
    sense yeah yeah so that's that's a good
  • 28:29 - 28:32
    question the question is couldn't
  • 28:30 - 28:35
    anybody else run it through the same
  • 28:32 - 28:36
    encryption program one detail maybe I
  • 28:35 - 28:38
    should have explained in a little bit
  • 28:36 - 28:46
    more detail is this key generation
  • 28:38 - 28:48
    function is randomized and this key has
  • 28:46 - 28:51
    high entropy so going back to that topic
  • 28:48 - 28:55
    we talked about earlier so like an
  • 28:51 - 28:58
    example is we might have aes 256 this is
  • 28:55 - 29:01
    one particular symmetric cipher and this
  • 28:58 - 29:04
    as the name might indicate has 256 bits
  • 29:01 - 29:06
    of entropy in the key and so that means
  • 29:04 - 29:07
    that as long as the attacker like
  • 29:06 - 29:09
    whoever downloads the cipher text from
  • 29:07 - 29:11
    the web service doesn't know your key
  • 29:09 - 29:11
    unless they have some better attack in
  • 29:11 - 29:13
    place
  • 29:11 - 29:15
    they'll have to try all the different
  • 29:13 - 29:17
    possible keys and if they're two to the
  • 29:15 - 29:20
    256 keys that's too many keys to try in
  • 29:17 - 29:21
    a reasonable amount of time does that
  • 29:20 - 29:26
    answer the question okay any other
  • 29:21 - 29:26
    questions yeah
  • 29:35 - 29:39
    that's an excellent question and that
  • 29:37 - 29:40
    leads into what I was going to talk
  • 29:39 - 29:44
    about next so thanks for that question
  • 29:40 - 29:45
    so as you point out like if I lose my
  • 29:44 - 29:47
    key I'm kind of stuck right
  • 29:45 - 29:48
    I need my key to decrypt that's kind of
  • 29:47 - 29:49
    the point of this thing like if I didn't
  • 29:48 - 29:51
    need my key to decrypt then this
  • 29:49 - 29:53
    wouldn't be a very good crypto system
  • 29:51 - 29:55
    and so I can combine this idea of
  • 29:53 - 29:56
    symmetric key cryptography with the
  • 29:55 - 29:58
    topic we just talked about key
  • 29:56 - 30:00
    derivation functions so instead of
  • 29:58 - 30:01
    having some key that's randomly
  • 30:00 - 30:03
    generated with my key generation
  • 30:01 - 30:04
    function say sampling entropy from
  • 30:03 - 30:11
    somewhere on my machine
  • 30:04 - 30:13
    I can have a passphrase and pass it
  • 30:11 - 30:18
    through my key derivation function box
  • 30:13 - 30:23
    and this gives me my key and then I can
  • 30:18 - 30:29
    take my plaintext and combine it with my
  • 30:23 - 30:35
    key in my encrypt function and this
  • 30:29 - 30:37
    produces my ciphertext and I store this
  • 30:35 - 30:39
    cipher text on the web service but now I
  • 30:37 - 30:42
    don't need to save this key instead I
  • 30:39 - 30:44
    can just remember in my pass phrase and
  • 30:42 - 30:45
    whenever I need my key I can reconstruct
  • 30:44 - 30:48
    it from the key derivation function
  • 30:45 - 30:48
    question
  • 30:57 - 31:00
    yeah so that's a good question the
  • 30:59 - 31:02
    question is is the key derivation
  • 31:00 - 31:05
    function slow enough to prevent
  • 31:02 - 31:06
    brute-force guessing and the answer is
  • 31:05 - 31:09
    it depends on how long your passphrase
  • 31:06 - 31:11
    is so for example if your passphrase is
  • 31:09 - 31:13
    like the string password is probably
  • 31:11 - 31:14
    gonna get broken very quickly but as
  • 31:13 - 31:16
    long as there's enough entropy in your
  • 31:14 - 31:18
    passphrase this is good enough so like
  • 31:16 - 31:19
    if I was uploading something to Dropbox
  • 31:18 - 31:22
    and I really want it to stay secret I
  • 31:19 - 31:24
    think like a 64-bit passphrase really a
  • 31:22 - 31:25
    passphrase with 64 bits of entropy it
  • 31:24 - 31:28
    would be more than enough in that
  • 31:25 - 31:30
    scenario for example and just a quick
  • 31:28 - 31:32
    demo of this so there are tools to make
  • 31:30 - 31:34
    this really easy to do this is actually
  • 31:32 - 31:37
    one of the exercises but we can take a
  • 31:34 - 31:40
    tool for example called open SSL and use
  • 31:37 - 31:42
    it to apply a symmetric cipher to some
  • 31:40 - 31:44
    file so I had my readme text here for
  • 31:42 - 31:47
    example readme MD it has a bunch of
  • 31:44 - 31:51
    stuff in it and I can do open SSL AES
  • 31:47 - 31:54
    256 cbc this is the name of a particular
  • 31:51 - 31:58
    symmetric cipher and i can say that i
  • 31:54 - 32:02
    want to apply this to read me md and
  • 31:58 - 32:04
    produce readme dot and MD let's give it
  • 32:02 - 32:05
    some name and then it's asking you for a
  • 32:04 - 32:06
    password so by default this works in
  • 32:05 - 32:08
    this mode where I provide a passphrase
  • 32:06 - 32:10
    is run through a KDF to produce a key
  • 32:08 - 32:12
    and that's used for encryption so I'll
  • 32:10 - 32:15
    type in some password type it in again
  • 32:12 - 32:19
    and now I produce this readme MD file
  • 32:15 - 32:21
    and if I look at this it kind of looks
  • 32:19 - 32:23
    like garbage and that's at a high level
  • 32:21 - 32:25
    the point of a symmetric cipher it
  • 32:23 - 32:26
    produces some cipher text that should be
  • 32:25 - 32:30
    kind of indistinguishable from random
  • 32:26 - 32:34
    data and when I want to decrypt this I
  • 32:30 - 32:38
    can run a similar command open SSL AES
  • 32:34 - 32:40
    256 cbc dash D for decrypt take the
  • 32:38 - 32:44
    input from readme tank done MD and I
  • 32:40 - 32:49
    like do like readme dot read need
  • 32:44 - 32:54
    decrypted MD as the output and I can
  • 32:49 - 32:56
    compare these two files and the
  • 32:54 - 32:58
    correctness property of symmetric
  • 32:56 - 32:59
    cryptography tells me that this should
  • 32:58 - 33:01
    be identical and this indeed is
  • 32:59 - 33:03
    identical if I look at the return value
  • 33:01 - 33:05
    compare return 0 so that means that are
  • 33:03 - 33:08
    the same file
  • 33:05 - 33:08
    [Applause]
  • 33:09 - 33:14
    so any questions about symmetric key
  • 33:12 - 33:14
    cryptography yeah
  • 33:20 - 33:26
    so the this particular command did make
  • 33:24 - 33:29
    a new file so it took us input readme MD
  • 33:26 - 33:31
    and produces output this file so that is
  • 33:29 - 33:33
    the encrypted version of the file it
  • 33:31 - 33:36
    left the original untouched but then of
  • 33:33 - 33:48
    course I could delete it if I wanted to
  • 33:36 - 33:49
    yeah that's a good question this is
  • 33:48 - 33:50
    something I wasn't gonna talk about in
  • 33:49 - 33:53
    too much detail the question is I
  • 33:50 - 33:56
    provided the salt argument here and
  • 33:53 - 33:58
    where is that stored so the answer is
  • 33:56 - 34:02
    that that is stored in this output here
  • 33:58 - 34:05
    so this output format stores both the
  • 34:02 - 34:07
    salt and the actual output ciphertext so
  • 34:05 - 34:13
    can be used in the reconstruction and
  • 34:07 - 34:15
    decrypt yeah that's correct it doesn't
  • 34:13 - 34:20
    keep any database or anything this is
  • 34:15 - 34:23
    fully self-contained yeah and as John
  • 34:20 - 34:25
    says the salt is not the secret like the
  • 34:23 - 34:34
    the passphrase is what is the secret
  • 34:25 - 34:37
    thing here okay so let's go back to so
  • 34:34 - 34:40
    the so the question is what is salt the
  • 34:37 - 34:42
    idea of a cryptographic salt is probably
  • 34:40 - 34:48
    best explained in the context of hash
  • 34:42 - 34:50
    functions so one common application of
  • 34:48 - 34:51
    hash functions is to store passwords in
  • 34:50 - 34:54
    a password database if I have a website
  • 34:51 - 34:55
    and I have logins for users like people
  • 34:54 - 34:57
    log in with their username and password
  • 34:55 - 35:00
    I don't actually want to store people's
  • 34:57 - 35:01
    passwords in plain text just like as is
  • 35:00 - 35:06
    does anybody know why I wouldn't want to
  • 35:01 - 35:08
    do that yes
  • 35:06 - 35:10
    exactly what if there was a breach and
  • 35:08 - 35:12
    someone got all your data so it's really
  • 35:10 - 35:14
    bad if you leak all your users passwords
  • 35:12 - 35:15
    it's especially bad because a lot of
  • 35:14 - 35:17
    people reuse their passwords across
  • 35:15 - 35:19
    different sites so you'll see attackers
  • 35:17 - 35:20
    break into one thing like there was big
  • 35:19 - 35:22
    yahoo breach a while ago and they find
  • 35:20 - 35:24
    all these usernames and passwords and
  • 35:22 - 35:26
    then they go and try those same login
  • 35:24 - 35:27
    credentials on Google and on Facebook
  • 35:26 - 35:30
    and on YouTube and whatnot these people
  • 35:27 - 35:33
    reuse passwords so it's bad to store
  • 35:30 - 35:34
    plaintext passwords so one thing you
  • 35:33 - 35:35
    should do is you should store hashed
  • 35:34 - 35:38
    passwords with a hash function or
  • 35:35 - 35:39
    ideally a password hashing function
  • 35:38 - 35:42
    that's intentionally designed to be slow
  • 35:39 - 35:44
    but one thing attackers one thing
  • 35:42 - 35:45
    attacker started doing once they realize
  • 35:44 - 35:47
    that people started storing hashed
  • 35:45 - 35:49
    passwords is they built these things
  • 35:47 - 35:53
    called rainbow tables what people did
  • 35:49 - 35:55
    was they took a way of generating big
  • 35:53 - 35:57
    password lists like the kind of model
  • 35:55 - 35:58
    what passwords might look like say take
  • 35:57 - 36:01
    all the dictionary words take all
  • 35:58 - 36:02
    strings of like length from zero to
  • 36:01 - 36:04
    eight and whatnot
  • 36:02 - 36:06
    take all of those and then hash them and
  • 36:04 - 36:09
    produce a big database mapping hashes
  • 36:06 - 36:11
    back to their pre image and so given the
  • 36:09 - 36:12
    output of a hash function rather than
  • 36:11 - 36:14
    have to like brute force said on the fly
  • 36:12 - 36:15
    you can just go look up in this database
  • 36:14 - 36:17
    Oh what is the input that corresponds to
  • 36:15 - 36:19
    this output and people have built these
  • 36:17 - 36:23
    for reasonably large password databases
  • 36:19 - 36:25
    and so one thing that you can do in
  • 36:23 - 36:28
    reaction to that as a defense is rather
  • 36:25 - 36:31
    than storing in your database as your to
  • 36:28 - 36:35
    write it rather than storing just the
  • 36:31 - 36:42
    hash of the password what you do is you
  • 36:35 - 36:45
    compute what's called a salt value and
  • 36:42 - 36:47
    what this is is a large random string
  • 36:45 - 36:50
    and then what you do is you store in
  • 36:47 - 36:53
    your password database the salt which is
  • 36:50 - 36:55
    not really a secret like you can store
  • 36:53 - 37:00
    this in your password database along
  • 36:55 - 37:05
    with a hash of the password with the
  • 37:00 - 37:08
    salt appended to it why is this useful
  • 37:05 - 37:10
    well this salt is a random unique value
  • 37:08 - 37:12
    for every user and so if someone has the
  • 37:10 - 37:15
    password safe password one two three on
  • 37:12 - 37:16
    one web service if you are just storing
  • 37:15 - 37:18
    the hash of the password then the hash
  • 37:16 - 37:19
    would be the same on both Web Services
  • 37:18 - 37:21
    right because this hash
  • 37:19 - 37:23
    function is a deterministic function but
  • 37:21 - 37:26
    now since we're using this randomized
  • 37:23 - 37:28
    salt value we store the hash of the
  • 37:26 - 37:30
    password plus of salt and so even if
  • 37:28 - 37:33
    someone's using the same password on
  • 37:30 - 37:35
    multiple sites this thing looks
  • 37:33 - 37:37
    different in both cases and it makes it
  • 37:35 - 37:41
    so these big databases mapping these
  • 37:37 - 37:42
    short passwords or hash outputs to the
  • 37:41 - 37:45
    short passwords that they came from
  • 37:42 - 37:47
    those are no longer useful when you have
  • 37:45 - 37:49
    salted passwords you kind of need to do
  • 37:47 - 37:51
    the brute-force attack for every user
  • 37:49 - 37:52
    once you find their salt value rather
  • 37:51 - 37:54
    than being able to use this big
  • 37:52 - 37:59
    precomputed database does that answer
  • 37:54 - 38:00
    the question of what a salt is and so
  • 37:59 - 38:03
    that's what that salt argument is
  • 38:00 - 38:03
    related to
  • 38:06 - 38:13
    let's see so any questions about
  • 38:08 - 38:17
    anything we talked about so far great
  • 38:13 - 38:20
    okay so the I'm gonna go ahead and erase
  • 38:17 - 38:22
    this and then the last topic we'll talk
  • 38:20 - 38:24
    about is one of the most exciting
  • 38:22 - 38:25
    developments of cryptography happen
  • 38:24 - 38:27
    quite a while ago but it's still a
  • 38:25 - 38:42
    really cool concept something called a
  • 38:27 - 38:44
    symmetric key cryptography and so this
  • 38:42 - 38:46
    is an idea that actually enables a lot
  • 38:44 - 38:49
    of the security and privacy related
  • 38:46 - 38:50
    features of basically anything you use
  • 38:49 - 38:53
    today like we need to go and type in
  • 38:50 - 38:57
    like www.google.com/mapmaker flog Rafi
  • 38:53 - 38:58
    is used as part of what goes on there so
  • 38:57 - 39:00
    this is going to look pretty similar to
  • 38:58 - 39:05
    what we talked about in symmetric key
  • 39:00 - 39:06
    cryptography except with a twist there's
  • 39:05 - 39:08
    a key generation function which
  • 39:06 - 39:11
    similarly is randomized but instead of
  • 39:08 - 39:17
    producing a single key it produces a
  • 39:11 - 39:22
    pair of keys two different things one of
  • 39:17 - 39:25
    which is referred to as a public key and
  • 39:22 - 39:28
    the other is referred to as a private
  • 39:25 - 39:30
    key and then these can be used for
  • 39:28 - 39:32
    encryption and decryption in a manner
  • 39:30 - 39:33
    kind of similar to symmetric key crypto
  • 39:32 - 39:35
    except these different keys have
  • 39:33 - 39:39
    different uses now so we have an
  • 39:35 - 39:41
    encryption function which takes in a
  • 39:39 - 39:47
    plaintext Isles right P here and it
  • 39:41 - 39:49
    takes in the public key and praises the
  • 39:47 - 39:53
    ciphertext and then I have a decryption
  • 39:49 - 39:59
    function which takes in my ciphertext
  • 39:53 - 40:06
    and the private key and gives me back my
  • 39:59 - 40:09
    plaintext and then similarly to those
  • 40:06 - 40:11
    two properties we had over there given
  • 40:09 - 40:14
    just the ciphertext we can't figure out
  • 40:11 - 40:16
    the plaintext unless we have the private
  • 40:14 - 40:17
    key and then we have the obvious
  • 40:16 - 40:19
    correctness property that if we encrypt
  • 40:17 - 40:21
    something with the private key
  • 40:19 - 40:23
    sorry encrypt something with the public
  • 40:21 - 40:25
    key and then take that cypher text and
  • 40:23 - 40:27
    try decrypting it with the corresponding
  • 40:25 - 40:29
    private key that came from this key
  • 40:27 - 40:31
    generation process that outputs these
  • 40:29 - 40:36
    two different things at once then I get
  • 40:31 - 40:38
    the same result back so this is very
  • 40:36 - 40:39
    similar to what's above but there's a
  • 40:38 - 40:41
    twist that we have these two different
  • 40:39 - 40:43
    keys that have different functions and
  • 40:41 - 40:45
    it's really neat that this public key
  • 40:43 - 40:47
    can actually be made as the name
  • 40:45 - 40:49
    indicates public like I could be using a
  • 40:47 - 40:51
    crypto system that works like this post
  • 40:49 - 40:53
    a public key on the internet for anybody
  • 40:51 - 40:55
    to see but keep my private key secret
  • 40:53 - 40:56
    and then I have this interesting
  • 40:55 - 40:58
    property that anybody on the internet
  • 40:56 - 41:01
    can take any piece of content and
  • 40:58 - 41:02
    encrypt it for me using my public key
  • 41:01 - 41:04
    and send it over the Internet
  • 41:02 - 41:06
    to me and then I can decrypt it using my
  • 41:04 - 41:08
    private key and as long as my private
  • 41:06 - 41:10
    key C's stays secret it doesn't matter
  • 41:08 - 41:12
    if my public key is available to anybody
  • 41:10 - 41:15
    on the Internet so here's where the
  • 41:12 - 41:19
    asymmetry comes from before we were in a
  • 41:15 - 41:21
    scenario where like suppose I was on the
  • 41:19 - 41:21
    internet but you weren't like talking to
  • 41:21 - 41:23
    me face-to-face
  • 41:21 - 41:25
    and you wanted to send me some data over
  • 41:23 - 41:27
    the Internet over some unencrypted
  • 41:25 - 41:28
    Channel where anybody could listen on
  • 41:27 - 41:30
    what you were saying and you wanted to
  • 41:28 - 41:32
    use symmetric key cryptography well we
  • 41:30 - 41:33
    need some way of exchanging a key in
  • 41:32 - 41:35
    advance so that you could encrypt some
  • 41:33 - 41:37
    plaintext with a key and give me that
  • 41:35 - 41:39
    ciphertext over the Internet so that I
  • 41:37 - 41:41
    could done decrypt it with that key in
  • 41:39 - 41:43
    symmetric key crypto if the keys public
  • 41:41 - 41:45
    it's game over like anybody can decrypt
  • 41:43 - 41:47
    your stuff whereas an asymmetric key
  • 41:45 - 41:49
    cryptography I could take my public key
  • 41:47 - 41:51
    and like post it on a bulletin board on
  • 41:49 - 41:53
    the Internet and you can go look at that
  • 41:51 - 41:55
    take some contents and encrypt them for
  • 41:53 - 41:56
    me and then send them over and that
  • 41:55 - 41:57
    would be totally fine
  • 41:56 - 42:00
    you can only decrypt it with the private
  • 41:57 - 42:02
    key and so one analogy that may be
  • 42:00 - 42:06
    helpful is comparing these mathematical
  • 42:02 - 42:07
    ideas to physical locks so you probably
  • 42:06 - 42:09
    have a lock on your door to your house
  • 42:07 - 42:11
    and you can put in a key and like turn
  • 42:09 - 42:12
    the thing in order to lock the door or
  • 42:11 - 42:14
    you can turn it the other way to unlock
  • 42:12 - 42:16
    the door so there's a single key and it
  • 42:14 - 42:17
    can both lock and unlock the door
  • 42:16 - 42:19
    but now consider this alternative
  • 42:17 - 42:21
    construction which you might use if say
  • 42:19 - 42:23
    I want you to be able to send me a
  • 42:21 - 42:25
    message and have it be sent over the
  • 42:23 - 42:27
    internet and you I don't really need a
  • 42:25 - 42:30
    way to exchange a key with you
  • 42:27 - 42:31
    beforehand I could get a box which you
  • 42:30 - 42:32
    could put a letter inside and you can
  • 42:31 - 42:36
    close the box and I can get one of the
  • 42:32 - 42:38
    padlock things which I can give you by I
  • 42:36 - 42:40
    could like take those padlock and open
  • 42:38 - 42:42
    it and give it to you and you at your
  • 42:40 - 42:43
    own leisure could put your message
  • 42:42 - 42:46
    inside a box and take this padlock which
  • 42:43 - 42:48
    is open and shut it around the box and
  • 42:46 - 42:50
    then send it over to me and then I could
  • 42:48 - 42:52
    put in my key and unlock it so do you
  • 42:50 - 42:54
    see how there is this asymmetry there as
  • 42:52 - 42:56
    opposed to the key that I used to open
  • 42:54 - 42:58
    the door to my house where the same key
  • 42:56 - 43:00
    opens and closes the thing instead I
  • 42:58 - 43:02
    give you this open padlock that you have
  • 43:00 - 43:04
    the ability to close but not open and
  • 43:02 - 43:06
    after you closed it I can use my key
  • 43:04 - 43:07
    which I've kept secret in order to open
  • 43:06 - 43:09
    the thing and retrieve what's inside
  • 43:07 - 43:11
    maybe this analogy is helpful maybe it's
  • 43:09 - 43:14
    not the mathematical construction works
  • 43:11 - 43:17
    just fine if that works for a year so
  • 43:14 - 43:19
    any questions about a symmetric key
  • 43:17 - 43:21
    encryption and decryption and how it
  • 43:19 - 43:26
    relates to symmetric key crypto how it's
  • 43:21 - 43:28
    a little bit different so before we talk
  • 43:26 - 43:30
    about applications of this idea I'm
  • 43:28 - 43:34
    going to talk about one other set of
  • 43:30 - 43:36
    concepts in a symmetric key cryptography
  • 43:34 - 43:38
    so these crypto systems give you another
  • 43:36 - 43:40
    set of tools which are related to
  • 43:38 - 43:42
    encryption and decryption something
  • 43:40 - 43:44
    called signing and verifying and this is
  • 43:42 - 43:46
    kind of similar to the real world like I
  • 43:44 - 43:48
    can get a document and sign it with my
  • 43:46 - 43:50
    signature except real world signatures
  • 43:48 - 43:52
    are I don't think that hard to forge
  • 43:50 - 43:56
    these are pretty hard to forge and
  • 43:52 - 43:58
    therefore more useful what does
  • 43:56 - 44:01
    signature schemes look like there's a
  • 43:58 - 44:08
    function sign that takes us some message
  • 44:01 - 44:10
    and the private key so notice this this
  • 44:08 - 44:15
    is the private key not the public key
  • 44:10 - 44:18
    and it produces a signature and then
  • 44:15 - 44:24
    there's another function verify which
  • 44:18 - 44:28
    takes in the message the signature and
  • 44:24 - 44:28
    the public key this time
  • 44:30 - 44:36
    and it tells me it returns a boolean
  • 44:34 - 44:41
    whether or not the signature checks out
  • 44:36 - 44:44
    and then this pair of functions has the
  • 44:41 - 44:45
    property that again these are kind of
  • 44:44 - 44:49
    properties that follow the intuition
  • 44:45 - 44:54
    that come from physical signatures that
  • 44:49 - 44:57
    uh without the private key it's hard to
  • 44:54 - 44:59
    produce a signature for any message such
  • 44:57 - 45:00
    that you can give the message in the
  • 44:59 - 45:02
    signature and the public key to the
  • 45:00 - 45:08
    verify function to get it to return true
  • 45:02 - 45:08
    like at a high level it's hard to Forge
  • 45:10 - 45:21
    it's hard to forge a signature of course
  • 45:12 - 45:24
    without the private key and then there's
  • 45:21 - 45:26
    the obvious correctness property that if
  • 45:24 - 45:27
    you signed a thing with a public key and
  • 45:26 - 45:29
    then try verifying it with the
  • 45:27 - 45:30
    corresponding sorry if you sign a thing
  • 45:29 - 45:32
    with the private key and try to verify
  • 45:30 - 45:34
    it with the corresponding public key
  • 45:32 - 45:38
    it returns okay that this verification
  • 45:34 - 45:41
    checks out so these are two different
  • 45:38 - 45:44
    kinds of things you can do with
  • 45:41 - 45:46
    asymmetric key cryptosystems
  • 45:44 - 45:48
    an example of an asymmetric key
  • 45:46 - 45:50
    cryptosystem that you might have heard
  • 45:48 - 45:52
    of is something called RSA so RSA is
  • 45:50 - 45:53
    designed by a number of people one of
  • 45:52 - 45:59
    whom is ron rivest who's a professor
  • 45:53 - 46:02
    here so there are couple of interesting
  • 45:59 - 46:03
    applications of asymmetric key crypto
  • 46:02 - 46:05
    actually like tons and tons and tons of
  • 46:03 - 46:07
    you spend like days talking about this
  • 46:05 - 46:09
    but a couple examples are email
  • 46:07 - 46:10
    encryption so we talked a little bit
  • 46:09 - 46:12
    about sending messages what we can do
  • 46:10 - 46:14
    with asymmetric key crypto is that you
  • 46:12 - 46:17
    can have public keys posted online I
  • 46:14 - 46:19
    think some of the instructors have PGP
  • 46:17 - 46:20
    public keys on their website so for
  • 46:19 - 46:22
    example you go to my website or John's
  • 46:20 - 46:25
    website you'll find a public key and
  • 46:22 - 46:27
    then what you can do is you can send us
  • 46:25 - 46:29
    an encrypted email and so even if that
  • 46:27 - 46:30
    message goes through Gmail or whatever
  • 46:29 - 46:32
    other mail service throughout my T's
  • 46:30 - 46:34
    mail servers if there happens to be an
  • 46:32 - 46:36
    attacker snooping on the messages they
  • 46:34 - 46:38
    can't make any sense of their contents
  • 46:36 - 46:40
    because they're all encrypted and this
  • 46:38 - 46:42
    is really cool because you can do this
  • 46:40 - 46:43
    without kind of finding us in person and
  • 46:42 - 46:44
    exchanging
  • 46:43 - 46:46
    which you might have to do in a
  • 46:44 - 46:48
    symmetric key cryptosystem you can just
  • 46:46 - 46:49
    find our public key which can be posted
  • 46:48 - 46:52
    online without causing any issues and
  • 46:49 - 46:54
    then send us encrypted email another
  • 46:52 - 46:56
    thing asymmetric key crypto is used for
  • 46:54 - 46:58
    is private messaging so raise your hand
  • 46:56 - 47:01
    if you've used anything like signal or
  • 46:58 - 47:02
    telegram or I think what's up is in
  • 47:01 - 47:05
    theory antenna encrypted so a good
  • 47:02 - 47:07
    number of you these private messaging
  • 47:05 - 47:09
    applications also use asymmetric key
  • 47:07 - 47:12
    crypto to establish private
  • 47:09 - 47:14
    communication channels basically every
  • 47:12 - 47:16
    person has associated with them a key
  • 47:14 - 47:18
    pair and so your device has run this key
  • 47:16 - 47:20
    generation function produced a public
  • 47:18 - 47:22
    key and a private key and automatically
  • 47:20 - 47:24
    posted your public key to the internet
  • 47:22 - 47:26
    so for example if you're using signal
  • 47:24 - 47:27
    your public key is on the signal servers
  • 47:26 - 47:30
    and then when someone wants to contact
  • 47:27 - 47:32
    you their phone can look up your public
  • 47:30 - 47:33
    key retreat and once it's retrieve your
  • 47:32 - 47:35
    public key they can encrypt information
  • 47:33 - 47:37
    for you this is a kind of approximation
  • 47:35 - 47:39
    of how their algorithm works but at a
  • 47:37 - 47:41
    high level that's what's going on
  • 47:39 - 47:43
    another neat application of asymmetric
  • 47:41 - 47:45
    key crypto is we were talking about
  • 47:43 - 47:46
    earlier like making sure you have the
  • 47:45 - 47:47
    right software we downloaded from the
  • 47:46 - 47:49
    internet
  • 47:47 - 47:51
    asymmetric key crypto can be used to
  • 47:49 - 47:52
    sign software releases and this is
  • 47:51 - 47:55
    something that people do for example
  • 47:52 - 47:56
    like Debian packages or whatever things
  • 47:55 - 47:58
    you download from the internet the
  • 47:56 - 48:00
    developer will try to sign their
  • 47:58 - 48:01
    software so that you can make sure that
  • 48:00 - 48:02
    whatever you've downloaded from the
  • 48:01 - 48:05
    internet is actually the right thing
  • 48:02 - 48:07
    that came from the right person we
  • 48:05 - 48:08
    talked about in the get lecture all the
  • 48:07 - 48:10
    interesting things you can do with git
  • 48:08 - 48:15
    one thing we didn't cover was signing
  • 48:10 - 48:18
    related functionality and get so git has
  • 48:15 - 48:20
    commits and you can associate with
  • 48:18 - 48:21
    commits something called tags at a high
  • 48:20 - 48:23
    level you can basically take a git
  • 48:21 - 48:26
    commit and attach a signature to it
  • 48:23 - 48:29
    which binds your public key to this
  • 48:26 - 48:31
    commit and then anybody who has your
  • 48:29 - 48:33
    public key can take the commit and your
  • 48:31 - 48:36
    public key and make sure that there's a
  • 48:33 - 48:41
    legitimate signature on the key or sorry
  • 48:36 - 48:45
    on the commit so let me go to like some
  • 48:41 - 48:47
    random repository that I have I can look
  • 48:45 - 48:52
    at a bunch of tags associated with
  • 48:47 - 48:55
    repository if I do look at the raw data
  • 48:52 - 48:55
    associated with this tag
  • 48:56 - 49:03
    it has some metadata and then a blob of
  • 48:59 - 49:06
    like ascii encoded information that i
  • 49:03 - 49:09
    can use the get tagged - v4 verify
  • 49:06 - 49:11
    command to make sure that oh this is a
  • 49:09 - 49:13
    good signature from this person happens
  • 49:11 - 49:14
    to be me so I sign the software release
  • 49:13 - 49:16
    so that anybody who downloads it from
  • 49:14 - 49:18
    the Internet can make sure that they
  • 49:16 - 49:32
    actually got an authentic copy yes
  • 49:18 - 49:34
    question so the question is what exactly
  • 49:32 - 49:38
    is the verify function doing or what is
  • 49:34 - 49:40
    it checking against the like if you want
  • 49:38 - 49:41
    to know mathematically what's going on
  • 49:40 - 49:44
    talk to me
  • 49:41 - 49:46
    after this lecture but for from kind of
  • 49:44 - 49:48
    an API perspective what's going on here
  • 49:46 - 49:50
    is that the signature and also the
  • 49:48 - 49:52
    message here are just a blob of bytes
  • 49:50 - 49:56
    and it happens to be the case that these
  • 49:52 - 49:59
    things are designed such that basically
  • 49:56 - 50:01
    if I take for some particular public key
  • 49:59 - 50:03
    like if you take my public key it's
  • 50:01 - 50:06
    impossible for you without knowledge of
  • 50:03 - 50:08
    my private key for any message to find a
  • 50:06 - 50:11
    second argument to this function that
  • 50:08 - 50:13
    makes it return true you can kind of
  • 50:11 - 50:14
    compare it to signing a document like
  • 50:13 - 50:17
    you don't know how to forge my signature
  • 50:14 - 50:19
    I can take any piece of paper and sign
  • 50:17 - 50:21
    it and then anybody who knows what my
  • 50:19 - 50:22
    signature looks like I can show my
  • 50:21 - 50:25
    document - you can be like yeah that
  • 50:22 - 50:27
    checks out but nobody without the
  • 50:25 - 50:30
    private key can produce a signature that
  • 50:27 - 50:35
    will make this function return true for
  • 50:30 - 50:36
    any particular message and any related
  • 50:35 - 50:39
    questions started you want me to explain
  • 50:36 - 50:39
    any other way or does that make sense
  • 50:50 - 50:54
    so any questions about signing software
  • 50:53 - 50:56
    or any of the other handful of
  • 50:54 - 51:01
    applications talked about of asymmetric
  • 50:56 - 51:02
    key crypto well so one final thing I
  • 51:01 - 51:05
    want to talk about we're almost out of
  • 51:02 - 51:07
    time is key distribution this is a kind
  • 51:05 - 51:08
    of interesting side effect of asymmetric
  • 51:07 - 51:09
    key cryptography
  • 51:08 - 51:12
    it enables a bunch of interesting
  • 51:09 - 51:13
    functionality like I can post my public
  • 51:12 - 51:15
    key on the internet you can go find it
  • 51:13 - 51:16
    and send me encrypted email but how do
  • 51:15 - 51:18
    you know that the public key found is
  • 51:16 - 51:19
    actually my public key it seems like
  • 51:18 - 51:23
    there's a bootstrapping problem here
  • 51:19 - 51:25
    right so there are a couple this is like
  • 51:23 - 51:28
    a really interesting and really hard
  • 51:25 - 51:29
    real-world problem and there are a
  • 51:28 - 51:32
    couple different approaches you might
  • 51:29 - 51:34
    take to this problem one is kind of a
  • 51:32 - 51:35
    lame solution but this thing solves a
  • 51:34 - 51:37
    lot of cryptography problems this
  • 51:35 - 51:40
    exchange the information out-of-band
  • 51:37 - 51:41
    what that means is you want to send me
  • 51:40 - 51:44
    encrypted email we'll just talk to me
  • 51:41 - 51:45
    after class I'll give you my public key
  • 51:44 - 51:47
    on a piece of paper and since you were
  • 51:45 - 51:48
    talking to me in person like you know
  • 51:47 - 51:50
    that it's actually my public key not
  • 51:48 - 51:52
    just somebody like hacked my website and
  • 51:50 - 51:53
    stuck some random number on there that
  • 51:52 - 51:54
    solves the problem but it's not the most
  • 51:53 - 51:56
    elegant there are a couple other
  • 51:54 - 51:59
    approaches that different applications
  • 51:56 - 52:01
    use so those of you who use signal have
  • 51:59 - 52:03
    you ever encountered the phrase safety
  • 52:01 - 52:07
    number like verify your safety number
  • 52:03 - 52:09
    with so and so so with signal they have
  • 52:07 - 52:11
    a way of exchanging public keys which is
  • 52:09 - 52:13
    through the signal servers whoever runs
  • 52:11 - 52:14
    the signal service just maintains on
  • 52:13 - 52:16
    their servers basically a mapping from
  • 52:14 - 52:18
    phone numbers to public keys and when I
  • 52:16 - 52:19
    say oh I want to message this person
  • 52:18 - 52:20
    with this number my phone just goes and
  • 52:19 - 52:22
    retrieves their public key from the
  • 52:20 - 52:24
    internet and then encrypts the message
  • 52:22 - 52:27
    for that public key now does anybody see
  • 52:24 - 52:27
    a problem with the setup
  • 52:30 - 52:39
    yeah yeah exactly the signal servers our
  • 52:37 - 52:41
    point of failure there because if the
  • 52:39 - 52:43
    signal servers give me the wrong public
  • 52:41 - 52:45
    key like supposed signal just produces a
  • 52:43 - 52:46
    new key pair and give me their public
  • 52:45 - 52:48
    key now they can read all my messages
  • 52:46 - 52:50
    and they could even sit in between and
  • 52:48 - 52:52
    transparently decrypt the messages I
  • 52:50 - 52:53
    send them and then re encrypt them and
  • 52:52 - 52:55
    send them on to their final destination
  • 52:53 - 52:58
    like basically I need some way of
  • 52:55 - 53:00
    authenticating the public key I get and
  • 52:58 - 53:01
    so signal has one solution to this which
  • 53:00 - 53:04
    is also just kind of punting the issue
  • 53:01 - 53:05
    to out-of-band key exchange you can meet
  • 53:04 - 53:07
    up with somebody and they have a
  • 53:05 - 53:09
    slightly streamline flow where they show
  • 53:07 - 53:10
    QR codes on the screen you take one
  • 53:09 - 53:11
    phone and take a picture of the other
  • 53:10 - 53:13
    phone screen and vice versa
  • 53:11 - 53:15
    and now you've exchanged public keys in
  • 53:13 - 53:16
    person and from that point on you've
  • 53:15 - 53:19
    bootstrap your encrypted end-to-end
  • 53:16 - 53:22
    communication it also has an issue of or
  • 53:19 - 53:24
    it also has approach of pinning a public
  • 53:22 - 53:26
    key so once you know that a particular
  • 53:24 - 53:28
    phone number has a particular public key
  • 53:26 - 53:30
    your phone remembers that and if that
  • 53:28 - 53:32
    ever changes it'll complain to you and
  • 53:30 - 53:35
    then there are a couple other solutions
  • 53:32 - 53:37
    to this problem PGP one pop to let used
  • 53:35 - 53:38
    to be popular a while ago has this idea
  • 53:37 - 53:40
    of web of trust so like I trust people
  • 53:38 - 53:42
    who my friends trust so if like John has
  • 53:40 - 53:44
    done an out-of-band exchange with say my
  • 53:42 - 53:46
    professor then I can probably email my
  • 53:44 - 53:47
    professor because like I know that John
  • 53:46 - 53:49
    trusts my professor and I trust John so
  • 53:47 - 53:50
    you got this chain of trust through
  • 53:49 - 53:52
    there that's one interesting approach
  • 53:50 - 53:54
    and then another model that's called
  • 53:52 - 53:56
    pretty recently as something that a tool
  • 53:54 - 54:00
    called key base uses this is a really
  • 53:56 - 54:00
    neat whoops
  • 54:00 - 54:05
    there's website called key based IO and
  • 54:04 - 54:07
    they have a really interesting solution
  • 54:05 - 54:09
    to this bootstrapping problem which is
  • 54:07 - 54:11
    social proof so saying you probably have
  • 54:09 - 54:14
    your friends on Facebook and on Twitter
  • 54:11 - 54:16
    and whatnot and it's probably pretty
  • 54:14 - 54:17
    hard for an attacker to break into your
  • 54:16 - 54:18
    friend's Facebook account at the same
  • 54:17 - 54:20
    time as their Twitter account as the
  • 54:18 - 54:21
    same time as their hacker news account
  • 54:20 - 54:23
    and so on and so there's this
  • 54:21 - 54:26
    interesting way of binding public keys
  • 54:23 - 54:28
    to a set of social identities such that
  • 54:26 - 54:30
    you can retrieve a public key once you
  • 54:28 - 54:33
    trust some number of social identities
  • 54:30 - 54:34
    corresponding to your friend we have
  • 54:33 - 54:35
    links to these in the lecture notes if
  • 54:34 - 54:38
    you want to see these things in more
  • 54:35 - 54:41
    detail so that's it for our security and
  • 54:38 - 54:43
    cryptography lecture and tomorrow's
  • 54:41 - 54:43
    lecture will be on a random collection
  • 54:43 - 54:45
    of top
  • 54:43 - 54:49
    that your instructors find interesting
  • 54:45 - 54:49
    so hopefully see you in lecture tomorrow
  • 54:51 - 54:54
    I'll also be here for a couple of
  • 54:53 - 55:09
    minutes after class if anybody has
  • 54:54 - 55:10
    questions yes okay so John feel free to
  • 55:09 - 55:11
    leave if you have to leave but I think
  • 55:10 - 55:12
    nobody's using the classroom after us
  • 55:11 - 55:15
    I'm going to talk about one other
  • 55:12 - 55:17
    interesting topic so john brought up the
  • 55:15 - 55:19
    fact that a symmetric key cryptography
  • 55:17 - 55:23
    is slow and symmetric key cryptography
  • 55:19 - 55:26
    is fast and so in practice you don't
  • 55:23 - 55:28
    really use just a symmetric key
  • 55:26 - 55:31
    cryptography by itself it's usually used
  • 55:28 - 55:38
    to bootstrap a more sophisticated
  • 55:31 - 55:41
    protocol that you're using one thing you
  • 55:38 - 55:41
    might want to do is use a symmetric key
  • 55:41 - 55:44
    cryptography
  • 55:41 - 55:46
    for signing encrypted email right we
  • 55:44 - 55:48
    talked about that example and the way
  • 55:46 - 55:49
    that works isn't what you might have
  • 55:48 - 55:51
    guessed from our straightforward
  • 55:49 - 55:53
    explanation of asymmetric key crypto
  • 55:51 - 55:55
    like you don't just use that encrypt
  • 55:53 - 55:58
    function up there and call it a day in
  • 55:55 - 56:05
    practice what you do is you use hybrid
  • 55:58 - 56:07
    encryption to use a combination of
  • 56:05 - 56:09
    symmetric key and asymmetric key
  • 56:07 - 56:12
    cryptography
  • 56:09 - 56:14
    what you do is here I'll draw this as a
  • 56:12 - 56:21
    big block diagram you take your message
  • 56:14 - 56:23
    m and then I have my public key that I
  • 56:21 - 56:25
    want to encrypt for but rather than just
  • 56:23 - 56:27
    take these two things and pass it
  • 56:25 - 56:36
    through the encryption up there what I
  • 56:27 - 56:42
    do is I use the symmetric key gen
  • 56:36 - 56:43
    function to produce a symmetric key okay
  • 56:42 - 56:45
    I'm gonna like prepend this with
  • 56:43 - 56:46
    symmetric so we can distinguish it from
  • 56:45 - 56:49
    the public key key generation function
  • 56:46 - 56:52
    and then what I do is I take these two
  • 56:49 - 56:55
    things pass them through my symmetric
  • 56:52 - 56:55
    encryption box
  • 57:02 - 57:14
    this produces the ciphertext and now
  • 57:09 - 57:16
    this by itself to the sender sorry this
  • 57:14 - 57:18
    by itself to the receiver who has the
  • 57:16 - 57:20
    private key corresponding to this public
  • 57:18 - 57:21
    key here this is not really useful right
  • 57:20 - 57:26
    because this is encrypted with a
  • 57:21 - 57:29
    symmetric cipher with this key K that
  • 57:26 - 57:31
    came from this function that I ran on my
  • 57:29 - 57:33
    local machine so I need some way of
  • 57:31 - 57:35
    getting this to the person who actually
  • 57:33 - 57:38
    used to decrypt the email and so what I
  • 57:35 - 57:40
    do is I take this thing and now this
  • 57:38 - 57:41
    email might have been big and I use
  • 57:40 - 57:43
    symmetric encryption with that because
  • 57:41 - 57:45
    symmetric encryption is fast but this
  • 57:43 - 57:47
    key is small like it might be 256 bits
  • 57:45 - 57:49
    or something so I can take this thing
  • 57:47 - 58:06
    and encrypt it with a symmetric
  • 57:49 - 58:10
    encryption using the public key and this
  • 58:06 - 58:13
    gives me an encrypted key and this thing
  • 58:10 - 58:15
    can be decrypted using the private key
  • 58:13 - 58:18
    corresponding to that public key to
  • 58:15 - 58:22
    reconstruct this so this is on the
  • 58:18 - 58:24
    sender's end now the receiver gets this
  • 58:22 - 58:25
    and this and kind of does these things
  • 58:24 - 58:28
    backwards so you start with the
  • 58:25 - 58:30
    encrypted key and use asymmetric
  • 58:28 - 58:31
    decryption using your public using your
  • 58:30 - 58:34
    private key that corresponds to the
  • 58:31 - 58:35
    posted public key to reconstruct this
  • 58:34 - 58:38
    key that were used for the symmetric
  • 58:35 - 58:40
    encryption box and then use symmetric
  • 58:38 - 58:42
    key decryption using that key that was
  • 58:40 - 58:46
    reconstructed to take this ciphertext
  • 58:42 - 58:48
    and produce the original message so
  • 58:46 - 58:50
    there's a kind of interesting example of
  • 58:48 - 58:57
    how in practice symmetric and asymmetric
  • 58:50 - 58:57
    key cryptography is combined question
  • 59:01 - 59:08
    so the question is will you be using the
  • 59:03 - 59:11
    same symmetric key generators yes okay
  • 59:08 - 59:14
    so you need to kind of agree ahead of
  • 59:11 - 59:16
    time which box you're using here so you
  • 59:14 - 59:21
    might be like oh I'm going to use AES
  • 59:16 - 59:25
    256 GC up here but this is a well known
  • 59:21 - 59:26
    function and it's public like the
  • 59:25 - 59:28
    attackers allowed to know all the
  • 59:26 - 59:29
    parameters this function this is the
  • 59:28 - 59:34
    only secret thing that the attacker
  • 59:29 - 59:41
    doesn't know the key any other questions
  • 59:34 - 59:42
    yeah that's a really good question what
  • 59:41 - 59:46
    kind of data is important enough to
  • 59:42 - 59:48
    encrypt and I think that depends on your
  • 59:46 - 59:49
    threat model like who what kind of
  • 59:48 - 59:53
    attackers are you concerned about what
  • 59:49 - 59:54
    are you trying to protect against so you
  • 59:53 - 59:56
    might have the stance that you just
  • 59:54 - 59:58
    don't really care and that like anything
  • 59:56 - 59:59
    you communicate with anybody is allowed
  • 59:58 - 60:01
    to be public I might be willing to like
  • 59:59 - 60:03
    post all my conversation with everybody
  • 60:01 - 60:06
    for everybody to see publicly on the
  • 60:03 - 60:08
    Internet on the other hand maybe you're
  • 60:06 - 60:11
    doing some like security sensitive works
  • 60:08 - 60:12
    here working under a contract for the US
  • 60:11 - 60:14
    government developing some sensitive
  • 60:12 - 60:15
    military stuff if you're sending that
  • 60:14 - 60:17
    through the open Internet while you're
  • 60:15 - 60:19
    traveling you probably want to be pretty
  • 60:17 - 60:21
    darn sure that no eavesdroppers or
  • 60:19 - 60:22
    anybody else along the way can see what
  • 60:21 - 60:23
    you're sending and that whatever you're
  • 60:22 - 60:25
    sending is in fact going to the right
  • 60:23 - 60:26
    place and that whoever is receiving it
  • 60:25 - 60:30
    can authenticate that it in fact came
  • 60:26 - 60:31
    from you so you might be worried about
  • 60:30 - 60:33
    all different kinds of adversaries
  • 60:31 - 60:34
    depending on your scenario from random
  • 60:33 - 60:37
    script kiddies who are trying to break
  • 60:34 - 60:38
    into websites to nation state level
  • 60:37 - 60:40
    attackers and you'll need different
  • 60:38 - 60:41
    types of techniques for defending
  • 60:40 - 60:48
    against the different categories of
  • 60:41 - 60:48
    attackers any other questions
  • 60:51 - 60:55
    well so hopefully see some of you
  • 60:54 - 60:57
    tomorrow for a random collection of
  • 60:55 - 60:59
    things that John Jose and I find
  • 60:57 - 60:59
    interesting
Title:
Lecture 9: Security and Cryptography (2020)
Description:

more » « less
Duration:
01:01:00

English subtitles

Revisions