okay so again let's get started with
today's lecture so today we're going to
be talking about security and
cryptography and today's lecture is
gonna be a little bit different than our
treatment of this topic in last year's
class so last year we focused a little
bit more on security and privacy and
from the perspective of a user of a
computer but today we're going to focus
a little bit more on security and
cryptography concepts that are relevant
in understanding some of the tools that
we talked about earlier in this class so
for example we talked about hash
functions or cryptographic hash
functions like sha-1 in the get lecture
or we talked about public keys when we
talked about SSH in the command line
environment in a command line
environment lecture and so today we'll
talk about there's different
cryptographic primitives in more detail
and get an understanding of how they
work and how they're used in these
different tools that we're teaching in
this class this lecture is not a
substitute for a more rigorous class in
security so they're they're a bunch of
really good classes at MIT like six
eight five eight which is on computer
system security or six eight five seven
and six eight seven five which are more
focused on cryptography so don't do
security work without formal training in
security from these classes or elsewhere
and unless you're an expert don't roll
your own crypto don't build your own
crypto implementations or protocols and
the same principle applies to computer
system security this lecture is not
about building your own stuff it's about
understanding what's already out there
and so this lecture will have a very
informal but we think practical
treatment of these basic cryptography
concepts and yeah hopefully it'll help
you understand some of the tools we
talked about earlier in this class any
questions about the plan for today's
lecture great so the first topic for
today is something called entropy
entropy is a measure of randomness and
this is useful for example when trying
to determine the strength of a password
so let's take a look at this comic from
xkcd we're a big fan of xkcd comics so
this comic raise your hand if you've
seen this before know a good number of
you so this comic is complaining about
this common pattern that's been taught
to users of computers that when you
design passwords they should be things
like that T
our zero ub40 orm purse and three string
in the top-left like we should design
passwords that are full of funny
characters and things like that to make
it hard for attackers to guess and yet
turns out that passwords like that are
actually pretty weak and guessable by
computers that can guess postures really
fast and brute-force attacks and on the
other hand passwords which maybe
intuitively don't look as secure like
the one on the bottom left correct horse
battery staple that one turns out to be
way more secure so how do I actually
quantify the security of these different
passwords it's by measuring the amount
of randomness in the password how many
bits of randomness are in there and so
entropy is measured in bits this is like
the same bits from information theory
and we're only going to talk about the
simple case where you're trying to
measure the amount of randomness when
you're choosing from a set of things
uniformly at random so for example when
you're constructing a password that's in
the format of four random words you're
kind of considering all possible
sequences of four random words made from
some dictionary you have you might have
a dictionary would say a hundred
thousand words and you're selecting each
word uniformly at random how many
possibilities are there
well you can go and figure that out in
that example once you know how many
possibilities there the measure of
entropy is log base 2 of the number of
possibilities and as that comic suggests
this is related to how long it'll take
an attacker to try to brute-force
through your different passwords like if
you have a thousand possibilities you're
guessing passwords at a thousand
passwords a second that's not a very
good password so this is a couple of
quick examples a coin flip has two
possibilities and let's assume we have a
fair coin and so a coin flip has log
base 2 of 2 is one bit of entropy and
another thing we might look at is
something like a dice roll so there's
six possibilities and log
two of six is something like 2.6 bits of
entropy
so that's how we quantify the amount of
randomness in something so now going
back to that example in the xkcd comic
when we want to figure out how much
entropy is in a password we have to
consider and if the model for how the
password was generated for example in
the top left you could consider okay we
take one dictionary word make some
substitutions of some of the characters
with numbers that look similar to that
character add one punctuation mark at
the end and add one numeral after that
and we can take that model and then use
common rhetoric to figure out how many
possibilities there are and from that we
can derive how many bits of entropy are
in that password so in that particular
example I don't know exactly what model
they were using for the password but
they calculated their 28 bits of entropy
whereas in the bottom-left example that
correct horse battery staple they assume
that you're working from a dictionary of
about 2,000 words and so when you
combine four of those words together you
get about 44 bits of entropy from that
so it's much more secure than the
example before it so any questions about
this definition of entropy or why it's
useful and so when you're generating
your own passwords keep this in mind you
want a high entropy password and the
exact number you need depends on exactly
what you're trying to protect against
like in general a concept in securities
you have to keep in mind what your
threat model is like what attackers
you're concerned about what kinds of
technique the attackers might be using
for example this comic refers to an
attacker that can guess a thousand
passwords a second this might be
something that's possible for say a web
service that allows people to try to log
in with your email and then random
passwords that the attacker is trying
but this thousand passwords the second
model might not be accurate for other
scenarios for example an offline
password cracking scenario or maybe the
attacker has broken into a website and
downloaded their database and they have
some obfuscated form of your password
and they're trying to figure out what
the password is maybe they can paralyze
this attack and make it go to million
guesses a second and so exactly how much
entropy you need depends on exactly what
you're trying to protect against but
roughly forty bits of entropy might be
good enough for
which is protected by a website and
you're concerned about online password
guesses and then maybe something like 80
bits of entropy might be good if you're
concerned about offline attacks and you
want to be really really secure so
they're rough guidelines you can use and
then how do you actually generate strong
passwords well you have some model for
password for example the for dictionary
works thing and you can actually get a
dictionary and then you can use methods
like dice where so there's a song we
linked to in the lecture notes where you
can actually get physical dye and roll
them and then map dice rolls to
dictionary words in order to eventually
turn that into a password and doing
something like this using some kind of
physical token that you know is random
like a balanced die or a coin that you
know is balanced is a good thing to do
because humans are actually not good at
choosing random numbers right if I just
asked you to name a random number for 1
to 100 chances are that you're probably
not doing so uniformly at random very
well and so that's why it's actually
good to use these physical tokens in
order to produce randomness so entropy
that's our first concept recovering any
questions about that or about this comic
great so getting into slightly more
interesting and complicated topics the
next thing we're going to talk about is
hash functions so hopefully most of you
were here during the get lecture where
we talked about the sha-1 hash function
used in get so now going into that topic
in a little bit more detail hash
functions at a high level are functions
that map a variable amount of data into
a fixed size output so for example the
sha-1 hash functions is one example of a
hash function takes in some input of
some number of bytes and outputs exactly
160 bits of output so that's kind of the
type signature of this particular hash
function and then these functions have
some number of properties that are
useful so at a high level
these can be thought about as hard to
invert functions that have
random-looking outputs we can actually
try this out on some random piece of
data
for example if I enter into my terminal
printf hello this does exactly what you
would expect it does prints the set to
standard out and I can pipe this to the
sha-1 sum command so this is a command
line program that accepts input via
standard in and computes this sha-1
function which takes in some variable
number of bytes from the input and
produces a 160-bit output which in this
particular case is represent or encoded
as a hexadecimal string so it's a length
40 hexadecimal string and you see this
output right here this - just means it
took it it took its input from
standardin so this output just looks
like some random number but one
important thing is that this is a
deterministic number if you try the same
command on your own computer printf
hello sha-1 something you will get the
same number out so sha-1 is some
well-known function that people have
agreed upon for all its parameters we'll
see that if we tweak the input a little
bit like say changed hello to holo with
a capital H now I get a completely
different looking output and this also
looks like some other kind of random ish
number even though it is deterministic
and you could reproduce this on your own
computer hash functions have a number of
properties that are pretty important the
first property that cryptographic hash
functions have is that their
non-invertible and what that means is
that if you take the output from this
function for example that a a f4
ballaugh 3 for D strings shown there
from that output it's hard to figure out
what the input was that produced that
output so you can go one way compute the
sha-1 hash easily but you can't go
backwards another property that these
functions have is that their collision
resistant and what this property means
is that it's hard to find two different
inputs that produce the same output and
so this basically describes what a
cryptographic hash function is so any
questions about the kind of
specification of a cryptographic hash
function
okay so what are these hash functions
actually useful for well we've already
seen one application in git for content
address storage so we want it get we
want some uniform way of naming
different objects that are in the object
store and it turns out that get
addresses all of them by their sha-1
hash so you have the actual data you
want to store and then to name that
particular piece of data you just name
the sha-1 hash and all of that is stored
in the object store in that particular
way we see this when looking at many
different parts of git for example right
here I'm going to get repository if I do
get log it shows me the commits and for
example this number up here is the
cryptographic hash function sha-1 apply
to the commit object that describes this
particular commit so does anybody know
why git uses a cryptographic hash
function here as opposed to so you might
have heard in your other computer
science classes like say your
introductory algorithms class there are
things called hash functions without the
word cryptographic appended in front of
them and they have similar properties
that they turn a variable sized input
into some fixed size output but they
don't quite have these properties where
it's hard to find an input that produces
a particular output or things like that
it's a kind of weaker definition than
this so why is it that in get we care
about having a cryptographic hash
function as opposed to just a regular
old hash function does anybody have any
ideas
yeah that's that's basically it that we
don't want to have kind of conflicts in
the output from this hash function like
every commit is identified by a hash
function every file is identified by the
hash of that file if it were ever the
case that two different pieces of
content in practice produce the same
output that is if the function were not
collision resistant that could be really
problematic right because then you and I
we could have to do to get repos that we
think are the same we check out the same
commit hash and we might end up with
different files and this is concerning
because git is used to track software a
track development of software and it's
also kind of involved in making sure
that the right people are authoring the
software nothing funny has happened in
the process for example there all these
open source projects like the Linux
kernel where development is done using
git it would be really bad if some
contributor to get could say edit some
file and propose some change that looks
pretty benign like oh let me go and
improve this part of Linux submit that
change request to the Linux developers
and then in practice actually supply a
git repository that has the same commit
hash and whatnot but actually the file
contents are different there's something
malicious so git actually relies on this
sha-1 function being a cryptographic
hash function in order to achieve
security any questions about that and
some other interesting applications of
hash functions so as we saw hash
functions turn big inputs into small
outputs and in a way because the hash
function is collision resistant the
output can be used to kind of attest to
or identify the input and so you can
think of a hash as a short summary of a
file for example in this directory of a
bunch of files and I can compute the
sha-1 sum of some file in this directory
and this is the sha-1 algorithm applied
to this readme MD file and what's
interesting is that it is
computationally hard or like impossible
you can kind of think of it as
impossible to find any other file so a
different file that has the same hash
output and one scenario in which this is
useful is when you download files from
the internet
for example there are lots of Linux
distributions that distribute large CD
or DVD images from their website like I
can go to Debian org and download the
latest version of Debian the thing is
that hosting those files can be
expensive and so a lot of people are
nice enough to host mirrors of these
files so instead of downloading Debian
from Debian org I can go to one of many
other sites and download what are
supposed to be the same files that are
hosted at Debian org but how do I know
that I actually got the correct file
like what if I set up a malicious mirror
and you go to like Anisha is evil Debian
website calm and then try to download
Debian turns out that your Linux
installation is backdoored well one
thing you could do is download a copy
from the original double-unit website
and then download my version and compare
them but that kind of defeats the
purpose right because we want to avoid
downloading things from Debian org
because hosting these files is expensive
and we want all these different people
to be able to mirror copies of the files
elsewhere so does anybody see how
cryptographic hash functions could be
useful to solve this problem that I want
to download a file from an untrusted
source but and not from like the trusted
source itself but maybe I can get some
small piece of information from this
trusted source in order to know whether
the file I downloaded from the untrusted
source is the thing I was supposed to
get yes like it's basically just a
straightforward application of
cryptographic hash functions
so what Debian org can do is they can
produce their kind of correct ISO file
or whatever they want and instead of
publishing the file itself on their
website they can publish a hash of that
file so compared to the file itself
which may be many gigabytes this is only
like in this particular case 160 bits of
data right so very cheap to host and
then what I can do is a user is I can
download that file from any random
website it could be an untrusted website
and after I download I just double check
the sha-1 hash and if the hash matches
then I know that I have the right file
because it's computationally infeasible
for somebody to give me some different
file that happens to have the same hash
because hash functions are collision
resistant so any questions about that
application yeah
yeah so that's a good question like why
do you need different people to host the
information like wouldn't it be equally
expensive for everybody so the answer is
that question is a little bit
complicated but like here's that here's
a partial answer one thing is that
downloading files from a server is
affected by how far away the server is
from you so for example if the servers
in Massachusetts and you're in say China
like you have to kind of make a big
round trip across the internet and that
may be expensive for a number of reasons
like the latency is high and the traffic
traffic needs to go through kind of lots
of different wires to make its way all
the way to where you are and so one
thing that these websites do is that
they distribute their content to servers
that are all over the world and then as
a user you download from the server
that's closest to you like for example
mit maintains a Debian package
repository and like kind of mirrors all
the Debbie and stuff so if you're a
Debian user at MIT you can use the MIT
copy of everything and then you can kind
of access it over our fast local network
and that traffic never needs to go to
the outside Internet at all so it's very
fast that's a good question any other
questions ok and then one final kind of
interesting application of hash
functions is something called a
commitment scheme so I want to play a
game and I need a volunteer for this so
you don't actually need to get up from
your seat or anything I was need you to
talk with me so any volunteers raise
your hand yeah okay yeah what's your
name
Abdul Aziz okay great so Abdul Aziz
we're going to play a game we're going
to play a game where I'm going to flip a
coin and then you're gonna call heads or
tails and if you call it right you win
and if you call it wrong you lose and
there are no stakes for this game but
just the pride of winning so sadly I
checked my wallet and all I have is
dollar bills I don't have any coins with
me so instead I'm going to just flip the
coin in my head all right
so okay I flip the coin call heads or
tails sorry you lost it was heads I
don't I play again yeah I can cheat
right I can just see what you say and
say the opposite thing so let's try
fixing this game how about you call
heads or tails after I say what the
flip result was okay yeah so if I say oh
the result is tails what are you gonna
say are you call tails yeah so is it
possible to play this guess what guess
what the coin flip result is game in a
fair way without having a physical coin
that we share like because I can't
really manipulate your physical reality
if I flip a coin in front of you
probably trust that it's okay right
so it turns out that hash functions give
us a kind of cool way to solve this
problem to through a idea called a
commitment scheme so I can say they're
like here's the construction of the
solution I can pick heads or tails and
I'm actually going to pick a big random
number say like this number here and
what I can do is compute the sha-1 sum
of this number at this moment you
haven't seen this number yet I'm just
doing all this in my head and then what
I do is I tell you okay I flipped a coin
and I'm not going to tell you what the
result is just yet because you haven't
called heads or tails but I'll tell you
what the shell wants some of the result
is here you go and I tell you this value
now after this you can call heads or
tails so what do you say like say say
heads afterwards what I can do is I can
reveal to you what my input to this
function was and then you can
cross-check this right you can compute
the sha-1 sum on the input to verify
that the output is what I said it was
earlier and then we can have some way of
mapping these numbers to heads or tails
so I might have agreed upon beforehand
that even numbers are heads and odd
numbers or tails and so this is a way of
fixing that game so we can actually play
this game in in our heads right I can
pick a value but not reveal that value
to you but I can commit to the value so
this is a kind of binding commitment
scheme that I can't change my mind after
I've told you this but it doesn't reveal
the original value to you and so this is
one other neat application of
cryptographic hash functions so any
questions about this particular
construction okay great so moving on to
the next topic we're going to talk about
key derivation functions
[Applause]
often abbreviate it as KDF so this is a
concept that's very similar to hash
functions except it has kind of one
extra property that it is slow to
compute for example there's a hash
function of key derivation function
known as P pbkdf2 pbkdf2 password-based
key derivation function that has a kind
of similar form as these hash functions
we were talking about here that they
take in some variable length input in
pretty so fixed length output but
they're meant to be used for one
particular purpose
the purpose is generally to use the
fixed length output as a key in another
cryptographic algorithm and we'll talk
about those algorithms like what use the
output of this thing for in a moment but
a one property of these things is that
they're slow does anybody have any idea
why you'd want an algorithm to be slow
like normally we want algorithms to be
fast right so why would we want an
algorithm to be slow yes
yeah that's exactly it so I'll repeat so
it goes into the microphone the reason
you want these to be slow is when you're
actually using it for something like
password authentication where you have
the hash of a password saved and then
somebody inputs the password you want to
know if that corresponds to the hash
it's ok if it's slow because you're only
doing this check kind of once but the
other scenario in which you're going to
be using this function is when
somebody's trying to brute-force a
password say a website has their
password database stolen and somebody's
going through all the accounts I'm
trying to break all the passwords well
in that case you want this to be slow
because someone's gonna be doing this
like millions and millions of times and
you can slow down the attacker a lot by
making this function slow and so it's
fine if this takes you like one second
upon logging in to compute this function
but when your brute forcing it we don't
go to a thousand guesses a second like
in that xkcd comic we can slow it down a
little bit so what is the output of key
derivation functions actually used for
well the next topic we're going to talk
about probably like one of the most
classic things when you think about
cryptography is encryption and
decryption the next topic is symmetric
key cryptography and like the rest of
this lecture we're not going to talk
about how you implement these we're
going to talk about the API for a
symmetric key symmetric key crypto like
how it's used
so symmetric key cryptosystems have a
couple different functions they have a
key generation function which is a
randomized function that produces a
thing we call the key and then they have
a pair of functions encrypt and decrypt
and encrypt take as input something we
refer to as the plaintext and this is
just some sequence of bytes some data
and it takes in a key so something that
came as an output of this key generation
function and produces
what we call the ciphertext and then
decrypt does the opposite of this so it
takes the ciphertext along with the key
and produces the plaintext and this
triple of functions has a couple
properties one is that like one one team
you might expect is that this thing
doesn't really tell you all that much
about this input to the encryption so
property number one is given the
ciphertext you can't figure out the
plaintext without the key and the other
property is kind of the obvious
correctness property that if you take
something and you encrypt it some
message M with a key K and then you
decrypt that ciphertext using the same
key that gives you back the same message
this is the kind of obvious correctness
property so does this description make
sense does it fit your kind of intuitive
understanding of taking some piece of
data and obscuring it so you can't
really tell anything about the original
input but then taking that obscured
result and then passing it there's some
decryption function given that key to
retrieve the original input and this
this isn't really a rigorous definition
of what it means for something to be
secure but it's a good enough intuitive
definition that we can work with it so
any questions about that description
there so where can key cryptography be
useful we'll talk about a whole bunch of
examples later in this lecture but one
example we'll talk about right now is
encrypting files for storage and
untrusted cloud service
so consider say something like Dropbox
or Google Drive or things like that when
you upload files there you're trusting
the service to not look at your files or
do anything malicious with them these
services like at least the ones I named
are not intend encrypted or anything
like that like in theory any employee
those companies could look at your files
now of course these companies have lots
of policies and technical controls in
place for making sure that that sort of
thing doesn't happen but that doesn't
mean that it's not technically possible
and so one thing you might want to do if
you don't want to trust these cloud
services to not peek at your data not do
like machine learning over them or do
other sorts of things that you wouldn't
really want is you can just take your
files and encrypt them before uploading
them to these these web services so does
that idea make sense that I can take my
file like Center pictures or whatever
pass it through an encryption function
and peruse the cipher text and then
place that cipher text on the web
service safe for backup purposes or
whatever and if I ever need that I can
retrieve the cipher text then use my key
to decrypt it back into the plaintext
and they can use a result for doing
whatever I need to do does that make
sense yeah yeah so that's that's a good
question the question is couldn't
anybody else run it through the same
encryption program one detail maybe I
should have explained in a little bit
more detail is this key generation
function is randomized and this key has
high entropy so going back to that topic
we talked about earlier so like an
example is we might have aes 256 this is
one particular symmetric cipher and this
as the name might indicate has 256 bits
of entropy in the key and so that means
that as long as the attacker like
whoever downloads the cipher text from
the web service doesn't know your key
unless they have some better attack in
place
they'll have to try all the different
possible keys and if they're two to the
256 keys that's too many keys to try in
a reasonable amount of time does that
answer the question okay any other
questions yeah
that's an excellent question and that
leads into what I was going to talk
about next so thanks for that question
so as you point out like if I lose my
key I'm kind of stuck right
I need my key to decrypt that's kind of
the point of this thing like if I didn't
need my key to decrypt then this
wouldn't be a very good crypto system
and so I can combine this idea of
symmetric key cryptography with the
topic we just talked about key
derivation functions so instead of
having some key that's randomly
generated with my key generation
function say sampling entropy from
somewhere on my machine
I can have a passphrase and pass it
through my key derivation function box
and this gives me my key and then I can
take my plaintext and combine it with my
key in my encrypt function and this
produces my ciphertext and I store this
cipher text on the web service but now I
don't need to save this key instead I
can just remember in my pass phrase and
whenever I need my key I can reconstruct
it from the key derivation function
question
yeah so that's a good question the
question is is the key derivation
function slow enough to prevent
brute-force guessing and the answer is
it depends on how long your passphrase
is so for example if your passphrase is
like the string password is probably
gonna get broken very quickly but as
long as there's enough entropy in your
passphrase this is good enough so like
if I was uploading something to Dropbox
and I really want it to stay secret I
think like a 64-bit passphrase really a
passphrase with 64 bits of entropy it
would be more than enough in that
scenario for example and just a quick
demo of this so there are tools to make
this really easy to do this is actually
one of the exercises but we can take a
tool for example called open SSL and use
it to apply a symmetric cipher to some
file so I had my readme text here for
example readme MD it has a bunch of
stuff in it and I can do open SSL AES
256 cbc this is the name of a particular
symmetric cipher and i can say that i
want to apply this to read me md and
produce readme dot and MD let's give it
some name and then it's asking you for a
password so by default this works in
this mode where I provide a passphrase
is run through a KDF to produce a key
and that's used for encryption so I'll
type in some password type it in again
and now I produce this readme MD file
and if I look at this it kind of looks
like garbage and that's at a high level
the point of a symmetric cipher it
produces some cipher text that should be
kind of indistinguishable from random
data and when I want to decrypt this I
can run a similar command open SSL AES
256 cbc dash D for decrypt take the
input from readme tank done MD and I
like do like readme dot read need
decrypted MD as the output and I can
compare these two files and the
correctness property of symmetric
cryptography tells me that this should
be identical and this indeed is
identical if I look at the return value
compare return 0 so that means that are
the same file
[Applause]
so any questions about symmetric key
cryptography yeah
so the this particular command did make
a new file so it took us input readme MD
and produces output this file so that is
the encrypted version of the file it
left the original untouched but then of
course I could delete it if I wanted to
yeah that's a good question this is
something I wasn't gonna talk about in
too much detail the question is I
provided the salt argument here and
where is that stored so the answer is
that that is stored in this output here
so this output format stores both the
salt and the actual output ciphertext so
can be used in the reconstruction and
decrypt yeah that's correct it doesn't
keep any database or anything this is
fully self-contained yeah and as John
says the salt is not the secret like the
the passphrase is what is the secret
thing here okay so let's go back to so
the so the question is what is salt the
idea of a cryptographic salt is probably
best explained in the context of hash
functions so one common application of
hash functions is to store passwords in
a password database if I have a website
and I have logins for users like people
log in with their username and password
I don't actually want to store people's
passwords in plain text just like as is
does anybody know why I wouldn't want to
do that yes
exactly what if there was a breach and
someone got all your data so it's really
bad if you leak all your users passwords
it's especially bad because a lot of
people reuse their passwords across
different sites so you'll see attackers
break into one thing like there was big
yahoo breach a while ago and they find
all these usernames and passwords and
then they go and try those same login
credentials on Google and on Facebook
and on YouTube and whatnot these people
reuse passwords so it's bad to store
plaintext passwords so one thing you
should do is you should store hashed
passwords with a hash function or
ideally a password hashing function
that's intentionally designed to be slow
but one thing attackers one thing
attacker started doing once they realize
that people started storing hashed
passwords is they built these things
called rainbow tables what people did
was they took a way of generating big
password lists like the kind of model
what passwords might look like say take
all the dictionary words take all
strings of like length from zero to
eight and whatnot
take all of those and then hash them and
produce a big database mapping hashes
back to their pre image and so given the
output of a hash function rather than
have to like brute force said on the fly
you can just go look up in this database
Oh what is the input that corresponds to
this output and people have built these
for reasonably large password databases
and so one thing that you can do in
reaction to that as a defense is rather
than storing in your database as your to
write it rather than storing just the
hash of the password what you do is you
compute what's called a salt value and
what this is is a large random string
and then what you do is you store in
your password database the salt which is
not really a secret like you can store
this in your password database along
with a hash of the password with the
salt appended to it why is this useful
well this salt is a random unique value
for every user and so if someone has the
password safe password one two three on
one web service if you are just storing
the hash of the password then the hash
would be the same on both Web Services
right because this hash
function is a deterministic function but
now since we're using this randomized
salt value we store the hash of the
password plus of salt and so even if
someone's using the same password on
multiple sites this thing looks
different in both cases and it makes it
so these big databases mapping these
short passwords or hash outputs to the
short passwords that they came from
those are no longer useful when you have
salted passwords you kind of need to do
the brute-force attack for every user
once you find their salt value rather
than being able to use this big
precomputed database does that answer
the question of what a salt is and so
that's what that salt argument is
related to
let's see so any questions about
anything we talked about so far great
okay so the I'm gonna go ahead and erase
this and then the last topic we'll talk
about is one of the most exciting
developments of cryptography happen
quite a while ago but it's still a
really cool concept something called a
symmetric key cryptography and so this
is an idea that actually enables a lot
of the security and privacy related
features of basically anything you use
today like we need to go and type in
like www.google.com/mapmaker flog Rafi
is used as part of what goes on there so
this is going to look pretty similar to
what we talked about in symmetric key
cryptography except with a twist there's
a key generation function which
similarly is randomized but instead of
producing a single key it produces a
pair of keys two different things one of
which is referred to as a public key and
the other is referred to as a private
key and then these can be used for
encryption and decryption in a manner
kind of similar to symmetric key crypto
except these different keys have
different uses now so we have an
encryption function which takes in a
plaintext Isles right P here and it
takes in the public key and praises the
ciphertext and then I have a decryption
function which takes in my ciphertext
and the private key and gives me back my
plaintext and then similarly to those
two properties we had over there given
just the ciphertext we can't figure out
the plaintext unless we have the private
key and then we have the obvious
correctness property that if we encrypt
something with the private key
sorry encrypt something with the public
key and then take that cypher text and
try decrypting it with the corresponding
private key that came from this key
generation process that outputs these
two different things at once then I get
the same result back so this is very
similar to what's above but there's a
twist that we have these two different
keys that have different functions and
it's really neat that this public key
can actually be made as the name
indicates public like I could be using a
crypto system that works like this post
a public key on the internet for anybody
to see but keep my private key secret
and then I have this interesting
property that anybody on the internet
can take any piece of content and
encrypt it for me using my public key
and send it over the Internet
to me and then I can decrypt it using my
private key and as long as my private
key C's stays secret it doesn't matter
if my public key is available to anybody
on the Internet so here's where the
asymmetry comes from before we were in a
scenario where like suppose I was on the
internet but you weren't like talking to
me face-to-face
and you wanted to send me some data over
the Internet over some unencrypted
Channel where anybody could listen on
what you were saying and you wanted to
use symmetric key cryptography well we
need some way of exchanging a key in
advance so that you could encrypt some
plaintext with a key and give me that
ciphertext over the Internet so that I
could done decrypt it with that key in
symmetric key crypto if the keys public
it's game over like anybody can decrypt
your stuff whereas an asymmetric key
cryptography I could take my public key
and like post it on a bulletin board on
the Internet and you can go look at that
take some contents and encrypt them for
me and then send them over and that
would be totally fine
you can only decrypt it with the private
key and so one analogy that may be
helpful is comparing these mathematical
ideas to physical locks so you probably
have a lock on your door to your house
and you can put in a key and like turn
the thing in order to lock the door or
you can turn it the other way to unlock
the door so there's a single key and it
can both lock and unlock the door
but now consider this alternative
construction which you might use if say
I want you to be able to send me a
message and have it be sent over the
internet and you I don't really need a
way to exchange a key with you
beforehand I could get a box which you
could put a letter inside and you can
close the box and I can get one of the
padlock things which I can give you by I
could like take those padlock and open
it and give it to you and you at your
own leisure could put your message
inside a box and take this padlock which
is open and shut it around the box and
then send it over to me and then I could
put in my key and unlock it so do you
see how there is this asymmetry there as
opposed to the key that I used to open
the door to my house where the same key
opens and closes the thing instead I
give you this open padlock that you have
the ability to close but not open and
after you closed it I can use my key
which I've kept secret in order to open
the thing and retrieve what's inside
maybe this analogy is helpful maybe it's
not the mathematical construction works
just fine if that works for a year so
any questions about a symmetric key
encryption and decryption and how it
relates to symmetric key crypto how it's
a little bit different so before we talk
about applications of this idea I'm
going to talk about one other set of
concepts in a symmetric key cryptography
so these crypto systems give you another
set of tools which are related to
encryption and decryption something
called signing and verifying and this is
kind of similar to the real world like I
can get a document and sign it with my
signature except real world signatures
are I don't think that hard to forge
these are pretty hard to forge and
therefore more useful what does
signature schemes look like there's a
function sign that takes us some message
and the private key so notice this this
is the private key not the public key
and it produces a signature and then
there's another function verify which
takes in the message the signature and
the public key this time
and it tells me it returns a boolean
whether or not the signature checks out
and then this pair of functions has the
property that again these are kind of
properties that follow the intuition
that come from physical signatures that
uh without the private key it's hard to
produce a signature for any message such
that you can give the message in the
signature and the public key to the
verify function to get it to return true
like at a high level it's hard to Forge
it's hard to forge a signature of course
without the private key and then there's
the obvious correctness property that if
you signed a thing with a public key and
then try verifying it with the
corresponding sorry if you sign a thing
with the private key and try to verify
it with the corresponding public key
it returns okay that this verification
checks out so these are two different
kinds of things you can do with
asymmetric key cryptosystems
an example of an asymmetric key
cryptosystem that you might have heard
of is something called RSA so RSA is
designed by a number of people one of
whom is ron rivest who's a professor
here so there are couple of interesting
applications of asymmetric key crypto
actually like tons and tons and tons of
you spend like days talking about this
but a couple examples are email
encryption so we talked a little bit
about sending messages what we can do
with asymmetric key crypto is that you
can have public keys posted online I
think some of the instructors have PGP
public keys on their website so for
example you go to my website or John's
website you'll find a public key and
then what you can do is you can send us
an encrypted email and so even if that
message goes through Gmail or whatever
other mail service throughout my T's
mail servers if there happens to be an
attacker snooping on the messages they
can't make any sense of their contents
because they're all encrypted and this
is really cool because you can do this
without kind of finding us in person and
exchanging
which you might have to do in a
symmetric key cryptosystem you can just
find our public key which can be posted
online without causing any issues and
then send us encrypted email another
thing asymmetric key crypto is used for
is private messaging so raise your hand
if you've used anything like signal or
telegram or I think what's up is in
theory antenna encrypted so a good
number of you these private messaging
applications also use asymmetric key
crypto to establish private
communication channels basically every
person has associated with them a key
pair and so your device has run this key
generation function produced a public
key and a private key and automatically
posted your public key to the internet
so for example if you're using signal
your public key is on the signal servers
and then when someone wants to contact
you their phone can look up your public
key retreat and once it's retrieve your
public key they can encrypt information
for you this is a kind of approximation
of how their algorithm works but at a
high level that's what's going on
another neat application of asymmetric
key crypto is we were talking about
earlier like making sure you have the
right software we downloaded from the
internet
asymmetric key crypto can be used to
sign software releases and this is
something that people do for example
like Debian packages or whatever things
you download from the internet the
developer will try to sign their
software so that you can make sure that
whatever you've downloaded from the
internet is actually the right thing
that came from the right person we
talked about in the get lecture all the
interesting things you can do with git
one thing we didn't cover was signing
related functionality and get so git has
commits and you can associate with
commits something called tags at a high
level you can basically take a git
commit and attach a signature to it
which binds your public key to this
commit and then anybody who has your
public key can take the commit and your
public key and make sure that there's a
legitimate signature on the key or sorry
on the commit so let me go to like some
random repository that I have I can look
at a bunch of tags associated with
repository if I do look at the raw data
associated with this tag
it has some metadata and then a blob of
like ascii encoded information that i
can use the get tagged - v4 verify
command to make sure that oh this is a
good signature from this person happens
to be me so I sign the software release
so that anybody who downloads it from
the Internet can make sure that they
actually got an authentic copy yes
question so the question is what exactly
is the verify function doing or what is
it checking against the like if you want
to know mathematically what's going on
talk to me
after this lecture but for from kind of
an API perspective what's going on here
is that the signature and also the
message here are just a blob of bytes
and it happens to be the case that these
things are designed such that basically
if I take for some particular public key
like if you take my public key it's
impossible for you without knowledge of
my private key for any message to find a
second argument to this function that
makes it return true you can kind of
compare it to signing a document like
you don't know how to forge my signature
I can take any piece of paper and sign
it and then anybody who knows what my
signature looks like I can show my
document - you can be like yeah that
checks out but nobody without the
private key can produce a signature that
will make this function return true for
any particular message and any related
questions started you want me to explain
any other way or does that make sense
so any questions about signing software
or any of the other handful of
applications talked about of asymmetric
key crypto well so one final thing I
want to talk about we're almost out of
time is key distribution this is a kind
of interesting side effect of asymmetric
key cryptography
it enables a bunch of interesting
functionality like I can post my public
key on the internet you can go find it
and send me encrypted email but how do
you know that the public key found is
actually my public key it seems like
there's a bootstrapping problem here
right so there are a couple this is like
a really interesting and really hard
real-world problem and there are a
couple different approaches you might
take to this problem one is kind of a
lame solution but this thing solves a
lot of cryptography problems this
exchange the information out-of-band
what that means is you want to send me
encrypted email we'll just talk to me
after class I'll give you my public key
on a piece of paper and since you were
talking to me in person like you know
that it's actually my public key not
just somebody like hacked my website and
stuck some random number on there that
solves the problem but it's not the most
elegant there are a couple other
approaches that different applications
use so those of you who use signal have
you ever encountered the phrase safety
number like verify your safety number
with so and so so with signal they have
a way of exchanging public keys which is
through the signal servers whoever runs
the signal service just maintains on
their servers basically a mapping from
phone numbers to public keys and when I
say oh I want to message this person
with this number my phone just goes and
retrieves their public key from the
internet and then encrypts the message
for that public key now does anybody see
a problem with the setup
yeah yeah exactly the signal servers our
point of failure there because if the
signal servers give me the wrong public
key like supposed signal just produces a
new key pair and give me their public
key now they can read all my messages
and they could even sit in between and
transparently decrypt the messages I
send them and then re encrypt them and
send them on to their final destination
like basically I need some way of
authenticating the public key I get and
so signal has one solution to this which
is also just kind of punting the issue
to out-of-band key exchange you can meet
up with somebody and they have a
slightly streamline flow where they show
QR codes on the screen you take one
phone and take a picture of the other
phone screen and vice versa
and now you've exchanged public keys in
person and from that point on you've
bootstrap your encrypted end-to-end
communication it also has an issue of or
it also has approach of pinning a public
key so once you know that a particular
phone number has a particular public key
your phone remembers that and if that
ever changes it'll complain to you and
then there are a couple other solutions
to this problem PGP one pop to let used
to be popular a while ago has this idea
of web of trust so like I trust people
who my friends trust so if like John has
done an out-of-band exchange with say my
professor then I can probably email my
professor because like I know that John
trusts my professor and I trust John so
you got this chain of trust through
there that's one interesting approach
and then another model that's called
pretty recently as something that a tool
called key base uses this is a really
neat whoops
there's website called key based IO and
they have a really interesting solution
to this bootstrapping problem which is
social proof so saying you probably have
your friends on Facebook and on Twitter
and whatnot and it's probably pretty
hard for an attacker to break into your
friend's Facebook account at the same
time as their Twitter account as the
same time as their hacker news account
and so on and so there's this
interesting way of binding public keys
to a set of social identities such that
you can retrieve a public key once you
trust some number of social identities
corresponding to your friend we have
links to these in the lecture notes if
you want to see these things in more
detail so that's it for our security and
cryptography lecture and tomorrow's
lecture will be on a random collection
of top
that your instructors find interesting
so hopefully see you in lecture tomorrow
I'll also be here for a couple of
minutes after class if anybody has
questions yes okay so John feel free to
leave if you have to leave but I think
nobody's using the classroom after us
I'm going to talk about one other
interesting topic so john brought up the
fact that a symmetric key cryptography
is slow and symmetric key cryptography
is fast and so in practice you don't
really use just a symmetric key
cryptography by itself it's usually used
to bootstrap a more sophisticated
protocol that you're using one thing you
might want to do is use a symmetric key
cryptography
for signing encrypted email right we
talked about that example and the way
that works isn't what you might have
guessed from our straightforward
explanation of asymmetric key crypto
like you don't just use that encrypt
function up there and call it a day in
practice what you do is you use hybrid
encryption to use a combination of
symmetric key and asymmetric key
cryptography
what you do is here I'll draw this as a
big block diagram you take your message
m and then I have my public key that I
want to encrypt for but rather than just
take these two things and pass it
through the encryption up there what I
do is I use the symmetric key gen
function to produce a symmetric key okay
I'm gonna like prepend this with
symmetric so we can distinguish it from
the public key key generation function
and then what I do is I take these two
things pass them through my symmetric
encryption box
this produces the ciphertext and now
this by itself to the sender sorry this
by itself to the receiver who has the
private key corresponding to this public
key here this is not really useful right
because this is encrypted with a
symmetric cipher with this key K that
came from this function that I ran on my
local machine so I need some way of
getting this to the person who actually
used to decrypt the email and so what I
do is I take this thing and now this
email might have been big and I use
symmetric encryption with that because
symmetric encryption is fast but this
key is small like it might be 256 bits
or something so I can take this thing
and encrypt it with a symmetric
encryption using the public key and this
gives me an encrypted key and this thing
can be decrypted using the private key
corresponding to that public key to
reconstruct this so this is on the
sender's end now the receiver gets this
and this and kind of does these things
backwards so you start with the
encrypted key and use asymmetric
decryption using your public using your
private key that corresponds to the
posted public key to reconstruct this
key that were used for the symmetric
encryption box and then use symmetric
key decryption using that key that was
reconstructed to take this ciphertext
and produce the original message so
there's a kind of interesting example of
how in practice symmetric and asymmetric
key cryptography is combined question
so the question is will you be using the
same symmetric key generators yes okay
so you need to kind of agree ahead of
time which box you're using here so you
might be like oh I'm going to use AES
256 GC up here but this is a well known
function and it's public like the
attackers allowed to know all the
parameters this function this is the
only secret thing that the attacker
doesn't know the key any other questions
yeah that's a really good question what
kind of data is important enough to
encrypt and I think that depends on your
threat model like who what kind of
attackers are you concerned about what
are you trying to protect against so you
might have the stance that you just
don't really care and that like anything
you communicate with anybody is allowed
to be public I might be willing to like
post all my conversation with everybody
for everybody to see publicly on the
Internet on the other hand maybe you're
doing some like security sensitive works
here working under a contract for the US
government developing some sensitive
military stuff if you're sending that
through the open Internet while you're
traveling you probably want to be pretty
darn sure that no eavesdroppers or
anybody else along the way can see what
you're sending and that whatever you're
sending is in fact going to the right
place and that whoever is receiving it
can authenticate that it in fact came
from you so you might be worried about
all different kinds of adversaries
depending on your scenario from random
script kiddies who are trying to break
into websites to nation state level
attackers and you'll need different
types of techniques for defending
against the different categories of
attackers any other questions
well so hopefully see some of you
tomorrow for a random collection of
things that John Jose and I find
interesting