-
okay so again let's get started with
-
today's lecture so today we're going to
-
be talking about security and
-
cryptography and today's lecture is
-
gonna be a little bit different than our
-
treatment of this topic in last year's
-
class so last year we focused a little
-
bit more on security and privacy and
-
from the perspective of a user of a
-
computer but today we're going to focus
-
a little bit more on security and
-
cryptography concepts that are relevant
-
in understanding some of the tools that
-
we talked about earlier in this class so
-
for example we talked about hash
-
functions or cryptographic hash
-
functions like sha-1 in the get lecture
-
or we talked about public keys when we
-
talked about SSH in the command line
-
environment in a command line
-
environment lecture and so today we'll
-
talk about there's different
-
cryptographic primitives in more detail
-
and get an understanding of how they
-
work and how they're used in these
-
different tools that we're teaching in
-
this class this lecture is not a
-
substitute for a more rigorous class in
-
security so they're they're a bunch of
-
really good classes at MIT like six
-
eight five eight which is on computer
-
system security or six eight five seven
-
and six eight seven five which are more
-
focused on cryptography so don't do
-
security work without formal training in
-
security from these classes or elsewhere
-
and unless you're an expert don't roll
-
your own crypto don't build your own
-
crypto implementations or protocols and
-
the same principle applies to computer
-
system security this lecture is not
-
about building your own stuff it's about
-
understanding what's already out there
-
and so this lecture will have a very
-
informal but we think practical
-
treatment of these basic cryptography
-
concepts and yeah hopefully it'll help
-
you understand some of the tools we
-
talked about earlier in this class any
-
questions about the plan for today's
-
lecture great so the first topic for
-
today is something called entropy
-
entropy is a measure of randomness and
-
this is useful for example when trying
-
to determine the strength of a password
-
so let's take a look at this comic from
-
xkcd we're a big fan of xkcd comics so
-
this comic raise your hand if you've
-
seen this before know a good number of
-
you so this comic is complaining about
-
this common pattern that's been taught
-
to users of computers that when you
-
design passwords they should be things
-
like that T
-
our zero ub40 orm purse and three string
-
in the top-left like we should design
-
passwords that are full of funny
-
characters and things like that to make
-
it hard for attackers to guess and yet
-
turns out that passwords like that are
-
actually pretty weak and guessable by
-
computers that can guess postures really
-
fast and brute-force attacks and on the
-
other hand passwords which maybe
-
intuitively don't look as secure like
-
the one on the bottom left correct horse
-
battery staple that one turns out to be
-
way more secure so how do I actually
-
quantify the security of these different
-
passwords it's by measuring the amount
-
of randomness in the password how many
-
bits of randomness are in there and so
-
entropy is measured in bits this is like
-
the same bits from information theory
-
and we're only going to talk about the
-
simple case where you're trying to
-
measure the amount of randomness when
-
you're choosing from a set of things
-
uniformly at random so for example when
-
you're constructing a password that's in
-
the format of four random words you're
-
kind of considering all possible
-
sequences of four random words made from
-
some dictionary you have you might have
-
a dictionary would say a hundred
-
thousand words and you're selecting each
-
word uniformly at random how many
-
possibilities are there
-
well you can go and figure that out in
-
that example once you know how many
-
possibilities there the measure of
-
entropy is log base 2 of the number of
-
possibilities and as that comic suggests
-
this is related to how long it'll take
-
an attacker to try to brute-force
-
through your different passwords like if
-
you have a thousand possibilities you're
-
guessing passwords at a thousand
-
passwords a second that's not a very
-
good password so this is a couple of
-
quick examples a coin flip has two
-
possibilities and let's assume we have a
-
fair coin and so a coin flip has log
-
base 2 of 2 is one bit of entropy and
-
another thing we might look at is
-
something like a dice roll so there's
-
six possibilities and log
-
two of six is something like 2.6 bits of
-
entropy
-
so that's how we quantify the amount of
-
randomness in something so now going
-
back to that example in the xkcd comic
-
when we want to figure out how much
-
entropy is in a password we have to
-
consider and if the model for how the
-
password was generated for example in
-
the top left you could consider okay we
-
take one dictionary word make some
-
substitutions of some of the characters
-
with numbers that look similar to that
-
character add one punctuation mark at
-
the end and add one numeral after that
-
and we can take that model and then use
-
common rhetoric to figure out how many
-
possibilities there are and from that we
-
can derive how many bits of entropy are
-
in that password so in that particular
-
example I don't know exactly what model
-
they were using for the password but
-
they calculated their 28 bits of entropy
-
whereas in the bottom-left example that
-
correct horse battery staple they assume
-
that you're working from a dictionary of
-
about 2,000 words and so when you
-
combine four of those words together you
-
get about 44 bits of entropy from that
-
so it's much more secure than the
-
example before it so any questions about
-
this definition of entropy or why it's
-
useful and so when you're generating
-
your own passwords keep this in mind you
-
want a high entropy password and the
-
exact number you need depends on exactly
-
what you're trying to protect against
-
like in general a concept in securities
-
you have to keep in mind what your
-
threat model is like what attackers
-
you're concerned about what kinds of
-
technique the attackers might be using
-
for example this comic refers to an
-
attacker that can guess a thousand
-
passwords a second this might be
-
something that's possible for say a web
-
service that allows people to try to log
-
in with your email and then random
-
passwords that the attacker is trying
-
but this thousand passwords the second
-
model might not be accurate for other
-
scenarios for example an offline
-
password cracking scenario or maybe the
-
attacker has broken into a website and
-
downloaded their database and they have
-
some obfuscated form of your password
-
and they're trying to figure out what
-
the password is maybe they can paralyze
-
this attack and make it go to million
-
guesses a second and so exactly how much
-
entropy you need depends on exactly what
-
you're trying to protect against but
-
roughly forty bits of entropy might be
-
good enough for
-
which is protected by a website and
-
you're concerned about online password
-
guesses and then maybe something like 80
-
bits of entropy might be good if you're
-
concerned about offline attacks and you
-
want to be really really secure so
-
they're rough guidelines you can use and
-
then how do you actually generate strong
-
passwords well you have some model for
-
password for example the for dictionary
-
works thing and you can actually get a
-
dictionary and then you can use methods
-
like dice where so there's a song we
-
linked to in the lecture notes where you
-
can actually get physical dye and roll
-
them and then map dice rolls to
-
dictionary words in order to eventually
-
turn that into a password and doing
-
something like this using some kind of
-
physical token that you know is random
-
like a balanced die or a coin that you
-
know is balanced is a good thing to do
-
because humans are actually not good at
-
choosing random numbers right if I just
-
asked you to name a random number for 1
-
to 100 chances are that you're probably
-
not doing so uniformly at random very
-
well and so that's why it's actually
-
good to use these physical tokens in
-
order to produce randomness so entropy
-
that's our first concept recovering any
-
questions about that or about this comic
-
great so getting into slightly more
-
interesting and complicated topics the
-
next thing we're going to talk about is
-
hash functions so hopefully most of you
-
were here during the get lecture where
-
we talked about the sha-1 hash function
-
used in get so now going into that topic
-
in a little bit more detail hash
-
functions at a high level are functions
-
that map a variable amount of data into
-
a fixed size output so for example the
-
sha-1 hash functions is one example of a
-
hash function takes in some input of
-
some number of bytes and outputs exactly
-
160 bits of output so that's kind of the
-
type signature of this particular hash
-
function and then these functions have
-
some number of properties that are
-
useful so at a high level
-
these can be thought about as hard to
-
invert functions that have
-
random-looking outputs we can actually
-
try this out on some random piece of
-
data
-
for example if I enter into my terminal
-
printf hello this does exactly what you
-
would expect it does prints the set to
-
standard out and I can pipe this to the
-
sha-1 sum command so this is a command
-
line program that accepts input via
-
standard in and computes this sha-1
-
function which takes in some variable
-
number of bytes from the input and
-
produces a 160-bit output which in this
-
particular case is represent or encoded
-
as a hexadecimal string so it's a length
-
40 hexadecimal string and you see this
-
output right here this - just means it
-
took it it took its input from
-
standardin so this output just looks
-
like some random number but one
-
important thing is that this is a
-
deterministic number if you try the same
-
command on your own computer printf
-
hello sha-1 something you will get the
-
same number out so sha-1 is some
-
well-known function that people have
-
agreed upon for all its parameters we'll
-
see that if we tweak the input a little
-
bit like say changed hello to holo with
-
a capital H now I get a completely
-
different looking output and this also
-
looks like some other kind of random ish
-
number even though it is deterministic
-
and you could reproduce this on your own
-
computer hash functions have a number of
-
properties that are pretty important the
-
first property that cryptographic hash
-
functions have is that their
-
non-invertible and what that means is
-
that if you take the output from this
-
function for example that a a f4
-
ballaugh 3 for D strings shown there
-
from that output it's hard to figure out
-
what the input was that produced that
-
output so you can go one way compute the
-
sha-1 hash easily but you can't go
-
backwards another property that these
-
functions have is that their collision
-
resistant and what this property means
-
is that it's hard to find two different
-
inputs that produce the same output and
-
so this basically describes what a
-
cryptographic hash function is so any
-
questions about the kind of
-
specification of a cryptographic hash
-
function
-
okay so what are these hash functions
-
actually useful for well we've already
-
seen one application in git for content
-
address storage so we want it get we
-
want some uniform way of naming
-
different objects that are in the object
-
store and it turns out that get
-
addresses all of them by their sha-1
-
hash so you have the actual data you
-
want to store and then to name that
-
particular piece of data you just name
-
the sha-1 hash and all of that is stored
-
in the object store in that particular
-
way we see this when looking at many
-
different parts of git for example right
-
here I'm going to get repository if I do
-
get log it shows me the commits and for
-
example this number up here is the
-
cryptographic hash function sha-1 apply
-
to the commit object that describes this
-
particular commit so does anybody know
-
why git uses a cryptographic hash
-
function here as opposed to so you might
-
have heard in your other computer
-
science classes like say your
-
introductory algorithms class there are
-
things called hash functions without the
-
word cryptographic appended in front of
-
them and they have similar properties
-
that they turn a variable sized input
-
into some fixed size output but they
-
don't quite have these properties where
-
it's hard to find an input that produces
-
a particular output or things like that
-
it's a kind of weaker definition than
-
this so why is it that in get we care
-
about having a cryptographic hash
-
function as opposed to just a regular
-
old hash function does anybody have any
-
ideas
-
yeah that's that's basically it that we
-
don't want to have kind of conflicts in
-
the output from this hash function like
-
every commit is identified by a hash
-
function every file is identified by the
-
hash of that file if it were ever the
-
case that two different pieces of
-
content in practice produce the same
-
output that is if the function were not
-
collision resistant that could be really
-
problematic right because then you and I
-
we could have to do to get repos that we
-
think are the same we check out the same
-
commit hash and we might end up with
-
different files and this is concerning
-
because git is used to track software a
-
track development of software and it's
-
also kind of involved in making sure
-
that the right people are authoring the
-
software nothing funny has happened in
-
the process for example there all these
-
open source projects like the Linux
-
kernel where development is done using
-
git it would be really bad if some
-
contributor to get could say edit some
-
file and propose some change that looks
-
pretty benign like oh let me go and
-
improve this part of Linux submit that
-
change request to the Linux developers
-
and then in practice actually supply a
-
git repository that has the same commit
-
hash and whatnot but actually the file
-
contents are different there's something
-
malicious so git actually relies on this
-
sha-1 function being a cryptographic
-
hash function in order to achieve
-
security any questions about that and
-
some other interesting applications of
-
hash functions so as we saw hash
-
functions turn big inputs into small
-
outputs and in a way because the hash
-
function is collision resistant the
-
output can be used to kind of attest to
-
or identify the input and so you can
-
think of a hash as a short summary of a
-
file for example in this directory of a
-
bunch of files and I can compute the
-
sha-1 sum of some file in this directory
-
and this is the sha-1 algorithm applied
-
to this readme MD file and what's
-
interesting is that it is
-
computationally hard or like impossible
-
you can kind of think of it as
-
impossible to find any other file so a
-
different file that has the same hash
-
output and one scenario in which this is
-
useful is when you download files from
-
the internet
-
for example there are lots of Linux
-
distributions that distribute large CD
-
or DVD images from their website like I
-
can go to Debian org and download the
-
latest version of Debian the thing is
-
that hosting those files can be
-
expensive and so a lot of people are
-
nice enough to host mirrors of these
-
files so instead of downloading Debian
-
from Debian org I can go to one of many
-
other sites and download what are
-
supposed to be the same files that are
-
hosted at Debian org but how do I know
-
that I actually got the correct file
-
like what if I set up a malicious mirror
-
and you go to like Anisha is evil Debian
-
website calm and then try to download
-
Debian turns out that your Linux
-
installation is backdoored well one
-
thing you could do is download a copy
-
from the original double-unit website
-
and then download my version and compare
-
them but that kind of defeats the
-
purpose right because we want to avoid
-
downloading things from Debian org
-
because hosting these files is expensive
-
and we want all these different people
-
to be able to mirror copies of the files
-
elsewhere so does anybody see how
-
cryptographic hash functions could be
-
useful to solve this problem that I want
-
to download a file from an untrusted
-
source but and not from like the trusted
-
source itself but maybe I can get some
-
small piece of information from this
-
trusted source in order to know whether
-
the file I downloaded from the untrusted
-
source is the thing I was supposed to
-
get yes like it's basically just a
-
straightforward application of
-
cryptographic hash functions
-
so what Debian org can do is they can
-
produce their kind of correct ISO file
-
or whatever they want and instead of
-
publishing the file itself on their
-
website they can publish a hash of that
-
file so compared to the file itself
-
which may be many gigabytes this is only
-
like in this particular case 160 bits of
-
data right so very cheap to host and
-
then what I can do is a user is I can
-
download that file from any random
-
website it could be an untrusted website
-
and after I download I just double check
-
the sha-1 hash and if the hash matches
-
then I know that I have the right file
-
because it's computationally infeasible
-
for somebody to give me some different
-
file that happens to have the same hash
-
because hash functions are collision
-
resistant so any questions about that
-
application yeah
-
yeah so that's a good question like why
-
do you need different people to host the
-
information like wouldn't it be equally
-
expensive for everybody so the answer is
-
that question is a little bit
-
complicated but like here's that here's
-
a partial answer one thing is that
-
downloading files from a server is
-
affected by how far away the server is
-
from you so for example if the servers
-
in Massachusetts and you're in say China
-
like you have to kind of make a big
-
round trip across the internet and that
-
may be expensive for a number of reasons
-
like the latency is high and the traffic
-
traffic needs to go through kind of lots
-
of different wires to make its way all
-
the way to where you are and so one
-
thing that these websites do is that
-
they distribute their content to servers
-
that are all over the world and then as
-
a user you download from the server
-
that's closest to you like for example
-
mit maintains a Debian package
-
repository and like kind of mirrors all
-
the Debbie and stuff so if you're a
-
Debian user at MIT you can use the MIT
-
copy of everything and then you can kind
-
of access it over our fast local network
-
and that traffic never needs to go to
-
the outside Internet at all so it's very
-
fast that's a good question any other
-
questions ok and then one final kind of
-
interesting application of hash
-
functions is something called a
-
commitment scheme so I want to play a
-
game and I need a volunteer for this so
-
you don't actually need to get up from
-
your seat or anything I was need you to
-
talk with me so any volunteers raise
-
your hand yeah okay yeah what's your
-
name
-
Abdul Aziz okay great so Abdul Aziz
-
we're going to play a game we're going
-
to play a game where I'm going to flip a
-
coin and then you're gonna call heads or
-
tails and if you call it right you win
-
and if you call it wrong you lose and
-
there are no stakes for this game but
-
just the pride of winning so sadly I
-
checked my wallet and all I have is
-
dollar bills I don't have any coins with
-
me so instead I'm going to just flip the
-
coin in my head all right
-
so okay I flip the coin call heads or
-
tails sorry you lost it was heads I
-
don't I play again yeah I can cheat
-
right I can just see what you say and
-
say the opposite thing so let's try
-
fixing this game how about you call
-
heads or tails after I say what the
-
flip result was okay yeah so if I say oh
-
the result is tails what are you gonna
-
say are you call tails yeah so is it
-
possible to play this guess what guess
-
what the coin flip result is game in a
-
fair way without having a physical coin
-
that we share like because I can't
-
really manipulate your physical reality
-
if I flip a coin in front of you
-
probably trust that it's okay right
-
so it turns out that hash functions give
-
us a kind of cool way to solve this
-
problem to through a idea called a
-
commitment scheme so I can say they're
-
like here's the construction of the
-
solution I can pick heads or tails and
-
I'm actually going to pick a big random
-
number say like this number here and
-
what I can do is compute the sha-1 sum
-
of this number at this moment you
-
haven't seen this number yet I'm just
-
doing all this in my head and then what
-
I do is I tell you okay I flipped a coin
-
and I'm not going to tell you what the
-
result is just yet because you haven't
-
called heads or tails but I'll tell you
-
what the shell wants some of the result
-
is here you go and I tell you this value
-
now after this you can call heads or
-
tails so what do you say like say say
-
heads afterwards what I can do is I can
-
reveal to you what my input to this
-
function was and then you can
-
cross-check this right you can compute
-
the sha-1 sum on the input to verify
-
that the output is what I said it was
-
earlier and then we can have some way of
-
mapping these numbers to heads or tails
-
so I might have agreed upon beforehand
-
that even numbers are heads and odd
-
numbers or tails and so this is a way of
-
fixing that game so we can actually play
-
this game in in our heads right I can
-
pick a value but not reveal that value
-
to you but I can commit to the value so
-
this is a kind of binding commitment
-
scheme that I can't change my mind after
-
I've told you this but it doesn't reveal
-
the original value to you and so this is
-
one other neat application of
-
cryptographic hash functions so any
-
questions about this particular
-
construction okay great so moving on to
-
the next topic we're going to talk about
-
key derivation functions
-
[Applause]
-
often abbreviate it as KDF so this is a
-
concept that's very similar to hash
-
functions except it has kind of one
-
extra property that it is slow to
-
compute for example there's a hash
-
function of key derivation function
-
known as P pbkdf2 pbkdf2 password-based
-
key derivation function that has a kind
-
of similar form as these hash functions
-
we were talking about here that they
-
take in some variable length input in
-
pretty so fixed length output but
-
they're meant to be used for one
-
particular purpose
-
the purpose is generally to use the
-
fixed length output as a key in another
-
cryptographic algorithm and we'll talk
-
about those algorithms like what use the
-
output of this thing for in a moment but
-
a one property of these things is that
-
they're slow does anybody have any idea
-
why you'd want an algorithm to be slow
-
like normally we want algorithms to be
-
fast right so why would we want an
-
algorithm to be slow yes
-
yeah that's exactly it so I'll repeat so
-
it goes into the microphone the reason
-
you want these to be slow is when you're
-
actually using it for something like
-
password authentication where you have
-
the hash of a password saved and then
-
somebody inputs the password you want to
-
know if that corresponds to the hash
-
it's ok if it's slow because you're only
-
doing this check kind of once but the
-
other scenario in which you're going to
-
be using this function is when
-
somebody's trying to brute-force a
-
password say a website has their
-
password database stolen and somebody's
-
going through all the accounts I'm
-
trying to break all the passwords well
-
in that case you want this to be slow
-
because someone's gonna be doing this
-
like millions and millions of times and
-
you can slow down the attacker a lot by
-
making this function slow and so it's
-
fine if this takes you like one second
-
upon logging in to compute this function
-
but when your brute forcing it we don't
-
go to a thousand guesses a second like
-
in that xkcd comic we can slow it down a
-
little bit so what is the output of key
-
derivation functions actually used for
-
well the next topic we're going to talk
-
about probably like one of the most
-
classic things when you think about
-
cryptography is encryption and
-
decryption the next topic is symmetric
-
key cryptography and like the rest of
-
this lecture we're not going to talk
-
about how you implement these we're
-
going to talk about the API for a
-
symmetric key symmetric key crypto like
-
how it's used
-
so symmetric key cryptosystems have a
-
couple different functions they have a
-
key generation function which is a
-
randomized function that produces a
-
thing we call the key and then they have
-
a pair of functions encrypt and decrypt
-
and encrypt take as input something we
-
refer to as the plaintext and this is
-
just some sequence of bytes some data
-
and it takes in a key so something that
-
came as an output of this key generation
-
function and produces
-
what we call the ciphertext and then
-
decrypt does the opposite of this so it
-
takes the ciphertext along with the key
-
and produces the plaintext and this
-
triple of functions has a couple
-
properties one is that like one one team
-
you might expect is that this thing
-
doesn't really tell you all that much
-
about this input to the encryption so
-
property number one is given the
-
ciphertext you can't figure out the
-
plaintext without the key and the other
-
property is kind of the obvious
-
correctness property that if you take
-
something and you encrypt it some
-
message M with a key K and then you
-
decrypt that ciphertext using the same
-
key that gives you back the same message
-
this is the kind of obvious correctness
-
property so does this description make
-
sense does it fit your kind of intuitive
-
understanding of taking some piece of
-
data and obscuring it so you can't
-
really tell anything about the original
-
input but then taking that obscured
-
result and then passing it there's some
-
decryption function given that key to
-
retrieve the original input and this
-
this isn't really a rigorous definition
-
of what it means for something to be
-
secure but it's a good enough intuitive
-
definition that we can work with it so
-
any questions about that description
-
there so where can key cryptography be
-
useful we'll talk about a whole bunch of
-
examples later in this lecture but one
-
example we'll talk about right now is
-
encrypting files for storage and
-
untrusted cloud service
-
so consider say something like Dropbox
-
or Google Drive or things like that when
-
you upload files there you're trusting
-
the service to not look at your files or
-
do anything malicious with them these
-
services like at least the ones I named
-
are not intend encrypted or anything
-
like that like in theory any employee
-
those companies could look at your files
-
now of course these companies have lots
-
of policies and technical controls in
-
place for making sure that that sort of
-
thing doesn't happen but that doesn't
-
mean that it's not technically possible
-
and so one thing you might want to do if
-
you don't want to trust these cloud
-
services to not peek at your data not do
-
like machine learning over them or do
-
other sorts of things that you wouldn't
-
really want is you can just take your
-
files and encrypt them before uploading
-
them to these these web services so does
-
that idea make sense that I can take my
-
file like Center pictures or whatever
-
pass it through an encryption function
-
and peruse the cipher text and then
-
place that cipher text on the web
-
service safe for backup purposes or
-
whatever and if I ever need that I can
-
retrieve the cipher text then use my key
-
to decrypt it back into the plaintext
-
and they can use a result for doing
-
whatever I need to do does that make
-
sense yeah yeah so that's that's a good
-
question the question is couldn't
-
anybody else run it through the same
-
encryption program one detail maybe I
-
should have explained in a little bit
-
more detail is this key generation
-
function is randomized and this key has
-
high entropy so going back to that topic
-
we talked about earlier so like an
-
example is we might have aes 256 this is
-
one particular symmetric cipher and this
-
as the name might indicate has 256 bits
-
of entropy in the key and so that means
-
that as long as the attacker like
-
whoever downloads the cipher text from
-
the web service doesn't know your key
-
unless they have some better attack in
-
place
-
they'll have to try all the different
-
possible keys and if they're two to the
-
256 keys that's too many keys to try in
-
a reasonable amount of time does that
-
answer the question okay any other
-
questions yeah
-
that's an excellent question and that
-
leads into what I was going to talk
-
about next so thanks for that question
-
so as you point out like if I lose my
-
key I'm kind of stuck right
-
I need my key to decrypt that's kind of
-
the point of this thing like if I didn't
-
need my key to decrypt then this
-
wouldn't be a very good crypto system
-
and so I can combine this idea of
-
symmetric key cryptography with the
-
topic we just talked about key
-
derivation functions so instead of
-
having some key that's randomly
-
generated with my key generation
-
function say sampling entropy from
-
somewhere on my machine
-
I can have a passphrase and pass it
-
through my key derivation function box
-
and this gives me my key and then I can
-
take my plaintext and combine it with my
-
key in my encrypt function and this
-
produces my ciphertext and I store this
-
cipher text on the web service but now I
-
don't need to save this key instead I
-
can just remember in my pass phrase and
-
whenever I need my key I can reconstruct
-
it from the key derivation function
-
question
-
yeah so that's a good question the
-
question is is the key derivation
-
function slow enough to prevent
-
brute-force guessing and the answer is
-
it depends on how long your passphrase
-
is so for example if your passphrase is
-
like the string password is probably
-
gonna get broken very quickly but as
-
long as there's enough entropy in your
-
passphrase this is good enough so like
-
if I was uploading something to Dropbox
-
and I really want it to stay secret I
-
think like a 64-bit passphrase really a
-
passphrase with 64 bits of entropy it
-
would be more than enough in that
-
scenario for example and just a quick
-
demo of this so there are tools to make
-
this really easy to do this is actually
-
one of the exercises but we can take a
-
tool for example called open SSL and use
-
it to apply a symmetric cipher to some
-
file so I had my readme text here for
-
example readme MD it has a bunch of
-
stuff in it and I can do open SSL AES
-
256 cbc this is the name of a particular
-
symmetric cipher and i can say that i
-
want to apply this to read me md and
-
produce readme dot and MD let's give it
-
some name and then it's asking you for a
-
password so by default this works in
-
this mode where I provide a passphrase
-
is run through a KDF to produce a key
-
and that's used for encryption so I'll
-
type in some password type it in again
-
and now I produce this readme MD file
-
and if I look at this it kind of looks
-
like garbage and that's at a high level
-
the point of a symmetric cipher it
-
produces some cipher text that should be
-
kind of indistinguishable from random
-
data and when I want to decrypt this I
-
can run a similar command open SSL AES
-
256 cbc dash D for decrypt take the
-
input from readme tank done MD and I
-
like do like readme dot read need
-
decrypted MD as the output and I can
-
compare these two files and the
-
correctness property of symmetric
-
cryptography tells me that this should
-
be identical and this indeed is
-
identical if I look at the return value
-
compare return 0 so that means that are
-
the same file
-
[Applause]
-
so any questions about symmetric key
-
cryptography yeah
-
so the this particular command did make
-
a new file so it took us input readme MD
-
and produces output this file so that is
-
the encrypted version of the file it
-
left the original untouched but then of
-
course I could delete it if I wanted to
-
yeah that's a good question this is
-
something I wasn't gonna talk about in
-
too much detail the question is I
-
provided the salt argument here and
-
where is that stored so the answer is
-
that that is stored in this output here
-
so this output format stores both the
-
salt and the actual output ciphertext so
-
can be used in the reconstruction and
-
decrypt yeah that's correct it doesn't
-
keep any database or anything this is
-
fully self-contained yeah and as John
-
says the salt is not the secret like the
-
the passphrase is what is the secret
-
thing here okay so let's go back to so
-
the so the question is what is salt the
-
idea of a cryptographic salt is probably
-
best explained in the context of hash
-
functions so one common application of
-
hash functions is to store passwords in
-
a password database if I have a website
-
and I have logins for users like people
-
log in with their username and password
-
I don't actually want to store people's
-
passwords in plain text just like as is
-
does anybody know why I wouldn't want to
-
do that yes
-
exactly what if there was a breach and
-
someone got all your data so it's really
-
bad if you leak all your users passwords
-
it's especially bad because a lot of
-
people reuse their passwords across
-
different sites so you'll see attackers
-
break into one thing like there was big
-
yahoo breach a while ago and they find
-
all these usernames and passwords and
-
then they go and try those same login
-
credentials on Google and on Facebook
-
and on YouTube and whatnot these people
-
reuse passwords so it's bad to store
-
plaintext passwords so one thing you
-
should do is you should store hashed
-
passwords with a hash function or
-
ideally a password hashing function
-
that's intentionally designed to be slow
-
but one thing attackers one thing
-
attacker started doing once they realize
-
that people started storing hashed
-
passwords is they built these things
-
called rainbow tables what people did
-
was they took a way of generating big
-
password lists like the kind of model
-
what passwords might look like say take
-
all the dictionary words take all
-
strings of like length from zero to
-
eight and whatnot
-
take all of those and then hash them and
-
produce a big database mapping hashes
-
back to their pre image and so given the
-
output of a hash function rather than
-
have to like brute force said on the fly
-
you can just go look up in this database
-
Oh what is the input that corresponds to
-
this output and people have built these
-
for reasonably large password databases
-
and so one thing that you can do in
-
reaction to that as a defense is rather
-
than storing in your database as your to
-
write it rather than storing just the
-
hash of the password what you do is you
-
compute what's called a salt value and
-
what this is is a large random string
-
and then what you do is you store in
-
your password database the salt which is
-
not really a secret like you can store
-
this in your password database along
-
with a hash of the password with the
-
salt appended to it why is this useful
-
well this salt is a random unique value
-
for every user and so if someone has the
-
password safe password one two three on
-
one web service if you are just storing
-
the hash of the password then the hash
-
would be the same on both Web Services
-
right because this hash
-
function is a deterministic function but
-
now since we're using this randomized
-
salt value we store the hash of the
-
password plus of salt and so even if
-
someone's using the same password on
-
multiple sites this thing looks
-
different in both cases and it makes it
-
so these big databases mapping these
-
short passwords or hash outputs to the
-
short passwords that they came from
-
those are no longer useful when you have
-
salted passwords you kind of need to do
-
the brute-force attack for every user
-
once you find their salt value rather
-
than being able to use this big
-
precomputed database does that answer
-
the question of what a salt is and so
-
that's what that salt argument is
-
related to
-
let's see so any questions about
-
anything we talked about so far great
-
okay so the I'm gonna go ahead and erase
-
this and then the last topic we'll talk
-
about is one of the most exciting
-
developments of cryptography happen
-
quite a while ago but it's still a
-
really cool concept something called a
-
symmetric key cryptography and so this
-
is an idea that actually enables a lot
-
of the security and privacy related
-
features of basically anything you use
-
today like we need to go and type in
-
like www.google.com/mapmaker flog Rafi
-
is used as part of what goes on there so
-
this is going to look pretty similar to
-
what we talked about in symmetric key
-
cryptography except with a twist there's
-
a key generation function which
-
similarly is randomized but instead of
-
producing a single key it produces a
-
pair of keys two different things one of
-
which is referred to as a public key and
-
the other is referred to as a private
-
key and then these can be used for
-
encryption and decryption in a manner
-
kind of similar to symmetric key crypto
-
except these different keys have
-
different uses now so we have an
-
encryption function which takes in a
-
plaintext Isles right P here and it
-
takes in the public key and praises the
-
ciphertext and then I have a decryption
-
function which takes in my ciphertext
-
and the private key and gives me back my
-
plaintext and then similarly to those
-
two properties we had over there given
-
just the ciphertext we can't figure out
-
the plaintext unless we have the private
-
key and then we have the obvious
-
correctness property that if we encrypt
-
something with the private key
-
sorry encrypt something with the public
-
key and then take that cypher text and
-
try decrypting it with the corresponding
-
private key that came from this key
-
generation process that outputs these
-
two different things at once then I get
-
the same result back so this is very
-
similar to what's above but there's a
-
twist that we have these two different
-
keys that have different functions and
-
it's really neat that this public key
-
can actually be made as the name
-
indicates public like I could be using a
-
crypto system that works like this post
-
a public key on the internet for anybody
-
to see but keep my private key secret
-
and then I have this interesting
-
property that anybody on the internet
-
can take any piece of content and
-
encrypt it for me using my public key
-
and send it over the Internet
-
to me and then I can decrypt it using my
-
private key and as long as my private
-
key C's stays secret it doesn't matter
-
if my public key is available to anybody
-
on the Internet so here's where the
-
asymmetry comes from before we were in a
-
scenario where like suppose I was on the
-
internet but you weren't like talking to
-
me face-to-face
-
and you wanted to send me some data over
-
the Internet over some unencrypted
-
Channel where anybody could listen on
-
what you were saying and you wanted to
-
use symmetric key cryptography well we
-
need some way of exchanging a key in
-
advance so that you could encrypt some
-
plaintext with a key and give me that
-
ciphertext over the Internet so that I
-
could done decrypt it with that key in
-
symmetric key crypto if the keys public
-
it's game over like anybody can decrypt
-
your stuff whereas an asymmetric key
-
cryptography I could take my public key
-
and like post it on a bulletin board on
-
the Internet and you can go look at that
-
take some contents and encrypt them for
-
me and then send them over and that
-
would be totally fine
-
you can only decrypt it with the private
-
key and so one analogy that may be
-
helpful is comparing these mathematical
-
ideas to physical locks so you probably
-
have a lock on your door to your house
-
and you can put in a key and like turn
-
the thing in order to lock the door or
-
you can turn it the other way to unlock
-
the door so there's a single key and it
-
can both lock and unlock the door
-
but now consider this alternative
-
construction which you might use if say
-
I want you to be able to send me a
-
message and have it be sent over the
-
internet and you I don't really need a
-
way to exchange a key with you
-
beforehand I could get a box which you
-
could put a letter inside and you can
-
close the box and I can get one of the
-
padlock things which I can give you by I
-
could like take those padlock and open
-
it and give it to you and you at your
-
own leisure could put your message
-
inside a box and take this padlock which
-
is open and shut it around the box and
-
then send it over to me and then I could
-
put in my key and unlock it so do you
-
see how there is this asymmetry there as
-
opposed to the key that I used to open
-
the door to my house where the same key
-
opens and closes the thing instead I
-
give you this open padlock that you have
-
the ability to close but not open and
-
after you closed it I can use my key
-
which I've kept secret in order to open
-
the thing and retrieve what's inside
-
maybe this analogy is helpful maybe it's
-
not the mathematical construction works
-
just fine if that works for a year so
-
any questions about a symmetric key
-
encryption and decryption and how it
-
relates to symmetric key crypto how it's
-
a little bit different so before we talk
-
about applications of this idea I'm
-
going to talk about one other set of
-
concepts in a symmetric key cryptography
-
so these crypto systems give you another
-
set of tools which are related to
-
encryption and decryption something
-
called signing and verifying and this is
-
kind of similar to the real world like I
-
can get a document and sign it with my
-
signature except real world signatures
-
are I don't think that hard to forge
-
these are pretty hard to forge and
-
therefore more useful what does
-
signature schemes look like there's a
-
function sign that takes us some message
-
and the private key so notice this this
-
is the private key not the public key
-
and it produces a signature and then
-
there's another function verify which
-
takes in the message the signature and
-
the public key this time
-
and it tells me it returns a boolean
-
whether or not the signature checks out
-
and then this pair of functions has the
-
property that again these are kind of
-
properties that follow the intuition
-
that come from physical signatures that
-
uh without the private key it's hard to
-
produce a signature for any message such
-
that you can give the message in the
-
signature and the public key to the
-
verify function to get it to return true
-
like at a high level it's hard to Forge
-
it's hard to forge a signature of course
-
without the private key and then there's
-
the obvious correctness property that if
-
you signed a thing with a public key and
-
then try verifying it with the
-
corresponding sorry if you sign a thing
-
with the private key and try to verify
-
it with the corresponding public key
-
it returns okay that this verification
-
checks out so these are two different
-
kinds of things you can do with
-
asymmetric key cryptosystems
-
an example of an asymmetric key
-
cryptosystem that you might have heard
-
of is something called RSA so RSA is
-
designed by a number of people one of
-
whom is ron rivest who's a professor
-
here so there are couple of interesting
-
applications of asymmetric key crypto
-
actually like tons and tons and tons of
-
you spend like days talking about this
-
but a couple examples are email
-
encryption so we talked a little bit
-
about sending messages what we can do
-
with asymmetric key crypto is that you
-
can have public keys posted online I
-
think some of the instructors have PGP
-
public keys on their website so for
-
example you go to my website or John's
-
website you'll find a public key and
-
then what you can do is you can send us
-
an encrypted email and so even if that
-
message goes through Gmail or whatever
-
other mail service throughout my T's
-
mail servers if there happens to be an
-
attacker snooping on the messages they
-
can't make any sense of their contents
-
because they're all encrypted and this
-
is really cool because you can do this
-
without kind of finding us in person and
-
exchanging
-
which you might have to do in a
-
symmetric key cryptosystem you can just
-
find our public key which can be posted
-
online without causing any issues and
-
then send us encrypted email another
-
thing asymmetric key crypto is used for
-
is private messaging so raise your hand
-
if you've used anything like signal or
-
telegram or I think what's up is in
-
theory antenna encrypted so a good
-
number of you these private messaging
-
applications also use asymmetric key
-
crypto to establish private
-
communication channels basically every
-
person has associated with them a key
-
pair and so your device has run this key
-
generation function produced a public
-
key and a private key and automatically
-
posted your public key to the internet
-
so for example if you're using signal
-
your public key is on the signal servers
-
and then when someone wants to contact
-
you their phone can look up your public
-
key retreat and once it's retrieve your
-
public key they can encrypt information
-
for you this is a kind of approximation
-
of how their algorithm works but at a
-
high level that's what's going on
-
another neat application of asymmetric
-
key crypto is we were talking about
-
earlier like making sure you have the
-
right software we downloaded from the
-
internet
-
asymmetric key crypto can be used to
-
sign software releases and this is
-
something that people do for example
-
like Debian packages or whatever things
-
you download from the internet the
-
developer will try to sign their
-
software so that you can make sure that
-
whatever you've downloaded from the
-
internet is actually the right thing
-
that came from the right person we
-
talked about in the get lecture all the
-
interesting things you can do with git
-
one thing we didn't cover was signing
-
related functionality and get so git has
-
commits and you can associate with
-
commits something called tags at a high
-
level you can basically take a git
-
commit and attach a signature to it
-
which binds your public key to this
-
commit and then anybody who has your
-
public key can take the commit and your
-
public key and make sure that there's a
-
legitimate signature on the key or sorry
-
on the commit so let me go to like some
-
random repository that I have I can look
-
at a bunch of tags associated with
-
repository if I do look at the raw data
-
associated with this tag
-
it has some metadata and then a blob of
-
like ascii encoded information that i
-
can use the get tagged - v4 verify
-
command to make sure that oh this is a
-
good signature from this person happens
-
to be me so I sign the software release
-
so that anybody who downloads it from
-
the Internet can make sure that they
-
actually got an authentic copy yes
-
question so the question is what exactly
-
is the verify function doing or what is
-
it checking against the like if you want
-
to know mathematically what's going on
-
talk to me
-
after this lecture but for from kind of
-
an API perspective what's going on here
-
is that the signature and also the
-
message here are just a blob of bytes
-
and it happens to be the case that these
-
things are designed such that basically
-
if I take for some particular public key
-
like if you take my public key it's
-
impossible for you without knowledge of
-
my private key for any message to find a
-
second argument to this function that
-
makes it return true you can kind of
-
compare it to signing a document like
-
you don't know how to forge my signature
-
I can take any piece of paper and sign
-
it and then anybody who knows what my
-
signature looks like I can show my
-
document - you can be like yeah that
-
checks out but nobody without the
-
private key can produce a signature that
-
will make this function return true for
-
any particular message and any related
-
questions started you want me to explain
-
any other way or does that make sense
-
so any questions about signing software
-
or any of the other handful of
-
applications talked about of asymmetric
-
key crypto well so one final thing I
-
want to talk about we're almost out of
-
time is key distribution this is a kind
-
of interesting side effect of asymmetric
-
key cryptography
-
it enables a bunch of interesting
-
functionality like I can post my public
-
key on the internet you can go find it
-
and send me encrypted email but how do
-
you know that the public key found is
-
actually my public key it seems like
-
there's a bootstrapping problem here
-
right so there are a couple this is like
-
a really interesting and really hard
-
real-world problem and there are a
-
couple different approaches you might
-
take to this problem one is kind of a
-
lame solution but this thing solves a
-
lot of cryptography problems this
-
exchange the information out-of-band
-
what that means is you want to send me
-
encrypted email we'll just talk to me
-
after class I'll give you my public key
-
on a piece of paper and since you were
-
talking to me in person like you know
-
that it's actually my public key not
-
just somebody like hacked my website and
-
stuck some random number on there that
-
solves the problem but it's not the most
-
elegant there are a couple other
-
approaches that different applications
-
use so those of you who use signal have
-
you ever encountered the phrase safety
-
number like verify your safety number
-
with so and so so with signal they have
-
a way of exchanging public keys which is
-
through the signal servers whoever runs
-
the signal service just maintains on
-
their servers basically a mapping from
-
phone numbers to public keys and when I
-
say oh I want to message this person
-
with this number my phone just goes and
-
retrieves their public key from the
-
internet and then encrypts the message
-
for that public key now does anybody see
-
a problem with the setup
-
yeah yeah exactly the signal servers our
-
point of failure there because if the
-
signal servers give me the wrong public
-
key like supposed signal just produces a
-
new key pair and give me their public
-
key now they can read all my messages
-
and they could even sit in between and
-
transparently decrypt the messages I
-
send them and then re encrypt them and
-
send them on to their final destination
-
like basically I need some way of
-
authenticating the public key I get and
-
so signal has one solution to this which
-
is also just kind of punting the issue
-
to out-of-band key exchange you can meet
-
up with somebody and they have a
-
slightly streamline flow where they show
-
QR codes on the screen you take one
-
phone and take a picture of the other
-
phone screen and vice versa
-
and now you've exchanged public keys in
-
person and from that point on you've
-
bootstrap your encrypted end-to-end
-
communication it also has an issue of or
-
it also has approach of pinning a public
-
key so once you know that a particular
-
phone number has a particular public key
-
your phone remembers that and if that
-
ever changes it'll complain to you and
-
then there are a couple other solutions
-
to this problem PGP one pop to let used
-
to be popular a while ago has this idea
-
of web of trust so like I trust people
-
who my friends trust so if like John has
-
done an out-of-band exchange with say my
-
professor then I can probably email my
-
professor because like I know that John
-
trusts my professor and I trust John so
-
you got this chain of trust through
-
there that's one interesting approach
-
and then another model that's called
-
pretty recently as something that a tool
-
called key base uses this is a really
-
neat whoops
-
there's website called key based IO and
-
they have a really interesting solution
-
to this bootstrapping problem which is
-
social proof so saying you probably have
-
your friends on Facebook and on Twitter
-
and whatnot and it's probably pretty
-
hard for an attacker to break into your
-
friend's Facebook account at the same
-
time as their Twitter account as the
-
same time as their hacker news account
-
and so on and so there's this
-
interesting way of binding public keys
-
to a set of social identities such that
-
you can retrieve a public key once you
-
trust some number of social identities
-
corresponding to your friend we have
-
links to these in the lecture notes if
-
you want to see these things in more
-
detail so that's it for our security and
-
cryptography lecture and tomorrow's
-
lecture will be on a random collection
-
of top
-
that your instructors find interesting
-
so hopefully see you in lecture tomorrow
-
I'll also be here for a couple of
-
minutes after class if anybody has
-
questions yes okay so John feel free to
-
leave if you have to leave but I think
-
nobody's using the classroom after us
-
I'm going to talk about one other
-
interesting topic so john brought up the
-
fact that a symmetric key cryptography
-
is slow and symmetric key cryptography
-
is fast and so in practice you don't
-
really use just a symmetric key
-
cryptography by itself it's usually used
-
to bootstrap a more sophisticated
-
protocol that you're using one thing you
-
might want to do is use a symmetric key
-
cryptography
-
for signing encrypted email right we
-
talked about that example and the way
-
that works isn't what you might have
-
guessed from our straightforward
-
explanation of asymmetric key crypto
-
like you don't just use that encrypt
-
function up there and call it a day in
-
practice what you do is you use hybrid
-
encryption to use a combination of
-
symmetric key and asymmetric key
-
cryptography
-
what you do is here I'll draw this as a
-
big block diagram you take your message
-
m and then I have my public key that I
-
want to encrypt for but rather than just
-
take these two things and pass it
-
through the encryption up there what I
-
do is I use the symmetric key gen
-
function to produce a symmetric key okay
-
I'm gonna like prepend this with
-
symmetric so we can distinguish it from
-
the public key key generation function
-
and then what I do is I take these two
-
things pass them through my symmetric
-
encryption box
-
this produces the ciphertext and now
-
this by itself to the sender sorry this
-
by itself to the receiver who has the
-
private key corresponding to this public
-
key here this is not really useful right
-
because this is encrypted with a
-
symmetric cipher with this key K that
-
came from this function that I ran on my
-
local machine so I need some way of
-
getting this to the person who actually
-
used to decrypt the email and so what I
-
do is I take this thing and now this
-
email might have been big and I use
-
symmetric encryption with that because
-
symmetric encryption is fast but this
-
key is small like it might be 256 bits
-
or something so I can take this thing
-
and encrypt it with a symmetric
-
encryption using the public key and this
-
gives me an encrypted key and this thing
-
can be decrypted using the private key
-
corresponding to that public key to
-
reconstruct this so this is on the
-
sender's end now the receiver gets this
-
and this and kind of does these things
-
backwards so you start with the
-
encrypted key and use asymmetric
-
decryption using your public using your
-
private key that corresponds to the
-
posted public key to reconstruct this
-
key that were used for the symmetric
-
encryption box and then use symmetric
-
key decryption using that key that was
-
reconstructed to take this ciphertext
-
and produce the original message so
-
there's a kind of interesting example of
-
how in practice symmetric and asymmetric
-
key cryptography is combined question
-
so the question is will you be using the
-
same symmetric key generators yes okay
-
so you need to kind of agree ahead of
-
time which box you're using here so you
-
might be like oh I'm going to use AES
-
256 GC up here but this is a well known
-
function and it's public like the
-
attackers allowed to know all the
-
parameters this function this is the
-
only secret thing that the attacker
-
doesn't know the key any other questions
-
yeah that's a really good question what
-
kind of data is important enough to
-
encrypt and I think that depends on your
-
threat model like who what kind of
-
attackers are you concerned about what
-
are you trying to protect against so you
-
might have the stance that you just
-
don't really care and that like anything
-
you communicate with anybody is allowed
-
to be public I might be willing to like
-
post all my conversation with everybody
-
for everybody to see publicly on the
-
Internet on the other hand maybe you're
-
doing some like security sensitive works
-
here working under a contract for the US
-
government developing some sensitive
-
military stuff if you're sending that
-
through the open Internet while you're
-
traveling you probably want to be pretty
-
darn sure that no eavesdroppers or
-
anybody else along the way can see what
-
you're sending and that whatever you're
-
sending is in fact going to the right
-
place and that whoever is receiving it
-
can authenticate that it in fact came
-
from you so you might be worried about
-
all different kinds of adversaries
-
depending on your scenario from random
-
script kiddies who are trying to break
-
into websites to nation state level
-
attackers and you'll need different
-
types of techniques for defending
-
against the different categories of
-
attackers any other questions
-
well so hopefully see some of you
-
tomorrow for a random collection of
-
things that John Jose and I find
-
interesting