-
Herald: Our next talk is on "the plain simple
reality of entropy", and
-
we all know that you need randomness and entropy
-
if you want to do something like encryption
-
or generate keys.
-
And if you don't want to do it the xkcd way,
-
using only 4 as the random number,
-
you need a cryptographically secure pseudorandom
number generator,
-
and what is this, how it works,
-
and where you can find one,
-
will be the topic of this talk.
-
So I present to you Filippo Valsorda,
-
on "How I learned to stop worrying and love
urandom".
-
applause
-
FV: Hello. Okay, I'm very glad so many people
showed up,
-
even if I essentially gave away the entire
content of the talk
-
in the description.
-
Want me to stop, to leave, something?
-
Okay? No.
-
Anyway, hi! I'm Filippo Valsorda,
-
I work for CloudFlare,
-
I do cryptography and systems engineering,
-
I recently implemented the DNSSEC implementation
-
of the CloudFlare DNS server,
-
and maybe in April 2014, you used my Heartbleed
test.
-
Anyway,
-
applause
-
Well, thank you!
-
Okay. Anyway, I'm here to tell you about random
bytes.
-
So, here are some random bytes.
-
They're pretty good, you can use them.
-
laughter
-
But, if you need more,
-
Amazon sells this excellent book
-
full of a million random numbers.
-
Anyway. More seriously,
-
random numbers are central to a lot of
-
the security protocols and systems of our
modern technology.
-
The most obvious is encryption keys.
-
You obviously want your encryption key to
be random,
-
to be really hard to predict,
-
and you want your encryption key to be different
-
from the person next to you.
-
Unless you're doing key escrow,
-
which, well, we don't point.
-
So, a lot of other different systems
-
use randomness to prevent all kinds of attack.
-
In particular, one amongst many,
-
DNS, using random query IDs
-
to prevent Kaminsky attacks.
-
So, what makes a stream, a source of random
bytes, good?
-
What are we looking for when we look for good
random bytes?
-
First of all, we look for uniform random bytes.
-
Every time we draw a random byte from our
random byte source
-
we want to have the same probability
-
to get all values, from 0 to 255.
-
For example, you don't want your distribution
to look like this.
-
This is RC4.
-
But that's not enough.
-
You also want your random bytes to be completely
unpredictable.
-
And here is where the task actually becomes
difficult.
-
Because if you think about it, we are programming
computers.
-
Computers are very deterministic machines,
-
even if they don't feel like they are.
-
And they're essentially machines built
-
to sequentially execute always the same set
of instructions,
-
which we call code.
-
And when we ask them, at some point,
-
to do something that is completely different
-
every time they do it,
-
and two equal computers should do it differently,
-
we get in trouble.
-
So, where can a computer source this randomness?
-
Where can a computer find unpredictability,
-
if it can't have its own?
-
Obviously, in our messy meat world.
-
In our physical world,
-
where everything is not always happening the
same way.
-
So, user input: every time you type on the
keyboard,
-
you do that with different timings.
-
When you move your mouse around,
-
you do that differently every time.
-
Or, simply, reading from disk.
-
Every time your computer reads something from
disk,
-
it takes a slightly different amount of time.
-
Or interrupt times, I/O, you get the idea.
-
So, all these events are visible to the kernel.
-
The kernel is the component of your system
-
which is controlling all these interactions
-
with the outside world, and can measure them
-
and observe them with the right precision.
-
And each of these events can have
-
a wide or narrow range of possible values,
-
for example, when you read from disk,
-
it might take from 0.17 nanoseconds to 1.3
nanoseconds.
-
I made numbers up.
-
How wide this range is what we call entropy.
-
Essentially it is how many different values,
-
how spread apart the values are,
-
which also means how easy they are to predict.
-
But something they definitely aren't is uniform.
-
Because as I said, for example, reading from
disk
-
might take in a specific range,
-
definitely not from 0 to 255 nanoseconds.
-
That would be...
-
And usually they're not enough to satisfy
-
all our random bytes needs.
-
So, now we have some unpredictability.
-
We have some events that we can see from our
system,
-
and we want to turn that into a stream of
random bytes
-
that we can use to generate SSH keys,
-
and DNS query IDs, etc.
-
Enter a CSPRNG.
-
Cryptographers like their acronyms very long.
-
It's a cryptographically secure pseudorandom
number generator.
-
applause
-
It's not that hard to pronounce!
-
Okay, it is.
-
Anyway, it's nothing else than a cryptographic
tool
-
that takes some input and generates an unlimited,
-
reasonably unlimited,
-
stream of random uniform bytes,
-
which depend on all and only the input you
gave to the CSPRNG.
-
So you can see how we can use this.
-
We have this amount of random events,
-
we feed that into the CSPRNG,
-
and we get out random bytes that we can use
for everything.
-
So, to understand how a CSPRNG works,
-
I decided to simply present you with a very
simple one.
-
One based on hash functions.
-
I assume that everyone in the hall
-
knows essentially what hash functions are.
-
But the properties we care about today of
hash functions are:
-
The fact that the output is uniform.
-
If you take the output of a hash function,
-
all the bits should be indistinguishable from
random,
-
if you don't know the input.
-
It's impossible to reverse a hash function.
-
If I give you the output of a hash function,
-
you should know nothing more than before
-
about what the input of the hash function
is,
-
unless you can specifically figure out the
input
-
and try the hash function.
-
And finally, it takes a limited amount of
input,
-
and makes a fixed amount of output.
-
These are the properties we are going to use
-
to build a CSPRNG out of hash functions.
-
So. This is how it works.
-
We start with a pool.
-
We call "pool" an array of bytes,
-
and we fill it with zeros to start.
-
And every time a new event comes in,
-
for example, you moved the mouse around,
-
we take that event,
-
we serialize it to some binary format,
-
doesn't really matter.
-
For example, mouse is at position 15 to 835.
-
And we hash together,
-
we hash the concatenation of our pool,
-
which for now is just zeros,
-
and the serialization of this event.
-
We hash them together, we get an output,
-
and that's our new value of the pool.
-
And we repeat.
-
Now, instead of zeros, we have the output
from before.
-
Now we have this output, and a new event happens.
-
A disk read happens, and it takes exactly
1.27589 nanoseconds.
-
And we hash together the old contents of the
pool,
-
this information, disk read happened and it
took this amount of time,
-
we hash them together and we get a new value
of the pool.
-
You see where this is going.
-
We keep doing this.
-
Every time a new event comes in,
-
every time the mouse moves,
-
every time a CPU interrupt is raised,
-
every time disk read happens,
-
we call this stirring function
-
to mix this event into this pool.
-
And what do we end up with?
-
We end up with what we call an entropy pool.
-
Now, to figure this value, you need exactly
all the events
-
that lead to this value.
-
If you're an attacker, and you really want
to figure out
-
what my entropy pool is,
-
you don't, you're not supposed to have any
better way
-
to figure it out than to guess all the different
-
hard disk timings and mouse movements that
happened
-
all the way up to now.
-
Okay? So now we have this essentially unpredictable
value,
-
but now we want to generate keys out of it,
-
and we can't just use these few bytes here.
-
So we can use again hash functions.
-
Same hash function.
-
We take the entropy pool, and we hash it with
a counter.
-
You want 5000 random bits? Sure.
-
You hash entropy pool and 0,
-
hash entropy pool and 1, and 2, 3, 4, 5, 6,
7, 8, 9.
-
You get all these outputs, you concatenate
them,
-
and now you have 5000 bits, which are as unpredictable
-
as all the events that were stirred into the
pool.
-
Let's think about it for a second.
-
We said that hash functions are not invertible,
-
so even if you know one of the outputs,
-
you can't get back to the entropy pool.
-
And we said that hash functions have,
-
that with hash functions, all the bits in
input
-
affect all the bits of the output.
-
So even if just the counter changes
-
between one rand and the other,
-
the output is completely unrelated.
-
So, did we get what we want?
-
It's uniform, because we said before,
-
hash functions' outputs are uniform.
-
It's unpredictable, because the only way an
attacker has
-
to figure out what the output will be
-
is imagine or brute-force or observe, I guess,
-
all the hard-disk timings and user inputs,
-
which is impossible for a third party.
-
And it's unlimited, because we can keep
-
incrementing that counter forever.
-
Now, really please don't go implement this
scheme
-
and say "Filippo told me it was okay".
-
No.
-
Also because it's exactly not what this talk
is about.
-
So, if CSPRNGs, if we have this tool
-
to turn some unpredictable events
-
into an unlimited stream of random bytes,
-
which is what we need,
-
and we have all these unpredictable events
-
observed by the kernel,
-
doesn't it make sense to just put a CSPRNG
in the kernel
-
and just have the kernel run the CSPRNG
-
when we need random bytes?
-
It's such a good idea that it's exactly what
Linux did,
-
and all the other operating systems.
-
In Linux, it's called /dev/urandom,
-
and it looks like a file, you read it like
a file,
-
and it's technically a character device
-
and every time you read 100 bytes from it,
-
it runs a CSPRNG, on an entropy pool
-
not different from the one I've presented,
-
and this entropy pool is stirred with all
the events
-
that the kernel saw happen from its privileged
position.
-
Other operating systems have something similar.
-
OS X and BSD have /dev/random
-
which is exactly what /dev/urandom is on Linux,
-
and on Windows you can get something similar
-
with a CryptGenRandom call.
-
One last thing.
-
Putting the CSPRNG in the kernel
-
is not only about convenience,
-
it's also about security.
-
Because, first of all,
-
the kernel is the entity that can observe
-
the unpredictable events.
-
If you take a CSPRNG, which is just code,
-
so you can implement your own,
-
and you implement it in your library,
-
or in your application,
-
now you have the problem of,
-
how do you take the random, the unpredictable
events
-
from the kernel and take them to the application?
-
This is something that you can forget to do,
-
often,
-
or do wrong.
-
And, moreover, the kernel can protect
-
the memory space of the entropy pool
-
much better than applications.
-
For example, applications can fork,
-
there's a whole lot of different things
-
that applications can get wrong.
-
And finally, you have one single centralized
implementation
-
that is reasonably easy to audit.
-
I don't know, was anyone managing Debian servers
in 2008?
-
laughter
-
Just asking. Unrelated. Right.
-
So, yeah. /dev/urandom.
-
So, we have a solution, right?
-
We have a tool to turn unpredictable events
-
into an unlimited uniform stream of random
bytes,
-
we have a source of unpredictable events,
-
solved!
-
What are everybody talking about?
-
Why is there even a need for a talk?
-
Well. Sadly, there's some common misconceptions
in the field,
-
which is also why I'm here to give this talk.
-
One of the most common is fueled by the very
Linux man pages.
-
The recent versions are better but
-
they still give you this impression
-
that if you want real security,
-
you should be using /dev/random,
-
because /dev/urandom is okay, but, hmm, kinda...
-
and, well, we want real security, right?
-
But you might ask yourself, okay,
-
if /dev/urandom is a CSPRNG,
-
and a CSPRNG is all I need,
-
what else can I get,
-
what does /dev/random have more?
-
Well, the idea of this talk is giving you
the knowledge
-
to figure out by yourself whether you need
/dev/random or not.
-
So, first I explained how a CSPRNG works,
-
now I'm going to go a bit into the details
-
of how /dev/urandom and /dev/random work.
-
These are taken directly from the kernel source.
-
Both /dev/urandom and /dev/random are...
-
Yeah. Sorry.
-
Essentially, everything I'm going to say now
-
applies to both /dev/urandom and /dev/random.
-
They both are based on a pool of 4000 bits,
-
not dissimilar from the one of the CSPRNG
we played with before,
-
which is implemented as a series of 32-bits
words, I think.
-
The pool is mixed with all the unpredictable
events,
-
using a CRC-like function.
-
This is not a cryptographically secure hash
function,
-
but this is just about how the unpredictable
events,
-
the interrupts, the disk timings, are stirred
-
into the internal pool.
-
Every time one of these events happens,
-
this very fast function kicks in
-
and stirs the pool with the unpredictable
event.
-
Then extraction, so actual random byte generation,
-
happens with SHA-1.
-
So you want some random bytes from the kernel,
-
what the kernel does is just run SHA-1 on
the pool,
-
give you the output,
-
and also take the output and feed it back
into the pool
-
using that mixing function.
-
This is a big difference, you might have noticed,
-
from our design, which is a counter,
-
because keeping counters, turns out, is still
hard.
-
And they can reset, you can lose count, that's
bad.
-
Also, this has more security properties against
compromise.
-
So what is does is simply that,
-
when it generates output, it also stirs it
back,
-
and if you need more output,
-
SHA-1 again on the new pool
-
gives output and stirs it back into the pool,
-
so that the pool keeps changing.
-
Now, both /dev/urandom and /dev/random
-
do the exact same thing.
-
Same code, same sizes, same entropy sources,
-
literally in the source,
-
random_read is a call to extract_entropy_user,
-
urandom_read is a call to extract_entropy_user.
-
The only difference is
-
I finally get to what's special about /dev/random,
-
is that it tries to do a couple of really
hard and weird things.
-
First, it tries to guess how many bits of
entropy
-
were mixed into the pool after each unpredictable
event.
-
This is already very hard, because, think
about it,
-
a disk read took 1.735 nanoseconds. Great.
-
We don't know how many different values this
might take.
-
We don't know if this is a spinning hard disk,
-
which has timings all over the place,
-
or if it's a SSD which almost always takes
the same time.
-
So we don't know how much predictable this
is.
-
So this is already hard,
-
figuring out how unpredictable the pool is.
-
So it keeps a counter, arbitrary number of
how many bits
-
of entropy, how much unpredictability
-
there is in this pool.
-
And then, when you run the hash function on
the pool,
-
it decreases this count,
-
it reduces this number.
-
And if this number gets too low, it blocks
you.
-
So you're reading from /dev/random,
-
this number dwindles,
-
so now you're still reading from /dev/random
-
but you're blocked
-
until more unpredictable events happen.
-
This is useless in the modern world.
-
Because entropy does not decrease.
-
Entropy does not run out, and everything freezes.
-
Once the pool becomes unpredictable
-
because too many different events contributed
-
to how the entropy pool looks like,
-
it's forever unpredictable,
-
because the attacker doesn't learn anything
from the output.
-
Obviously, unless the CSPRNG is broken
-
and is leaking information about the entropy
pool.
-
However, saying that CSPRNGs are broken
-
is equivalent to saying that a lot of cryptography
constructs are broken.
-
It's saying that stream ciphers are broken,
-
it's saying that CTR mode is broken,
-
it's saying that TLS and PGP are broken,
-
because they're both about reusing the same
key
-
for multiple packets or messages.
-
So if cryptographers didn't know how to build
-
a secure CSPRNG,
-
it would mean that cryptographers weren't
able
-
to build most of the things we're relying
on today.
-
It would mean that cryptography was doomed.
-
Now, I'm not DJB, I can't tell you if cryptography
is doomed.
-
But I can tell you that if cryptography is
doomed,
-
your problem is not your CSPRNG.
-
laughter
-
So, cryptography relies on being able
-
to build secure CSPRNGs.
-
And on the other hand,
-
that makes /dev/random blocking useless, obviously.
-
It can be unacceptable, too, because
-
you get a TLS request, and you're like
-
"I have that HTTP page, but wait a second,
-
I need someone to start typing
-
on the keyboard of the rack to serve it to
you."
-
And it can even be dangerous,
-
because you're essentially giving away information
-
about what other users in the system are doing
-
to other users.
-
On the other hand,
-
/dev/urandom is safe for any cryptography
use
-
you want to use it for.
-
You want to generate long-term keys...
-
My GPG keys are generated from /dev/urandom.
-
And I'm not the only one saying this,
-
BoringSSL, Python, Go, Ruby, use /dev/urandom
-
as the only source, the only CSPRNG.
-
Sandstorm even replaces /dev/random with it.
-
And here is a long list of people
-
saying exactly what I'm here on stage to tell
you.
-
So, I hope that at the end of this, you see
-
that you don't actually need /dev/random,
-
as well as you don't need to keep measuring
-
how much entropy you have in the pool,
-
you don't need to refill the pool
-
with things like haveged or
-
I don't know how to pronounce it.
-
Actually I've even seen people take output
from /dev/urandom
-
and pipe it back as root into /dev/random
-
so that the entropy doesn't run out,
-
which is exactly what the kernel is doing!
-
Which is, obviously, a pretty upvoted answer
on StackOverflow.
-
laughter
-
Anyway. And finally,
-
random number quality does not decrease,
-
there are not like premium-level random numbers
-
and then they kinda rot after you use them
for a while.
-
No, that's not a thing.
-
Okay. So, there is only one small case
-
in which /dev/urandom does not do exactly
-
what we would expect it to do, which is early
at boot.
-
If you think about it,
-
everything we said is about using unpredictable
events
-
to build up unpredictability.
-
As soon as you boot the machine,
-
you don't have observed enough events yet.
-
So this got embedded devices,
-
this got the Raspberry Pi recently,
-
essentially it's a Linux shortcoming,
-
which by now it's too late to fix,
-
which is the fact that /dev/urandom will not
block
-
even at boot, before being initialized.
-
The solution in most cases is just that the
distribution
-
should save the state of the pool at power-off,
-
and reload it at power-on,
-
or block until the pool is initialized.
-
So, your distribution probably solves this
for you anyway.
-
So, to sum up, CSPRNGs are pretty cool, and
they work.
-
You don't need /dev/random.
-
You shouldn't use userspace CSPRNGs
-
because they're very easy to get wrong.
-
And if you need 100 random bytes,
-
read 100 bytes from /dev/urandom.
-
That's it!
-
applause
-
I glossed over a lot of different ways to
do it wrong,
-
so if you have questions about why not this
other thing,
-
please, come forward.
-
Herald: Okay, and because people on the stream
can't be here in person,
-
we will start with questions from the Internet.
-
Q: The first question is: How do you explain,
-
regarding what you explained of /dev/random
versus /dev/urandom,
-
the fact that on the 4.3.3 kernel, /dev/random
output
-
is identical with something from /dev/input
something?
-
Someone claimed that.
-
FV: I'm sorry, you have to repeat. On the
what?
-
Q: On a kernel 4.3.3, someone claims that
sometimes,
-
the output from /dev/random, or /dev/unrandom,
-
is identical to something that comes from
/dev/input,
-
like an input device.
-
FV: I'm not sure I got what system, but...
-
oh my god, what system?
-
Q: Linux, Linux 4.3.3, the guy claims.
-
FV: That sounds like a pretty bad bug, but...
-
I don't know.
-
If that's the case, I'm not aware of it,
-
because I read the kernel source,
-
and it's really a call to extract_entropy_user.
-
File a bug report, maybe?
-
No, I mean, I'm joking.
-
I would be happy to talk about this offline.
-
Herald: Is there another question from the
stream?
-
Q: Yes, I have two more questions.
-
One is: What do you think about hardware entropy
generators,
-
or hardware random generators?
-
FV: Aha! I have a slide for this!
-
laughter
-
So, hardware random number generators, very
quickly.
-
Some CPUs on some platforms have real random
number generators.
-
Essentially, I'm told they use electrical
noises
-
to give you actual randomness.
-
Linux has support for them, and, if they're
loaded,
-
they will immediately be used to refuel this
pool,
-
and they will also be used as the initialization
vectors
-
for the SHA-1 of this extraction.
-
So, if they're turned on, you don't have to
worry about them,
-
and they will make /dev/urandom work even
better.
-
Yep.
-
applause
-
Herald: Okay, quick question from the stream.
-
Q: Yeah, someone wants your opinion about
entropy-gathering daemons
-
like havege daemons.
-
FV: There was probably a time when they had
-
their reason to exist, maybe because Linux
implementations
-
of this entropy gathering was not that good,
-
today they don't really have reason to exist.
-
Herald: Okay, thank you. And microphone 4,
please.
-
Q: Hello. I wanted to ask about the early
boot problem.
-
You say that we should mix, that we should
save the state
-
of /dev/urandom, what happens if a machine
crashes?
-
Wouldn't you restart from an earlier state
of /dev/urandom?
-
FV: Hm, yeah, I think that the correct way
to do this is,
-
as soon as, even before the input is used
to initialize the pool,
-
the one from the last shutdown,
-
it should be deleted from the disk,
-
and the disk flushed.
-
Yes, it's kind of hard, yes.
-
Herald: Okay, and unfortunately we are running
out of time,
-
because we have to clear this room. And you
have a short announcement?
-
FV: Oh, yeah! Tomorrow, at 15.30, I am giving
a quick workshop
-
about how to implement a Vaudenay padding
oracle attack in Hall 14.
-
I think it doesn't take as many people as
are here now...
-
so maybe I shouldn't have said that.
-
Herald: Okay, then! Thanks again, Filippo,
for the talk.
-
applause
-
subtitles created by c3subtitles.de
Join, and help us!