Real-world stream ciphers (20 min)

Edit subtitles

0:00 - 0:04

In this segment, I want to give a few
examples of stream ciphers that are used in practice.
0:04 - 0:07

I'm gonna start with two old
examples that actually are not
0:07 - 0:11

supposed to be used in new systems.
But nevertheless, they're still fairly
0:11 - 0:14

widely used, and so I just want to mention
the names so that you're familiar with
0:14 - 0:19

these concepts. The first stream cipher I
want to talk about is called RC4, designed
0:19 - 0:23

back in 1987. And I'm only gonna give you
the high-level description of it, and then
0:23 - 0:28

we'll talk about some weaknesses of RC4
and leave it at that. So RC4 takes a
0:28 - 0:33

variable size seed, here I just gave
as an example where it would take 128
0:33 - 0:37

bits as the seed size, which would then
be used as the key for the stream cipher.
0:37 - 0:42

The first thing it does, is it expands the
128-bit secret key into 2,048 bits, which
0:42 - 0:46

are gonna be used as the internal state
for the generator. And then, once it's done
0:46 - 0:51

this expansion, it basically executes a
very simple loop, where every iteration of
0:51 - 0:56

this loop outputs one byte of output. So,
essentially, you can run the generator for
0:56 - 1:01

as long as you want, and generate one byte
at a time. Now RC4 is actually, as I said,
1:01 - 1:05

fairly popular. It's used in the HTTPS
protocol quite commonly actually.
1:05 - 1:12

These days, for example, Google uses RC4 in its
HTTPS. It's also used in WEP as we
1:12 - 1:16

discussed in the last segment, but of
course in WEP, it's used incorrectly and
1:16 - 1:19

it's completely insecure the way it's used
inside of WEP. So over the years,
1:19 - 1:24

some weaknesses have been found in RC4, and as a
result, it's recommended that new projects
1:24 - 1:29

actually not use RC4 but rather use a more
modern pseudo-random generator as we'll
1:29 - 1:34

discuss toward the end of the segment. So
let me just mention two of the weaknesses.
1:34 - 1:40

So the first one is, it's kind of bizarre
basically, if you look at the second byte
1:40 - 1:45

of the output of RC4. It turns out the
second byte is slightly biased. If RC4 was
1:45 - 1:50

completely random, the probability that the
second byte happens to be equal to zero
1:50 - 1:55

would be exactly one over 256. There are
256 possible bytes, the probability that
1:55 - 2:00

it's zero should be one over 256. It so
happens that for RC4 the probability is
2:00 - 2:04

actually two over 256, which means that if
you use the RC4 output to encrypt a
2:04 - 2:10

message the second byte is likely to not
be encrypted at all. In other words it'll
2:10 - 2:15

be XOR-ed with zero with twice the
probability that it's supposed to.
2:15 - 2:19

So two over 256, instead of one over 256.
And by the way I should say that there's
2:19 - 2:23

nothing special about the second byte. It
turns out the first and the third bytes
2:23 - 2:28

are also biased. And in fact it's now
recommended that if you're gonna use RC4,
2:28 - 2:33

what you should do is ignore basically the
first 256 bytes of the output and just
2:33 - 2:37

start using the output of the generator
starting from byte 257. The first couple
2:37 - 2:41

of bytes turned out to be biased, so you
just ignore them. The second attack that
2:41 - 2:48

was discovered is that in fact if you look
at a very long output of RC4 it so happens
2:48 - 2:54

that you're more likely to get the
sequence 00. In other words, you're more
2:54 - 2:59

likely to get sixteen bits, two bytes of
zero, zero, than you should. Again, if RC4
2:59 - 3:04

was completely random the probability of
seeing zero, zero would be exactly 1/256
3:04 - 3:09

squared. It turns out RC4 is a little
biased and the bias is 1/256 cubed. It
3:09 - 3:14

turns out this bias actually starts after
several gigabytes of data are produced by
3:14 - 3:19

RC4. But nevertheless, this is something
that can be used to predict the generator
3:19 - 3:23

and definitely it can be used to
distinguish the output of the generator
3:23 - 3:28

from a truly random sequence. Basically the
fact that zero, zero appears more often
3:28 - 3:32

than it should is the distinguisher. And
then in the last segment we talked about
3:32 - 3:36

related-key attacks that were used to
attack WEP, that basically say that
3:36 - 3:41

if one uses keys that are closely related
to one another then it's actually possible
3:41 - 3:46

to recover the root key. So these are the
weaknesses that are known of RC4 and, as a
3:46 - 3:50

result, it's recommended that new systems
actually not use RC4 and instead use a
3:50 - 3:54

modern pseudo-random generator. Okay,
second example I want to give you is a
3:54 - 3:59

badly broken stream cipher that's used for
encrypting DVD movies. When you buy a DVD
3:59 - 4:04

in the store, the actual movie is
encrypted using a stream cipher called the
4:04 - 4:08

content scrambling system, CSS. CSS turns
out to be a badly broken stream cipher,
4:08 - 4:13

and we can very easily break it, and I
want to show you how the attack algorithm
4:13 - 4:17

works. We're doing it so you can see an
example of an attack algorithm, but in
4:17 - 4:21

fact, there are many systems out there
that basically use this attack to decrypt
4:21 - 4:26

encrypted DVDs. So the CSS stream cipher
is based on something that hardware
4:26 - 4:30

designers like. It's designed to be a
hardware stream cipher that is supposed to
4:30 - 4:34

be easy to implement in hardware, and is
based on a mechanism called a linear
4:34 - 4:39

feedback shift register. So a linear
feedback shift register is basically a register
4:39 - 4:44

that consists of cells where
each cell contains one bit. Then basically
4:44 - 4:49

what happens is there are these taps into
certain cells, not all the cells, certain
4:49 - 4:54

positions are called taps. And then these
taps feed into an XOR and then at
4:54 - 4:59

every clock cycle, the shift register
shifts to the left. The last bit falls off
4:59 - 5:04

and then the first bit becomes the result
of this XOR. So you can see that
5:04 - 5:09

this is a very simple mechanism to implement, and in hardware takes very few
5:09 - 5:14

transistors. Just the shift right, the
last bit falls off and the first bit just
5:14 - 5:19

becomes the XOR of the previous
bits. So the seed for this LFSR
5:19 - 5:23

basically, is the initial state of the
LFSR.
5:24 - 5:29

And it is the basis of a number of stream
ciphers. So here are some examples. So, as
5:29 - 5:33

I said, DVD encryption uses two LFSRs.
I'll show you how that works in just a
5:33 - 5:38

second. GSM encryption, these are
algorithms called A51 and A52. And that
5:38 - 5:43

uses three LFSRs. Bluetooth encryption is
an algorithm called, E zero. These are all
5:43 - 5:49

stream ciphers, and that uses four LFSRs.
Turns out all of these are badly broken,
5:49 - 5:53

and actually really should not be trusted
for encrypting traffic, but they're all
5:53 - 5:57

implemented in hardware, so it's a little
difficult now to change what the hardware
5:57 - 6:01

does. But the simplest of these, CSS,
actually has a cute attack on it, so let
6:01 - 6:05

me show you how the attack works. So,
let's describe how CSS actually works. So,
6:05 - 6:11

the key for CSS is five bytes, namely 40
bits, five times eight is 40 bits. The
6:11 - 6:16

reason they had to limit themselves to
only 40 bits is that DVD encryption was
6:16 - 6:20

designed at a time where U.S. Export
regulations only allowed for export of
6:20 - 6:25

crpyto algorithms where the key was
only 40 bits. So the designers of CSS were
6:25 - 6:30

already limited to very, very short keys.
Just 40 bit keys. So, their design works
6:30 - 6:35

as follows. Basically, CSS uses two
LFSR's. One is a 17-bit LFSR. In other words,
6:35 - 6:41

the register contains
17 bits. And the other one is a 25-bit LFSR,
6:41 - 6:47

it's a little bit longer, 25-bit
LFSR. And the way these LFSRs are seeded
6:47 - 6:52

is as follows. So the key for the
encryption, basically looks as follows.
6:52 - 6:58

You start off with a one, and you concatenate to it the first two bytes of
6:58 - 7:03

the key. And that's the initial state of the LFSR.
7:03 - 7:08

And then the second LFSR basically is intitialized the same way.
7:08 - 7:14

One concatenated the last three bytes of
the key. And that's
7:14 - 7:20

loaded into the initial state of the LFSR.
You can see that the first two bytes are
7:20 - 7:25

sixteen bits, plus leading one, that's
seventeen bits overall, whereas the second
7:25 - 7:31

LFSR is 24 bits plus one which is 25 bits.
And you notice we used all five bits of
7:31 - 7:37

the key. So then these LFSRs are basically
run for eight cycles, so they generate
7:37 - 7:42

eight bits of output. And then they go
through this adder that does basically
7:42 - 7:48

addition modulo 256. Yeah so this is an
addition box, modulo 256. There's one more
7:48 - 7:54

technical thing that happens. In fact let's
actually—also added is the carry from the
7:54 - 8:00

previous block. But that is not so
important. That's a detail that's not so
8:00 - 8:05

relevant. OK, so every block, you notice
we're doing addition modulo 256 and
8:05 - 8:10

we're ignoring the carry, but the carry is
basically added as a zero or a one to the
8:10 - 8:15

addition of the next block. Okay? And then
basically this output one byte per round.
8:15 - 8:20

Okay, and then this byte is then of course
used, XOR-ed with the appropriate
8:20 - 8:25

byte of the movie that's being encrypted.
Okay, so it's a very simple stream
8:25 - 8:30

cipher, it takes very little hardware to
implement. It will run fast, even on very
8:30 - 8:36

cheap hardware and it will encrypt movies.
So it turns out this is easy to break
8:36 - 8:41

in time roughly two to the seventeen. Now let me show you how.
8:41 - 8:46

So suppose you
intercept the movies, so here we have an
8:46 - 8:51

encrypted movie that you want to decrypt.
So let's say that this is all encrypted so
8:51 - 8:55

you have no idea what's inside of here.
However, it so happens that just because
8:55 - 9:00

DVD encryption is using MPEG files, it so
happens if you know of a prefix of the
9:00 - 9:04

plaintext, let's just say maybe this is
twenty bytes. Well, we know if you
9:04 - 9:09

XOR these two things together, so in other words, you do the XOR here,
9:09 - 9:14

what you'll get is the initial segment of the PRG. So, you'll get the
9:14 - 9:18

first twenty bytes of the output of CSS,
the output of this PRG. Okay, so now
9:18 - 9:24

here's what we're going to do. So we have
the first twenty bytes of the output. Now
9:24 - 9:31

we do the following. We try all two to the
seventeen possible values of the first
9:31 - 9:37

LFSR. Okay? So two to the seventeen
possible values. So for each value, so for
9:37 - 9:43

each of these two to the seventeen initial
values of the LFSR, we're gonna run the
9:43 - 9:48

LFSR for twenty bytes, okay? So we'll
generate twenty bytes of outputs from this
9:48 - 9:53

first LFSR, assuming—for each one of the
two to the seventeen possible settings.
9:53 - 9:59

Now, remember we have the full output of
the CSS system. So what we can do is we
9:59 - 10:04

can take this output that we have. And
subtract it from the twenty bites that we
10:04 - 10:09

got from the first LFSR, and if in fact our
guess for the initial state of the first
10:09 - 10:14

LFSR is correct, what we should get
is the first twenty-byte output of the
10:14 - 10:19

second LFSR. Right? Because that's by
definition what the output of the CSS
10:19 - 10:25

system is. Now, it turns out that looking
at a 20-byte sequence, it's very easy
10:25 - 10:30

to tell whether this 20-byte sequence
came from a 25-bit LFSR or not. If it
10:30 - 10:34

didn't, then we know that our guess
for the 17-bit LFSR was
10:34 - 10:37

incorrect and then we move on to the next
guess for the 17-bit LFSR and
10:37 - 10:42

the next guess and so on and so forth.
Until eventually we hit the right initial
10:42 - 10:47

state for the 17-bit LFSR, and
then we'll actually get, we'll see that
10:47 - 10:52

the 20 bytes that we get as the
candidate output for the 25-bit LFSR is
10:52 - 10:57

in fact a possible output for a 25-bit LFSR. And then, not only will we have
10:57 - 11:02

learned the correct initial state for the
17-bit LFSR, we will have also
11:02 - 11:08

learned the correct initial state of the
25-bit LFSR. And then we can predict the
11:08 - 11:13

remaining outputs of CSS, and of course,
using that, we can then decrypt the rest of
11:13 - 11:18

the movie. We can actually recover the
remaining plaintext. Okay. This is
11:18 - 11:22

things that we talked about before. So, I
said this a little quick, but hopefully,
11:22 - 11:27

it was clear. We're also going to be doing
a homework exercise on this type of stream
11:27 - 11:31

ciphers and you'll kind of get the point
of how these attack algorithms
11:31 - 11:36

work. And I should mention that there are
many open-source systems now that actually
11:36 - 11:41

use this method to decrypt CSS-encrypted
data. Okay, so now that we've seen two
11:41 - 11:46

weak examples, let's move on to better
examples, and in particular the better
11:46 - 11:49

pseudo-random generators come from what's
called the eStream Project. This is a
11:49 - 11:56

project that concluded in 2008, and they
qualify basically five different stream
11:56 - 12:00

ciphers, but here I want to present just
one. So first of all the parameters for
12:00 - 12:04

these stream ciphers are a little
different than what we're used to. So these
12:04 - 12:08

stream ciphers as normal they have a seed.
But in addition they also have, what's
12:08 - 12:13

called a nonce and we'll see what a
nonce is used for in just a minute. So
12:13 - 12:17

they take two inputs a seed and a nonce.
We'll see what the nonce is used for in
12:17 - 12:21

just a second. And the of course they
produce a very large output, so n here is
12:21 - 12:27

much, much, much bigger than s. Now, when
I say nonce, what I mean is a value that's
12:27 - 12:31

never going to repeat as long as the key
is fixed. And I'll explain that in more
12:31 - 12:35

detail in just a second. But for now, just
think of it as a unique value that never
12:35 - 12:41

repeats as long as the key is the same.
And so of course once you have this PRG,
12:41 - 12:45

you would encrypt, you get a stream cipher
just as before, except now as you see, the
12:45 - 12:50

PRG takes as input both the key and the
nonce. And the property of the nonce is
12:50 - 12:56

that the pair, k comma r, so the key comma
the nonce, is never—never repeats. It's
12:56 - 13:03

never used more than once. So the bottom
line is that you can reuse the key, reuse
13:03 - 13:10

the key, because the nonce makes the
pair unique, because k and r are only
13:10 - 13:16

used once. I'll say they're unique. Okay
so this nonce is kind of a cute trick that
13:16 - 13:22

saves us the trouble of moving to a new
key every time. Okay, so the particular
13:22 - 13:26

example from the eStream that I want to
show you is called Salsa twenty. It's a
13:26 - 13:30

stream cipher that's designed for both
software implementations and hardware
13:30 - 13:33

implementations. It's kind of interesting.
You realize that some stream ciphers are
13:33 - 13:39

designed for software, like RC4.
Everything it does is designed to make
13:39 - 13:43

software implementation run fast, whereas
other stream ciphers are designed for
13:43 - 13:48

hardware, like CSS, using an LFSR that's
particularly designed to make hardware
13:48 - 13:51

implementations very cheap. It's also, the
nice thing about that is that it's
13:51 - 13:55

designed so that it's both easy to
implement it in hardware and its software
13:55 - 14:00

implementation is also very fast. So let
me explain how Salsa works. Well, Salsa
14:00 - 14:05

takes either 128 or 256-bit keys. I'll
only explain the 128-bit version of Salsa.
14:05 - 14:11

So this is the seed. And then it also
takes a nonce, just as before, which
14:11 - 14:15

happens to be 64 bits. And then it'll
generate a large output. Now, how does it
14:15 - 14:21

actually work? Well, the function itself
is defined as follows. Basically, given
14:21 - 14:26

the key and the nonce, it will generate a
very long, well, a long pseudorandom
14:26 - 14:31

sequence, as long as necessary. And it'll do
it by using this function that I'll denote by
14:31 - 14:36

H. This function H takes three inputs.
Basically the key. Well, the seed k,
14:36 - 14:40

the nonce r, and then a counter that
increments from step to step. So it goes
14:40 - 14:45

from zero to one, two, three, four as long as
we need it to be. Okay? So basically,
14:45 - 14:50

by evaluating this H on this k r, but using
this incrementing counter, we can get a
14:50 - 14:55

sequence that's as long as we want. So all
I have to do is describe how this function
14:55 - 14:59

H works. Now, let me do that here for you.
The way it works is as follows. Well, we
14:59 - 15:05

start off by expanding the states into
something quite large which is 64 bytes
15:05 - 15:10

long, and we do that as follows. Basically
we stick a constant at the beginning, so
15:10 - 15:16

there's tao zero, these are four bytes,
it's a four byte constant, so the spec for
15:16 - 15:21

Salsa basically gives you the value for
tao zero. Then we put k in which is
15:21 - 15:25

sixteen bytes. Then we put another
constant. Again, this is four bytes. And
15:25 - 15:31

as I said, the spec basically specifies
what this fixed constant is. Then we put
15:31 - 15:37

the nonce, which is eight bytes. Then we
put the index. This is the counter zero,
15:37 - 15:43

one, two, three, four, which is another
eight bytes. Then we put another constant
15:43 - 15:49

tau two, which is another four bytes.
Then we put the key again, this is another
15:49 - 15:55

sixteen bytes. And then finally we put the
third constant, tau three, which is
15:55 - 16:00

another four bytes. Okay so as I said, if you
sum these up, you see that you get 64
16:00 - 16:05

bytes. So basically we've expanded the key
and the nonce and the counter into 64
16:05 - 16:11

bytes. Basically the key is repeated twice
I guess. And then what we do is we apply a
16:11 - 16:16

function, I'll call this functional little h.
Okay, so we apply this function, little h.
16:16 - 16:22

And this is a function that's one to one
so it maps 64 bytes to 64 bytes. It's a
16:22 - 16:26

completely invertible function, okay? So
this function h is, as I say, it's an
16:26 - 16:30

invertable function. So given the input
you can get the output and given the
16:30 - 16:35

output you can go back to the input. And
it's designed specifically so it's a- easy
16:35 - 16:40

to implement in hardware and b- on an x86,
it's extremely easy to implement because
16:40 - 16:44

x86 has this SSE2 instruction set which
supports all the operations you need to do
16:44 - 16:49

for this function. It's very, very fast.
As a result, Salsa has a very fast stream
16:49 - 16:53

cipher. And then it does this basically
again and again. So it applies this
16:53 - 16:58

function h again and it gets another 64
bytes. And so on and so forth, basically
16:58 - 17:05

it does this ten times. Okay so the whole
thing here, say repeats ten times, so
17:05 - 17:18

basically apply h ten times. And then by
itself, this is actually not quite random.
17:18 - 17:22

It's not gonna look random because like we
said, H is completely invertable. So given
17:22 - 17:26

this final output, it's very easy to
just invert h and then go back to the original
17:26 - 17:32

inputs and then test that the input has
the right structure. So you do one more
17:32 - 17:37

thing, which is to basically XOR the
inputs and the final outputs. Actually,
17:37 - 17:42

sorry. It's not an XOR. It's actually an
addition. So you do an addition word by
17:42 - 17:48

word. So if there are 64 bytes, you do a
word-by-word addition four bytes at a
17:48 - 17:53

time, and finally you get the 64-byte
output, and that's it. That's the whole
17:53 - 17:57

pseudo-random generator. So that, that's
the whole function, little h. And as I
17:57 - 18:02

explained, this whole construction here is
the function big H. And then you evaluate
18:02 - 18:06

big H by incrementing the counter I from
zero, one, two, three onwards. And that
18:06 - 18:10

will give you a pseudo-random sequence
that's as long as you need it to be. And
18:10 - 18:15

basically, there are no signifigant
attacks on this. This has security that's
18:15 - 18:20

very close to two to the 128. We'll see
what that means more precisely later on.
18:20 - 18:25

It's a very fast stream cipher, both in
hardware and in software. And, as far as
18:25 - 18:30

we can tell, it seems to be unpredictable
as required for a stream cipher. So I
18:30 - 18:35

should say that the eStream project
actually has five stream ciphers like
18:35 - 18:39

this. I only chose Salsa 'cause I think
it's the most elegant. But I can give you
18:39 - 18:44

some performance numbers here. So you can
see, these are performance numbers on a
18:44 - 18:49

2.2 gigahertz, you know, x86 type machine.
And you can see that RC4 actually is the
18:49 - 18:53

slowest. Because essentially, well it
doesn't really take advantage of the
18:53 - 18:57

hardware. It only does byte operations.
And so there's a lot of wasted cycles that
18:57 - 19:01

aren't being used. But the E-Stream
candidates, both Salsa and other
19:01 - 19:05

candidate called Sosemanuk. I should say these
are eStream finalists. These are
19:05 - 19:10

actually stream ciphers that are approved
by the eStream project. You can see that
19:10 - 19:14

they have achieved a significant rate.
This is 643 megabytes per second on this
19:14 - 19:18

architecture, more than enough for a movie
and these are actually quite impressive
19:18 - 19:22

rates. And so now you've seen examples of
two old stream ciphers that shouldn't be
19:22 - 19:27

used, including attacks on those stream ciphers.
You've seen what the modern stream ciphers
19:27 - 19:30

look like with this nonce. And you
see the performance numbers for these
19:30 - 19:35

modern stream ciphers so if you happen to
need a stream cipher you could use one of
19:35 - 19:38

the eStream finalists. In particular you
could use something like Salsa.

Title:: Real-world stream ciphers (20 min)
Video Language:: English

	datadevil edited English subtitles for Real-world stream ciphers (20 min)
	datadevil edited English subtitles for Real-world stream ciphers (20 min)
	datadevil edited English subtitles for Real-world stream ciphers (20 min)
	jason.orendorff edited English subtitles for Real-world stream ciphers (20 min)
	jason.orendorff edited English subtitles for Real-world stream ciphers (20 min)
	jason.orendorff edited English subtitles for Real-world stream ciphers (20 min)
	jason.orendorff edited English subtitles for Real-world stream ciphers (20 min)
	jason.orendorff edited English subtitles for Real-world stream ciphers (20 min)

Show all

English subtitles

Revisions

Revision 6

datadevil

Real-world stream ciphers (20 min)

Revisions

Our website uses cookies

Operating cookies (Required)