FactHacks [29c3]

Edit subtitles

0:09 - 0:12

Today we have Daniel J. Bernstein,
0:12 - 0:15

Nadia Heninger,
0:15 - 0:17

and Tanja Lange.
0:17 - 0:20

And they will talk about RSA factorization in the real world.
0:20 - 0:22

Talk is called FactHacks.
0:22 - 0:30

applause
0:30 - 0:32

Nadia: Thank you.
0:32 - 0:37

So we're going to start with a crypto lecture in 5 minutes.
0:37 - 0:41

So, just to begin, since we're talking about RSA.
0:41 - 0:44

Here is a picture of RSA for you.
0:44 - 0:49

RSA is the first published public-key cryptosystem.
0:49 - 0:51

So by a public-key cryptosystem, what I mean is
0:51 - 0:54

a cryptosystem that has two keys intead of just having
0:54 - 0:57

one key. You might think "ok you encrypt a message
0:57 - 0:59

to that key and you decrypt a message to that key"
0:59 - 1:01

That is a symmetric cryptosystem.
1:01 - 1:03

A public-key cryptosystem has two keys.
1:03 - 1:06

One is the public key and one is the private key.
1:06 - 1:08

The public key you publish to the world and
1:08 - 1:10

anyone can use that key to encrypt a message,
1:10 - 1:11

especially to you.
1:11 - 1:14

And only you, who have the private key, can decrypt that.
1:14 - 1:19

So the RSA paper was the first published public-key cryptosystem.
1:19 - 1:21

That was in 1977.
1:21 - 1:23

This revolutionized cryptography and enabled
1:23 - 1:26

the development of the Internet.
1:26 - 1:32

So 35 years later RSA is still the most commonly used public-key cryptosystem.
1:32 - 1:36

If you ever use the Internet, you probably use it every day.
1:36 - 1:42

So here is a sample SSL certificate which uses RSA.
1:42 - 1:45

If you use SSH you probably have an SSH key
1:45 - 1:48

which probably is RSA.
1:48 - 1:52

Before I launch into the math
1:52 - 1:55

I want to explain what we're doing here.
1:55 - 1:57

We wanted to, since you guys are hackers,
1:57 - 2:00

you like code instead of formulas
2:00 - 2:03

so we wanted to give you formulas in code.
2:03 - 2:05

So what we're giving you is working Python code for
2:05 - 2:08

all the math that we're going to do.
2:08 - 2:10

And in order to do that we're going to use
2:10 - 2:13

some software you call Sage.
2:13 - 2:16

Sage is free open-source mathematics software.
2:16 - 2:20

It is available at this URL sagemath.org.
2:20 - 2:22

It's based off of Python
2:22 - 2:24

so it's very simple if you know Python.
2:24 - 2:28

It has a nice sort of interpreter
2:28 - 2:30

that you can use just like Python.
2:30 - 2:32

Here an example, 2*3=6.
2:32 - 2:35

But there are some differences.
2:35 - 2:38

For example the caret instead of in Python it's XOR
2:38 - 2:40

in Sage it's exponentiation.
2:40 - 2:44

So 2^3 is 8 in Sage.
2:44 - 2:47

The other nice thing about Sage is that
2:47 - 2:50

it has lots of useful mathematical libraries.
2:50 - 2:53

For example if I, in the Sage interpreter, say factor(15)
2:53 - 2:56

it helpfully tells me 3*5.
2:56 - 2:57

That's pretty great.
2:57 - 2:59

It also does a lot more advanced things, for example
2:59 - 3:00

it can represent polynomials.
3:00 - 3:03

I can say factor the polynomial x^2-1
3:03 - 3:08

and it helpfully tells me the answer to that.
3:08 - 3:13

So now that we've gone through what is Sage,
3:13 - 3:17

I want to explain how to do RSA.
3:17 - 3:21

Perhaps some of you have seen this before.
3:21 - 3:25

In order to generate an RSA key, the first thing that you do
3:25 - 3:27

is you generate two large random primes.
3:27 - 3:30

So if we want a 1024 bit RSA key
3:30 - 3:35

we generate two primes of size 2^512,
3:35 - 3:39

so 512 bit primes, p and q.
3:39 - 3:42

In order to generate the public key
3:42 - 3:45

I multiply together p and q and you get some N.
3:45 - 3:46

N has 1024 bits.
3:46 - 3:50

And then you have what's known as the public exponent
3:50 - 3:54

which you can choose 3 or 65537 or 35.
3:54 - 3:56

It really doesn't matter what you choose.
3:56 - 3:59

These are some of the most commonly used values.
3:59 - 4:03

Now in order to generate a private key
4:03 - 4:08

you choose d, the decryption exponent,
4:08 - 4:13

to be the modular inverse of e mod (p-1)*(q-1)
4:13 - 4:15

It doesn't matter why so much, for this talk
4:15 - 4:17

but here is the formula
4:17 - 4:20

or here is the command to do this in Sage.
4:20 - 4:23

So now we have a public key and a private key
4:23 - 4:25

that are pairs.
4:25 - 4:28

And in RSA if you want to enrypt a message
4:28 - 4:32

you raise your message to the e'th power mod n
4:32 - 4:35

so you can do that with your public key here.
4:35 - 4:37

And similarly decryption,
4:37 - 4:40

you raise the ciphertext to the d'th power mod n
4:40 - 4:43

and that will give you the message back out again.
4:43 - 4:46

Fairly simple.
4:46 - 4:49

Now I'm lying to you slightly.
4:49 - 4:51

If you read this talk do not
4:51 - 4:53

absolutely do not implement RSA this way.
4:53 - 4:55

You must pad your message.
4:55 - 4:57

This is a public service warning.
4:57 - 5:01

We offered to tell you about factoring.
5:01 - 5:04

What is the relationship between RSA and factoring?
5:04 - 5:09

Now you can see easily from the formula
5:09 - 5:12

from the private key here
5:12 - 5:16

that if you know how to factor N into its prime factors p and q
5:16 - 5:19

then you can just compute d, the decryption exponent
5:19 - 5:21

of your private key.
5:21 - 5:24

Clearly, if you can factor N you can compute the private key.
5:24 - 5:29

But we don't actually know that factoring is the only way to break RSA.
5:29 - 5:34

There might be some way that you can compute messages from ciphertexts
5:34 - 5:37

that doesn't actually reveal the private key.
5:37 - 5:42

And there's no proof either way that this is or is not possible.
5:42 - 5:43

The last thing that I want to mention here
5:43 - 5:45

just to clarify things
5:45 - 5:47

some of you who might have taken
5:47 - 5:52

complexity or algorithms courses or a beginning CS course
5:52 - 5:56

you might have heard of NP-hardness, the P vs. NP problem.
5:56 - 6:01

And I just want to clarify that factoring is not known to be NP-hard.
6:01 - 6:05

Every computer scientist will love it if we could actually generate
6:05 - 6:11

a cryptosystem which is known to be based of average case NP-hardness.
6:11 - 6:14

We don't know how to do that.
6:14 - 6:17

RSA is not doing that.
6:17 - 6:22

In addition the factoring problem is probably not NP-hard.
6:22 - 6:26

The question is: how hard is factoring?
6:26 - 6:27

Since factoring is the best way that we know
6:27 - 6:31

to break the most commonly used public-key cryptosystem on the Internet.
6:31 - 6:33

Well I showed you some examples before
6:33 - 6:37

where Sage had this helpful little factor function.
6:37 - 6:39

So how well does it work?
6:39 - 6:41

How long do people think this takes?
6:41 - 6:46

two 32 bit integers multiplied together.
6:46 - 6:49

I heard it'll go a couple of seconds.
6:49 - 6:50

0.01 seconds.
6:50 - 6:55

This is on this computer.
6:55 - 6:58

How about 64 bit integers, two of them multiplied together?
6:58 - 7:01

So this is a 128 bit RSA modulus.
7:01 - 7:04

Any guesses?
7:04 - 7:05

One minute?
7:05 - 7:08

Nope, 0.1 seconds.
7:08 - 7:15

How about two 96 bit integers multiplied together?
7:15 - 7:16

A couple of minutes?
7:16 - 7:18

4 seconds.
7:18 - 7:22

How about two 128 bit integers multiplied together?
7:22 - 7:28

This is a 256 bit RSA modulus.
7:28 - 7:29

No guesses? Alright.
7:29 - 7:30

500 seconds.
7:30 - 7:33

So at this point it's starting to get a little bit iffy if I'm
7:33 - 7:36

sitting on a plane trying to do this on battery power.
7:36 - 7:39

laughter
7:39 - 7:44

Clearly this is growing faster than linear in the size of the key.
7:44 - 7:46

So what is actually going on?
7:46 - 7:49

Well.
7:49 - 7:51

Dan: Ok.
7:51 - 7:53

Turns out these factoring functions
7:53 - 7:55

they do look like they're getting pretty slow but
7:55 - 7:57

wait a minute.
7:57 - 8:00

Do we actually have to use this factor function
8:00 - 8:01

if it's getting really slow?
8:01 - 8:04

Maybe instead of trying to use Sage's factor function
8:04 - 8:09

maybe we could just guess what the prime was.
8:09 - 8:12

Now as an RSA user you might not think that this is a threat
8:12 - 8:14

having somebody guess your prime
8:14 - 8:17

because there is some way to count the number of primes,
8:17 - 8:19

number of 512 bit primes and there's
8:19 - 8:22

more than 2^500 512 bit primes.
8:22 - 8:24

And if your random number generator is giving you
8:24 - 8:27

any of those 512 bit primes with the same chance
8:27 - 8:31

then any particular prime that the attacker guesses
8:31 - 8:33

and just tries dividing n by that,
8:33 - 8:35

see if it's evenly divisable, well
8:35 - 8:39

any particular prime has only a 2^(-500) chance
8:39 - 8:41

of being guessed by the attacker.
8:41 - 8:43

And even if the attacker tries a huge number of times
8:43 - 8:45

he'll never guess your prime
8:45 - 8:48

except what if your random number generator
8:48 - 8:52

is working really really badly and doesn't give you 2^500 different primes?
8:52 - 8:56

What if it only gives you, say, 2^40 different primes?
8:56 - 8:58

And then suddenly the attacker could, well,
8:58 - 9:01

try figuring out what those primes are.
9:01 - 9:03

So here's how the attacker looks at this.
9:03 - 9:06

The attacker says: ok I'm gonna target your public key.
9:06 - 9:09

I'm gonna take your big N which is your secret...
9:09 - 9:11

sorry, public key is big N,
9:11 - 9:14

your secret key p times your secret key q,
9:14 - 9:15

your private p and q.
9:15 - 9:17

He's trying to figure out the p and q, given N.
9:17 - 9:20

So what he does, he takes a bunch of devices that are out there
9:20 - 9:23

and he looks at how they work
9:23 - 9:25

or downloads a bunch of software which generates keys
9:25 - 9:29

and then using that software, using those devices
9:29 - 9:32

he tries generating a bunch of p's and q's for himself
9:32 - 9:35

using whatever the random number generator is
9:35 - 9:37

in that device or in that software.
9:37 - 9:38

And then that's something which
9:38 - 9:41

maybe the random number generator works really badly
9:41 - 9:43

and maybe you're using that device, too.
9:43 - 9:46

So the attacker tries each of the primes that he see's
9:46 - 9:47

from this device
9:47 - 9:50

and tries seeing does that prime divide your N.
9:50 - 9:51

If you were using that device
9:51 - 9:53

and it doesn't generate very many primes
9:53 - 9:56

then maybe he gets lucky and finds your prime
9:56 - 9:58

because the device has generated the same prime for him
9:58 - 10:00

that it generated for you.
10:00 - 10:03

Now does anybody actually build devices which
10:03 - 10:07

screw up random numbers this badly?
10:07 - 10:09

laughter
10:09 - 10:12

As a matter of fact, yes, bears do shit in the woods.
10:12 - 10:14

So there are some famous examples.
10:14 - 10:17

We don't have a huge number of historical discussions in this talk
10:17 - 10:19

but just a few examples here.
10:19 - 10:22

Goldberg&Wagner totally broke the original Netscape
10:22 - 10:24

generation of public keys.
10:24 - 10:27

There are only something like 2^47 possible keys
10:27 - 10:29

which is a countable numbers.
10:29 - 10:31

So they could figure out all the possible keys
10:31 - 10:34

that Netscape could generate if they you knew when you generated a key
10:34 - 10:36

which was often very predictable.
10:36 - 10:38

Another example: the Debian bug,
10:38 - 10:41

the horrendous failure for SSH and lots of other applications
10:41 - 10:42

from a few years ago.
10:42 - 10:46

Debian and Ubuntu for a year and half were generating only
10:46 - 10:49

well, fewer than a million possible public keys.
10:49 - 10:51

So complete disasters of random number generation.
10:51 - 10:54

But if you want to add to this list,
10:54 - 10:55

if you want to do more and more of these
10:55 - 10:57

find bad random number generators
10:57 - 11:01

you have to say "ok what's the next device out there
11:01 - 11:02

that might have been screwed up."
11:02 - 11:03

"Let's look at it, look at the code."
11:03 - 11:05

"See what kind of primes it generates."
11:05 - 11:07

"Does it do badly?"
11:07 - 11:09

It takes some real work to figure this out.
11:09 - 11:12

Wouldn't it be nice if there was a systematic way
11:12 - 11:15

to just without having to do a lot of work,
11:15 - 11:17

to look at each device, each piece of software,
11:17 - 11:18

wouldn't it be nice to just figure out
11:18 - 11:22

"oh yeah, let's just magically figure out
11:22 - 11:24

if there's any primes out there generated
11:24 - 11:26

from bad random number generators."
11:26 - 11:28

You could imagine here's the easy attack.
11:28 - 11:30

Here's the systematic way to do this.
11:30 - 11:34

You take two users, so you and you.
11:34 - 11:35

And you each have a public key.
11:35 - 11:37

We're gonna grab those public keys
11:37 - 11:42

N1 and N2
11:42 - 11:47

and then hope that this N1 and N2 share a prime.
11:47 - 11:54

N1 is some prime p times some q1 and N2 is some same prime p times a different q2.
11:54 - 11:56

Now this could happen, you could have
11:56 - 11:59

N1 and N2 sharing both primes.
11:59 - 12:02

That would be legitimate that two people who are actually
12:02 - 12:06

the same webserver sharing their keys, sharing their certificates,
12:06 - 12:07

they know they're doing this,
12:07 - 12:09

you can get the same public key from two places.
12:09 - 12:12

Google has their keys copied to tens of thousands of servers,
12:12 - 12:13

that's not a surprise.
12:13 - 12:15

But if there's a different...
12:15 - 12:17

If you have two different public keys
12:17 - 12:18

they shouldn't be sharing primes.
12:18 - 12:20

What if they do?
12:20 - 12:22

Well, here's this magic Euclid's algorithm
12:22 - 12:25

which will just print out the shared prime p.
12:25 - 12:26

This is a very simple algorithm.
12:26 - 12:28

It just takes one of the numbers, say N1
12:28 - 12:32

and let's say N1 is the smaller one and N2 is the bigger one,
12:32 - 12:34

it divides N2 by N1
12:34 - 12:38

and replaces N2 by the remainder and then switches the numbers
12:38 - 12:40

and that's exactly what this code does.
12:40 - 12:43

The N2 % N1 that's again the N2 mod N1
12:43 - 12:46

the least remainder when you divide N2 by N1.
12:46 - 12:49

And then the result once one of the numbers gets down to zero
12:49 - 12:52

the other number is exactly the shared prime.
12:52 - 12:54

So this amazing algorithm,
12:54 - 12:57

here's just a little manual example
12:57 - 13:00

not with big 512, 1024 bit keys
13:00 - 13:04

this is with just tiny little I guess 8 bit primes
13:04 - 13:08

16 bit... well looks more like 13 bit keys.
13:08 - 13:12

So two numbers coming in and one is 4187 and two is 5989.
13:12 - 13:15

You can see if you keep dividing one number by the other,
13:15 - 13:16

replacing it by the remainder,
13:16 - 13:19

do that a few times, you quickly get down to zero.
13:19 - 13:23

And the number right before the zero in the middle of the slide is 53.
13:23 - 13:27

Now 53 is a shared prime between these two small keys.
13:27 - 13:29

If you scale this up,
13:29 - 13:32

if you do this computation for bigger numbers
13:32 - 13:34

then here's another timing example.
13:34 - 13:37

It's still really really fast, in the middle of the slide.
13:37 - 13:40

This is doing 1024 bit keys that share a 512 bit prime.
13:40 - 13:45

You can figure out this 512 bit prime so quickly that the timing
13:45 - 13:49

there is 0.00 seconds, can't even see how fast it is.
13:49 - 13:51

Ok, so that's Euclid's algorithm.
13:51 - 13:54

Now you can actually do this for tons and tons of keys.
13:54 - 13:58

You download, say, millions, tens of millions of keys from the Internet,
13:58 - 14:01

all the different SSL servers, maybe go for some SSH keys,
14:01 - 14:04

you download tons and tons of public keys and then
14:04 - 14:07

you try all the pairs with Euclid's algorithm.
14:07 - 14:09

And it's so fast that you can do this but actually
14:09 - 14:11

you can do better.
14:11 - 14:14

So there's an algorithm: batch GCD algorithm
14:14 - 14:15

which takes a whole bunch of keys
14:15 - 14:20

and figures out which of those keys share primes with the others.
14:20 - 14:23

Much much faster than trying every pair.
14:23 - 14:24

So you don't have to try every pair.
14:24 - 14:27

It's kind of like sorting, it's about that kind of speed
14:27 - 14:29

instead of having to try every pair.
14:29 - 14:31

If you have tens of millions you don't have to do
14:31 - 14:33

tens of millions times tens of millions of pairs.
14:33 - 14:36

You just have to pretty much reap through the whole list once.
14:36 - 14:39

So what does this batch GCD algorithm do?
14:39 - 14:41

It's figuring out for the first N1,
14:41 - 14:44

instead of computing the greatest common divisor,
14:44 - 14:46

the Euclid result for N1 and N2
14:46 - 14:48

and for N1 and N3 and so on,
14:48 - 14:52

it figures out the product of N2, N3, N4 and so on,
14:52 - 14:55

takes the greatest common divisor of that with N1,
14:55 - 14:57

applies Euclid's algorithm to that
14:57 - 15:01

and then does a similar thing for N2 and for N3 and so on.
15:01 - 15:06

Each of those is gonna show you whether N1 or N2 or N3,
15:06 - 15:08

it's gonna show you whether they share some prime
15:08 - 15:10

with any of the other N's.
15:10 - 15:13

If you did exactly the computation
15:13 - 15:15

shown on the slide in the math formulas
15:15 - 15:17

it would still be too slow.
15:17 - 15:20

But the batch GCD algorithm finds redundancies
15:20 - 15:23

in these GCDs, and here's exactly what it does:
15:23 - 15:26

It figures out the product of all of the keys.
15:26 - 15:30

So you take, say, a million keys and you multiply them all together.
15:30 - 15:34

And that's done with the code here where you do...
15:34 - 15:35

maybe I'll just skip to the bottom.
15:35 - 15:37

You see numbers 10, 20, 30, 40.
15:37 - 15:39

You build a product tree.
15:39 - 15:42

So you take 10 times 20, compute 200.
15:42 - 15:45

Take the next pair 30 times 40, compute 1200.
15:45 - 15:48

The take the pair of products 200 times 1200.
15:48 - 15:50

If you have more numbers, you have more levels in the tree.
15:50 - 15:55

And that's exactly what the Sage code does here.
15:55 - 15:59

And then the last step of the algorithm is
15:59 - 16:03

kind of going down the tree, building a remainder tree.
16:03 - 16:07

So what's happening here to figure out the greatest common divisor
16:07 - 16:10

of N1 with the product of all the other keys,
16:10 - 16:13

the idea is to take this product
16:13 - 16:16

which is called big R on this slide,
16:16 - 16:23

take the product of all the keys and then divide that by N1^2
16:23 - 16:26

and then take the remainders, so R mod N1^2,
16:26 - 16:31

divide by N1 and then compute the greatest common divisor of N1 mod N.
16:31 - 16:33

And the amazing effect is that it's exactly the
16:33 - 16:35

greatest common divisor we were looking for.
16:35 - 16:38

And what this little Sage script does is that it takes
16:38 - 16:42

R and divides R by, well a bunch of N's in a really fast way,
16:42 - 16:44

and then at the bottom, once it has figured out
16:44 - 16:47

all mod N1^2, all mod N2^2 and so on,
16:47 - 16:49

divides those by N1 and N2,
16:49 - 16:52

takes the greatest common divisor with N1 and N2
16:52 - 16:55

and that's exactly telling you, the outputs here
16:55 - 16:57

that are different from 1, are exactly telling you
16:57 - 17:00

which N's share a prime with some others.
17:00 - 17:04

This very remarkably short little script does work quite well.
17:04 - 17:07

Here's how fast these are.
17:07 - 17:10

So I'm on some 800 MHz core.
17:10 - 17:13

If you try this for, say, a hundred numbers,
17:13 - 17:17

just hundred random 1024 bit numbers,
17:17 - 17:21

it doesn't matter what the numbers are, they always run at the same speed,
17:21 - 17:23

then that takes 0.05 seconds.
17:23 - 17:27

If you have 10 times as many numbers it takes about 20 times as long.
17:27 - 17:31

And if you have another 10 times as many numbers it takes about 20 times as long.
17:31 - 17:33

And it keeps scaling up like that.
17:33 - 17:36

So you can get to millions or tens of millions of numbers
17:36 - 17:39

and it still finishes in a reasonable amount of time.
17:39 - 17:40

So, amazingly fast algorithm.
17:40 - 17:44

You don't have to scale this up to like 2^50 or 2^100 keys.
17:44 - 17:46

There's only, say, tens of millions of keys out there
17:46 - 17:48

and this runs reasonably fast.
17:48 - 17:51

Ok.
17:51 - 17:52

Can you actually hope for this to work?
17:52 - 17:55

Are random number generators really this bad?
17:55 - 17:56

Well...
17:56 - 18:00

laughs
18:00 - 18:03

There's a 2012 paper from Heninger
18:03 - 18:05

that's the same Heninger we have sitting here
18:05 - 18:06

and Durumeric, Wustrow, Halderman,
18:06 - 18:10

that's got the best paper award at USENIX Security 2012.
18:10 - 18:15

What they did was they tried exactly this on tens of millions of keys out there
18:15 - 18:18

and factored tens of thousands of keys.
18:18 - 18:23

Admittely they used a C version of this instead of this
18:23 - 18:24

Sage script but
18:24 - 18:26

the Sage script will go up to millions.
18:26 - 18:28

It's just when you get beyond that you want to write it in C
18:28 - 18:31

to deal with using a lot of memory.
18:31 - 18:35

But you can use exactly this script for pretty large chunks of keys.
18:35 - 18:38

And so they factored tens of thousands of keys and said
18:38 - 18:42

that these keys are typically not your bank keys.
18:42 - 18:45

They gonna be, say, keys for your home router.
18:45 - 18:48

So the vulnerable devices are not gonna be,
18:48 - 18:50

say, a big server out there.
18:50 - 18:54

The vulnerable devices will be your FRITZ!Box.
18:54 - 18:57

applause
18:57 - 19:00

Thank you, Nadia.
19:00 - 19:07

It's good to read the paper, go to factorable.net
19:07 - 19:09

and you get tons of analyses of why this happened
19:09 - 19:13

and why these devices are generated guessable numbers.
19:13 - 19:15

I mean numbers that are so non-random that they get
19:15 - 19:18

repeated between a lot of keys out there.
19:18 - 19:20

There was at the same time another team that
19:20 - 19:24

like within days announced the same result.
19:24 - 19:27

They independently did the same computation.
19:27 - 19:29

They said "yeah, we've also downloaded
19:29 - 19:31

a bunch of keys and factored as many as we could"
19:31 - 19:33

with basically the same algorithm.
19:33 - 19:35

At the end of it "yeah we factored
19:35 - 19:37

tens of thousands of keys."
19:37 - 19:39

That paper has no analysis.
19:39 - 19:42

They're like "yeah, there's keys"
19:42 - 19:43

"they share primes for some reason"
19:43 - 19:46

"I guess ecommerce is dead"
19:46 - 19:48

"go out, change your bank keys"
19:48 - 19:49

"call the New York Times"
19:49 - 19:51

There's the New York Times article:
19:51 - 19:53

"Flaw Found in an Online Encryption Method"
19:53 - 19:55

I also like the advertising on the side here.
19:55 - 19:57

Like iron key on the top and then:
19:57 - 20:01

"Encrypt, decrypt & access my secret data in real time"
20:01 - 20:03

"try it free for 30 days."
20:03 - 20:05

"I secured my cloud data. Have you?"
20:05 - 20:07

Awesome advertising there.
20:07 - 20:12

Once you found that there's a problem like this
20:12 - 20:15

then of course you start looking around for more and more places
20:15 - 20:17

where you might have some bad randomness.
20:17 - 20:23

For example there is a paper, or at least slides,
20:23 - 20:25

by Chou in Taiwan, saying
20:25 - 20:32

they took two million Taiwan Citizen Digital Certificates
20:32 - 20:34

and did some GCDs
20:34 - 20:37

and found that a 103 of those were
20:37 - 20:38

factorable.
20:38 - 20:40

So anybody who downloaded these certificates
20:40 - 20:43

which apparently are used in Taiwan for like paying your taxes,
20:43 - 20:44

talking to banks I don't know,
20:44 - 20:47

but paying taxes is one that I've checked.
20:47 - 20:50

So a 103 people out there have these Taiwan
20:50 - 20:51

Citizen Digital Certificates where
20:51 - 20:54

you're supposed to have in this database things like
20:54 - 20:57

the names of the people and their ID numbers
20:57 - 21:00

but you aren't supposed to have their secret keys.
21:00 - 21:02

The whole point is those are supposed to be private
21:02 - 21:04

kept on some smartcards that are issued to the citizens
21:04 - 21:05

in Taiwan.
21:05 - 21:09

And the randomness generation is so bad that
21:09 - 21:13

103 of those keys have now been factored.
21:13 - 21:15

The smartcard manufacturer, I also like this quote,
21:15 - 21:19

it's a company based in Munich called
21:19 - 21:22

Giesecke & Devrient and their motto is:
21:22 - 21:25

"Creating Confidence"
21:25 - 21:35

applause
21:35 - 21:38

Tanja: There's gonna be more of Dan, don't worry.
21:38 - 21:40

But we promised a bit more than just factoring
21:40 - 21:43

bad keys or the keys with their primes.
21:43 - 21:47

If you find another number like this one.
21:47 - 21:50

So here's coming some more of math stuff.
21:50 - 21:53

So if you see a number like this and you
21:53 - 21:56

observe this last digit, then yes.
21:56 - 21:59

It's very easy for you to see it's divisable by 5.
21:59 - 22:01

So if you find such a key that is easy to break
22:01 - 22:03

and you are a computer rather than a human being
22:03 - 22:05

you look at lots of those numbers.
22:05 - 22:07

You have a list of small primes stored
22:07 - 22:10

and what you're gonna do is just trial division.
22:10 - 22:12

So if there's any small factor
22:12 - 22:14

it's very easy to find this trial division.
22:14 - 22:17

Or what Dan showed was the remainder trees.
22:17 - 22:20

You can also do this for batch trial division.
22:20 - 22:24

So assuming that you're looking for a prime p somewhere
22:24 - 22:27

then you go linearly all the way up to the p
22:27 - 22:30

and there are about p/log(p) many primes
22:30 - 22:32

before you find this p.
22:32 - 22:34

For each of them you just do exact division.
22:34 - 22:37

If it works, fine you found a factor, otherwise not.
22:37 - 22:39

Now this is not going to work against your normal keys.
22:39 - 22:44

I know that in the example that Dan mentioned by Wagner&Goldberg
22:44 - 22:47

that they found one factor which had a 9 in there
22:47 - 22:50

but usually your keys don't have a 9.
22:50 - 22:55

For slightly larger numbers there is a method due to Pollard.
22:55 - 22:57

So if you see the respectful number here,
22:57 - 23:00

there is no obvious divisability of this one.
23:00 - 23:03

One thing you can try is
23:03 - 23:07

a walk, well we call this a walk.
23:07 - 23:10

You start with some random number smaller than N
23:10 - 23:13

and some offset here.
23:13 - 23:17

I should also say, all those scripts are available at this URL.
23:17 - 23:22

So if you go to facthacks.cr.yp.to then you'll find all the scripts
23:22 - 23:24

if you want to play with it at the same time,
23:24 - 23:27

including this one with some explanations and such.
23:27 - 23:31

So what you do is you start off two different numbers,
23:31 - 23:35

one at each step squares it and adds c.
23:35 - 23:37

And the other one squares it, adds c,
23:37 - 23:40

and squares it again and adds the c.
23:40 - 23:41

Each time computing mod N.
23:41 - 23:44

So you have two sequences running around,
23:44 - 23:46

one is twice the speed as the other.
23:46 - 23:51

At every step you check with the GCD of this number N
23:51 - 23:53

and the difference between the two walks
23:53 - 23:55

hasn't anything interesting now.
23:55 - 23:57

N is an RSA modulus, the only thing interesting
23:57 - 24:00

could either be p or q.
24:00 - 24:03

So for this particular number
24:03 - 24:04

when I run this
24:04 - 24:09

I find 2053 after a few milliseconds.
24:09 - 24:11

So this is again a cooked up example.
24:11 - 24:13

You won't find such a small factor if you're trying
24:13 - 24:16

a general RSA factor, otherwise your numbers
24:16 - 24:18

would be totally busted but
24:18 - 24:21

this shows if you have a small number in there
24:21 - 24:22

it's very easy to find.
24:22 - 24:24

All later on when Dan is talking about the number field sieve
24:24 - 24:26

this is an interesting subroutine.
24:26 - 24:28

So if you ever need to find small numbers
24:28 - 24:31

then, sure trial division is very easy for
24:31 - 24:34

really small numbers taking this long,
24:34 - 24:38

this one squared with p is much faster already.
24:38 - 24:42

Now even faster than that, also due to Pollard,
24:42 - 24:45

called Pollard's p-1 method.
24:45 - 24:48

Here is again an innocent looking number N.
24:48 - 24:50

You see the numbers get larger.
24:50 - 24:52

This is a 256 bit number.
24:52 - 24:57

In order to deal with such a number there is one step
24:57 - 24:59

which is expensive but you would do it only once
24:59 - 25:01

for all different numbers you want to try,
25:01 - 25:05

namely compute a big integer which has lots of little factors.
25:05 - 25:07

So you take the least common multiple
25:07 - 25:10

from 1, 2, 3, 4, 5...
25:10 - 25:12

Ok, once you hit the 4 you have another 2 in there.
25:12 - 25:14

So there's lots and lots of powers of 2,
25:14 - 25:15

lots and lots of powers of 3.
25:15 - 25:19

And then the larger primes only appear like once.
25:19 - 25:23

But you're not going up to 256 or 128 anywhere.
25:23 - 25:27

You're sticking here at 22 bits
25:27 - 25:29

and then use the same function that Nadia explained
25:31 - 25:33

in the RSA computation, namely the modular exponentiation.
25:33 - 25:37

So you take the 2^y,
25:37 - 25:38

this huge number,
25:38 - 25:40

but the whole computation mod N.
25:40 - 25:45

So this whole thing never grows beyond this 256 bit number.
25:45 - 25:48

Or for a real world example it would be a 1000 bit number.
25:48 - 25:53

And then you compute the GCD of this number and N.
25:53 - 25:55

For this particular one, again it's a cooked up example,
25:55 - 25:56

you get a factor.
25:56 - 26:00

Now this factor is actually 120 bits.
26:00 - 26:02

This is not a small factor.
26:02 - 26:06

So if you were using 256 bit numbers
26:06 - 26:08

this is something that could happen to you
26:08 - 26:11

and nevertheless this method would find it.
26:11 - 26:14

So this finds much larger factors than trial division
26:14 - 26:15

or the Rho method in the same amount of time.
26:15 - 26:17

But it doesn't work for all primes.
26:17 - 26:20

It works for primes which are kind of special.
26:20 - 26:24

Now what's special about this prime or the other factor?
26:24 - 26:30

If you take p and subtract -1, hence the name p-1 method,
26:30 - 26:32

and factor that thing,
26:32 - 26:35

that has lots, lots of small numbers.
26:35 - 26:37

So that's something we call smooth.
26:37 - 26:40

So a smooth number doesn't have big factors.
26:40 - 26:44

So a prime is like the least smooth number you can imagine.
26:44 - 26:47

And 2 to the power large is the most smooth number,
26:47 - 26:50

so lots of little factors means smooth.
26:50 - 26:56

This largest factor happens to be 21.7 bits
26:56 - 26:58

which is why I chose the 22 bits here.
26:58 - 27:04

So as soon I run over the largest prime in p-1
27:04 - 27:06

with this exponent y here
27:06 - 27:09

then I do find this factor.
27:09 - 27:11

Now if you thought this was math
27:11 - 27:12

there is even more scary math
27:12 - 27:16

namely there are methods which are generalizing
27:16 - 27:19

this p-1 method using elliptic curves.
27:19 - 27:22

If you listen to the crypto advertisement
27:22 - 27:23

elliptic curves are usually something very good,
27:23 - 27:25

something you should have on your smartcard.
27:25 - 27:27

It's faster than RSA and so on.
27:27 - 27:29

That's the same type of elliptic curve
27:29 - 27:32

but here elliptic curves come in as an attack tool.
27:32 - 27:35

There is a method called the elliptic curve method ECM
27:35 - 27:41

which is a generalization of this p-1 method and does not need anything special,
27:41 - 27:45

I mean avoiding such primes is easy.
27:45 - 27:46

If you look into older standards
27:46 - 27:50

they all warn about "make sure to use strong primes"
27:50 - 27:53

"make sure to check that your p-1 is not smooth"
27:53 - 27:55

And of course you don't want your p-1 to be smooth
27:55 - 27:58

because otherwise this attack works.
27:58 - 28:01

But if I have some elliptic curve
28:01 - 28:03

a different type of prime is weak for that curve
28:03 - 28:06

and I have many, many, many elliptic curves.
28:06 - 28:09

For each of the curves some primes are weak.
28:09 - 28:11

You can't exclude this method.
28:11 - 28:14

So the only thing you can do is just, well,
28:14 - 28:18

choose your prime large enough and hope, well,
28:18 - 28:20

make sure p-1 doesn't get it and hope
28:20 - 28:23

that the next elliptic curves won't get it.
28:23 - 28:25

So elliptic curves are good for attacking.
28:25 - 28:28

It's still not the fastest method out there
28:28 - 28:33

but it's a good method for random factoring.
28:33 - 28:36

It's not the best method for finding RSA factors
28:36 - 28:38

but if you have a number which is most likely
28:38 - 28:45

not too non-smooth then it's a good method.
28:45 - 28:47

Now this is what you do for finding small factors.
28:47 - 28:51

To show some method for big factors,
28:51 - 28:54

now here that is a big number.
28:54 - 28:59

But this is what I would call factorization by inspection.
28:59 - 29:08

What's wrong with this number?
29:08 - 29:10

I mean this number is not small and
29:10 - 29:13

it won't have small factors, I promise you this.
29:13 - 29:20

So it's a decimal representation, so 9 is the digit before 0.
29:20 - 29:23

This looks very close to the power of 10.
29:23 - 29:28

Actually this is very close to 10^340.
29:28 - 29:30

If you take the square root of it
29:30 - 29:34

then you almost exactly hit 10^170.
29:34 - 29:37

This example was cooked up as taking two primes
29:37 - 29:47

namely 10^170-33 and 10^170+63 and multiplying them.
29:47 - 29:49

So if you see lots of 0's and lots of 9's
29:49 - 29:51

then you know that you are very close to a certain power.
29:51 - 29:53

Now in real life this wouldn't be a power of 10,
29:53 - 29:55

it would be a power of 2.
29:55 - 29:56

And there actually are some examples.
29:56 - 29:59

There's a famous story of a letter bomber in Austria
29:59 - 30:03

who wanted his message to be crackable
30:03 - 30:06

and he sent a message, well he was some right-wing asshole
30:06 - 30:08

that was letter-bombing people
30:08 - 30:11

and then he sent an encrypted message to the police.
30:11 - 30:15

And the police tried to factor it and tried to decipher it
30:15 - 30:19

hoping it would be the list of the next victims.
30:19 - 30:21

In the end it was not but
30:21 - 30:24

this person was actually using very close to a power of 2
30:24 - 30:26

in order to have people crack it.
30:26 - 30:28

In the end it was just some propaganda,
30:28 - 30:29

some right-wing stuff as well.
30:29 - 30:33

So there are people using this for breakability.
30:33 - 30:35

I'm not aware of people actually using this as an accident.
30:35 - 30:40

But what can happen to you is not using close to the power of 2
30:40 - 30:46

but saying "ok I learned I shouldn't be close to the power of 2
30:46 - 30:49

so I take some offset and take the next prime."
30:49 - 30:51

Next prime just means, add 1, check is it prime,
30:51 - 30:54

add next one, add 2, add 3
30:54 - 30:57

Just check primes linearly from then on.
30:57 - 30:59

Now if we jump to this where we finally find our prime p
30:59 - 31:01

it's a good prime.
31:01 - 31:04

It's not anywhere close to the power of 2 because my c was large enough.
31:04 - 31:07

Made it, good.
31:07 - 31:09

Now on q.
31:09 - 31:11

So again call the next_prime().
31:11 - 31:14

So I do p+1, p+2 until I find a prime.
31:14 - 31:17

That means that if you take the product of the two
31:17 - 31:19

then this N is very close to a square.
31:19 - 31:23

So this N is cooked up with this method.
31:23 - 31:24

You don't see anything on the end,
31:24 - 31:27

there's nothing wrong there.
31:27 - 31:30

But when you compute the square root of it
31:30 - 31:33

it is almost an integer, it's .99999...
31:33 - 31:35

and then some dirt here.
31:35 - 31:39

So then you'll know that your primes were too small.
31:39 - 31:42

Sorry, not too small, too close to each other.
31:42 - 31:45

So if you ever take an RSA factor modulus
31:45 - 31:47

and you compute the square root
31:47 - 31:48

and you see something like this
31:48 - 31:52

then you know: that user did this method.
31:52 - 31:54

You could just say, I take the number,
31:54 - 31:57

subtract 1, subtract 2, just go off from the square root.
31:57 - 32:02

All you can check how far away is it from being a square.
32:02 - 32:04

So you take the seeding,
32:04 - 32:08

that means the closest integer counting upwards.
32:08 - 32:12

Just round it to this two, here's 57, is there a 56?
32:12 - 32:15

Compute the square of that and subtract N.
32:15 - 32:18

Now in this example that is 4096
32:18 - 32:23

which itself is a square.
32:23 - 32:27

So the difference of N and a^2 is a square.
32:27 - 32:31

Then I take this 64, take a-64, divide N,
32:31 - 32:34

it's an exact division, so I found one of the factors.
32:34 - 32:37

And then there's the other factor.
32:37 - 32:40

It doesn't matter whether it's a power of 2 or a power of 10.
32:40 - 32:43

What matters is that the numbers are too close.
32:43 - 32:45

Now this method actually works surprisingly well
32:45 - 32:47

even if we don't do next_prime().
32:47 - 32:50

So here's another example where I didn't do the next_prime()
32:50 - 32:52

but did something larger
32:52 - 32:55

so this is not really close to a square.
32:55 - 32:58

Some 9's but not too many.
32:58 - 33:00

What we did before we wrote it as a difference of squares
33:00 - 33:05

and then divided N by one of these two factors.
33:05 - 33:10

Now in this case here I would not start as a^2-N
33:10 - 33:12

but I say, well, if a^2-N is not a square
33:12 - 33:16

I try the next one, I do (a+1)^2, (a+2)^2
33:16 - 33:21

each time subtract N and check when this is a square.
33:21 - 33:24

Now for this example I get lucky after 2.
33:24 - 33:26

And this actually took me a long time to cook up this example
33:26 - 33:29

because I was starting at something which was
33:29 - 33:32

half of the bits of p were changed.
33:32 - 33:36

So I was actually starting with a cube quite a distance away.
33:36 - 33:38

Still, most of those were just giving a square.
33:38 - 33:42

They were not giving me anything larger than zero here.
33:42 - 33:46

Now for general numbers if I wouldn't do like next_number()
33:46 - 33:49

but just, well, random prime, another random prime.
33:49 - 33:53

It still works but takes a long time
33:53 - 33:55

because I can always write it this way.
33:55 - 33:57

This is a, this is b.
33:57 - 34:01

But then usually it would take about square root of N
34:01 - 34:04

which is the same size as p until I get there.
34:04 - 34:07

This is a nice method if it looks like lots of
34:07 - 34:13

9999 and otherwise do something better as Dan will tell now.
34:13 - 34:14

Dan: Ok.
34:14 - 34:22

applause
34:22 - 34:24

Alright, so it's bad to have p and q
34:24 - 34:25

too close to each other
34:25 - 34:29

or have a small p or to have p and q non-random.
34:29 - 34:31

So let's do everything right.
34:31 - 34:34

Let's make just independently choose some big p
34:34 - 34:37

between 2^511 and 2^512
34:37 - 34:39

and choose some big q
34:39 - 34:43

without being like the next prime or anywhere in the ballpark of p.
34:43 - 34:48

You could maybe say q has to be between 2^509 and 2^510
34:48 - 34:50

and then it's definitely nowhere near p.
34:50 - 34:53

Now you got this public key and p times q
34:53 - 34:54

which is a 1024 bit RSA key.
34:54 - 34:59

If nothing has gone wrong with your randomness generation
34:59 - 35:00

then what does the attacker do?
35:00 - 35:04

Well, we don't know anything fast.
35:04 - 35:07

So this is gonna be the one part of the talk
35:07 - 35:08

where there's these big powers of 2
35:08 - 35:10

for how fast things are
35:10 - 35:11

because they're not really fast.
35:11 - 35:15

You have to think about how much something like 2^80 is.
35:15 - 35:16

There's all these different methods,
35:16 - 35:18

I'll just skip down to the last two.
35:18 - 35:20

There's the quadratic sieve (QS),
35:20 - 35:22

the number field sieve (NFS).
35:22 - 35:24

These have been around since, well,
35:24 - 35:29

QS since the 80th, NFS since the early 90th.
35:29 - 35:32

The NFS, the general consensus is
35:32 - 35:35

it takes something like 2^80 operations.
35:35 - 35:37

Now here's something fun to try.
35:37 - 35:39

If somebody says 2^80 operations
35:39 - 35:41

and there's some cryptographer talking about the security or something,
35:41 - 35:44

ask them: what do you mean by an operation?
35:44 - 35:46

How fast is this operation?
35:46 - 35:47

And then most of the people saying this
35:47 - 35:49

will go running screaming like:
35:49 - 35:50

I don't know what an operation is,
35:50 - 35:51

don't ask me that question.
35:51 - 35:53

But still people will confidently tell you
35:53 - 35:56

that the NFS takes 2^80 operations
35:56 - 36:00

to break any 1024 bit RSA key
36:00 - 36:02

even if the user hasn't done anything wrong,
36:02 - 36:04

the implementor hasn't done anything wrong.
36:04 - 36:06

And this number, this 2^80
36:06 - 36:08

if you pin down what those operations are,
36:08 - 36:11

this is something which is doable today
36:11 - 36:16

by people with a big botnet or people with a big computer cluster.
36:16 - 36:19

I'll give some details what I mean, the size is there.
36:19 - 36:22

As your computers get cheaper and cheaper,
36:22 - 36:24

somehow chips aren't going up in clockspeed
36:24 - 36:25

but they're still getting cheaper.
36:25 - 36:28

You can get a really cheap ARM which is doing
36:28 - 36:29

the same kind of computations
36:29 - 36:32

that a pretty big CPU was doing several years ago.
36:32 - 36:33

And chips are gonna continue getting cheaper
36:33 - 36:37

for a while. These attacks against RSA 1024
36:37 - 36:40

are going to be common doable for more and more people.
36:40 - 36:43

How do these methods work?
36:43 - 36:47

I'll give one example and then one little Sage script.
36:47 - 36:50

Let's try just a really small number.
36:50 - 36:52

This will be with the quadratic sieve
36:52 - 36:54

which is not as fancy as the number field sieve
36:54 - 36:56

but it will give you some idea of how these methods work.
36:56 - 37:00

The QS starts with that Fermat method you heard about.
37:00 - 37:04

Remember, the idea there was to try to find a square
37:04 - 37:09

as a^2-N, so a was like the square root of N
37:09 - 37:11

or maybe the plus 1, plus 2 and so on
37:11 - 37:13

and you keep searching for a+1 and so on,
37:13 - 37:16

search for numbers that if you square them and subtract N
37:16 - 37:17

then you get a square.
37:17 - 37:21

So let's try that for 50=2759 which is not very big.
37:21 - 37:24

Then you try, 53 was the ceiling of the square root,
37:24 - 37:28

square that, subtract 2759, you get 50.
37:28 - 37:31

Well 25 would be a square, but 50 is not a square
37:31 - 37:33

it's 2 times 25, 2 times 5^2.
37:33 - 37:37

And then you try another number, 54^2-2759
37:37 - 37:40

umm 157, that's too complicated
37:40 - 37:43

I don't remember that being a square so let's just skip it.
37:43 - 37:46

And then you keep going like this, 266, 377,
37:46 - 37:48

490. I remember 49 is a square
37:48 - 37:50

but that doesn't mean that 490 is a square
37:50 - 37:52

cause 10, 2 times 5, that's not a square.
37:52 - 37:56

And 605, you can try...
37:56 - 37:59

We remember how to take out the 5
37:59 - 38:01

and then 60... 605 divided by 5.
38:01 - 38:04

You can figure out it's 5 times 11^2.
38:04 - 38:05

It's almost a square.
38:05 - 38:07

You keep doing this for a while
38:07 - 38:11

and it seems like Fermat's method is actually working pretty badly.
38:11 - 38:14

It does work eventually but it's taking quite a while.
38:14 - 38:17

Here's what the quadratic sieve does.
38:17 - 38:20

From the same computation you look at these
38:20 - 38:22

numbers that were not exactly squares
38:22 - 38:24

50 and 490 and 605
38:24 - 38:27

and you notice if you multiple them together
38:27 - 38:28

50 was 2 times 5^2
38:28 - 38:31

490 is 2 times 5 times 7^2
38:31 - 38:33

605 is 5 times 11^2
38:33 - 38:35

and then you multiply those together
38:35 - 38:39

you get 2^2 and 5^4 and 7^2 and 11^2
38:39 - 38:41

and that's exactly the square of something.
38:41 - 38:45

The square of 2^1 times 5^2 times 7 times 11.
38:45 - 38:48

Then the quadratic sieve is taking the square root of that number
38:48 - 38:53

and subtracting that from the product of the fifty-somethings
38:53 - 38:54

that were on the left side
38:54 - 38:57

and taking the greatest common divisor of that with N
38:57 - 39:00

and amazingly that produces a factor of N.
39:00 - 39:02

And it's not just some random accident that
39:02 - 39:04

if you had tried the greatest common divisor of
39:04 - 39:06

"write down some hocuspocus then"
39:06 - 39:09

"you eventually get 31 coming out"
39:09 - 39:11

"every 31 times you write down a random number"
39:11 - 39:13

"it will have this 31 dividing it"
39:13 - 39:15

No, you can try this for bigger and bigger numbers
39:15 - 39:17

and actually this keeps working.
39:17 - 39:19

As soon as you find a square product
39:19 - 39:23

then half of those square products are gonna factor N.
39:23 - 39:25

So that's how the quadratic sieve works.
39:25 - 39:29

Here's a more systematic script with a much bigger number
39:29 - 39:31

which if it were just hocuspocus this wouldn't work.
39:31 - 39:33

But this does work.
39:33 - 39:37

So there's a number from my computer's random number generator
39:37 - 39:42

31415926... laughter
39:42 - 39:45

You try writing down a bunch of squares...
39:45 - 39:48

maybe I should get my computer's random number generator checked.
39:48 - 39:51

It is this two year old laptop you heard about before
39:51 - 39:52

so I'm not sure it's working properly.
39:52 - 39:56

You take a bunch of a^2-N and then
39:56 - 39:57

let's make a list of those called X.
39:57 - 40:01

You can see the range, there is up to 500.000 numbers.
40:01 - 40:03

It's a pretty big list we're talking about.
40:03 - 40:06

And then search through those elements,
40:06 - 40:09

search through those a^2-N, those differences,
40:09 - 40:10

the elements in this list,
40:10 - 40:13

to see which ones have easy factorizations.
40:13 - 40:15

Now a lot of numbers like that 157
40:15 - 40:17

I know what the factorization of that is
40:17 - 40:20

but there is function which you can find on our website
40:20 - 40:23

easyfactorizations() which looks through the list X
40:23 - 40:25

and if it finds numbers that are easy to factor
40:25 - 40:28

then it quickly writes down the factorizations
40:28 - 40:30

using the small prime techniques you heard about.
40:30 - 40:33

And then makes a list of those easy factorizations,
40:33 - 40:34

puts those into F.
40:34 - 40:38

And then there's a big chunk which I won't try to explain
40:38 - 40:40

which is called linear algebra mod 2.
40:40 - 40:43

And that somehow figures out a square
40:43 - 40:46

that comes from the factorizations,
40:46 - 40:48

somehow looks through this list of factorizations
40:48 - 40:51

and magically applies some linear algebra stuff.
40:51 - 40:53

Well ok, I won't go through that.
40:53 - 40:54

It's something doable
40:54 - 40:59

and this is using some standard Sage functions to do it pretty easily.
40:59 - 41:02

There's all sorts of things that
41:02 - 41:04

in the interest of time I won't get into
41:04 - 41:06

like how this easyfactorizations() works.
41:06 - 41:08

And this is something where people write papers
41:08 - 41:10

and papers and papers talking about trying to make this
41:10 - 41:12

go as quickly as possible.
41:12 - 41:15

I do want to emphasize one fact about these methods:
41:15 - 41:17

the bold-faced **very small memory requirements**.
41:17 - 41:20

You can use the elliptic curve method
41:20 - 41:22

that you heard about before
41:22 - 41:25

you can use that to figure out easy factorizations
41:25 - 41:27

using very little memory.
41:27 - 41:29

That means you can build a chip
41:29 - 41:30

like a FPGA, a special purpose ASIC,
41:30 - 41:33

you can have thousands of little ECM units
41:33 - 41:35

running in parallel, so you can really
41:35 - 41:38

massively parallelize this easyfactorizations() function.
41:38 - 41:40

So if people tell you "oh you need tons of
41:40 - 41:43

memory for sieving to find easy factorizations"
41:43 - 41:46

then you should tell them "no no, that's not true"
41:46 - 41:48

"ECM you can do with very little memory
41:48 - 41:50

running massively parallelizable"
41:50 - 41:53

And then there's other things which,
41:53 - 41:54

I suppose if we had another hour
41:54 - 41:56

then we could get into all these details
41:56 - 41:59

of like how the number field sieve goes beyond this.
41:59 - 42:02

It's a similar kind of method but gets more complicated.
42:02 - 42:05

I do want to emphasize again thing in bold-face here
42:05 - 42:07

which is that if somebody tells you
42:07 - 42:10

"oh 2^80 operations, the attacker wouldn't do
42:10 - 42:14

that because the target isn't worth that much to them"
42:14 - 42:18

Well, **Batch NFS** is a way to take a bunch of N's
42:18 - 42:21

and like the batch techniques before
42:21 - 42:24

there's some magic ways to find redundancies
42:24 - 42:26

between doing factorizations of those N's.
42:26 - 42:27

So somebody tells you:
42:27 - 42:30

"oh 2^80, that isn't worth it for the attacker."
42:30 - 42:33

Well, **Batch NFS** reduces the cost quite a bit
42:33 - 42:35

for factoring each key.
42:35 - 42:37

If the attacker has a lot of keys to target
42:37 - 42:40

they can factor all of them in not much more cost
42:40 - 42:41

than just factoring one.
42:41 - 42:44

So don't believe people who take an economic view
42:44 - 42:47

how much the number field sieve is worth.
42:47 - 42:49

Alright, what does this mean?
42:49 - 42:52

Can the attacker actually do the NFS
42:52 - 42:54

once they've put in all these optimizations?
42:54 - 42:59

Well if you look carefully at the analyses of the NFS
42:59 - 43:02

there are only about 2^70 differences.
43:02 - 43:04

Things like these a^2-N's.
43:04 - 43:09

If you look through 2^70 of those and scan them properly,
43:09 - 43:11

find easy factorizations, do linear algebra,
43:11 - 43:14

you will factor any 1024 bit key.
43:14 - 43:16

And 2^70, how big is that number?
43:16 - 43:21

It's actually small enough that you can do this computation on
43:21 - 43:22

your favorite botnet.
43:22 - 43:26

Say, Conficker is shrinking, still
43:26 - 43:28

and it will probably be wiped out at some point
43:28 - 43:29

but it's an example of what you can do
43:29 - 43:32

using any of these security problems that are deployed
43:32 - 43:34

on enough machines.
43:34 - 43:35

You can break into a bunch of machines,
43:35 - 43:37

millions of machines.
43:37 - 43:40

Conficker estimates vary between 7 million and 12 million machines.
43:40 - 43:44

So use your, say, 2^23, that's 8 million machines.
43:44 - 43:47

Count how many seconds there are in a year
43:47 - 43:49

and then, say, ok that means we have to do
43:49 - 43:52

2^22 differences per second machine.
43:52 - 43:57

And that is slower than state of the art factorization code already runs.
43:57 - 43:59

So it's actually a pretty easy computation.
43:59 - 44:02

You don't even need that many millions of computers to do this.
44:02 - 44:04

If people tell you "wait a minute"
44:04 - 44:06

"you can't just use a bunch of machines"
44:06 - 44:08

"there's all this work for linear algebra"
44:08 - 44:10

Then I'll just skip this.
44:10 - 44:13

Be aware that linear algebra is not as much of a problem
44:13 - 44:15

as some people would say.
44:15 - 44:19

If you use a big botnet there's one little problem.
44:19 - 44:21

For an attacker who breaks into a bunch of machines
44:21 - 44:24

and then starts spinning them up, heating up the CPU,
44:24 - 44:26

the fan starts running, the user starts wondering
44:26 - 44:28

"why is my machine running so slowly?"
44:28 - 44:30

Maybe only use half of the CPU
44:30 - 44:34

but still it has still got a good chance of getting you detected
44:34 - 44:35

and kicked off the machines.
44:35 - 44:37

Users are gonna shut you down if you're doing a
44:37 - 44:39

say, year of computation on your botnet.
44:39 - 44:43

So what do you do instead?
44:43 - 44:45

Well you build a little computer cluster.
44:45 - 44:49

Now yesterday we heard about a big building
44:49 - 44:51

in Bluffdale, Utah, that apparently
44:51 - 44:55

is going to store all of the data collected by
44:55 - 44:57

the NSA for the next hundred years
44:57 - 44:59

or maybe forever.
44:59 - 45:01

But something that I didn't hear emphasized yesterday
45:01 - 45:05

was that this building also has a new power substation
45:05 - 45:08

which is drawing 65 megawatts,
45:08 - 45:10

well generating 65 megawatts.
45:10 - 45:14

Now what are we doing with 65 megawatts?
45:14 - 45:15

Is that the kind of power that you need to
45:15 - 45:17

store a bunch of data on tapes?
45:17 - 45:21

No, that's the kind of power you need to do computations.
45:21 - 45:25

So what is NSA doing with 2^26 watts?
45:25 - 45:27

Or maybe even with more watts.
45:27 - 45:31

You shouldn't actually think that 2^26 is such a big number.
45:31 - 45:34

Here's a little table with, well,
45:34 - 45:37

2^57 admittedly is pushing it a little bit.
45:37 - 45:41

This is the total power that the earth gets from the sun
45:41 - 45:43

of which about half gets to the earth's surface.
45:43 - 45:48

2^44 is the amount the human is using right now.
45:48 - 45:51

Varies a little bit but I mean this is the average power
45:51 - 45:53

including cars and such.
45:53 - 45:56

If you have the botnet that breaks into millions of machines
45:56 - 46:00

and runs in flat-out that's about 2^30 watts.
46:00 - 46:02

And actually if you think about it, wait a minute,
46:02 - 46:03

there's a lot of machines out there.
46:03 - 46:07

This means the 2^26 is not actually that much power.
46:07 - 46:11

It's just one dinky little Bluffdale 65 MW computer center
46:11 - 46:15

and actually if you're a government agency
46:15 - 46:16

you probably have more than one of those.
46:16 - 46:19

As you heard yesterday in the context of data storage
46:19 - 46:21

but again, it's not just data storage.
46:21 - 46:26

You don't make a 65 MW power station to just store data.
46:26 - 46:30

Ok, if you just take standard graphics processing units
46:30 - 46:33

and run 2^26 watts to those that will do about
46:33 - 46:36

2^84 floating point multiplications per year.
46:36 - 46:40

You have to figure out, ok, what are exactly the operations involved here.
46:40 - 46:45

This should be enough to break 1024 bit RSA
46:45 - 46:48

and if NSA is not just buying GPU's off-the-shelves
46:48 - 46:50

but really tuning chips that they build
46:50 - 46:57

using IBM to manufacture their chips at the moment.
46:57 - 47:02

Apparently it wasn't cost-effective for NSA to manufacture their own chips
47:02 - 47:04

so in 2005 they started subcontracting for IBM
47:04 - 47:08

but presumably through that fabrication IBM is building chips
47:08 - 47:11

that NSA wants and those should be about 10 times faster
47:11 - 47:13

than what GPU's can do.
47:13 - 47:18

So it should be possible with this little 65 MW computer cluster
47:18 - 47:21

to factor at least 1, maybe even 10,
47:21 - 47:23

and then with batch NFS maybe even more
47:23 - 47:30

1024 bit RSA keys in a year.
47:30 - 47:38

applause
47:38 - 47:41

Nadia: So to conclude things up here I want to
47:41 - 47:43

explain how you can use
47:43 - 47:46

from the comfort of your very own home
47:46 - 47:52

a distributed, a massive scale distributed cloud computing service
47:52 - 47:57

to calculate private keys on your own.
47:57 - 48:00

So many of you may be familiar with Amazon EC2
48:00 - 48:05

but it turns out Google also has a large amount of computing infrastructure
48:05 - 48:07

and they actually have a very convenient web interface
48:07 - 48:08

to access it.
48:08 - 48:13

laugther
48:13 - 48:22

applause
48:22 - 48:24

It's amazing what you can find on the Internet.
48:24 - 48:26

laughter
48:26 - 48:29

So ok, here's some keys except, you know,
48:29 - 48:31

if you look at these carefully
48:31 - 48:33

not all of these are sort of obviously RSA keys.
48:33 - 48:35

There are some problems here
48:35 - 48:41

like the second to last one on the bottom.
48:41 - 48:42

So...
48:42 - 48:43

laughter
48:43 - 48:50

After doing this search we found this key and
48:50 - 48:55

someone seems to have pasted a private key into pastebin
48:55 - 49:00

and someone came along and interrupted part of it
49:00 - 49:05

with some profanity.
49:05 - 49:07

This is an interesting problem.
49:07 - 49:09

What do we do with this key?
49:09 - 49:12

Clearly this is an interesting key, we must have this key.
49:12 - 49:14

laughter
49:14 - 49:25

applause
49:25 - 49:26

Step 1... yeah?
49:26 - 49:30

Male: Did you see the easter egg in row 1?
49:30 - 49:31

Nadia: Well, we'll discuss this.
49:31 - 49:34

Angel: Questions will be later, after the talk please.
49:34 - 49:39

Nadia: So yeah, we've removed the obvious problems here.
49:39 - 49:42

But there's still an issue which is that
49:42 - 49:44

we're missing hundreds and hundreds of bits of key.
49:44 - 49:46

We can't brute force-this.
49:46 - 49:48

What are we gonna do?
49:48 - 49:51

Well, let's look at what RSA keys actually are.
49:51 - 49:54

I introduced RSA keys and I was, like,
49:54 - 49:57

it's a d and d has like a thosand of bits.
49:57 - 50:01

And if we don't know p and q
50:01 - 50:03

how are we gonna get this d?
50:03 - 50:08

Here is the storage format for an RSA key.
50:08 - 50:11

Public key is the modulus N, the public exponent e.
50:11 - 50:15

Ok, private key has version, N, e,
50:15 - 50:17

d the decryption exponent,
50:17 - 50:20

p, q, d mod (p-1), d mod (q-1),
50:20 - 50:23

the inverse of q mod p,
50:23 - 50:25

this is all useful information if you want to
50:25 - 50:27

speed up your calculation and not have to
50:27 - 50:29

compute everything every time you use your key.
50:29 - 50:32

Okay.
50:32 - 50:36

What did we see here then?
50:36 - 50:41

Here is the key broken up into all of the difference pieces.
50:41 - 50:45

So we have: the red bit is approximately where N is sitting.
50:45 - 50:47

We've got e in orange.
50:47 - 50:51

We've got part of d in yellow.
50:51 - 50:52

We seem to be missing p
50:52 - 50:55

but we've got part of q
50:55 - 50:57

and then here's d mod p-1
50:57 - 51:02

and d mod q-1 and q inverse mod p.
51:02 - 51:08

There's also a little problem before the red here
51:08 - 51:08

laughter
51:08 - 51:11

in addition.
51:11 - 51:14

You might call this the private part of the private key.
51:14 - 51:16

laughter
51:16 - 51:21

applause
51:21 - 51:24

Fortunately this is just sitting in an encoding of the length
51:24 - 51:30

and the version. laughter
51:30 - 51:33

Alright, so what do we do with this information?
51:33 - 51:37

Basically given any part of this private key
51:37 - 51:39

we can easily compute all of the other parts.
51:39 - 51:40

Simple formulas:
51:40 - 51:49

q will be 2^(e times d mod p-1) - 1 and GCD that with N
51:49 - 51:52

then we get q and then
51:52 - 51:54

we can figure out what p was by dividing
51:54 - 51:56

and figure out what d is
51:56 - 51:58

and so on and so forth.
51:58 - 52:02

You can do this even if you have a part of a single value
52:02 - 52:04

from the private key.
52:04 - 52:08

You can still compute the rest of the private key from that.
52:08 - 52:12

We don't have time to get into this but it's online.
52:12 - 52:17

You've seen a little bit of math.
52:17 - 52:18

Single formulas, typed into Sage.
52:18 - 52:23

We can reconstruct the private key here.
52:23 - 52:29

applause
52:29 - 52:33

So, what are the lessons that we can learn
52:33 - 52:36

from everything that we've done today.
52:36 - 52:39

The first one is: stop using 1024 bit RSA
52:39 - 52:41

if you haven't already.
52:41 - 52:43

I have looked at the keys that are out there on the Internet
52:43 - 52:46

and you are still using 1024 bit RSA.
52:46 - 52:48

laughter
52:48 - 52:52

Second: make sure your primes are big enough.
52:52 - 52:54

Don't try to be clever about how you're generating them.
52:54 - 52:58

Make sure your primes are random.
52:58 - 52:59

You guys have some problems
52:59 - 53:01

especially in hardware.
53:01 - 53:04

Also...
53:04 - 53:05

laughter
53:05 - 53:09

"FUCK A DUCK" is not good crypto.
53:09 - 53:12

Pastebin is not a secure cloud store.
53:12 - 53:14

laughter
53:14 - 53:20

applause
53:20 - 53:23

And you probably shouldn't put your private key
53:23 - 53:25

in a secure cloud store anyway.
53:25 - 53:26

laughter
53:26 - 53:31

applause
53:31 - 53:33

And lastly...
53:33 - 53:34

laughter
53:34 - 53:40

applause
53:40 - 53:42

Thank you.
53:42 - 53:44

applause
53:44 - 53:45

Angel: Give them an applause.
53:45 - 54:06

applause
54:06 - 54:11

Angel: So questions will be taken at the numbered microphones.
54:11 - 54:15

So anyone who has a question please line up at a microphone
54:15 - 54:17

and we will start by the signal angel
54:17 - 54:20

who has a question from IRC.
54:20 - 54:24

*Signal Angel*: IRC has a lot of punch lines involving penisses
54:24 - 54:26

and some questions.
54:26 - 54:31

First question is: would you recommend more using
54:31 - 54:38

elliptic curves or RSA with multiple primes...
54:38 - 54:42

using proper primes.
54:42 - 54:47

Dan: There are ways to do elliptic curves wrong,
54:47 - 54:49

there are ways to do RSA wrong.
54:49 - 54:54

In general if you've got a particular performance requirement
54:54 - 54:56

then elliptic curves are gonna meet that
54:56 - 54:58

with a much higher security level
54:58 - 55:00

than RSA is gonna meet that.
55:00 - 55:03

Of course you can do RSA securely,
55:03 - 55:05

you can do elliptic curves securely
55:05 - 55:07

but if you've got some performance limitation
55:07 - 55:09

as many applications do
55:09 - 55:13

then elliptic curves tend to be a better choice.
55:13 - 55:14

Nadia: I also want to clarify
55:14 - 55:18

the other most commonly used public-key cryptosystem
55:18 - 55:20

is DSA, the Digital Signature Algorithm.
55:20 - 55:21

There's also elliptic curve DSA
55:21 - 55:23

which is probably what you're thinking about
55:23 - 55:26

with elliptic curve cryptography.
55:26 - 55:31

DSA is in my opinion actually much worse
55:31 - 55:34

than RSA in terms of randomness failures.
55:34 - 55:36

In the paper that Dan referenced
55:36 - 55:39

we also talk about randomness failures in DSA
55:39 - 55:41

and it's horrifying.
55:41 - 55:45

If you ever ever screw up randomness in a single signature
55:45 - 55:48

your entire public key is lost.
55:48 - 55:51

Angel: We will take the next question from microphone nr 4
55:51 - 55:53

to the right of the seat.
55:53 - 55:54

Jake: Hey guys.
55:54 - 55:56

Sorry I didn't mention the computing power of Bluffdale.
55:56 - 56:02

That said, when we want to transition from 1024 bit RSA
56:02 - 56:05

to something else, what do you think is a good idea?
56:05 - 56:10

So for example, in Tor we do use 1024 bit RSA for TLS.
56:10 - 56:13

laughter Yeah I know, right?
56:13 - 56:16

So the thing is we're working on changing these things
56:16 - 56:17

but what is...
56:17 - 56:20

I mean Zooko has this 100 year crypto project.
56:20 - 56:22

What is the real thing that we should,
56:22 - 56:24

like if we could switch tomorrow to something,
56:24 - 56:27

what's the useful that we can live with for 5 or 10 years?
56:27 - 56:29

Is 2048 going to be useful?
56:29 - 56:31

Is 4096 where we should go tomorrow?
56:31 - 56:33

Cause there is a trade-off between
56:33 - 56:35

performance and forward secrecy
56:35 - 56:38

and there are a lot of things to think about.
56:38 - 56:40

Tanja: If you tell me 5 to 10 years
56:40 - 56:42

I would be still comfortable with elliptic curves.
56:42 - 56:45

If you talk 100 years
56:45 - 56:48

and then there's all these worse quantum computers around
56:48 - 56:51

then I would go for something which is worse in performance
56:51 - 56:53

like code-based cryptography
56:53 - 56:54

or for signatures hashing
56:54 - 56:56

but if you'd say in 5 to 10 years
56:56 - 56:57

I'm very comfortable recommending to you
56:57 - 57:00

elliptic curves with 256 bits.
57:00 - 57:02

Jake: Ok, thanks.
57:02 - 57:05

Angel: Signal angel?
57:05 - 57:07

*Signal Angel*: Yeah, another question from IRC.
57:07 - 57:11

If you avoid all the easy primes
57:11 - 57:14

does this shrink the search space such that
57:14 - 57:18

it becomes easier to crack the remaining keys?
57:18 - 57:20

Or asked another way:
57:20 - 57:25

can you quantify the ratio of easy primes versus hard primes?
57:25 - 57:26

Dan: Yeah, that's a good question.
57:26 - 57:29

The answer is: yes, it can be quantified
57:29 - 57:33

and once you're up to about to the 1024 bit RSA key level
57:33 - 57:35

then there's very very few weak primes.
57:35 - 57:37

So if you have a random number,
57:37 - 57:39

you just generate a random prime
57:39 - 57:40

then your chance of bumping into something weak
57:40 - 57:43

is so small that you just don't have to worry about it.
57:43 - 57:45

It is one of those much less frequent than
57:45 - 57:47

being hit by a lightning kind of events.
57:47 - 57:49

It is something that have been quantified
57:49 - 57:52

and it is an issue for smaller RSA keys
57:52 - 57:55

back when people were using, say, 512 bit RSA
57:55 - 57:57

then it was actually a noticable issue.
57:57 - 57:59

But once you're at 1024 or above then
57:59 - 58:01

the issue is basically gone.
58:01 - 58:04

It's something where you can and you should
58:04 - 58:06

look at exactly what the chance is of bumping into a weak
58:06 - 58:09

but it's not reallistically something
58:09 - 58:13

you're gonna encounter with 1024 bit RSA.
58:13 - 58:17

Angel: We're gonna take the next question from nr 1.
58:17 - 58:19

Just one notice: we don't have a lot of time left
58:19 - 58:24

so even though there is a talk in one hour
58:24 - 58:25

just ask.
58:25 - 58:28

Male: There is a devastating attack that can break AES,
58:28 - 58:32

SHA2, most probably SHA3 and DES.
58:32 - 58:34

It's called a biclique attack.
58:34 - 58:36

Are you concerned that it might break RSA
58:36 - 58:37

with any key size and even ECC
58:37 - 58:40

and even any public-key crypto?
58:40 - 58:42

Dan: No.
58:42 - 58:44

laughter
58:44 - 58:47

Bicliques are something which are certainly interesting
58:47 - 58:50

and I think about it as a way to speed up brute force
58:50 - 58:53

and it speeds up brute force by a noticable factor,
58:53 - 58:55

often makes attacks, say, twice as fast.
58:55 - 58:59

But with public-key crypto we have attacks which are
58:59 - 59:01

much much faster than brute force to begin with
59:01 - 59:04

so that kind of speed-up of brute force won't be competitive
59:04 - 59:08

with things like the number field sieve.
59:08 - 59:13

Angel: Next is from number 3.
59:13 - 59:17

Male: Me not coming from the dark side of mathematics,
59:17 - 59:21

how do I know that my random number generators are fine
59:21 - 59:24

for generating keys?
59:24 - 59:26

laughter
59:26 - 59:31

Nadia: Seed them.
59:31 - 59:37

Basically the things that are out there if you're using
59:37 - 59:41

a standard random number generator like in Linux,
59:41 - 59:44

actually Linux has added patches to the kernel
59:44 - 59:48

to fix some of the problems that we've found.
59:48 - 59:50

If you want to know that you're keys are good:
59:50 - 59:55

if you generate them on a general purpose computer
59:55 - 59:57

after it has been running for a long time
59:57 - 60:00

you're probably fine.
60:00 - 60:05

I would not generate very important or long-term keys
60:05 - 60:10

on low-power hardware devices where you can't tell...
60:10 - 60:12

laughter
60:12 - 60:16

Male: Thank you.
60:16 - 60:18

Angel: Next question will be taken from number 4.
60:18 - 60:19

Male: Hi.
60:19 - 60:24

It's just a remark, not really a question.
60:24 - 60:25

I work in high performance computing.
60:25 - 60:27

I was at the supercomputing conference
60:27 - 60:29

a couple of weeks ago.
60:29 - 60:32

They're presenting large-scale installations
60:32 - 60:36

and dealing with problems of power.
60:36 - 60:39

They want to build petaflops and exaflops systems
60:39 - 60:42

that are in a range of 20 MW.
60:42 - 60:47

So I'm wondering what NSA is doing with 53 megawatts
60:47 - 60:54

in a data center because no storage takes that much capacity.
60:54 - 60:59

I think it's really something we should think about.
60:59 - 61:03

So thanks, nice talk.
61:03 - 61:05

Angel: Next question from...
61:05 - 61:07

does the signal angel have one?
61:07 - 61:09

*Signal angel*: Ok, another question is:
61:09 - 61:12

How big do you estimate the effort to
61:12 - 61:17

exchange keys and certificates involving 1024 bit primes
61:17 - 61:23

let's say worldwide at the moment?
61:23 - 61:25

Dan: I wasn't getting what the question was.
61:25 - 61:26

Could you repeat the question?
61:26 - 61:27

*Signal angel*: I don't really understand it myself.
61:27 - 61:29

laughter
61:29 - 61:33

Angel: Ok, let's switch to 1.
61:33 - 61:37

Male: My question goes back to the idea
61:37 - 61:43

of how can we know how good the private keys are?
61:43 - 61:50

Is there something like a keygen evaluation tool?
61:50 - 61:57

Or can the quality of a key generator be estimated
61:57 - 61:59

from a sample of keys?
61:59 - 62:02

Like a public tool that you can throw keys on
62:02 - 62:06

and it will tell me a bias of my keygen?
62:06 - 62:09

Nadia: If you go to factorable.net
62:09 - 62:13

my co-authors and I have made a key check tool
62:13 - 62:15

which will tell you if your key is in the
62:15 - 62:18

list of keys that we scraped off the Internet
62:18 - 62:19

that we were able to compromise.
62:19 - 62:21

That's a first check.
62:21 - 62:24

It's not a positive verification that your key is good
62:24 - 62:27

but it will tell you if it is very bad.
62:27 - 62:29

laughter
62:29 - 62:37

If you want to check your generator,
62:37 - 62:41

if you're worried about the factorization problems
62:41 - 62:43

we have fast code on the website in C
62:43 - 62:46

that will do the GCD calculation for you
62:46 - 62:50

if you just dump a gigantic collection of keys at it.
62:50 - 62:54

If you might have other problems like you are
62:54 - 62:57

not sure whether it's factorable like those don't come up
62:57 - 63:00

the experiment that you might want to do is
63:00 - 63:04

sort of restart the process in realistic conditions
63:04 - 63:06

and generate a gigantic quantity of keys
63:06 - 63:09

and just check for repeats.
63:09 - 63:12

The sort of obvious stress testings.
63:12 - 63:16

If you have a device you want to boot it up in realistic conditions
63:16 - 63:18

and then check them
63:18 - 63:20

and this is painstaking work
63:20 - 63:22

but I don't think there's any realistic way
63:22 - 63:26

of doing better than that.
63:26 - 63:28

Or inspect the code, the obvious thing.
63:28 - 63:31

Make sure you're seeding anything.
63:31 - 63:33

Male: Ok, thank you.
63:33 - 63:36

Angel: Next question will be from mic number 4.
63:36 - 63:37

Male: Hi.
63:37 - 63:41

If I got your numbers correctly so you said something like
63:41 - 63:45

it would take 8 million machines a year to factor 1024
63:45 - 63:49

so I was wondering what if I would like to factor like
63:49 - 63:52

800 bits or 900 bits which is,
63:52 - 63:55

as I understand, way easier than 1024
63:55 - 63:57

but was never done publicly.
63:57 - 64:00

So are you saying that would take like a day or something?
64:00 - 64:03

Dan: It depends on the size of cluster you have.
64:03 - 64:07

The RSA 768 factorization a couple of years ago
64:07 - 64:11

used something on the scale of a 1500 computers
64:11 - 64:14

for a year.
64:14 - 64:16

And those were normal kind of computers,
64:16 - 64:19

Desktop kind of computers with, I think, typically 2 cores.
64:19 - 64:24

Nowadays, if you have some faster, say, 3 GHz 4-core machines,
64:24 - 64:27

I think those were typically 2 GHZ 2-core, nowadays...
64:27 - 64:31

Tanja's mentioning you of course want to use GPUs
64:31 - 64:32

to speed things up.
64:32 - 64:33

And there's many ways to take advantage of
64:33 - 64:35

computer technology moving forward.
64:35 - 64:40

To get careful estimates: for going from 768 to 800,
64:40 - 64:42

that's a short enough extrapolation
64:42 - 64:44

that people are pretty confident about that.
64:44 - 64:48

Getting to 900 starts getting iffy what exactly the cost would be
64:48 - 64:51

and 1024 there's a fair amount of guess works.
64:51 - 64:54

There is some consensus on rougly what the costs are
64:54 - 64:55

but it's hard to say exactly.
64:55 - 64:58

But again, to give you a scale of it:
64:58 - 65:00

you were saying like 900 or 800.
65:00 - 65:03

Well, 768 RSA challenge was done
65:03 - 65:07

and that one was 1500 machine years.
65:07 - 65:09

Male: Ok, thank you.
65:09 - 65:10

Angel: We'll take four more questions.
65:10 - 65:12

Signal angel first.
65:12 - 65:17

*Signal angel*: It's a clarification of the question I asked before.
65:17 - 65:22

The actual question is: how big will be the effort
65:22 - 65:32

to upgrade existing keys from 1024 bits to 4K or 8K bits?
65:32 - 65:36

Considering that the keys are currently in use
65:36 - 65:38

how much effort worldwide would it be to
65:38 - 65:42

upgrade all that to something more secure?
65:42 - 65:44

Tanja: I mean, for the private user if you're running
65:44 - 65:46

SSH on the laptop you can just generate a new key.
65:46 - 65:48

Doesn't take you much time.
65:48 - 65:50

You will notice maybe some degradation of performance
65:50 - 65:52

if you're running it on a netbook
65:52 - 65:55

but going to 2048 is not a big deal.
65:55 - 65:58

However, there is a recommendation of the
65:58 - 66:02

US government to stop using 1024 bit RSA as of 2010
66:02 - 66:05

and we still see it everywhere.
66:05 - 66:08

Apparently it is bigger effort than just
66:08 - 66:11

everybody spending 10 minutes on the laptop.
66:11 - 66:14

Dan: Maybe part of the problem is things like
66:14 - 66:17

everybody says "yeah, use 2048 or bigger"
66:17 - 66:19

and there's some financial standard
66:19 - 66:21

which says you can use any key size you want
66:21 - 66:24

up to 1984 bits.
66:24 - 66:26

I have no idea how they came up with this
66:26 - 66:27

but then they run into some other standard
66:27 - 66:29

which says use 2048 and they just can't
66:29 - 66:33

implement it on the smartcards which only support up to 1984.
66:33 - 66:36

Now if you get the standards people talking to each other for several years
66:36 - 66:39

then they agree on using 1984.
66:39 - 66:42

It's just about as good as 2048
66:42 - 66:45

but realistically when you have these conflicts
66:45 - 66:47

in standards then it takes a while to work out.
66:47 - 66:49

And when you have a busy server
66:49 - 66:51

that's trying to do 2048 bit RSA
66:51 - 66:53

it doesn't matter what the standard says
66:53 - 66:55

if you got a ton of load you just can't handle that load
66:55 - 66:58

if you're spending a ton of time on cryptography.
66:58 - 67:00

And that again is where we like elliptic curves
67:00 - 67:03

because they give you for whatever your
67:03 - 67:05

performance constraints are much better security.
67:05 - 67:07

So if you're trying a given security level
67:07 - 67:10

the speed is much better.
67:10 - 67:13

Angel: Next question will be from number 2.
67:13 - 67:15

Female: So you've had two developers stand up
67:15 - 67:17

and ask how to verify whether or not their
67:17 - 67:20

key generation is correct and whether or not their
67:20 - 67:21

random numbers are correct.
67:21 - 67:24

And I think it's great that you give coding recommendations
67:24 - 67:27

like seed rand() but coming from the development
67:27 - 67:30

perspective I like to unit test my code
67:30 - 67:32

and specifically I like to write my unit tests
67:32 - 67:33

before I write my code,
67:33 - 67:35

it's called test-driven development.
67:35 - 67:38

So if I'm going about creating an algorithm to
67:38 - 67:40

encrypt something or whatever
67:40 - 67:43

what are the steps that I need to do
67:43 - 67:47

when I'm writing my unit test?
67:47 - 67:52

Has there been some kind of effort in that way
67:52 - 67:55

like what goes into unit test for determining that
67:55 - 67:57

your random number is correct?
67:57 - 67:59

I mean there's algorithms for determining
67:59 - 68:01

whether your random number generator is
68:01 - 68:03

operating correctly, right?
68:03 - 68:05

Dan: This is something where there is,
68:05 - 68:08

if you look at how the random number generators work,
68:08 - 68:10

there was just a NIST workshop
68:10 - 68:12

and there's some NIST standards
68:12 - 68:13

which are telling you
68:13 - 68:16

here's what to do to test your random number generators.
68:16 - 68:18

So you have a hardware random number generator,
68:18 - 68:20

some post-processing, and they say
68:20 - 68:22

"ok here's the standard for how to test those pieces"
68:22 - 68:25

Now that's not the same as testing the cryptography
68:25 - 68:26

that's using those pieces
68:26 - 68:29

and it's very helpful if there's a clear separation there.
68:29 - 68:30

So you have your cryptography
68:30 - 68:33

which is doing deterministic stuff
68:33 - 68:34

and says you have to have some randomness coming
68:34 - 68:38

from a totally separate unit where the only job
68:38 - 68:40

of that separate unit is do the randomness properly.
68:40 - 68:42

And then the NIST test will tell you
68:42 - 68:43

what the randomness does
68:43 - 68:46

and then deterministic cryptographic testing
68:46 - 68:48

will tell you that the other pieces are working correctly
68:48 - 68:50

for all sorts of inputs.
68:50 - 68:54

Where it goes wrong is something like the ECDSA standard
68:54 - 68:55

that Nadia was mentioning before
68:55 - 68:57

and that's one that says that
68:57 - 68:59

you don't just deterministically generate a signature
68:59 - 69:01

you make some new randomness every time you generate
69:01 - 69:04

a signature and that's what goes horribly wrong.
69:04 - 69:06

So that's something where we need new standards
69:06 - 69:08

which say instead of doing what ECDSA does
69:08 - 69:10

we have a clear separation between
69:10 - 69:12

the cryptography always does the same thing
69:12 - 69:15

making it testable, and then randomness is centralized
69:15 - 69:18

in one spot which hopefully the NIST standards
69:18 - 69:20

do a good enough job of testing.
69:20 - 69:24

Female: So there's something that says here's what determines...
69:24 - 69:27

I guess what I'm asking is do you know of any algorithms
69:27 - 69:30

that can be used for determining whether something
69:30 - 69:32

is a good random number and whether your random
69:32 - 69:35

number generator is generating things properly?
69:35 - 69:37

Tanja: Well, there are a bunch of tests for.
69:37 - 69:39

There's hardware random number generators
69:39 - 69:42

you can run them through the diehard tests.
69:42 - 69:45

There's certificates by FIPS.
69:45 - 69:49

In general I would recommend implementing all those steps
69:49 - 69:51

but the smartcards that have been mentioned by Dan earlier
69:51 - 69:53

from the Taiwan citizen card
69:53 - 69:55

those are actually FIPS-certified.
69:55 - 69:59

So even though, these go through what is the industry standard
69:59 - 70:01

of doing random number generation on smartcards
70:01 - 70:04

which everybody thought was good enough,
70:04 - 70:05

apparently they're not.
70:05 - 70:06

So randomness is a total mess.
70:06 - 70:09

I mean after the paper by Nadia and
70:09 - 70:10

also when the paper by the Lausanne team came out,
70:10 - 70:13

there is no some more movement in finding
70:13 - 70:14

better random number generators.
70:14 - 70:19

At this moment, well, hope for the best.
70:19 - 70:21

laughter
70:21 - 70:22

Implement the standards, yes.
70:22 - 70:25

If you have some source of randomness
70:25 - 70:28

try to stretch it with a stream cipher or something like that.
70:28 - 70:31

Nadia: I want to tell a little story.
70:31 - 70:33

After we published the paper
70:33 - 70:36

I heard some very interesting things from some of the vendors.
70:36 - 70:39

I think one of the things that makes randomness
70:39 - 70:42

and cryptography a difficult problem is that
70:42 - 70:45

this kind of issue is something that
70:45 - 70:47

crosses a lot of the abstraction boundaries
70:47 - 70:49

that you're used to in coding.
70:49 - 70:51

You want to have the clean unit test that you know
70:51 - 70:52

that this piece works, and this piece works,
70:52 - 70:54

and this piece works, and this piece works,
70:54 - 70:58

and you put them all together and it will all work flawlessly.
70:58 - 71:00

Somehow this kind of problem is something that
71:00 - 71:03

happens at the boundaries of each unit test.
71:03 - 71:05

We know that the operating system is like
71:05 - 71:08

"ok I'm getting my proper inputs from the hardware
71:08 - 71:11

and I'm correctly generating my randomness from there"
71:11 - 71:13

and then the application says
71:13 - 71:15

"I'm getting my randomness from the operating system,
71:15 - 71:16

it's good randomness"
71:16 - 71:20

and then the application uses the correct crypto algorithms
71:20 - 71:22

to then do good cryptography.
71:22 - 71:24

And at the very beginning of that stage
71:24 - 71:28

there was something broken and it translated all the way
71:28 - 71:30

through all of these pieces of software
71:30 - 71:32

that were working correctly.
71:32 - 71:35

Testing this holistically was the only way
71:35 - 71:37

to find this kind of error.
71:37 - 71:41

One story that I heard from a vendor was that
71:41 - 71:44

they ran into randomness problems.
71:44 - 71:46

They developed some device.
71:46 - 71:47

It was working perfectly
71:47 - 71:51

and on the assembly line they were being turned on
71:51 - 71:55

and run through the checks.
71:55 - 71:58

You boot it up it checks the integrity
71:58 - 72:01

on the assembly line and it was continuing on.
72:01 - 72:06

If you exactly booted them up at the exact same time
72:06 - 72:10

in the exact same conditions
72:10 - 72:13

then all of the inputs to the random number generator
72:13 - 72:16

would be the same and they would generate the same keys
72:16 - 72:18

and then these keys would be,
72:18 - 72:20

you know, they already generated the keys,
72:20 - 72:21

so they wouldn't generate them again after
72:21 - 72:25

they went to the consumer.
72:25 - 72:27

I don't know how to unit test that
72:27 - 72:30

except take the entire device, turn it on,
72:30 - 72:33

and take multiple of them and make sure the devices
72:33 - 72:35

as they're coming out of the production process
72:35 - 72:40

are working correctly.
72:40 - 72:41

Angel: So we will take 2 more questions.
72:41 - 72:47

One from number 4 first and then number 1 at the last.
72:47 - 72:48

Male: How good, do you think,
72:48 - 72:53

hardware-supported random number generators are
72:53 - 72:53

after what you've just said?
72:53 - 72:56

I think they're probably not good anymore?
72:56 - 73:02

Dan: Intel has their RdRand instruction in some of the newer chips
73:02 - 73:05

and they say that they've gone through
73:05 - 73:07

a reasonable number of evaluation steps.
73:07 - 73:09

It sounds like they're trying.
73:09 - 73:12

Whether they're successful is a different question.
73:12 - 73:14

Something I don't like about the API is
73:14 - 73:16

RdRand gives you no way to test the pieces.
73:16 - 73:19

You can only test the post-processed output.
73:19 - 73:22

So nobody else is able to test the actual
73:22 - 73:24

randomness generation part of the hardware.
73:24 - 73:27

You can only get a filtered version of that.
73:27 - 73:29

So Intel and people who are contracted by Intel
73:29 - 73:31

to see what's going on inside
73:31 - 73:33

are the only people who can run these tests.
73:33 - 73:36

It's not open in a way that allows the community
73:36 - 73:37

to see if there's further problems.
73:37 - 73:39

But at least they're trying
73:39 - 73:41

and maybe with enough effort the hardware manufacturers
73:41 - 73:43

will get randomness generation right.
73:43 - 73:48

Of course if you can generate any sort of secret once
73:48 - 73:50

if you have enough of a secret to start with
73:50 - 73:52

then you can use things like AES to
73:52 - 73:54

generate any number of secrets
73:54 - 73:55

and you can put those secrets into
73:55 - 73:57

any number of devices that you want.
73:57 - 74:00

If you use AES in counter mode with
74:00 - 74:02

encrypting 1, encrypting 2, encrypting 3...
74:02 - 74:05

you get totally unpredictable, unrelated outputs
74:05 - 74:07

and then use those, burn those into some hardware.
74:07 - 74:10

But where do you get that initial randomness from?
74:10 - 74:13

Are you going through a trustworthy process there?
74:13 - 74:16

It's a hard problem.
74:16 - 74:18

Angel: Next question from number 1.
74:18 - 74:20

And that's the last question.
74:20 - 74:21

Male: You said something about
74:21 - 74:25

you dropped this group-based cryptography word.
74:25 - 74:28

Most of the stuff I've encountered under that heading
74:28 - 74:30

kind of sounded like snake oil
74:30 - 74:33

just cause they're using non-abelian groups and stuff.
74:33 - 74:36

Is there anything here that isn't?
74:36 - 74:38

Tanja: Ok, I did not say group-based crypto.
74:38 - 74:39

Male: Ok.
74:39 - 74:40

Tanja: I said code-based cryptography.
74:40 - 74:41

Male: Sorry, thanks.
74:41 - 74:43

Tanja: So there is a class of cryptosystems
74:43 - 74:45

which are fine against quantum computers.
74:45 - 74:50

For all crypto it's always up to what we know.
74:50 - 74:52

Next day, there could be somebody with a
74:52 - 74:54

polynomial-time factor algorithm and so on.
74:54 - 74:57

Now there's also a group of people doing
74:57 - 74:59

braid group cryptography, doing non-abelian groups,
74:59 - 75:02

most of these systems have been broken.
75:02 - 75:07

If they weren't broken by classical computers
75:07 - 75:11

maybe they would remain secure against quantum computers.
75:11 - 75:14

It's lots of "would"s there.
75:14 - 75:18

Yes, they might have this on their flags as well
75:18 - 75:21

but I wouldn't count this as a secure system.
75:21 - 75:23

Whereas code-based cryptography or lattice-based cryptography
75:23 - 75:26

are systems which should be fine.
75:26 - 75:29

Dan: Maybe just a URL if you're interested
75:29 - 75:32

in post-quantum cryptography is pqcrypto.org
75:32 - 75:35

and that keeps track of papers on the various types
75:35 - 75:37

of cryptography that should survive for a 100 years
75:37 - 75:41

and in particular against quantum computers.
75:41 - 75:42

Angel: Ladies and gentleman, please give
75:42 - 75:47

Nadia, Tanja and Daniel a big round of applause.
75:47 - 75:48

Thank you so much.
75:48 - 75:57

applause

Title:: FactHacks [29c3]
Description:: FactHacks
RSA factorization in the real world

RSA is the dominant public-key cryptosystem on the Internet. This talk will explain the state of the art in techniques for the attacker to figure out your secret RSA keys.

A typical 1024-bit RSA public key is a product of two secret 512-bit primes. The security of the cryptosystem relies on an attacker being unable to compute the user's secret primes. The attacker can try guessing one of the secret primes and checking whether it divides the user's public key, but this is very unlikely to work: there are more than 2^500 512-bit primes, far beyond the number of atoms in the universe. Fortunately for the attacker, there are much faster ways to figure out the secret primes. Some of the danger is visible in public announcements of factorization records by academic teams; the largest publicly factored RSA key, announced in 2009, has 768 bits. But what does this mean for the security of 1024-bit RSA? There are several different reasons that a real-world attacker doesn't have to play by the rules of an academic challenge. Sometimes users have bad random-number generators; sometimes users generate both primes from a single search; sometimes users choose special primes to try to make RSA run faster; sometimes users leak secret bits through side channels; sometimes the attacker has a botnet, or a 65-megawatt data center in Utah or Tianjin. This talk will assess the real-world threat to RSA-1024, explaining what the best attacks can do and how you can replicate them in your very own home or local GPU farm. Attack algorithms will be presented as Python code snippets and will already be online before the talk. This is a joint presentation by Daniel J. Bernstein, Nadia Heninger, and Tanja Lange, surveying work by many people.

Speaker: djb, Nadia Heninger, Tanja Lange
EventID: 5275
Event: 29th Chaos Communication Congress [29c3] by the Chaos Computer Club [CCC]
Location: Congress Centrum Hamburg (CCH); Am Dammtor; Marseiller Straße; 20355 Hamburg; Germany
Language: english
Begin: 12/28/2012 18:30:00 +01:00
Lizenz: CC-by-nc-sa

more » « less
Video Language:: English
Duration:: 01:16:10

	mw edited English subtitles for FactHacks [29c3]
	mw edited English subtitles for FactHacks [29c3]
	mw edited English subtitles for FactHacks [29c3]
	mw edited English subtitles for FactHacks [29c3]
	mw edited English subtitles for FactHacks [29c3]
	mw edited English subtitles for FactHacks [29c3]
	mw edited English subtitles for FactHacks [29c3]
	mw edited English subtitles for FactHacks [29c3]

Show all

English subtitles

Revisions

Revision 9

mw

FactHacks [29c3]

Revisions

Our website uses cookies

Operating cookies (Required)