34C3 - Defeating (Not)Petya's Cryptography

Edit subtitles

0:00 - 0:15

Music
0:15 - 0:19

Herald: And give a great round of applause
to Sebastian Eschweiler!
0:19 - 0:24

Applause
0:24 - 0:32

Sebastian: So hi everyone. So I want to
talk about how I defeated not (Not)Petya's
0:32 - 0:43

cryptography. Some I'd say that (Not)Petya
would be Ukraine's scourge, and for those
0:43 - 0:48

of you who don't know what the word
scourge means: this guy right here doesn't
0:48 - 0:57

know either. And quick trivia quiz - does
anyone know what this movie is? What's the
0:57 - 1:09

name of the movie? So, in the next scene
Johnny Depp enters. Does ring a bell? Jim,
1:09 - 1:17

a movie by Jim Jarmusch. Soundtrack by
Neil Young? Dead Man, right! Great! It's a
1:17 - 1:24

great movie. So if you want to know what a
scourge is then you could watch the movie.
1:24 - 1:32

So let's begin with my with my talk. So
this is what actually the official Ukraine
1:32 - 1:40

Twitter account tweeted some time ago in
at the end of June 2017. And there was an
1:40 - 1:47

outbreak of a ransomware attack which was
noticed mostly in Ukraine but also all
1:47 - 1:52

over the world. So millions of users and
also companies large companies were
1:52 - 2:03

affected, and the damage went into the
billions of dollars. So, the problem there
2:03 - 2:10

is, I mean, this is this is not the
everyday ransomware outbreak you have
2:10 - 2:17

there. And I want to give you a short
glimpse into the the (Not)Petya universe.
2:17 - 2:24

And also how I could decrypt all these
stuff that, yeah, actually was encrypted
2:24 - 2:31

by the this ransomware outbreak. So first
I want to begin my talk with a
2:31 - 2:36

differentiation. I want to draw a
demarcation line because all this
2:36 - 2:42

(Not)Petya universe - there's much sub-
summarised under under this whole label.
2:42 - 2:50

And I really, I'm just talking about a
small fraction of this whole universe. So
2:50 - 2:55

I will first distinguish between what my
talk will be in what my talk will not be
2:55 - 3:03

about. Next I will describe (Not)Petya's
cryptography, and especially (Not)Petya's
3:03 - 3:08

cryptographic failures. Which I then will
be exploiting in the remainder of the talk
3:08 - 3:22

and see how can the users get their, yeah,
vacation photos back. So what was this
3:22 - 3:35

whole thing? The outbreak started, as I
said, in June 2017, and it started as fake
3:35 - 3:41

update or as malicious update from a
software called Madoc
3:41 - 3:48

- this is a tax software, one of the two
official tax softwares in the Ukraine. So
3:48 - 3:56

almost every company has it installed on
tax accounts on their computers many
3:56 - 4:03

private persons have it installed. It was
pushed and then site loaded this file
4:03 - 4:14

perfc that was then downloaded to the
computers and it comprises several parts.
4:14 - 4:23

And some parts are more interesting than
then as others, so one component ever like
4:23 - 4:29

half an hour time it would start
encrypting the files depending on the
4:29 - 4:37

access level. So, if there wasn't any
access to infect the computer with well
4:37 - 4:44

this MBR infector. then it would just
based on the current user level encrypt
4:44 - 4:53

the files based on that current user. In
lack of a better name I would call this
4:53 - 5:01

"Mischa" component I know it's usually
somewhat different, something different.
5:01 - 5:06

However, this was the best name I could
find there. So it's basically just a
5:06 - 5:13

finite crypter with AES. My talk will not
go about this this part - this file
5:13 - 5:22

infector. The next very interesting
component was the spreader part - it's
5:22 - 5:27

basically based on the
EternalBlue/EternalRomance exploits that
5:27 - 5:34

had been leaked by the Shadow Brokers. My
talk will not go about this as well. This
5:34 - 5:41

is a whole different universe as well.
What my talk will be about is the actual
5:41 - 5:49

(Not)Petya component and this is an MBR
encrypter and I will show you in the next
5:49 - 5:58

slide what it's actually about. So, the
user will see something like this upon
5:58 - 6:06

reboot. So if the access rights are
granted so if there is some local admin
6:06 - 6:12

installed on the computer or the correct
password could be guessed by some attack
6:12 - 6:19

and then this dropper, the perfc that
would infect the system by overwriting the
6:19 - 6:26

Master Boot Record with a custom boot
loader. And it would reboot the system
6:26 - 6:33

after a predefined time - usually being
about 10 minutes. And then the the actual
6:33 - 6:40

(Not)Petya compound would kick into
action. This infected MBR, the bootloader
6:40 - 6:47

shows this decoy CheckDisk screen, and in
the background would find and iterate the
6:47 - 6:50

old files on the file system and
6:50 - 6:56

encrypt all these files.
So the main takeaways of this slide are
6:56 - 7:05

that we are dealing with 16-bit code here.
So we're we're in 16-bit real mode - so
7:05 - 7:11

this means no proper file system it means
no 32-bit or 64-bit code. There are no
7:11 - 7:20

windows APIs-- so debugging all this and
analyzing this is a tedious work. However,
7:20 - 7:26

we have something on the plus side which
is a BIOS, you know, the basic
7:26 - 7:34

input/output system and with that comes a
range of interrupts that are very well
7:34 - 7:43

described and a really nice thing is
having Bochs and being able to debug all
7:43 - 7:50

this in in IDA. So this was a really neat
plugin that had been developed by the
7:50 - 8:01

authors. So let's analyze a bit and check
the cryptography and why implementing
8:01 - 8:10

crypto is really hard. So I will start
this part with describing in short words
8:10 - 8:17

the theory of Salsa20 and then check that
compare that against the (Not)Petya
8:17 - 8:28

implementation of this cipher. So Salsa20
is a stream cipher. So basically you will
8:28 - 8:36

have a plaintext, it's it's about here and
then you will have some kind of random
8:36 - 8:41

number generator or pseudo-random number
generator and then apply some operations
8:41 - 8:49

on the plaintext and out comes the
ciphertext. And what you put in there are
8:49 - 8:55

four different variables, or four
different inputs, because there's the
8:55 - 9:00

constant part which is obviously not
variable. But we'll see about that in a
9:00 - 9:09

bit. So you have these key and the nonce,
and then there's this really nifty thing
9:09 - 9:18

called counter. What's so nifty about that
is if you were choose to stream a Salsa 20
9:18 - 9:25

encrypted stream and you would lose some
frames then you could adjust this counter
9:25 - 9:34

variable which would de-mark the offset of
the current stream in the current stream
9:34 - 9:39

and then could continue needlessly with
with the decryption of the stream. So this
9:39 - 9:48

is a very nice feature. The size of the
counter is 64-bits and the hash size here,
9:48 - 9:58

so what salsa does is create a 64-byte
hash for every different of these inputs
9:58 - 10:06

and with that then apply to the input. If
you want any more details about this salsa
10:06 - 10:13

cipher, the inventor of salsa should be in
this room, and should be in at the
10:13 - 10:17

convention. He is at the convention: I
just rode the train with him, so I guess
10:17 - 10:24

you can ask him the the gory details of
salsa 20. So it's a really nice crypto
10:24 - 10:31

cipher and you should hit him up and ask
him. I shouldn't have said that, right?
10:31 - 10:38

Sorry. So basically what a very important
thing is to note is for every invocation
10:38 - 10:46

of this this or for every instance of this
(Not)Petya encryption and you would have
10:46 - 10:54

these three variables or these three
inputs be constant so (Not)Petya would
10:54 - 11:00

would patch during the infection process
the key, the nonce, and the constants into
11:00 - 11:08

a configuration sector. The constants, not
these are somewhat different. And then the
11:08 - 11:15

counter would only change throughout every
iteration. So the interesting thing or the
11:15 - 11:23

interesting question was first - what is
the length of the key stream? So, the key
11:23 - 11:28

stream, you know the the number of
different outputs that that would come out
11:28 - 11:35

of this hashing function. And I mean for
this implementation for the theory this
11:35 - 11:43

is quite clear it's 64-bits here eight
bytes times the 64 bytes of output so it
11:43 - 11:54

would be about two to the seventy of a
periodicity. One different about the
11:54 - 12:01

actual implementation in (Not)Petya on the
theory would be that this constants thing
12:01 - 12:07

here had been changed to I string a
reading out invalid sect ID. So this would
12:07 - 12:23

break the official implementation. So the
very first failure I saw in (Not)Petya was
12:23 - 12:27

something like this. So I think I can slip
this slide because it's obvious for
12:27 - 12:35

everyone that this is a fail right? So who
sees the fail? So, not many hands? Okay,
12:35 - 12:43

then then I'll explain it. So I was just
kidding , I'm almost not not expecting for
12:43 - 12:49

for you to grasp of that first. So
remember, we're in 16-bit code. We have
12:49 - 12:57

here this shift left operation which would
shift a register by N-bytes. The register
12:57 - 13:04

with here is 16 bits, so it only has 16
digits so to speak and you would shift it
13:04 - 13:12

by 16 - by 10 hex - sixteen. And this
would effectively null the register, and
13:12 - 13:15

even worse is here. So
13:15 - 13:22

you would shift a an 8-bit register for
sixteen. So this is something you you
13:22 - 13:29

wouldn't expect from my proper
cryptographic implementation. And I was
13:29 - 13:33

was really intrigued why that is, because
it wouldn't make any sense in source code
13:33 - 13:40

and and did the (Not)Petya or the Petya
authors really implement that on purpose,
13:40 - 13:48

or what was the gist with it. And I looked
up the (Not)Petya, the salsa 20
13:48 - 13:55

implementation um I just googled it and
funny a nice website that had a
13:55 - 14:05

implementation of salsa 20. And there you
would see this code. So you see here it
14:05 - 14:12

sits in the the endianness conversion. And
you see here these these shifting of bits
14:12 - 14:20

of registers and you see here this this
uint fast 16 or 32 type. So it becomes
14:20 - 14:23

quite clear that this is a broken
implementation right? So everyone can see
14:23 - 14:33

that right? No, not right now because you
need to know some things more about this.
14:33 - 14:39

There are two important facts that make
this implementation broken and the two
14:39 - 14:46

facts are: you need to compile this code
for 16 bits, and you need to look up what
14:46 - 14:52

Visual Studio makes of these these type
definitions here. And when you look that
14:52 - 14:59

up, this is from Visual Studio and it's in
the standard int.h header file. And there
14:59 - 15:04

you see it's interpreted or translated as
unsigned int, so this is the base type.
15:04 - 15:09

And this base type, unsigned int, in
16-bit code is a 16-bit variable or a
15:09 - 15:14

16-bit register. And then everything makes
sense. And this was somewhat of a a
15:14 - 15:22

failure here - the authors didn't really
check if their code was actually working
15:22 - 15:27

against the test vectors. And this guy who
wrote the code here on this salsa 20
15:27 - 15:40

implementation made this mistake also. On
this slide you see two bugs off of the
15:40 - 15:47

(Not)Petya implementation of salsa 20. And
I quickly want to to explain both to you
15:47 - 15:52

because they are of some somewhat in
importance throughout the remainder of the
15:52 - 16:02

talk. So, both revolve around the the
counter variable. Just remember - this
16:02 - 16:07

counter variable is the only dynamic input
the only variable input throughout all
16:07 - 16:10

these salsa 20 invocations.
16:10 - 16:22

And the the first error is: so you read
a sector, a sector number into the memory.
16:22 - 16:29

So a bit about the the low-level aspects
of a hard drive. A hard drive from the
16:29 - 16:35

view of the BIOS would look somewhat like
a bunch of different sectors. So these are
16:35 - 16:43

512 byte chunks of data, and they they
come one after another. So if you were to
16:43 - 16:49

read a sector, you would actually read a
512 bytes about me a byte long portion of
16:49 - 16:56

data. And this is obviously not the offset
in the stream, and this is somewhat of a
16:56 - 17:04

problem there. So, you see here the same
variable is used to decrypt or encrypt the
17:04 - 17:13

data. And, I mean, this this is it doesn't
it isn't really apparent to to the
17:13 - 17:21

implementer of this cipher. However, if
you were to analyze it, it would look
17:21 - 17:28

something like this. So you would have the
key stream of two different sectors, two
17:28 - 17:33

different consecutive sectors here. So it
would start with with FF and then
17:33 - 17:40

continues with D7 and so on. And the next
sector would have almost all of the bytes
17:40 - 17:50

identical. And this is yeah a really big
failure because this really nice salsa 20
17:50 - 17:58

implementation, this really nice salsa
algorithm will then be, from would then be
17:58 - 18:03

converted from a one-time pad to a many
times pad. And this is the first fine I
18:03 - 18:10

wanted to show you in this very few lines
of code. The second part is, the second
18:10 - 18:18

bug is here this large keyword. Remember,
we are in 16-bit code so this large
18:18 - 18:25

keyword here does not push a 64-bit
variable as we would suspect to do it, but
18:25 - 18:31

a 32-bit variable. So only 32 bits of this
this nice counter variable are actually
18:31 - 18:41

used in this case. So, these two failures
are somewhat of a problem for this salsa
18:41 - 19:00

20 implementation. So, in this slide I
took two different hex dumps, and these
19:00 - 19:09

hex times were are within this expand key
function. And they they are well basically
19:09 - 19:17

two snapshots - one right before these
bugs become apparent: so before this this
19:17 - 19:25

endianness conversion, and right after on
the lower half. So you very nicely see the
19:25 - 19:32

different variables being put into all the
different key inputs being put into these
19:32 - 19:34

this memory blocks. So here, it would
19:34 - 19:39

spell out invalid sect ID, you know, the
constants (Not)Petya uses. Here you would
19:39 - 19:46

see the key and here, so it's broken into
two halves. Additionally you would see the
19:46 - 19:54

nonce here. And what really sticks out is
this bunch of zeros here. So this this
19:54 - 20:01

higher part of this a 64-bit variable
isn't even used - is it's not even filled.
20:01 - 20:07

So this is well the first problem here,
and after the endianness conversion you
20:07 - 20:14

see that it's not really an endianness
conversion but it's more of a nulling of
20:14 - 20:23

of bytes. So the result would be that this
initially 64 bit variable would be just
20:23 - 20:33

16-bit in length. And, as I said earlier,
the original salsa implementation would be
20:33 - 20:45

2^70s key lengths. And right now we have
16 bits times 64 bytes in in key lengths
20:45 - 20:51

which would result an in 26 bits in key
lengths. Which would be a measly 4
20:51 - 20:57

megabytes in key lengths. So this is the
this was a very interesting observation I
20:57 - 21:09

made there and this would be possible then
to decrypt together with the many times
21:09 - 21:18

pad properties of this cipher which make
it really easy to break. So to quickly
21:18 - 21:25

summarize this part of the talk - so we
have a very very short key stream of just
21:25 - 21:32

4 megabytes, it's highly repetitive
repetitive. So for each secretary progress
21:32 - 21:42

you would only have a progress of one byte
at a time. So only 26 bits remain of the
21:42 - 21:50

whole stream. And as I said, the many
times pad property's a very nice thing to
21:50 - 22:00

to analyze. I could come around to
implement a small joke here, so this salsa
22:00 - 22:09

implementation I would only call "LOLsa"
from now on. Sorry, is a bad joke sorry!
22:09 - 22:17

So, the the main goal is to derive the
keystream, and as I'm not really a crypto
22:17 - 22:22

expert, the basically the only attack I
know it would be a known plaintext attack
22:22 - 22:28

so this was my goal there because it's so
straightforward to do that. And in the
22:28 - 22:35

remainder of the talk I will tell you how
I did that. So without further ado, let's
22:35 - 22:42

exploit these failures and see how much of
the plaintext we can actually get from a
22:42 - 22:47

(Not)Petya infected drive.
So the modulus operandi
22:47 - 22:53

of (Not)Petya would look
somewhat like that. This, so let's let's
22:53 - 22:58

stop with the with the left hand side of
the of this slide, and concentrate on the
22:58 - 23:04

right hand side. For those of you who you
are not really intimately familiar with
23:04 - 23:11

NTFS - I wasn't before analyzing Petya or
(Not)Petya as well so no worries. It's
23:11 - 23:18

quite simple - so every NTFS partition has
something called an master file table.
23:18 - 23:25

MFT, abbreviated. And it would contain
some metadata about the file. For example,
23:25 - 23:32

the file name, the file size, and if the
file is small enough, it would even fit
23:32 - 23:42

the actual content of the file into this
record. So, as I said, MFT is just list of
23:42 - 23:50

records. And if the file is larger it
would have a pointer, a so called data
23:50 - 23:58

run, which would point to a cluster or
sector on the disk on the partition which
23:58 - 24:06

would then actually be the the payload
data. One of these MFT records is one
24:06 - 24:12

kilobyte in size. So now to the actual
implementation. So let's zoom out a bit
24:12 - 24:19

and see how this LOLsa implementation is
used in (Not)Petya. So it would basically
24:19 - 24:26

just iterate over over all of these MFT
records, and then check if this record
24:26 - 24:33

would put into would point to a file. If
that is the case, it would encrypt the
24:33 - 24:42

first kilobyte of the file, and then would
encrypt the record itself. This
24:42 - 24:48

implementation is good for a bunch of
reasons - it's very fast: You would only
24:48 - 24:52

need to encrypt the first kilobyte, and
this is this first kilobyte contains
24:52 - 25:03

really really important information. For
example, headers, or especially compressed
25:03 - 25:09

files have these really important header
structures there. Additionally, your file
25:09 - 25:14

recovery tools wouldn't be able to work
anymore because most of them rely on this
25:14 - 25:23

very header. And the second thing is this
MFT is can be considered as table of
25:23 - 25:31

contents. So with no metadata, with with
no pointers to these to the files, you
25:31 - 25:38

won't have anything there to work with to
recover from. And this is a I mean from
25:38 - 25:44

from the implementation standpoint it's
very neat, because it's fast and it's a
25:44 - 25:50

somewhat thorough. As the MFT is
25:50 - 25:57

really important, my idea was to to
recover that first and then check what
25:57 - 26:07

what comes out from there and see how I
can can further progress there with the
26:07 - 26:14

decryption of the files. So the metadata
would would be of most importance there.
26:14 - 26:27

I'm a visual person, and here I took two
disk dumps from a from one of my test
26:27 - 26:33

disks. So I infected a clean system with
(Not)Petya and let it encrypt the hard
26:33 - 26:38

drive. And on the left-hand side you see
the plaintext on and on the right-hand
26:38 - 26:45

side you see the encrypted data. So to
just get a better picture about the
26:45 - 26:54

encryption process. On the far left-hand
side fancy PowerPoint altered animation,
26:54 - 27:01

you see some kind of indicator so which
which would tell you at which offset, how
27:01 - 27:08

much of the of the data has actually been
different. And you see the whole disk is
27:08 - 27:20

more or less being encrypted. However, you
see at the far bottom part here it's more
27:20 - 27:27

dark red. And this is actually the MFT, so
the MFT is towards the end of the disk
27:27 - 27:34

sometimes. But this might be a
misconception. So my my idea was something
27:34 - 27:39

like this: we have two input sizes, right?
We have the the encrypted MFT, and we have
27:39 - 27:47

encrypted files. And first i would analyze
the MFT, and then derive the key stream
27:47 - 27:56

from that. After that analysis stage had
been finished I would put the key stream
27:56 - 28:04

back into the this this little box here,
and actually decrypt that. And then out
28:04 - 28:10

comes the decrypted MFT. And with that and
the key stream I would be able to find the
28:10 - 28:17

encrypted files on the disk, and then be
able to decrypt them; then be ready with
28:17 - 28:24

it right? So this was my initial idea
there and so let's start with the
28:24 - 28:35

decryption of the MFT. Known plaintext
attack, right? So an MFT you looks for
28:35 - 28:40

from from the viewpoint of the keystream,
somewhat like this. So you would have here
28:40 - 28:47

the first, the second and so on MFT
records. And on the on the column you will
28:47 - 28:55

have the actual byte that is used as key
to encrypt the key stream. Remember, the
28:55 - 29:04

operation that that encrypts the you know
the the key stream and the the plaintext
29:04 - 29:07

is just a mirror XOR operation.
So you have the the key
29:07 - 29:16

stream and the plaintext, and it's plainly
- so you can switch forth and back between
29:16 - 29:21

plaintext and ciphertext and even the key
stream with a with just applying an XOR
29:21 - 29:30

operation. So what you see here is for the
very first records you only have very few
29:30 - 29:37

key stream bytes or very few sample bytes.
However, as the progress as you make
29:37 - 29:42

progress with the analysis and then you
will have more and more of these sample
29:42 - 29:50

bytes to collect from and this this will
give you more confidence in the in the
29:50 - 29:57

result, and the may be known key stream
then. The question is, does the MFT hold
29:57 - 30:03

enough plaintext to do some some kind of
known plaintext attack? So let's look into
30:03 - 30:13

the specification. The MFT record has
basically two fields. So there is this the
30:13 - 30:20

standard information, which is a well-
defined structure and there's something
30:20 - 30:26

else called attribute list, which is a
quite dynamic structure and this would be
30:26 - 30:33

a somewhat more different, difficult to
clean plan text from. So I concentrate on
30:33 - 30:38

this first part. And the first part
quickly turned out to be quite constant
30:38 - 30:49

for many or most of the MFT records. And
you see it starts with FILE and then as
30:49 - 30:55

some some hex digits after that. And
although on the far bottom part of the
30:55 - 31:02

slide, I added my yeah certainty level. So
the the certainty level would be the
31:02 - 31:09

number of different sample bytes I would
have multiplied by the confidence I would
31:09 - 31:17

have in that sample byte being actually
this this plain text. So you see for the
31:17 - 31:23

very first record, we would have a quite
low, of a low certainty. I mean, it's just
31:23 - 31:31

one byte, right? Oh, the two byte skipping
is I mean quite a forward considering you
31:31 - 31:37

would have usually 512 bytes per sector on
a disk, and the MFT record is one kilobyte
31:37 - 31:48

in size, so the stream would progress two
bytes. And for record 100, so for if for
31:48 - 31:55

the 100th record I would have a certainty
of four. You know, I would just assume
31:55 - 32:03

these eight plaintext bytes here, divided
by 2, with an resulting 4. This wasn't
32:03 - 32:06

really satisfactory.
The problem there was towards
32:06 - 32:13

the end I would have many many unknown
records because I would was concentrating
32:13 - 32:20

on on the very first parts of the header.
So the remainder of the key stream, the
32:20 - 32:26

very end of the key stream wasn't be able
to being analyzed and eventually
32:26 - 32:30

decrypted. So I thought of something
different. And there was something like a
32:30 - 32:40

I would call a byte histogram. So for
every offset the MFT record, i would i
32:40 - 32:49

will then calculate creatin and a
histogram and check how many different
32:49 - 32:54

bytes are there actually for plaintext,
you know, it's a plaintext known plaintext
32:54 - 32:59

attack so I would need some kind of
plaintext and would do that for every
32:59 - 33:08

offset for every record. And so these the
questions there how to get many MFT
33:08 - 33:12

records - it's quite easy if you have some
nice colleagues you just need to hang them
33:12 - 33:17

all of the balcony and shake them a bit
then more or less voluntarily they will
33:17 - 33:25

give you some MFT keys to work with. And
the result of that is quite nice: you, I
33:25 - 33:31

mean for for the very first, there's
nothing much you can do the very first
33:31 - 33:36

record will always have very few sample
bytes. But as the stream progresses and
33:36 - 33:42

you will have a dramatic change there so
from this relatively low certainty of
33:42 - 33:51

four, I could increase that to more than
thirty. So this is somewhat nice, and ever
33:51 - 34:00

bit of doing science, this table drops out
from nowhere. So I compared these two
34:00 - 34:08

attack types. So, let's read that from
from right to left. On the on the far
34:08 - 34:16

right, I have for the first approach about
98% of the MFT records being successfully
34:16 - 34:22

recovered. You know, with the good thing
with science and with all this academic
34:22 - 34:30

approach is you have a ground truth. So I
have a plaintext an unencrypted hard drive
34:30 - 34:37

which were virtually unordered from
something right after infection, and then
34:37 - 34:43

you let execute the the whole infection
and encryption process. And then you can
34:43 - 34:49

differentiate and take you knows several
snapshots throughout the infection, change
34:49 - 34:55

different key values and all the stuff. So
this is a very nice thing about this
34:55 - 35:03

academic approach I was taking the other
So I could I could exactly pinpoint how many
35:03 - 35:14

of these records were perfectly recovered.
So for the byte histogram approach I could
35:14 - 35:18

decrypt almost all of the records, which
is quite nice because then we have a high
35:18 - 35:28

quality MFT to work with. What's also
quite nice is that we have zero wrong key
35:28 - 35:34

stream bytes. What's not so nice, however,
is that I was only able to recover about
35:34 - 35:43

1.3 percent of the overall key stream. And
remember - this this key stream is about 4
35:43 - 35:48

megabytes in length and I was able to
recover only 50 kilobytes of that, so we
35:48 - 35:56

can't recover all of the the files. Or is
that so? This is this was my next question
35:56 - 36:05

there. And so I drew another nice diagram.
This is the key stream; in the this is the
36:05 - 36:16

MFT, sorry, in the key stream. So the the
key stream is only filled and in sample
36:16 - 36:24

bytes at this 2 megabytes mark. And the
question is, are there many files in this
36:24 - 36:29

area being encrypted or is there some
other bug I could exploit. And I checked
36:29 - 36:35

where the files would lie into this the
key stream. So I would check how many
36:35 - 36:41

files are at which location in the key
stream being encrypted. And you see the
36:41 - 36:48

key stream is used all over the place - so
I mean sometimes it's used more, sometimes
36:48 - 36:56

it's used just the less, however, it's
basically used all over the place. And so
36:56 - 37:01

this much of the key stream could in a
perfect scenario be, I mean, perfect
37:01 - 37:09

scenario being a perfect known plaintext
oracle could theoretically be recovered.
37:09 - 37:18

However, this is somewhat of a problem
here and in the next part of the talk I
37:18 - 37:26

will present you how I was able to solve
this problem as well. So remember when the
37:26 - 37:32

file system is actually encrypted by
(Not)Petya, the file system looks somewhat
37:32 - 37:37

like this. So you would have the MFT which
is totally garbled so you won't have any
37:37 - 37:42

of any nice file pointers or data runs
pointing through those files. You won't
37:42 - 37:48

have any nice file names and all the
stuff. But with the first stage being
37:48 - 37:53

finished, the MFT looks really nice - I
mean like almost a hundred percent
37:53 - 37:58

of the records could be
recovered. So it looks
37:58 - 38:04

somewhat like this. So you have a bunch of
metadata you can now use to analyze the
38:04 - 38:09

the remainder of the files and the
remainder of the encryption. So all this
38:09 - 38:17

MFT or almost always actually decrypted.
And for files you would have the very
38:17 - 38:25

first kilobyte being encrypted. The
remainder of the file, the, I mean, most
38:25 - 38:31

files are more than a kilobyte in size,
right? So all the rest is not encrypted.
38:31 - 38:37

So you would have all this data and
metadata to collect information and to to
38:37 - 38:46

use to exploit as-known plaintext as
indicators for known plaintext. So I
38:46 - 38:56

thought of three different approaches to
you know to attack this problem. Basically
38:56 - 39:02

I was thinking about: okay what different
files are there, and I was quickly
39:02 - 39:08

thinking about different file types. And I
mean, there are I mean the file type can
39:08 - 39:13

be easily gleaned from this, right?
Because you would have the file extension
39:13 - 39:19

and it would be basically the file type.
And you would have two different types of
39:19 - 39:23

files - you would have structured files
and unstructured files. So I thought of
39:23 - 39:29

these and you would have something like,
yeah, source code, which I would consider
39:29 - 39:35

as more or less unstructured. And I was
calculating the histogram, so this would
39:35 - 39:40

give me some kind of prevalence towards
different bytes in the key stream. So it
39:40 - 39:47

would be somewhat like a guest plaintext
attack or something like that. The next
39:47 - 39:55

thing for structured files would be to do
the very same approach as with the MFT
39:55 - 40:01

records. I would calculate the histogram
for every offset of the first kilobyte,
40:01 - 40:08

and then quickly see how how many
different bytes are there per offset. And
40:08 - 40:15

the last approach uses somewhat more data.
It uses some of the metadata, and also
40:15 - 40:24

some of the file data. I will go into this
right now. So, what I basically have here
40:24 - 40:29

as I said it's only this this little
portion of the files encrypted. The whole
40:29 - 40:34

remainder of the file is not encrypted,
which is quite nice. And also the file
40:34 - 40:40

name the file sizes not encrypted.
So what I use, what I would do here
40:40 - 40:45

is create a database of
known files of known
40:45 - 40:52

Windows system files, for example. Y'all
might remember these nice background
40:52 - 40:59

images fonts all this stuff - plaintext
is flying around everywhere if you just
40:59 - 41:07

look for it. And I would have basically
three different, yeah, three three
41:07 - 41:17

different distinctors between those to
know which which is the correct plaintext.
41:17 - 41:25

So there is this this key file name, the
file size, and then the hash of this whole
41:25 - 41:32

remainder, of this whole tier. So if all
these 3-tuple would match, then I would
41:32 - 41:39

consider this as a proper plaintext.
However, there are some collisions, so
41:39 - 41:48

this is this is not really something that
is straightforward. So the initial idea of
41:48 - 41:56

having of only needing to analyze the MFT
and then could just being being able to
41:56 - 42:04

straightforward decrypt files needed to be
altered a bit. So I added this database of
42:04 - 42:12

known files there, I added another another
analysis stage in this nice box here, and
42:12 - 42:20

then I would be able to decrypt the files,
eventually. So let's do a bit of science
42:20 - 42:26

and check if this approach would be
worthwhile pursuing on a real-world
42:26 - 42:38

scenario. So let the science cat do its
science, let me have a drink. So whenever
42:38 - 42:47

you did do here is to create a database of
known files - I collected a bunch of
42:47 - 42:55

default Windows installations, which
resulted in about 340,000 unique files in
42:55 - 43:00

it. Then I calculated, you know, the the
different file type histograms I talked
43:00 - 43:09

about, I prepared my test setup, I added
there 1000 different files. And you should
43:09 - 43:16

note that these files were not part of
this known files database - they were
43:16 - 43:21

different from that because otherwise it
would have been easy to do. Then I would
43:21 - 43:28

infect this machine and then let its
encrypt by (Not)Petya and then attempting
43:28 - 43:40

recovery. And these are the results there.
So, I did this with 4 different runs, so I
43:40 - 43:46

tried every approach separately, and then
combined the three approaches. And the
43:46 - 43:52

outcome was something like this: So I
would have only two files by the general
43:52 - 44:00

histogram approach being able to
be to decrypt. At least 8%
44:00 - 44:07

where I will were yet decrypted by the
location histogram approach. And by the
44:07 - 44:12

known files approach over 90% of the files
could be successfully recovered. And even
44:12 - 44:22

better, the combined outcome would be
almost all files being able to decrypt.
44:22 - 44:31

So, so much about academia. So the problem
there is if you apply this to the real
44:31 - 44:37

world, you could get into more trouble
there. And there was lots and lots of more
44:37 - 44:43

to think about so. There were, for
example, non default windows installations
44:43 - 44:48

with the whole history of updates so this
might be really interesting from a
44:48 - 44:56

forensics perspective. Moreover, there's
lots of installed programs to derive known
44:56 - 45:05

plaintext from - for example, dotnet uses
a vast source of known plaintext or JDK
45:05 - 45:12

installations. So, especially the JDK
would would result in about tens of
45:12 - 45:17

thousands of different source code and
HTML files. So this is this really was
45:17 - 45:28

really quite nice. The drawback there was,
so there was so much data to collect, the
45:28 - 45:35

first attempts failed miserably because of
the sheer amount of RAM I was using. And
45:35 - 45:44

this would would result in the admins
constantly giving me more more RAM in my
45:44 - 45:53

VM so I would eventually end up with, I
think, 128 gigabytes of RAM and my my test
45:53 - 46:00

VM. Also you have a much larger master
file table you know because of all these
46:00 - 46:06

files that would need to be stored, so to
put them in in comparison. So this nice
46:06 - 46:14

working prototype, so this nice test
setup, I mean, would have an MFT of about
46:14 - 46:20

26 megabytes. And for real-world scenarios
you would have something like at least 500
46:20 - 46:27

megabytes, and even MFT could be even like
a couple of gigabytes in size. So this
46:27 - 46:33

means much more plaintext. So for through
these really large MFTs you could quickly
46:33 - 46:40

recover the whole key stream, the whole
four megabytes just by looking at the MFT.
46:40 - 46:47

And, in summary, this means that the
decryption of (Not)Petya encrypted drives
46:47 - 46:57

is possible. So, for for the file systems
the drives, I have looked at it, was
46:57 - 47:03

really I mean ever after having all these
these first bugs out of the way,
47:07 - 47:10

I was able to decrypt and
recover all the files there. So
47:10 - 47:18

there's a good chance the vacation photos
can be recovered as well. And this will
47:18 - 47:24

conclude my talk, so quick summary -
(Not)Petya has some severe cryptographic
47:24 - 47:33

failures and flaws, so I would only be
able to call this LOLsa and not salsa
47:33 - 47:41

anymore. It might be possible to further
look into this, this you know this expand
47:41 - 47:49

key and thing. It it has really really a
bunch of more cryptographic failures in
47:49 - 47:56

it. I didn't look into that deeper but I
know some of you guys are professors and
47:56 - 48:01

this might be a nice homework for your
students to look at this (Not)Petya
48:01 - 48:08

implementation and check out some, you
know, more advanced crypto analysis
48:08 - 48:16

methods. And you should note that this
hook this this whole (Not)Petya thing here
48:16 - 48:22

I described the whole cryptographic flaws
there are not just in (Not)Petya. They are
48:22 - 48:28

you see the brackets there: there they are
in every each and every version of Petya.
48:28 - 48:38

So all of the drives that you potentially
have saved can be decrypted. And so my
48:38 - 48:45

last point is if any of you or any of us
falls victim to ransomware you really
48:45 - 48:52

should keep the drives. You keep them
untouched in a locker and wait for a talk
48:52 - 48:59

like this basically. And then hope that
someone comes around and then be able to
48:59 - 49:07

decrypt the drives and then, you will have
your vacation photos back. So that's all
49:07 - 49:09

folks, thank you for your attention.
49:09 - 49:13

Applause
49:13 - 49:16

Herald: Thank you very much Sebastian. Now
49:16 - 49:19

it's time for questions, please queue up
by the microphones.
49:26 - 49:28

Mic 1: Hi, well thank you very much for
49:28 - 49:34

sharing your findings with us. I'm from
Rotterdam the largest harbour in Europe,
49:34 - 49:36

and as you might know, we were struck by
Petya
49:36 - 49:39

Sebastian Yes
Mic 1: Two terminals went down for a
49:39 - 49:44

couple of days, a couple of hundred
billion euros of damage. Your approach is
49:44 - 49:49

quite theoretically, so now to practice -
if you were there in this summer with
49:49 - 49:54

these findings, would you've been able to
help us and decrypt the files and get all
49:54 - 49:58

the companies up and running again? Or is
this too academic?
49:58 - 50:05

Sebastian: No, it's a practical approach I
mean, I work for CrowdStrike and we had
50:05 - 50:14

some occasions where we were able to help
the customers. So it's a very practical
50:14 - 50:21

thing to do so. And I was trying to
insinuate this with, you know, this slide
50:21 - 50:26

here. So I was talking about the real
world in this scenario. So I looked at
50:26 - 50:32

about 50 different encrypted hard drives
with (Not)Petya, and I was able to decrypt
50:32 - 50:39

all of them, or most of them. Mostly the
the ones not being able to decrypt or to
50:39 - 50:45

some, well, let's say, level 8 mistakes.
Mic 1: Awesome. Have you shared your
50:45 - 50:49

findings with nomoreransoms.org?
Sebastian: No, I don't know about this
50:49 - 50:51

site
Mic 1: They provide decryption tools for
50:51 - 50:57

ransomware. Please do, thank you
Herald: Microphone six, please
50:57 - 51:02

Mic 6: Thank you, in the beginning you
mentioned that basically the key was
51:02 - 51:05

shortened to what was a 24-bit, something
like that?
51:05 - 51:10

Sebastian: 26, yes
Mic 6: 26, yeah. From that point, wasn't a
51:10 - 51:14

brute-force attack way faster and way more
reliable?
51:14 - 51:21

Sebastian: Oh no no, don't get me wrong
there. So the keystream length was 2^26,
51:21 - 51:28

so the keystream length was four
megabytes. So you wouldn't be able to
51:28 - 51:35

brute-force that. So sorry, do you get
that? The number of bytes was four
51:35 - 51:39

megabytes, and you couldn't be able to
brute-force that.
51:39 - 51:44

Mic 6: Yeah but you already mention at the
beginning that you basically shortened it
51:44 - 51:50

down to the possibility of at most two to
the power of 26, and it is calculable.
51:50 - 51:57

Sebastian: Yes yes, I totally get the
question. But you're you're missing the
51:57 - 52:03

point there. So it's not the key space,
2^26, but the key length is 2^26, which
52:03 - 52:11

would be something - I I'm not good at
that converting this to decimal. Would be
52:11 - 52:17

something like, let me check, two to
the... I mean, here, math guys, computer
52:17 - 52:22

guys: how many bits is that?
[Inaudible audience answer]
52:22 - 52:26

Sebastian: Say again?
Crowd: A lot?
52:26 - 52:31

Sebastian: A lot! So it's, I mean, it's
it's four megabytes of keylength and you
52:31 - 52:35

couldn't just brute force that because you
you would have each of these
52:35 - 52:44

four megabytes you would have to brute
force. Got that? So the key is not 2^26
52:44 - 52:52

but the key length, the keystream length,
is that long. Got that? So yeah, this is
52:52 - 52:58

the key space would be longer than the
Bible. You know, and you couldn't just
52:58 - 53:03

brute force the Bible, the text of the
Bible. I mean given enough time yes but
53:03 - 53:09

you know we will all know the stories with
in monkeys and typewriters. I mean we can
53:09 - 53:12

talk about that that offline but you're
you're mixing up two different numbers
53:12 - 53:18

there.
Mic 6: Yeah I guess, thank you.
53:18 - 53:20

Herald: Questions from the Internet,
please
53:20 - 53:26

Signal Angel: Does the MFT encryption work
the same for Petya and (Not)Petya?
53:26 - 53:34

Sebastian: Yes, the underlying mechanism
is the same. The cryptography differs in
53:34 - 53:43

such a way that the constants number is
different. So, this little guy here - this
53:43 - 53:50

is different. It would be usually like
expand key something-something. And here
53:50 - 53:57

it's invalid sect ID. So it's somewhat
different, but the the MFT encryption; I
53:57 - 54:03

mean the the bytecode the and the
algorithm is the very same.
54:03 - 54:12

Signal Angel: Thanks
Herald: Any more questions? Then please
54:12 - 54:15

give a great round of applause to
Sebastian
54:15 - 54:18

Sebastian: Thank you
54:18 - 54:24

Applause
54:24 - 54:44

subtitles created by c3subtitles.de
in the year 2018. Join, and help us!

Title:: 34C3 - Defeating (Not)Petya's Cryptography
Description:: more » « less
Video Language:: English
Duration:: 54:44

	C3Subtitles edited English subtitles for 34C3 - Defeating (Not)Petya's Cryptography
	C3Subtitles edited English subtitles for 34C3 - Defeating (Not)Petya's Cryptography
	amrozek edited English subtitles for 34C3 - Defeating (Not)Petya's Cryptography
	amrozek edited English subtitles for 34C3 - Defeating (Not)Petya's Cryptography
	mfhs edited English subtitles for 34C3 - Defeating (Not)Petya's Cryptography
	mfhs edited English subtitles for 34C3 - Defeating (Not)Petya's Cryptography
	Maximilian Marx edited English subtitles for 34C3 - Defeating (Not)Petya's Cryptography
	C3Subtitles edited English subtitles for 34C3 - Defeating (Not)Petya's Cryptography

English subtitles

Revisions

Revision 8 Edited

C3Subtitles

34C3 - Defeating (Not)Petya's Cryptography

Revisions

Our website uses cookies

Operating cookies (Required)