WEBVTT
00:00:00.099 --> 00:00:14.800
Music
00:00:14.800 --> 00:00:18.760
Herald: And give a great round of applause
to Sebastian Eschweiler!
00:00:18.760 --> 00:00:24.290
Applause
00:00:24.290 --> 00:00:32.360
Sebastian: So hi everyone. So I want to
talk about how I defeated not (Not)Petya's
00:00:32.360 --> 00:00:42.800
cryptography. Some I'd say that (Not)Petya
would be Ukraine's scourge, and for those
00:00:42.800 --> 00:00:48.190
of you who don't know what the word
scourge means: this guy right here doesn't
00:00:48.190 --> 00:00:57.249
know either. And quick trivia quiz - does
anyone know what this movie is? What's the
00:00:57.249 --> 00:01:08.940
name of the movie? So, in the next scene
Johnny Depp enters. Does ring a bell? Jim,
00:01:08.940 --> 00:01:17.140
a movie by Jim Jarmusch. Soundtrack by
Neil Young? Dead Man, right! Great! It's a
00:01:17.140 --> 00:01:24.310
great movie. So if you want to know what a
scourge is then you could watch the movie.
00:01:24.310 --> 00:01:31.640
So let's begin with my with my talk. So
this is what actually the official Ukraine
00:01:31.640 --> 00:01:39.930
Twitter account tweeted some time ago in
at the end of June 2017. And there was an
00:01:39.930 --> 00:01:47.210
outbreak of a ransomware attack which was
noticed mostly in Ukraine but also all
00:01:47.210 --> 00:01:51.999
over the world. So millions of users and
also companies large companies were
00:01:51.999 --> 00:02:02.969
affected, and the damage went into the
billions of dollars. So, the problem there
00:02:02.969 --> 00:02:09.889
is, I mean, this is this is not the
everyday ransomware outbreak you have
00:02:09.889 --> 00:02:17.220
there. And I want to give you a short
glimpse into the the (Not)Petya universe.
00:02:17.220 --> 00:02:24.130
And also how I could decrypt all these
stuff that, yeah, actually was encrypted
00:02:24.130 --> 00:02:31.330
by the this ransomware outbreak. So first
I want to begin my talk with a
00:02:31.330 --> 00:02:35.880
differentiation. I want to draw a
demarcation line because all this
00:02:35.880 --> 00:02:41.560
(Not)Petya universe - there's much sub-
summarised under under this whole label.
00:02:41.560 --> 00:02:49.660
And I really, I'm just talking about a
small fraction of this whole universe. So
00:02:49.660 --> 00:02:55.430
I will first distinguish between what my
talk will be in what my talk will not be
00:02:55.430 --> 00:03:02.630
about. Next I will describe (Not)Petya's
cryptography, and especially (Not)Petya's
00:03:02.630 --> 00:03:08.310
cryptographic failures. Which I then will
be exploiting in the remainder of the talk
00:03:08.310 --> 00:03:22.360
and see how can the users get their, yeah,
vacation photos back. So what was this
00:03:22.360 --> 00:03:34.700
whole thing? The outbreak started, as I
said, in June 2017, and it started as fake
00:03:34.700 --> 00:03:40.990
update or as malicious update from a
software called Madoc
00:03:40.990 --> 00:03:48.400
- this is a tax software, one of the two
official tax softwares in the Ukraine. So
00:03:48.400 --> 00:03:55.730
almost every company has it installed on
tax accounts on their computers many
00:03:55.730 --> 00:04:02.960
private persons have it installed. It was
pushed and then site loaded this file
00:04:02.960 --> 00:04:14.380
perfc that was then downloaded to the
computers and it comprises several parts.
00:04:14.380 --> 00:04:22.870
And some parts are more interesting than
then as others, so one component ever like
00:04:22.870 --> 00:04:28.880
half an hour time it would start
encrypting the files depending on the
00:04:28.880 --> 00:04:37.250
access level. So, if there wasn't any
access to infect the computer with well
00:04:37.250 --> 00:04:44.080
this MBR infector. then it would just
based on the current user level encrypt
00:04:44.080 --> 00:04:53.480
the files based on that current user. In
lack of a better name I would call this
00:04:53.480 --> 00:05:00.810
"Mischa" component I know it's usually
somewhat different, something different.
00:05:00.810 --> 00:05:06.260
However, this was the best name I could
find there. So it's basically just a
00:05:06.260 --> 00:05:13.280
finite crypter with AES. My talk will not
go about this this part - this file
00:05:13.280 --> 00:05:22.190
infector. The next very interesting
component was the spreader part - it's
00:05:22.190 --> 00:05:27.389
basically based on the
EternalBlue/EternalRomance exploits that
00:05:27.389 --> 00:05:34.371
had been leaked by the Shadow Brokers. My
talk will not go about this as well. This
00:05:34.371 --> 00:05:41.260
is a whole different universe as well.
What my talk will be about is the actual
00:05:41.260 --> 00:05:49.260
(Not)Petya component and this is an MBR
encrypter and I will show you in the next
00:05:49.260 --> 00:05:58.049
slide what it's actually about. So, the
user will see something like this upon
00:05:58.049 --> 00:06:06.240
reboot. So if the access rights are
granted so if there is some local admin
00:06:06.240 --> 00:06:12.230
installed on the computer or the correct
password could be guessed by some attack
00:06:12.230 --> 00:06:18.610
and then this dropper, the perfc that
would infect the system by overwriting the
00:06:18.610 --> 00:06:26.110
Master Boot Record with a custom boot
loader. And it would reboot the system
00:06:26.110 --> 00:06:32.760
after a predefined time - usually being
about 10 minutes. And then the the actual
00:06:32.760 --> 00:06:39.500
(Not)Petya compound would kick into
action. This infected MBR, the bootloader
00:06:39.500 --> 00:06:46.760
shows this decoy CheckDisk screen, and in
the background would find and iterate the
00:06:46.760 --> 00:06:49.500
old files on the file system and
00:06:49.500 --> 00:06:56.060
encrypt all these files.
So the main takeaways of this slide are
00:06:56.060 --> 00:07:04.530
that we are dealing with 16-bit code here.
So we're we're in 16-bit real mode - so
00:07:04.530 --> 00:07:10.740
this means no proper file system it means
no 32-bit or 64-bit code. There are no
00:07:10.740 --> 00:07:20.449
windows APIs-- so debugging all this and
analyzing this is a tedious work. However,
00:07:20.449 --> 00:07:26.290
we have something on the plus side which
is a BIOS, you know, the basic
00:07:26.290 --> 00:07:33.910
input/output system and with that comes a
range of interrupts that are very well
00:07:33.910 --> 00:07:43.020
described and a really nice thing is
having Bochs and being able to debug all
00:07:43.020 --> 00:07:50.130
this in in IDA. So this was a really neat
plugin that had been developed by the
00:07:50.130 --> 00:08:00.680
authors. So let's analyze a bit and check
the cryptography and why implementing
00:08:00.680 --> 00:08:09.510
crypto is really hard. So I will start
this part with describing in short words
00:08:09.510 --> 00:08:16.510
the theory of Salsa20 and then check that
compare that against the (Not)Petya
00:08:16.510 --> 00:08:28.000
implementation of this cipher. So Salsa20
is a stream cipher. So basically you will
00:08:28.000 --> 00:08:35.568
have a plaintext, it's it's about here and
then you will have some kind of random
00:08:35.568 --> 00:08:41.089
number generator or pseudo-random number
generator and then apply some operations
00:08:41.089 --> 00:08:49.240
on the plaintext and out comes the
ciphertext. And what you put in there are
00:08:49.240 --> 00:08:54.649
four different variables, or four
different inputs, because there's the
00:08:54.649 --> 00:08:59.570
constant part which is obviously not
variable. But we'll see about that in a
00:08:59.570 --> 00:09:09.069
bit. So you have these key and the nonce,
and then there's this really nifty thing
00:09:09.069 --> 00:09:17.610
called counter. What's so nifty about that
is if you were choose to stream a Salsa 20
00:09:17.610 --> 00:09:24.540
encrypted stream and you would lose some
frames then you could adjust this counter
00:09:24.540 --> 00:09:33.709
variable which would de-mark the offset of
the current stream in the current stream
00:09:33.709 --> 00:09:39.190
and then could continue needlessly with
with the decryption of the stream. So this
00:09:39.190 --> 00:09:48.230
is a very nice feature. The size of the
counter is 64-bits and the hash size here,
00:09:48.230 --> 00:09:57.730
so what salsa does is create a 64-byte
hash for every different of these inputs
00:09:57.730 --> 00:10:06.429
and with that then apply to the input. If
you want any more details about this salsa
00:10:06.429 --> 00:10:13.330
cipher, the inventor of salsa should be in
this room, and should be in at the
00:10:13.330 --> 00:10:17.429
convention. He is at the convention: I
just rode the train with him, so I guess
00:10:17.429 --> 00:10:24.399
you can ask him the the gory details of
salsa 20. So it's a really nice crypto
00:10:24.399 --> 00:10:31.070
cipher and you should hit him up and ask
him. I shouldn't have said that, right?
00:10:31.070 --> 00:10:37.940
Sorry. So basically what a very important
thing is to note is for every invocation
00:10:37.940 --> 00:10:46.499
of this this or for every instance of this
(Not)Petya encryption and you would have
00:10:46.499 --> 00:10:54.220
these three variables or these three
inputs be constant so (Not)Petya would
00:10:54.220 --> 00:11:00.139
would patch during the infection process
the key, the nonce, and the constants into
00:11:00.139 --> 00:11:08.089
a configuration sector. The constants, not
these are somewhat different. And then the
00:11:08.089 --> 00:11:15.079
counter would only change throughout every
iteration. So the interesting thing or the
00:11:15.079 --> 00:11:22.540
interesting question was first - what is
the length of the key stream? So, the key
00:11:22.540 --> 00:11:27.839
stream, you know the the number of
different outputs that that would come out
00:11:27.839 --> 00:11:34.579
of this hashing function. And I mean for
this implementation for the theory this
00:11:34.579 --> 00:11:42.629
is quite clear it's 64-bits here eight
bytes times the 64 bytes of output so it
00:11:42.629 --> 00:11:53.769
would be about two to the seventy of a
periodicity. One different about the
00:11:53.769 --> 00:12:00.619
actual implementation in (Not)Petya on the
theory would be that this constants thing
00:12:00.619 --> 00:12:07.160
here had been changed to I string a
reading out invalid sect ID. So this would
00:12:07.160 --> 00:12:23.069
break the official implementation. So the
very first failure I saw in (Not)Petya was
00:12:23.069 --> 00:12:27.439
something like this. So I think I can slip
this slide because it's obvious for
00:12:27.439 --> 00:12:35.240
everyone that this is a fail right? So who
sees the fail? So, not many hands? Okay,
00:12:35.240 --> 00:12:43.189
then then I'll explain it. So I was just
kidding , I'm almost not not expecting for
00:12:43.189 --> 00:12:49.149
for you to grasp of that first. So
remember, we're in 16-bit code. We have
00:12:49.149 --> 00:12:57.380
here this shift left operation which would
shift a register by N-bytes. The register
00:12:57.380 --> 00:13:03.579
with here is 16 bits, so it only has 16
digits so to speak and you would shift it
00:13:03.579 --> 00:13:11.879
by 16 - by 10 hex - sixteen. And this
would effectively null the register, and
00:13:11.879 --> 00:13:14.910
even worse is here. So
00:13:14.910 --> 00:13:22.009
you would shift a an 8-bit register for
sixteen. So this is something you you
00:13:22.009 --> 00:13:28.919
wouldn't expect from my proper
cryptographic implementation. And I was
00:13:28.919 --> 00:13:32.680
was really intrigued why that is, because
it wouldn't make any sense in source code
00:13:32.680 --> 00:13:39.829
and and did the (Not)Petya or the Petya
authors really implement that on purpose,
00:13:39.829 --> 00:13:47.669
or what was the gist with it. And I looked
up the (Not)Petya, the salsa 20
00:13:47.669 --> 00:13:54.760
implementation um I just googled it and
funny a nice website that had a
00:13:54.760 --> 00:14:04.779
implementation of salsa 20. And there you
would see this code. So you see here it
00:14:04.779 --> 00:14:12.090
sits in the the endianness conversion. And
you see here these these shifting of bits
00:14:12.090 --> 00:14:19.810
of registers and you see here this this
uint fast 16 or 32 type. So it becomes
00:14:19.810 --> 00:14:23.480
quite clear that this is a broken
implementation right? So everyone can see
00:14:23.480 --> 00:14:32.730
that right? No, not right now because you
need to know some things more about this.
00:14:32.730 --> 00:14:38.829
There are two important facts that make
this implementation broken and the two
00:14:38.829 --> 00:14:46.210
facts are: you need to compile this code
for 16 bits, and you need to look up what
00:14:46.210 --> 00:14:52.259
Visual Studio makes of these these type
definitions here. And when you look that
00:14:52.259 --> 00:14:58.619
up, this is from Visual Studio and it's in
the standard int.h header file. And there
00:14:58.619 --> 00:15:04.209
you see it's interpreted or translated as
unsigned int, so this is the base type.
00:15:04.209 --> 00:15:08.629
And this base type, unsigned int, in
16-bit code is a 16-bit variable or a
00:15:08.629 --> 00:15:14.459
16-bit register. And then everything makes
sense. And this was somewhat of a a
00:15:14.459 --> 00:15:22.000
failure here - the authors didn't really
check if their code was actually working
00:15:22.000 --> 00:15:27.230
against the test vectors. And this guy who
wrote the code here on this salsa 20
00:15:27.230 --> 00:15:40.179
implementation made this mistake also. On
this slide you see two bugs off of the
00:15:40.179 --> 00:15:47.240
(Not)Petya implementation of salsa 20. And
I quickly want to to explain both to you
00:15:47.240 --> 00:15:52.059
because they are of some somewhat in
importance throughout the remainder of the
00:15:52.059 --> 00:16:02.380
talk. So, both revolve around the the
counter variable. Just remember - this
00:16:02.380 --> 00:16:06.950
counter variable is the only dynamic input
the only variable input throughout all
00:16:06.950 --> 00:16:09.920
these salsa 20 invocations.
00:16:09.920 --> 00:16:21.689
And the the first error is: so you read
a sector, a sector number into the memory.
00:16:21.689 --> 00:16:28.670
So a bit about the the low-level aspects
of a hard drive. A hard drive from the
00:16:28.670 --> 00:16:35.019
view of the BIOS would look somewhat like
a bunch of different sectors. So these are
00:16:35.019 --> 00:16:43.079
512 byte chunks of data, and they they
come one after another. So if you were to
00:16:43.079 --> 00:16:48.759
read a sector, you would actually read a
512 bytes about me a byte long portion of
00:16:48.759 --> 00:16:56.009
data. And this is obviously not the offset
in the stream, and this is somewhat of a
00:16:56.009 --> 00:17:03.790
problem there. So, you see here the same
variable is used to decrypt or encrypt the
00:17:03.790 --> 00:17:13.049
data. And, I mean, this this is it doesn't
it isn't really apparent to to the
00:17:13.049 --> 00:17:21.449
implementer of this cipher. However, if
you were to analyze it, it would look
00:17:21.449 --> 00:17:28.199
something like this. So you would have the
key stream of two different sectors, two
00:17:28.199 --> 00:17:33.080
different consecutive sectors here. So it
would start with with FF and then
00:17:33.080 --> 00:17:39.570
continues with D7 and so on. And the next
sector would have almost all of the bytes
00:17:39.570 --> 00:17:49.639
identical. And this is yeah a really big
failure because this really nice salsa 20
00:17:49.639 --> 00:17:57.770
implementation, this really nice salsa
algorithm will then be, from would then be
00:17:57.770 --> 00:18:03.140
converted from a one-time pad to a many
times pad. And this is the first fine I
00:18:03.140 --> 00:18:10.090
wanted to show you in this very few lines
of code. The second part is, the second
00:18:10.090 --> 00:18:17.740
bug is here this large keyword. Remember,
we are in 16-bit code so this large
00:18:17.740 --> 00:18:24.580
keyword here does not push a 64-bit
variable as we would suspect to do it, but
00:18:24.580 --> 00:18:31.450
a 32-bit variable. So only 32 bits of this
this nice counter variable are actually
00:18:31.450 --> 00:18:40.970
used in this case. So, these two failures
are somewhat of a problem for this salsa
00:18:40.970 --> 00:18:59.990
20 implementation. So, in this slide I
took two different hex dumps, and these
00:18:59.990 --> 00:19:09.399
hex times were are within this expand key
function. And they they are well basically
00:19:09.399 --> 00:19:16.970
two snapshots - one right before these
bugs become apparent: so before this this
00:19:16.970 --> 00:19:25.480
endianness conversion, and right after on
the lower half. So you very nicely see the
00:19:25.480 --> 00:19:32.140
different variables being put into all the
different key inputs being put into these
00:19:32.140 --> 00:19:34.140
this memory blocks. So here, it would
00:19:34.140 --> 00:19:39.331
spell out invalid sect ID, you know, the
constants (Not)Petya uses. Here you would
00:19:39.331 --> 00:19:46.309
see the key and here, so it's broken into
two halves. Additionally you would see the
00:19:46.309 --> 00:19:53.710
nonce here. And what really sticks out is
this bunch of zeros here. So this this
00:19:53.710 --> 00:20:01.380
higher part of this a 64-bit variable
isn't even used - is it's not even filled.
00:20:01.380 --> 00:20:07.170
So this is well the first problem here,
and after the endianness conversion you
00:20:07.170 --> 00:20:13.820
see that it's not really an endianness
conversion but it's more of a nulling of
00:20:13.820 --> 00:20:23.060
of bytes. So the result would be that this
initially 64 bit variable would be just
00:20:23.060 --> 00:20:32.600
16-bit in length. And, as I said earlier,
the original salsa implementation would be
00:20:32.600 --> 00:20:44.610
2^70s key lengths. And right now we have
16 bits times 64 bytes in in key lengths
00:20:44.610 --> 00:20:51.299
which would result an in 26 bits in key
lengths. Which would be a measly 4
00:20:51.299 --> 00:20:57.289
megabytes in key lengths. So this is the
this was a very interesting observation I
00:20:57.289 --> 00:21:08.730
made there and this would be possible then
to decrypt together with the many times
00:21:08.730 --> 00:21:17.720
pad properties of this cipher which make
it really easy to break. So to quickly
00:21:17.720 --> 00:21:24.980
summarize this part of the talk - so we
have a very very short key stream of just
00:21:24.980 --> 00:21:32.009
4 megabytes, it's highly repetitive
repetitive. So for each secretary progress
00:21:32.009 --> 00:21:41.620
you would only have a progress of one byte
at a time. So only 26 bits remain of the
00:21:41.620 --> 00:21:50.230
whole stream. And as I said, the many
times pad property's a very nice thing to
00:21:50.230 --> 00:22:00.190
to analyze. I could come around to
implement a small joke here, so this salsa
00:22:00.190 --> 00:22:08.690
implementation I would only call "LOLsa"
from now on. Sorry, is a bad joke sorry!
00:22:08.690 --> 00:22:16.630
So, the the main goal is to derive the
keystream, and as I'm not really a crypto
00:22:16.630 --> 00:22:22.029
expert, the basically the only attack I
know it would be a known plaintext attack
00:22:22.029 --> 00:22:28.470
so this was my goal there because it's so
straightforward to do that. And in the
00:22:28.470 --> 00:22:34.909
remainder of the talk I will tell you how
I did that. So without further ado, let's
00:22:34.909 --> 00:22:41.710
exploit these failures and see how much of
the plaintext we can actually get from a
00:22:41.710 --> 00:22:47.050
(Not)Petya infected drive.
So the modulus operandi
00:22:47.050 --> 00:22:53.299
of (Not)Petya would look
somewhat like that. This, so let's let's
00:22:53.299 --> 00:22:58.480
stop with the with the left hand side of
the of this slide, and concentrate on the
00:22:58.480 --> 00:23:03.860
right hand side. For those of you who you
are not really intimately familiar with
00:23:03.860 --> 00:23:10.890
NTFS - I wasn't before analyzing Petya or
(Not)Petya as well so no worries. It's
00:23:10.890 --> 00:23:17.659
quite simple - so every NTFS partition has
something called an master file table.
00:23:17.659 --> 00:23:24.940
MFT, abbreviated. And it would contain
some metadata about the file. For example,
00:23:24.940 --> 00:23:32.399
the file name, the file size, and if the
file is small enough, it would even fit
00:23:32.399 --> 00:23:42.320
the actual content of the file into this
record. So, as I said, MFT is just list of
00:23:42.320 --> 00:23:50.091
records. And if the file is larger it
would have a pointer, a so called data
00:23:50.091 --> 00:23:57.860
run, which would point to a cluster or
sector on the disk on the partition which
00:23:57.860 --> 00:24:06.020
would then actually be the the payload
data. One of these MFT records is one
00:24:06.020 --> 00:24:12.330
kilobyte in size. So now to the actual
implementation. So let's zoom out a bit
00:24:12.330 --> 00:24:19.370
and see how this LOLsa implementation is
used in (Not)Petya. So it would basically
00:24:19.370 --> 00:24:26.399
just iterate over over all of these MFT
records, and then check if this record
00:24:26.399 --> 00:24:33.230
would put into would point to a file. If
that is the case, it would encrypt the
00:24:33.230 --> 00:24:41.879
first kilobyte of the file, and then would
encrypt the record itself. This
00:24:41.879 --> 00:24:47.769
implementation is good for a bunch of
reasons - it's very fast: You would only
00:24:47.769 --> 00:24:52.259
need to encrypt the first kilobyte, and
this is this first kilobyte contains
00:24:52.259 --> 00:25:03.440
really really important information. For
example, headers, or especially compressed
00:25:03.440 --> 00:25:09.191
files have these really important header
structures there. Additionally, your file
00:25:09.191 --> 00:25:14.429
recovery tools wouldn't be able to work
anymore because most of them rely on this
00:25:14.429 --> 00:25:23.160
very header. And the second thing is this
MFT is can be considered as table of
00:25:23.160 --> 00:25:31.000
contents. So with no metadata, with with
no pointers to these to the files, you
00:25:31.000 --> 00:25:37.870
won't have anything there to work with to
recover from. And this is a I mean from
00:25:37.870 --> 00:25:44.320
from the implementation standpoint it's
very neat, because it's fast and it's a
00:25:44.320 --> 00:25:49.729
somewhat thorough. As the MFT is
00:25:49.729 --> 00:25:57.450
really important, my idea was to to
recover that first and then check what
00:25:57.450 --> 00:26:07.090
what comes out from there and see how I
can can further progress there with the
00:26:07.090 --> 00:26:14.180
decryption of the files. So the metadata
would would be of most importance there.
00:26:14.180 --> 00:26:27.419
I'm a visual person, and here I took two
disk dumps from a from one of my test
00:26:27.419 --> 00:26:33.339
disks. So I infected a clean system with
(Not)Petya and let it encrypt the hard
00:26:33.339 --> 00:26:38.270
drive. And on the left-hand side you see
the plaintext on and on the right-hand
00:26:38.270 --> 00:26:45.190
side you see the encrypted data. So to
just get a better picture about the
00:26:45.190 --> 00:26:54.220
encryption process. On the far left-hand
side fancy PowerPoint altered animation,
00:26:54.220 --> 00:27:01.059
you see some kind of indicator so which
which would tell you at which offset, how
00:27:01.059 --> 00:27:08.320
much of the of the data has actually been
different. And you see the whole disk is
00:27:08.320 --> 00:27:19.600
more or less being encrypted. However, you
see at the far bottom part here it's more
00:27:19.600 --> 00:27:27.199
dark red. And this is actually the MFT, so
the MFT is towards the end of the disk
00:27:27.199 --> 00:27:33.630
sometimes. But this might be a
misconception. So my my idea was something
00:27:33.630 --> 00:27:38.779
like this: we have two input sizes, right?
We have the the encrypted MFT, and we have
00:27:38.779 --> 00:27:47.330
encrypted files. And first i would analyze
the MFT, and then derive the key stream
00:27:47.330 --> 00:27:55.850
from that. After that analysis stage had
been finished I would put the key stream
00:27:55.850 --> 00:28:03.970
back into the this this little box here,
and actually decrypt that. And then out
00:28:03.970 --> 00:28:09.659
comes the decrypted MFT. And with that and
the key stream I would be able to find the
00:28:09.659 --> 00:28:16.899
encrypted files on the disk, and then be
able to decrypt them; then be ready with
00:28:16.899 --> 00:28:24.160
it right? So this was my initial idea
there and so let's start with the
00:28:24.160 --> 00:28:35.059
decryption of the MFT. Known plaintext
attack, right? So an MFT you looks for
00:28:35.059 --> 00:28:40.320
from from the viewpoint of the keystream,
somewhat like this. So you would have here
00:28:40.320 --> 00:28:46.800
the first, the second and so on MFT
records. And on the on the column you will
00:28:46.800 --> 00:28:54.510
have the actual byte that is used as key
to encrypt the key stream. Remember, the
00:28:54.510 --> 00:29:03.829
operation that that encrypts the you know
the the key stream and the the plaintext
00:29:03.829 --> 00:29:07.350
is just a mirror XOR operation.
So you have the the key
00:29:07.350 --> 00:29:15.769
stream and the plaintext, and it's plainly
- so you can switch forth and back between
00:29:15.769 --> 00:29:21.170
plaintext and ciphertext and even the key
stream with a with just applying an XOR
00:29:21.170 --> 00:29:29.850
operation. So what you see here is for the
very first records you only have very few
00:29:29.850 --> 00:29:36.680
key stream bytes or very few sample bytes.
However, as the progress as you make
00:29:36.680 --> 00:29:41.610
progress with the analysis and then you
will have more and more of these sample
00:29:41.610 --> 00:29:49.560
bytes to collect from and this this will
give you more confidence in the in the
00:29:49.560 --> 00:29:56.930
result, and the may be known key stream
then. The question is, does the MFT hold
00:29:56.930 --> 00:30:02.889
enough plaintext to do some some kind of
known plaintext attack? So let's look into
00:30:02.889 --> 00:30:13.399
the specification. The MFT record has
basically two fields. So there is this the
00:30:13.399 --> 00:30:19.570
standard information, which is a well-
defined structure and there's something
00:30:19.570 --> 00:30:25.840
else called attribute list, which is a
quite dynamic structure and this would be
00:30:25.840 --> 00:30:32.799
a somewhat more different, difficult to
clean plan text from. So I concentrate on
00:30:32.799 --> 00:30:38.200
this first part. And the first part
quickly turned out to be quite constant
00:30:38.200 --> 00:30:48.590
for many or most of the MFT records. And
you see it starts with FILE and then as
00:30:48.590 --> 00:30:55.210
some some hex digits after that. And
although on the far bottom part of the
00:30:55.210 --> 00:31:02.210
slide, I added my yeah certainty level. So
the the certainty level would be the
00:31:02.210 --> 00:31:08.749
number of different sample bytes I would
have multiplied by the confidence I would
00:31:08.749 --> 00:31:16.559
have in that sample byte being actually
this this plain text. So you see for the
00:31:16.559 --> 00:31:22.759
very first record, we would have a quite
low, of a low certainty. I mean, it's just
00:31:22.759 --> 00:31:30.580
one byte, right? Oh, the two byte skipping
is I mean quite a forward considering you
00:31:30.580 --> 00:31:37.370
would have usually 512 bytes per sector on
a disk, and the MFT record is one kilobyte
00:31:37.370 --> 00:31:47.919
in size, so the stream would progress two
bytes. And for record 100, so for if for
00:31:47.919 --> 00:31:54.820
the 100th record I would have a certainty
of four. You know, I would just assume
00:31:54.820 --> 00:32:02.730
these eight plaintext bytes here, divided
by 2, with an resulting 4. This wasn't
00:32:02.730 --> 00:32:06.320
really satisfactory.
The problem there was towards
00:32:06.320 --> 00:32:12.720
the end I would have many many unknown
records because I would was concentrating
00:32:12.720 --> 00:32:19.799
on on the very first parts of the header.
So the remainder of the key stream, the
00:32:19.799 --> 00:32:25.799
very end of the key stream wasn't be able
to being analyzed and eventually
00:32:25.799 --> 00:32:30.159
decrypted. So I thought of something
different. And there was something like a
00:32:30.159 --> 00:32:39.799
I would call a byte histogram. So for
every offset the MFT record, i would i
00:32:39.799 --> 00:32:48.860
will then calculate creatin and a
histogram and check how many different
00:32:48.860 --> 00:32:53.549
bytes are there actually for plaintext,
you know, it's a plaintext known plaintext
00:32:53.549 --> 00:32:58.560
attack so I would need some kind of
plaintext and would do that for every
00:32:58.560 --> 00:33:07.509
offset for every record. And so these the
questions there how to get many MFT
00:33:07.509 --> 00:33:12.039
records - it's quite easy if you have some
nice colleagues you just need to hang them
00:33:12.039 --> 00:33:16.999
all of the balcony and shake them a bit
then more or less voluntarily they will
00:33:16.999 --> 00:33:25.080
give you some MFT keys to work with. And
the result of that is quite nice: you, I
00:33:25.080 --> 00:33:30.640
mean for for the very first, there's
nothing much you can do the very first
00:33:30.640 --> 00:33:35.929
record will always have very few sample
bytes. But as the stream progresses and
00:33:35.929 --> 00:33:41.970
you will have a dramatic change there so
from this relatively low certainty of
00:33:41.970 --> 00:33:51.499
four, I could increase that to more than
thirty. So this is somewhat nice, and ever
00:33:51.499 --> 00:34:00.480
bit of doing science, this table drops out
from nowhere. So I compared these two
00:34:00.480 --> 00:34:08.230
attack types. So, let's read that from
from right to left. On the on the far
00:34:08.230 --> 00:34:16.270
right, I have for the first approach about
98% of the MFT records being successfully
00:34:16.270 --> 00:34:22.070
recovered. You know, with the good thing
with science and with all this academic
00:34:22.070 --> 00:34:29.659
approach is you have a ground truth. So I
have a plaintext an unencrypted hard drive
00:34:29.659 --> 00:34:36.920
which were virtually unordered from
something right after infection, and then
00:34:36.920 --> 00:34:43.370
you let execute the the whole infection
and encryption process. And then you can
00:34:43.370 --> 00:34:49.120
differentiate and take you knows several
snapshots throughout the infection, change
00:34:49.120 --> 00:34:54.539
different key values and all the stuff. So
this is a very nice thing about this
00:34:54.539 --> 00:35:02.830
academic approach I was taking the other
So I could I could exactly pinpoint how many
00:35:02.830 --> 00:35:14.280
of these records were perfectly recovered.
So for the byte histogram approach I could
00:35:14.280 --> 00:35:17.940
decrypt almost all of the records, which
is quite nice because then we have a high
00:35:17.940 --> 00:35:28.250
quality MFT to work with. What's also
quite nice is that we have zero wrong key
00:35:28.250 --> 00:35:34.210
stream bytes. What's not so nice, however,
is that I was only able to recover about
00:35:34.210 --> 00:35:43.130
1.3 percent of the overall key stream. And
remember - this this key stream is about 4
00:35:43.130 --> 00:35:47.840
megabytes in length and I was able to
recover only 50 kilobytes of that, so we
00:35:47.840 --> 00:35:55.540
can't recover all of the the files. Or is
that so? This is this was my next question
00:35:55.540 --> 00:36:05.370
there. And so I drew another nice diagram.
This is the key stream; in the this is the
00:36:05.370 --> 00:36:15.880
MFT, sorry, in the key stream. So the the
key stream is only filled and in sample
00:36:15.880 --> 00:36:24.310
bytes at this 2 megabytes mark. And the
question is, are there many files in this
00:36:24.310 --> 00:36:29.430
area being encrypted or is there some
other bug I could exploit. And I checked
00:36:29.430 --> 00:36:34.680
where the files would lie into this the
key stream. So I would check how many
00:36:34.680 --> 00:36:41.470
files are at which location in the key
stream being encrypted. And you see the
00:36:41.470 --> 00:36:47.550
key stream is used all over the place - so
I mean sometimes it's used more, sometimes
00:36:47.550 --> 00:36:56.110
it's used just the less, however, it's
basically used all over the place. And so
00:36:56.110 --> 00:37:00.611
this much of the key stream could in a
perfect scenario be, I mean, perfect
00:37:00.611 --> 00:37:09.340
scenario being a perfect known plaintext
oracle could theoretically be recovered.
00:37:09.340 --> 00:37:17.570
However, this is somewhat of a problem
here and in the next part of the talk I
00:37:17.570 --> 00:37:26.190
will present you how I was able to solve
this problem as well. So remember when the
00:37:26.190 --> 00:37:32.290
file system is actually encrypted by
(Not)Petya, the file system looks somewhat
00:37:32.290 --> 00:37:37.051
like this. So you would have the MFT which
is totally garbled so you won't have any
00:37:37.051 --> 00:37:42.250
of any nice file pointers or data runs
pointing through those files. You won't
00:37:42.250 --> 00:37:47.520
have any nice file names and all the
stuff. But with the first stage being
00:37:47.520 --> 00:37:53.320
finished, the MFT looks really nice - I
mean like almost a hundred percent
00:37:53.320 --> 00:37:57.530
of the records could be
recovered. So it looks
00:37:57.530 --> 00:38:04.430
somewhat like this. So you have a bunch of
metadata you can now use to analyze the
00:38:04.430 --> 00:38:08.960
the remainder of the files and the
remainder of the encryption. So all this
00:38:08.960 --> 00:38:17.090
MFT or almost always actually decrypted.
And for files you would have the very
00:38:17.090 --> 00:38:24.830
first kilobyte being encrypted. The
remainder of the file, the, I mean, most
00:38:24.830 --> 00:38:30.540
files are more than a kilobyte in size,
right? So all the rest is not encrypted.
00:38:30.540 --> 00:38:37.280
So you would have all this data and
metadata to collect information and to to
00:38:37.280 --> 00:38:45.520
use to exploit as-known plaintext as
indicators for known plaintext. So I
00:38:45.520 --> 00:38:56.060
thought of three different approaches to
you know to attack this problem. Basically
00:38:56.060 --> 00:39:01.620
I was thinking about: okay what different
files are there, and I was quickly
00:39:01.620 --> 00:39:08.100
thinking about different file types. And I
mean, there are I mean the file type can
00:39:08.100 --> 00:39:12.890
be easily gleaned from this, right?
Because you would have the file extension
00:39:12.890 --> 00:39:18.540
and it would be basically the file type.
And you would have two different types of
00:39:18.540 --> 00:39:22.990
files - you would have structured files
and unstructured files. So I thought of
00:39:22.990 --> 00:39:28.730
these and you would have something like,
yeah, source code, which I would consider
00:39:28.730 --> 00:39:34.610
as more or less unstructured. And I was
calculating the histogram, so this would
00:39:34.610 --> 00:39:39.980
give me some kind of prevalence towards
different bytes in the key stream. So it
00:39:39.980 --> 00:39:46.600
would be somewhat like a guest plaintext
attack or something like that. The next
00:39:46.600 --> 00:39:54.670
thing for structured files would be to do
the very same approach as with the MFT
00:39:54.670 --> 00:40:00.930
records. I would calculate the histogram
for every offset of the first kilobyte,
00:40:00.930 --> 00:40:07.690
and then quickly see how how many
different bytes are there per offset. And
00:40:07.690 --> 00:40:14.583
the last approach uses somewhat more data.
It uses some of the metadata, and also
00:40:14.583 --> 00:40:23.870
some of the file data. I will go into this
right now. So, what I basically have here
00:40:23.870 --> 00:40:29.410
as I said it's only this this little
portion of the files encrypted. The whole
00:40:29.410 --> 00:40:33.830
remainder of the file is not encrypted,
which is quite nice. And also the file
00:40:33.830 --> 00:40:40.490
name the file sizes not encrypted.
So what I use, what I would do here
00:40:40.490 --> 00:40:44.970
is create a database of
known files of known
00:40:44.970 --> 00:40:51.860
Windows system files, for example. Y'all
might remember these nice background
00:40:51.860 --> 00:40:59.310
images fonts all this stuff - plaintext
is flying around everywhere if you just
00:40:59.310 --> 00:41:07.220
look for it. And I would have basically
three different, yeah, three three
00:41:07.220 --> 00:41:17.300
different distinctors between those to
know which which is the correct plaintext.
00:41:17.300 --> 00:41:24.520
So there is this this key file name, the
file size, and then the hash of this whole
00:41:24.520 --> 00:41:32.180
remainder, of this whole tier. So if all
these 3-tuple would match, then I would
00:41:32.180 --> 00:41:38.640
consider this as a proper plaintext.
However, there are some collisions, so
00:41:38.640 --> 00:41:48.090
this is this is not really something that
is straightforward. So the initial idea of
00:41:48.090 --> 00:41:55.760
having of only needing to analyze the MFT
and then could just being being able to
00:41:55.760 --> 00:42:03.920
straightforward decrypt files needed to be
altered a bit. So I added this database of
00:42:03.920 --> 00:42:11.920
known files there, I added another another
analysis stage in this nice box here, and
00:42:11.920 --> 00:42:19.970
then I would be able to decrypt the files,
eventually. So let's do a bit of science
00:42:19.970 --> 00:42:26.430
and check if this approach would be
worthwhile pursuing on a real-world
00:42:26.430 --> 00:42:37.690
scenario. So let the science cat do its
science, let me have a drink. So whenever
00:42:37.690 --> 00:42:46.920
you did do here is to create a database of
known files - I collected a bunch of
00:42:46.920 --> 00:42:55.290
default Windows installations, which
resulted in about 340,000 unique files in
00:42:55.290 --> 00:43:00.090
it. Then I calculated, you know, the the
different file type histograms I talked
00:43:00.090 --> 00:43:08.910
about, I prepared my test setup, I added
there 1000 different files. And you should
00:43:08.910 --> 00:43:15.760
note that these files were not part of
this known files database - they were
00:43:15.760 --> 00:43:20.520
different from that because otherwise it
would have been easy to do. Then I would
00:43:20.520 --> 00:43:28.010
infect this machine and then let its
encrypt by (Not)Petya and then attempting
00:43:28.010 --> 00:43:40.170
recovery. And these are the results there.
So, I did this with 4 different runs, so I
00:43:40.170 --> 00:43:46.430
tried every approach separately, and then
combined the three approaches. And the
00:43:46.430 --> 00:43:52.240
outcome was something like this: So I
would have only two files by the general
00:43:52.240 --> 00:43:59.950
histogram approach being able to
be to decrypt. At least 8%
00:43:59.950 --> 00:44:06.510
where I will were yet decrypted by the
location histogram approach. And by the
00:44:06.510 --> 00:44:12.480
known files approach over 90% of the files
could be successfully recovered. And even
00:44:12.480 --> 00:44:22.280
better, the combined outcome would be
almost all files being able to decrypt.
00:44:22.280 --> 00:44:30.530
So, so much about academia. So the problem
there is if you apply this to the real
00:44:30.530 --> 00:44:37.350
world, you could get into more trouble
there. And there was lots and lots of more
00:44:37.350 --> 00:44:42.910
to think about so. There were, for
example, non default windows installations
00:44:42.910 --> 00:44:48.500
with the whole history of updates so this
might be really interesting from a
00:44:48.500 --> 00:44:56.010
forensics perspective. Moreover, there's
lots of installed programs to derive known
00:44:56.010 --> 00:45:04.610
plaintext from - for example, dotnet uses
a vast source of known plaintext or JDK
00:45:04.610 --> 00:45:11.560
installations. So, especially the JDK
would would result in about tens of
00:45:11.560 --> 00:45:17.051
thousands of different source code and
HTML files. So this is this really was
00:45:17.051 --> 00:45:27.940
really quite nice. The drawback there was,
so there was so much data to collect, the
00:45:27.940 --> 00:45:35.170
first attempts failed miserably because of
the sheer amount of RAM I was using. And
00:45:35.170 --> 00:45:43.960
this would would result in the admins
constantly giving me more more RAM in my
00:45:43.960 --> 00:45:52.900
VM so I would eventually end up with, I
think, 128 gigabytes of RAM and my my test
00:45:52.900 --> 00:46:00.080
VM. Also you have a much larger master
file table you know because of all these
00:46:00.080 --> 00:46:06.230
files that would need to be stored, so to
put them in in comparison. So this nice
00:46:06.230 --> 00:46:13.530
working prototype, so this nice test
setup, I mean, would have an MFT of about
00:46:13.530 --> 00:46:20.310
26 megabytes. And for real-world scenarios
you would have something like at least 500
00:46:20.310 --> 00:46:26.980
megabytes, and even MFT could be even like
a couple of gigabytes in size. So this
00:46:26.980 --> 00:46:33.060
means much more plaintext. So for through
these really large MFTs you could quickly
00:46:33.060 --> 00:46:40.180
recover the whole key stream, the whole
four megabytes just by looking at the MFT.
00:46:40.180 --> 00:46:46.600
And, in summary, this means that the
decryption of (Not)Petya encrypted drives
00:46:46.600 --> 00:46:57.301
is possible. So, for for the file systems
the drives, I have looked at it, was
00:46:57.301 --> 00:47:03.240
really I mean ever after having all these
these first bugs out of the way,
00:47:06.891 --> 00:47:10.410
I was able to decrypt and
recover all the files there. So
00:47:10.410 --> 00:47:17.550
there's a good chance the vacation photos
can be recovered as well. And this will
00:47:17.550 --> 00:47:24.010
conclude my talk, so quick summary -
(Not)Petya has some severe cryptographic
00:47:24.010 --> 00:47:33.020
failures and flaws, so I would only be
able to call this LOLsa and not salsa
00:47:33.020 --> 00:47:41.210
anymore. It might be possible to further
look into this, this you know this expand
00:47:41.210 --> 00:47:48.790
key and thing. It it has really really a
bunch of more cryptographic failures in
00:47:48.790 --> 00:47:56.310
it. I didn't look into that deeper but I
know some of you guys are professors and
00:47:56.310 --> 00:48:01.140
this might be a nice homework for your
students to look at this (Not)Petya
00:48:01.140 --> 00:48:07.510
implementation and check out some, you
know, more advanced crypto analysis
00:48:07.510 --> 00:48:15.690
methods. And you should note that this
hook this this whole (Not)Petya thing here
00:48:15.690 --> 00:48:22.050
I described the whole cryptographic flaws
there are not just in (Not)Petya. They are
00:48:22.050 --> 00:48:28.080
you see the brackets there: there they are
in every each and every version of Petya.
00:48:28.080 --> 00:48:38.251
So all of the drives that you potentially
have saved can be decrypted. And so my
00:48:38.251 --> 00:48:45.210
last point is if any of you or any of us
falls victim to ransomware you really
00:48:45.210 --> 00:48:51.850
should keep the drives. You keep them
untouched in a locker and wait for a talk
00:48:51.850 --> 00:48:59.090
like this basically. And then hope that
someone comes around and then be able to
00:48:59.090 --> 00:49:06.540
decrypt the drives and then, you will have
your vacation photos back. So that's all
00:49:06.540 --> 00:49:09.010
folks, thank you for your attention.
00:49:09.010 --> 00:49:13.210
Applause
00:49:13.210 --> 00:49:16.090
Herald: Thank you very much Sebastian. Now
00:49:16.090 --> 00:49:19.330
it's time for questions, please queue up
by the microphones.
00:49:26.270 --> 00:49:27.910
Mic 1: Hi, well thank you very much for
00:49:27.910 --> 00:49:33.550
sharing your findings with us. I'm from
Rotterdam the largest harbour in Europe,
00:49:33.550 --> 00:49:36.230
and as you might know, we were struck by
Petya
00:49:36.230 --> 00:49:39.420
Sebastian Yes
Mic 1: Two terminals went down for a
00:49:39.420 --> 00:49:44.260
couple of days, a couple of hundred
billion euros of damage. Your approach is
00:49:44.260 --> 00:49:48.930
quite theoretically, so now to practice -
if you were there in this summer with
00:49:48.930 --> 00:49:53.840
these findings, would you've been able to
help us and decrypt the files and get all
00:49:53.840 --> 00:49:58.000
the companies up and running again? Or is
this too academic?
00:49:58.000 --> 00:50:04.920
Sebastian: No, it's a practical approach I
mean, I work for CrowdStrike and we had
00:50:04.920 --> 00:50:13.640
some occasions where we were able to help
the customers. So it's a very practical
00:50:13.640 --> 00:50:21.030
thing to do so. And I was trying to
insinuate this with, you know, this slide
00:50:21.030 --> 00:50:26.350
here. So I was talking about the real
world in this scenario. So I looked at
00:50:26.350 --> 00:50:32.100
about 50 different encrypted hard drives
with (Not)Petya, and I was able to decrypt
00:50:32.100 --> 00:50:38.730
all of them, or most of them. Mostly the
the ones not being able to decrypt or to
00:50:38.730 --> 00:50:45.010
some, well, let's say, level 8 mistakes.
Mic 1: Awesome. Have you shared your
00:50:45.010 --> 00:50:48.930
findings with nomoreransoms.org?
Sebastian: No, I don't know about this
00:50:48.930 --> 00:50:50.910
site
Mic 1: They provide decryption tools for
00:50:50.910 --> 00:50:56.860
ransomware. Please do, thank you
Herald: Microphone six, please
00:50:56.860 --> 00:51:01.930
Mic 6: Thank you, in the beginning you
mentioned that basically the key was
00:51:01.930 --> 00:51:05.490
shortened to what was a 24-bit, something
like that?
00:51:05.490 --> 00:51:09.560
Sebastian: 26, yes
Mic 6: 26, yeah. From that point, wasn't a
00:51:09.560 --> 00:51:13.760
brute-force attack way faster and way more
reliable?
00:51:13.760 --> 00:51:20.820
Sebastian: Oh no no, don't get me wrong
there. So the keystream length was 2^26,
00:51:20.820 --> 00:51:27.660
so the keystream length was four
megabytes. So you wouldn't be able to
00:51:27.660 --> 00:51:34.650
brute-force that. So sorry, do you get
that? The number of bytes was four
00:51:34.650 --> 00:51:39.120
megabytes, and you couldn't be able to
brute-force that.
00:51:39.120 --> 00:51:43.740
Mic 6: Yeah but you already mention at the
beginning that you basically shortened it
00:51:43.740 --> 00:51:49.940
down to the possibility of at most two to
the power of 26, and it is calculable.
00:51:49.940 --> 00:51:56.870
Sebastian: Yes yes, I totally get the
question. But you're you're missing the
00:51:56.870 --> 00:52:03.230
point there. So it's not the key space,
2^26, but the key length is 2^26, which
00:52:03.230 --> 00:52:10.660
would be something - I I'm not good at
that converting this to decimal. Would be
00:52:10.660 --> 00:52:16.930
something like, let me check, two to
the... I mean, here, math guys, computer
00:52:16.930 --> 00:52:21.610
guys: how many bits is that?
[Inaudible audience answer]
00:52:21.610 --> 00:52:25.510
Sebastian: Say again?
Crowd: A lot?
00:52:25.510 --> 00:52:31.370
Sebastian: A lot! So it's, I mean, it's
it's four megabytes of keylength and you
00:52:31.370 --> 00:52:35.220
couldn't just brute force that because you
you would have each of these
00:52:35.220 --> 00:52:43.660
four megabytes you would have to brute
force. Got that? So the key is not 2^26
00:52:43.660 --> 00:52:52.290
but the key length, the keystream length,
is that long. Got that? So yeah, this is
00:52:52.290 --> 00:52:58.170
the key space would be longer than the
Bible. You know, and you couldn't just
00:52:58.170 --> 00:53:03.050
brute force the Bible, the text of the
Bible. I mean given enough time yes but
00:53:03.050 --> 00:53:08.990
you know we will all know the stories with
in monkeys and typewriters. I mean we can
00:53:08.990 --> 00:53:12.500
talk about that that offline but you're
you're mixing up two different numbers
00:53:12.500 --> 00:53:18.370
there.
Mic 6: Yeah I guess, thank you.
00:53:18.370 --> 00:53:20.350
Herald: Questions from the Internet,
please
00:53:20.350 --> 00:53:25.710
Signal Angel: Does the MFT encryption work
the same for Petya and (Not)Petya?
00:53:25.710 --> 00:53:34.400
Sebastian: Yes, the underlying mechanism
is the same. The cryptography differs in
00:53:34.400 --> 00:53:43.160
such a way that the constants number is
different. So, this little guy here - this
00:53:43.160 --> 00:53:50.110
is different. It would be usually like
expand key something-something. And here
00:53:50.110 --> 00:53:56.980
it's invalid sect ID. So it's somewhat
different, but the the MFT encryption; I
00:53:56.980 --> 00:54:03.170
mean the the bytecode the and the
algorithm is the very same.
00:54:03.170 --> 00:54:12.410
Signal Angel: Thanks
Herald: Any more questions? Then please
00:54:12.410 --> 00:54:14.820
give a great round of applause to
Sebastian
00:54:14.820 --> 00:54:18.320
Sebastian: Thank you
00:54:18.320 --> 00:54:24.130
Applause
00:54:24.130 --> 00:54:44.000
subtitles created by c3subtitles.de
in the year 2018. Join, and help us!