Return to Video

34C3 - Defeating (Not)Petya's Cryptography

  • 0:00 - 0:15
    Music
  • 0:15 - 0:19
    Herald: And give a great round of applause
    to Sebastian Eschweiler!
  • 0:19 - 0:24
    Applause
  • 0:24 - 0:32
    Sebastian: So hi everyone. So I want to
    talk about how I defeated not (Not)Petya's
  • 0:32 - 0:43
    cryptography. Some I'd say that (Not)Petya
    would be Ukraine's scourge, and for those
  • 0:43 - 0:48
    of you who don't know what the word
    scourge means: this guy right here doesn't
  • 0:48 - 0:57
    know either. And quick trivia quiz - does
    anyone know what this movie is? What's the
  • 0:57 - 1:09
    name of the movie? So, in the next scene
    Johnny Depp enters. Does ring a bell? Jim,
  • 1:09 - 1:17
    a movie by Jim Jarmusch. Soundtrack by
    Neil Young? Dead Man, right! Great! It's a
  • 1:17 - 1:24
    great movie. So if you want to know what a
    scourge is then you could watch the movie.
  • 1:24 - 1:32
    So let's begin with my with my talk. So
    this is what actually the official Ukraine
  • 1:32 - 1:40
    Twitter account tweeted some time ago in
    at the end of June 2017. And there was an
  • 1:40 - 1:47
    outbreak of a ransomware attack which was
    noticed mostly in Ukraine but also all
  • 1:47 - 1:52
    over the world. So millions of users and
    also companies large companies were
  • 1:52 - 2:03
    affected, and the damage went into the
    billions of dollars. So, the problem there
  • 2:03 - 2:10
    is, I mean, this is this is not the
    everyday ransomware outbreak you have
  • 2:10 - 2:17
    there. And I want to give you a short
    glimpse into the the (Not)Petya universe.
  • 2:17 - 2:24
    And also how I could decrypt all these
    stuff that, yeah, actually was encrypted
  • 2:24 - 2:31
    by the this ransomware outbreak. So first
    I want to begin my talk with a
  • 2:31 - 2:36
    differentiation. I want to draw a
    demarcation line because all this
  • 2:36 - 2:42
    (Not)Petya universe - there's much sub-
    summarised under under this whole label.
  • 2:42 - 2:50
    And I really, I'm just talking about a
    small fraction of this whole universe. So
  • 2:50 - 2:55
    I will first distinguish between what my
    talk will be in what my talk will not be
  • 2:55 - 3:03
    about. Next I will describe (Not)Petya's
    cryptography, and especially (Not)Petya's
  • 3:03 - 3:08
    cryptographic failures. Which I then will
    be exploiting in the remainder of the talk
  • 3:08 - 3:22
    and see how can the users get their, yeah,
    vacation photos back. So what was this
  • 3:22 - 3:35
    whole thing? The outbreak started, as I
    said, in June 2017, and it started as fake
  • 3:35 - 3:41
    update or as malicious update from a
    software called Madoc
  • 3:41 - 3:48
    - this is a tax software, one of the two
    official tax softwares in the Ukraine. So
  • 3:48 - 3:56
    almost every company has it installed on
    tax accounts on their computers many
  • 3:56 - 4:03
    private persons have it installed. It was
    pushed and then site loaded this file
  • 4:03 - 4:14
    perfc that was then downloaded to the
    computers and it comprises several parts.
  • 4:14 - 4:23
    And some parts are more interesting than
    then as others, so one component ever like
  • 4:23 - 4:29
    half an hour time it would start
    encrypting the files depending on the
  • 4:29 - 4:37
    access level. So, if there wasn't any
    access to infect the computer with well
  • 4:37 - 4:44
    this MBR infector. then it would just
    based on the current user level encrypt
  • 4:44 - 4:53
    the files based on that current user. In
    lack of a better name I would call this
  • 4:53 - 5:01
    "Mischa" component I know it's usually
    somewhat different, something different.
  • 5:01 - 5:06
    However, this was the best name I could
    find there. So it's basically just a
  • 5:06 - 5:13
    finite crypter with AES. My talk will not
    go about this this part - this file
  • 5:13 - 5:22
    infector. The next very interesting
    component was the spreader part - it's
  • 5:22 - 5:27
    basically based on the
    EternalBlue/EternalRomance exploits that
  • 5:27 - 5:34
    had been leaked by the Shadow Brokers. My
    talk will not go about this as well. This
  • 5:34 - 5:41
    is a whole different universe as well.
    What my talk will be about is the actual
  • 5:41 - 5:49
    (Not)Petya component and this is an MBR
    encrypter and I will show you in the next
  • 5:49 - 5:58
    slide what it's actually about. So, the
    user will see something like this upon
  • 5:58 - 6:06
    reboot. So if the access rights are
    granted so if there is some local admin
  • 6:06 - 6:12
    installed on the computer or the correct
    password could be guessed by some attack
  • 6:12 - 6:19
    and then this dropper, the perfc that
    would infect the system by overwriting the
  • 6:19 - 6:26
    Master Boot Record with a custom boot
    loader. And it would reboot the system
  • 6:26 - 6:33
    after a predefined time - usually being
    about 10 minutes. And then the the actual
  • 6:33 - 6:40
    (Not)Petya compound would kick into
    action. This infected MBR, the bootloader
  • 6:40 - 6:47
    shows this decoy CheckDisk screen, and in
    the background would find and iterate the
  • 6:47 - 6:50
    old files on the file system and
  • 6:50 - 6:56
    encrypt all these files.
    So the main takeaways of this slide are
  • 6:56 - 7:05
    that we are dealing with 16-bit code here.
    So we're we're in 16-bit real mode - so
  • 7:05 - 7:11
    this means no proper file system it means
    no 32-bit or 64-bit code. There are no
  • 7:11 - 7:20
    windows APIs-- so debugging all this and
    analyzing this is a tedious work. However,
  • 7:20 - 7:26
    we have something on the plus side which
    is a BIOS, you know, the basic
  • 7:26 - 7:34
    input/output system and with that comes a
    range of interrupts that are very well
  • 7:34 - 7:43
    described and a really nice thing is
    having Bochs and being able to debug all
  • 7:43 - 7:50
    this in in IDA. So this was a really neat
    plugin that had been developed by the
  • 7:50 - 8:01
    authors. So let's analyze a bit and check
    the cryptography and why implementing
  • 8:01 - 8:10
    crypto is really hard. So I will start
    this part with describing in short words
  • 8:10 - 8:17
    the theory of Salsa20 and then check that
    compare that against the (Not)Petya
  • 8:17 - 8:28
    implementation of this cipher. So Salsa20
    is a stream cipher. So basically you will
  • 8:28 - 8:36
    have a plaintext, it's it's about here and
    then you will have some kind of random
  • 8:36 - 8:41
    number generator or pseudo-random number
    generator and then apply some operations
  • 8:41 - 8:49
    on the plaintext and out comes the
    ciphertext. And what you put in there are
  • 8:49 - 8:55
    four different variables, or four
    different inputs, because there's the
  • 8:55 - 9:00
    constant part which is obviously not
    variable. But we'll see about that in a
  • 9:00 - 9:09
    bit. So you have these key and the nonce,
    and then there's this really nifty thing
  • 9:09 - 9:18
    called counter. What's so nifty about that
    is if you were choose to stream a Salsa 20
  • 9:18 - 9:25
    encrypted stream and you would lose some
    frames then you could adjust this counter
  • 9:25 - 9:34
    variable which would de-mark the offset of
    the current stream in the current stream
  • 9:34 - 9:39
    and then could continue needlessly with
    with the decryption of the stream. So this
  • 9:39 - 9:48
    is a very nice feature. The size of the
    counter is 64-bits and the hash size here,
  • 9:48 - 9:58
    so what salsa does is create a 64-byte
    hash for every different of these inputs
  • 9:58 - 10:06
    and with that then apply to the input. If
    you want any more details about this salsa
  • 10:06 - 10:13
    cipher, the inventor of salsa should be in
    this room, and should be in at the
  • 10:13 - 10:17
    convention. He is at the convention: I
    just rode the train with him, so I guess
  • 10:17 - 10:24
    you can ask him the the gory details of
    salsa 20. So it's a really nice crypto
  • 10:24 - 10:31
    cipher and you should hit him up and ask
    him. I shouldn't have said that, right?
  • 10:31 - 10:38
    Sorry. So basically what a very important
    thing is to note is for every invocation
  • 10:38 - 10:46
    of this this or for every instance of this
    (Not)Petya encryption and you would have
  • 10:46 - 10:54
    these three variables or these three
    inputs be constant so (Not)Petya would
  • 10:54 - 11:00
    would patch during the infection process
    the key, the nonce, and the constants into
  • 11:00 - 11:08
    a configuration sector. The constants, not
    these are somewhat different. And then the
  • 11:08 - 11:15
    counter would only change throughout every
    iteration. So the interesting thing or the
  • 11:15 - 11:23
    interesting question was first - what is
    the length of the key stream? So, the key
  • 11:23 - 11:28
    stream, you know the the number of
    different outputs that that would come out
  • 11:28 - 11:35
    of this hashing function. And I mean for
    this implementation for the theory this
  • 11:35 - 11:43
    is quite clear it's 64-bits here eight
    bytes times the 64 bytes of output so it
  • 11:43 - 11:54
    would be about two to the seventy of a
    periodicity. One different about the
  • 11:54 - 12:01
    actual implementation in (Not)Petya on the
    theory would be that this constants thing
  • 12:01 - 12:07
    here had been changed to I string a
    reading out invalid sect ID. So this would
  • 12:07 - 12:23
    break the official implementation. So the
    very first failure I saw in (Not)Petya was
  • 12:23 - 12:27
    something like this. So I think I can slip
    this slide because it's obvious for
  • 12:27 - 12:35
    everyone that this is a fail right? So who
    sees the fail? So, not many hands? Okay,
  • 12:35 - 12:43
    then then I'll explain it. So I was just
    kidding , I'm almost not not expecting for
  • 12:43 - 12:49
    for you to grasp of that first. So
    remember, we're in 16-bit code. We have
  • 12:49 - 12:57
    here this shift left operation which would
    shift a register by N-bytes. The register
  • 12:57 - 13:04
    with here is 16 bits, so it only has 16
    digits so to speak and you would shift it
  • 13:04 - 13:12
    by 16 - by 10 hex - sixteen. And this
    would effectively null the register, and
  • 13:12 - 13:15
    even worse is here. So
  • 13:15 - 13:22
    you would shift a an 8-bit register for
    sixteen. So this is something you you
  • 13:22 - 13:29
    wouldn't expect from my proper
    cryptographic implementation. And I was
  • 13:29 - 13:33
    was really intrigued why that is, because
    it wouldn't make any sense in source code
  • 13:33 - 13:40
    and and did the (Not)Petya or the Petya
    authors really implement that on purpose,
  • 13:40 - 13:48
    or what was the gist with it. And I looked
    up the (Not)Petya, the salsa 20
  • 13:48 - 13:55
    implementation um I just googled it and
    funny a nice website that had a
  • 13:55 - 14:05
    implementation of salsa 20. And there you
    would see this code. So you see here it
  • 14:05 - 14:12
    sits in the the endianness conversion. And
    you see here these these shifting of bits
  • 14:12 - 14:20
    of registers and you see here this this
    uint fast 16 or 32 type. So it becomes
  • 14:20 - 14:23
    quite clear that this is a broken
    implementation right? So everyone can see
  • 14:23 - 14:33
    that right? No, not right now because you
    need to know some things more about this.
  • 14:33 - 14:39
    There are two important facts that make
    this implementation broken and the two
  • 14:39 - 14:46
    facts are: you need to compile this code
    for 16 bits, and you need to look up what
  • 14:46 - 14:52
    Visual Studio makes of these these type
    definitions here. And when you look that
  • 14:52 - 14:59
    up, this is from Visual Studio and it's in
    the standard int.h header file. And there
  • 14:59 - 15:04
    you see it's interpreted or translated as
    unsigned int, so this is the base type.
  • 15:04 - 15:09
    And this base type, unsigned int, in
    16-bit code is a 16-bit variable or a
  • 15:09 - 15:14
    16-bit register. And then everything makes
    sense. And this was somewhat of a a
  • 15:14 - 15:22
    failure here - the authors didn't really
    check if their code was actually working
  • 15:22 - 15:27
    against the test vectors. And this guy who
    wrote the code here on this salsa 20
  • 15:27 - 15:40
    implementation made this mistake also. On
    this slide you see two bugs off of the
  • 15:40 - 15:47
    (Not)Petya implementation of salsa 20. And
    I quickly want to to explain both to you
  • 15:47 - 15:52
    because they are of some somewhat in
    importance throughout the remainder of the
  • 15:52 - 16:02
    talk. So, both revolve around the the
    counter variable. Just remember - this
  • 16:02 - 16:07
    counter variable is the only dynamic input
    the only variable input throughout all
  • 16:07 - 16:10
    these salsa 20 invocations.
  • 16:10 - 16:22
    And the the first error is: so you read
    a sector, a sector number into the memory.
  • 16:22 - 16:29
    So a bit about the the low-level aspects
    of a hard drive. A hard drive from the
  • 16:29 - 16:35
    view of the BIOS would look somewhat like
    a bunch of different sectors. So these are
  • 16:35 - 16:43
    512 byte chunks of data, and they they
    come one after another. So if you were to
  • 16:43 - 16:49
    read a sector, you would actually read a
    512 bytes about me a byte long portion of
  • 16:49 - 16:56
    data. And this is obviously not the offset
    in the stream, and this is somewhat of a
  • 16:56 - 17:04
    problem there. So, you see here the same
    variable is used to decrypt or encrypt the
  • 17:04 - 17:13
    data. And, I mean, this this is it doesn't
    it isn't really apparent to to the
  • 17:13 - 17:21
    implementer of this cipher. However, if
    you were to analyze it, it would look
  • 17:21 - 17:28
    something like this. So you would have the
    key stream of two different sectors, two
  • 17:28 - 17:33
    different consecutive sectors here. So it
    would start with with FF and then
  • 17:33 - 17:40
    continues with D7 and so on. And the next
    sector would have almost all of the bytes
  • 17:40 - 17:50
    identical. And this is yeah a really big
    failure because this really nice salsa 20
  • 17:50 - 17:58
    implementation, this really nice salsa
    algorithm will then be, from would then be
  • 17:58 - 18:03
    converted from a one-time pad to a many
    times pad. And this is the first fine I
  • 18:03 - 18:10
    wanted to show you in this very few lines
    of code. The second part is, the second
  • 18:10 - 18:18
    bug is here this large keyword. Remember,
    we are in 16-bit code so this large
  • 18:18 - 18:25
    keyword here does not push a 64-bit
    variable as we would suspect to do it, but
  • 18:25 - 18:31
    a 32-bit variable. So only 32 bits of this
    this nice counter variable are actually
  • 18:31 - 18:41
    used in this case. So, these two failures
    are somewhat of a problem for this salsa
  • 18:41 - 19:00
    20 implementation. So, in this slide I
    took two different hex dumps, and these
  • 19:00 - 19:09
    hex times were are within this expand key
    function. And they they are well basically
  • 19:09 - 19:17
    two snapshots - one right before these
    bugs become apparent: so before this this
  • 19:17 - 19:25
    endianness conversion, and right after on
    the lower half. So you very nicely see the
  • 19:25 - 19:32
    different variables being put into all the
    different key inputs being put into these
  • 19:32 - 19:34
    this memory blocks. So here, it would
  • 19:34 - 19:39
    spell out invalid sect ID, you know, the
    constants (Not)Petya uses. Here you would
  • 19:39 - 19:46
    see the key and here, so it's broken into
    two halves. Additionally you would see the
  • 19:46 - 19:54
    nonce here. And what really sticks out is
    this bunch of zeros here. So this this
  • 19:54 - 20:01
    higher part of this a 64-bit variable
    isn't even used - is it's not even filled.
  • 20:01 - 20:07
    So this is well the first problem here,
    and after the endianness conversion you
  • 20:07 - 20:14
    see that it's not really an endianness
    conversion but it's more of a nulling of
  • 20:14 - 20:23
    of bytes. So the result would be that this
    initially 64 bit variable would be just
  • 20:23 - 20:33
    16-bit in length. And, as I said earlier,
    the original salsa implementation would be
  • 20:33 - 20:45
    2^70s key lengths. And right now we have
    16 bits times 64 bytes in in key lengths
  • 20:45 - 20:51
    which would result an in 26 bits in key
    lengths. Which would be a measly 4
  • 20:51 - 20:57
    megabytes in key lengths. So this is the
    this was a very interesting observation I
  • 20:57 - 21:09
    made there and this would be possible then
    to decrypt together with the many times
  • 21:09 - 21:18
    pad properties of this cipher which make
    it really easy to break. So to quickly
  • 21:18 - 21:25
    summarize this part of the talk - so we
    have a very very short key stream of just
  • 21:25 - 21:32
    4 megabytes, it's highly repetitive
    repetitive. So for each secretary progress
  • 21:32 - 21:42
    you would only have a progress of one byte
    at a time. So only 26 bits remain of the
  • 21:42 - 21:50
    whole stream. And as I said, the many
    times pad property's a very nice thing to
  • 21:50 - 22:00
    to analyze. I could come around to
    implement a small joke here, so this salsa
  • 22:00 - 22:09
    implementation I would only call "LOLsa"
    from now on. Sorry, is a bad joke sorry!
  • 22:09 - 22:17
    So, the the main goal is to derive the
    keystream, and as I'm not really a crypto
  • 22:17 - 22:22
    expert, the basically the only attack I
    know it would be a known plaintext attack
  • 22:22 - 22:28
    so this was my goal there because it's so
    straightforward to do that. And in the
  • 22:28 - 22:35
    remainder of the talk I will tell you how
    I did that. So without further ado, let's
  • 22:35 - 22:42
    exploit these failures and see how much of
    the plaintext we can actually get from a
  • 22:42 - 22:47
    (Not)Petya infected drive.
    So the modulus operandi
  • 22:47 - 22:53
    of (Not)Petya would look
    somewhat like that. This, so let's let's
  • 22:53 - 22:58
    stop with the with the left hand side of
    the of this slide, and concentrate on the
  • 22:58 - 23:04
    right hand side. For those of you who you
    are not really intimately familiar with
  • 23:04 - 23:11
    NTFS - I wasn't before analyzing Petya or
    (Not)Petya as well so no worries. It's
  • 23:11 - 23:18
    quite simple - so every NTFS partition has
    something called an master file table.
  • 23:18 - 23:25
    MFT, abbreviated. And it would contain
    some metadata about the file. For example,
  • 23:25 - 23:32
    the file name, the file size, and if the
    file is small enough, it would even fit
  • 23:32 - 23:42
    the actual content of the file into this
    record. So, as I said, MFT is just list of
  • 23:42 - 23:50
    records. And if the file is larger it
    would have a pointer, a so called data
  • 23:50 - 23:58
    run, which would point to a cluster or
    sector on the disk on the partition which
  • 23:58 - 24:06
    would then actually be the the payload
    data. One of these MFT records is one
  • 24:06 - 24:12
    kilobyte in size. So now to the actual
    implementation. So let's zoom out a bit
  • 24:12 - 24:19
    and see how this LOLsa implementation is
    used in (Not)Petya. So it would basically
  • 24:19 - 24:26
    just iterate over over all of these MFT
    records, and then check if this record
  • 24:26 - 24:33
    would put into would point to a file. If
    that is the case, it would encrypt the
  • 24:33 - 24:42
    first kilobyte of the file, and then would
    encrypt the record itself. This
  • 24:42 - 24:48
    implementation is good for a bunch of
    reasons - it's very fast: You would only
  • 24:48 - 24:52
    need to encrypt the first kilobyte, and
    this is this first kilobyte contains
  • 24:52 - 25:03
    really really important information. For
    example, headers, or especially compressed
  • 25:03 - 25:09
    files have these really important header
    structures there. Additionally, your file
  • 25:09 - 25:14
    recovery tools wouldn't be able to work
    anymore because most of them rely on this
  • 25:14 - 25:23
    very header. And the second thing is this
    MFT is can be considered as table of
  • 25:23 - 25:31
    contents. So with no metadata, with with
    no pointers to these to the files, you
  • 25:31 - 25:38
    won't have anything there to work with to
    recover from. And this is a I mean from
  • 25:38 - 25:44
    from the implementation standpoint it's
    very neat, because it's fast and it's a
  • 25:44 - 25:50
    somewhat thorough. As the MFT is
  • 25:50 - 25:57
    really important, my idea was to to
    recover that first and then check what
  • 25:57 - 26:07
    what comes out from there and see how I
    can can further progress there with the
  • 26:07 - 26:14
    decryption of the files. So the metadata
    would would be of most importance there.
  • 26:14 - 26:27
    I'm a visual person, and here I took two
    disk dumps from a from one of my test
  • 26:27 - 26:33
    disks. So I infected a clean system with
    (Not)Petya and let it encrypt the hard
  • 26:33 - 26:38
    drive. And on the left-hand side you see
    the plaintext on and on the right-hand
  • 26:38 - 26:45
    side you see the encrypted data. So to
    just get a better picture about the
  • 26:45 - 26:54
    encryption process. On the far left-hand
    side fancy PowerPoint altered animation,
  • 26:54 - 27:01
    you see some kind of indicator so which
    which would tell you at which offset, how
  • 27:01 - 27:08
    much of the of the data has actually been
    different. And you see the whole disk is
  • 27:08 - 27:20
    more or less being encrypted. However, you
    see at the far bottom part here it's more
  • 27:20 - 27:27
    dark red. And this is actually the MFT, so
    the MFT is towards the end of the disk
  • 27:27 - 27:34
    sometimes. But this might be a
    misconception. So my my idea was something
  • 27:34 - 27:39
    like this: we have two input sizes, right?
    We have the the encrypted MFT, and we have
  • 27:39 - 27:47
    encrypted files. And first i would analyze
    the MFT, and then derive the key stream
  • 27:47 - 27:56
    from that. After that analysis stage had
    been finished I would put the key stream
  • 27:56 - 28:04
    back into the this this little box here,
    and actually decrypt that. And then out
  • 28:04 - 28:10
    comes the decrypted MFT. And with that and
    the key stream I would be able to find the
  • 28:10 - 28:17
    encrypted files on the disk, and then be
    able to decrypt them; then be ready with
  • 28:17 - 28:24
    it right? So this was my initial idea
    there and so let's start with the
  • 28:24 - 28:35
    decryption of the MFT. Known plaintext
    attack, right? So an MFT you looks for
  • 28:35 - 28:40
    from from the viewpoint of the keystream,
    somewhat like this. So you would have here
  • 28:40 - 28:47
    the first, the second and so on MFT
    records. And on the on the column you will
  • 28:47 - 28:55
    have the actual byte that is used as key
    to encrypt the key stream. Remember, the
  • 28:55 - 29:04
    operation that that encrypts the you know
    the the key stream and the the plaintext
  • 29:04 - 29:07
    is just a mirror XOR operation.
    So you have the the key
  • 29:07 - 29:16
    stream and the plaintext, and it's plainly
    - so you can switch forth and back between
  • 29:16 - 29:21
    plaintext and ciphertext and even the key
    stream with a with just applying an XOR
  • 29:21 - 29:30
    operation. So what you see here is for the
    very first records you only have very few
  • 29:30 - 29:37
    key stream bytes or very few sample bytes.
    However, as the progress as you make
  • 29:37 - 29:42
    progress with the analysis and then you
    will have more and more of these sample
  • 29:42 - 29:50
    bytes to collect from and this this will
    give you more confidence in the in the
  • 29:50 - 29:57
    result, and the may be known key stream
    then. The question is, does the MFT hold
  • 29:57 - 30:03
    enough plaintext to do some some kind of
    known plaintext attack? So let's look into
  • 30:03 - 30:13
    the specification. The MFT record has
    basically two fields. So there is this the
  • 30:13 - 30:20
    standard information, which is a well-
    defined structure and there's something
  • 30:20 - 30:26
    else called attribute list, which is a
    quite dynamic structure and this would be
  • 30:26 - 30:33
    a somewhat more different, difficult to
    clean plan text from. So I concentrate on
  • 30:33 - 30:38
    this first part. And the first part
    quickly turned out to be quite constant
  • 30:38 - 30:49
    for many or most of the MFT records. And
    you see it starts with FILE and then as
  • 30:49 - 30:55
    some some hex digits after that. And
    although on the far bottom part of the
  • 30:55 - 31:02
    slide, I added my yeah certainty level. So
    the the certainty level would be the
  • 31:02 - 31:09
    number of different sample bytes I would
    have multiplied by the confidence I would
  • 31:09 - 31:17
    have in that sample byte being actually
    this this plain text. So you see for the
  • 31:17 - 31:23
    very first record, we would have a quite
    low, of a low certainty. I mean, it's just
  • 31:23 - 31:31
    one byte, right? Oh, the two byte skipping
    is I mean quite a forward considering you
  • 31:31 - 31:37
    would have usually 512 bytes per sector on
    a disk, and the MFT record is one kilobyte
  • 31:37 - 31:48
    in size, so the stream would progress two
    bytes. And for record 100, so for if for
  • 31:48 - 31:55
    the 100th record I would have a certainty
    of four. You know, I would just assume
  • 31:55 - 32:03
    these eight plaintext bytes here, divided
    by 2, with an resulting 4. This wasn't
  • 32:03 - 32:06
    really satisfactory.
    The problem there was towards
  • 32:06 - 32:13
    the end I would have many many unknown
    records because I would was concentrating
  • 32:13 - 32:20
    on on the very first parts of the header.
    So the remainder of the key stream, the
  • 32:20 - 32:26
    very end of the key stream wasn't be able
    to being analyzed and eventually
  • 32:26 - 32:30
    decrypted. So I thought of something
    different. And there was something like a
  • 32:30 - 32:40
    I would call a byte histogram. So for
    every offset the MFT record, i would i
  • 32:40 - 32:49
    will then calculate creatin and a
    histogram and check how many different
  • 32:49 - 32:54
    bytes are there actually for plaintext,
    you know, it's a plaintext known plaintext
  • 32:54 - 32:59
    attack so I would need some kind of
    plaintext and would do that for every
  • 32:59 - 33:08
    offset for every record. And so these the
    questions there how to get many MFT
  • 33:08 - 33:12
    records - it's quite easy if you have some
    nice colleagues you just need to hang them
  • 33:12 - 33:17
    all of the balcony and shake them a bit
    then more or less voluntarily they will
  • 33:17 - 33:25
    give you some MFT keys to work with. And
    the result of that is quite nice: you, I
  • 33:25 - 33:31
    mean for for the very first, there's
    nothing much you can do the very first
  • 33:31 - 33:36
    record will always have very few sample
    bytes. But as the stream progresses and
  • 33:36 - 33:42
    you will have a dramatic change there so
    from this relatively low certainty of
  • 33:42 - 33:51
    four, I could increase that to more than
    thirty. So this is somewhat nice, and ever
  • 33:51 - 34:00
    bit of doing science, this table drops out
    from nowhere. So I compared these two
  • 34:00 - 34:08
    attack types. So, let's read that from
    from right to left. On the on the far
  • 34:08 - 34:16
    right, I have for the first approach about
    98% of the MFT records being successfully
  • 34:16 - 34:22
    recovered. You know, with the good thing
    with science and with all this academic
  • 34:22 - 34:30
    approach is you have a ground truth. So I
    have a plaintext an unencrypted hard drive
  • 34:30 - 34:37
    which were virtually unordered from
    something right after infection, and then
  • 34:37 - 34:43
    you let execute the the whole infection
    and encryption process. And then you can
  • 34:43 - 34:49
    differentiate and take you knows several
    snapshots throughout the infection, change
  • 34:49 - 34:55
    different key values and all the stuff. So
    this is a very nice thing about this
  • 34:55 - 35:03
    academic approach I was taking the other
    So I could I could exactly pinpoint how many
  • 35:03 - 35:14
    of these records were perfectly recovered.
    So for the byte histogram approach I could
  • 35:14 - 35:18
    decrypt almost all of the records, which
    is quite nice because then we have a high
  • 35:18 - 35:28
    quality MFT to work with. What's also
    quite nice is that we have zero wrong key
  • 35:28 - 35:34
    stream bytes. What's not so nice, however,
    is that I was only able to recover about
  • 35:34 - 35:43
    1.3 percent of the overall key stream. And
    remember - this this key stream is about 4
  • 35:43 - 35:48
    megabytes in length and I was able to
    recover only 50 kilobytes of that, so we
  • 35:48 - 35:56
    can't recover all of the the files. Or is
    that so? This is this was my next question
  • 35:56 - 36:05
    there. And so I drew another nice diagram.
    This is the key stream; in the this is the
  • 36:05 - 36:16
    MFT, sorry, in the key stream. So the the
    key stream is only filled and in sample
  • 36:16 - 36:24
    bytes at this 2 megabytes mark. And the
    question is, are there many files in this
  • 36:24 - 36:29
    area being encrypted or is there some
    other bug I could exploit. And I checked
  • 36:29 - 36:35
    where the files would lie into this the
    key stream. So I would check how many
  • 36:35 - 36:41
    files are at which location in the key
    stream being encrypted. And you see the
  • 36:41 - 36:48
    key stream is used all over the place - so
    I mean sometimes it's used more, sometimes
  • 36:48 - 36:56
    it's used just the less, however, it's
    basically used all over the place. And so
  • 36:56 - 37:01
    this much of the key stream could in a
    perfect scenario be, I mean, perfect
  • 37:01 - 37:09
    scenario being a perfect known plaintext
    oracle could theoretically be recovered.
  • 37:09 - 37:18
    However, this is somewhat of a problem
    here and in the next part of the talk I
  • 37:18 - 37:26
    will present you how I was able to solve
    this problem as well. So remember when the
  • 37:26 - 37:32
    file system is actually encrypted by
    (Not)Petya, the file system looks somewhat
  • 37:32 - 37:37
    like this. So you would have the MFT which
    is totally garbled so you won't have any
  • 37:37 - 37:42
    of any nice file pointers or data runs
    pointing through those files. You won't
  • 37:42 - 37:48
    have any nice file names and all the
    stuff. But with the first stage being
  • 37:48 - 37:53
    finished, the MFT looks really nice - I
    mean like almost a hundred percent
  • 37:53 - 37:58
    of the records could be
    recovered. So it looks
  • 37:58 - 38:04
    somewhat like this. So you have a bunch of
    metadata you can now use to analyze the
  • 38:04 - 38:09
    the remainder of the files and the
    remainder of the encryption. So all this
  • 38:09 - 38:17
    MFT or almost always actually decrypted.
    And for files you would have the very
  • 38:17 - 38:25
    first kilobyte being encrypted. The
    remainder of the file, the, I mean, most
  • 38:25 - 38:31
    files are more than a kilobyte in size,
    right? So all the rest is not encrypted.
  • 38:31 - 38:37
    So you would have all this data and
    metadata to collect information and to to
  • 38:37 - 38:46
    use to exploit as-known plaintext as
    indicators for known plaintext. So I
  • 38:46 - 38:56
    thought of three different approaches to
    you know to attack this problem. Basically
  • 38:56 - 39:02
    I was thinking about: okay what different
    files are there, and I was quickly
  • 39:02 - 39:08
    thinking about different file types. And I
    mean, there are I mean the file type can
  • 39:08 - 39:13
    be easily gleaned from this, right?
    Because you would have the file extension
  • 39:13 - 39:19
    and it would be basically the file type.
    And you would have two different types of
  • 39:19 - 39:23
    files - you would have structured files
    and unstructured files. So I thought of
  • 39:23 - 39:29
    these and you would have something like,
    yeah, source code, which I would consider
  • 39:29 - 39:35
    as more or less unstructured. And I was
    calculating the histogram, so this would
  • 39:35 - 39:40
    give me some kind of prevalence towards
    different bytes in the key stream. So it
  • 39:40 - 39:47
    would be somewhat like a guest plaintext
    attack or something like that. The next
  • 39:47 - 39:55
    thing for structured files would be to do
    the very same approach as with the MFT
  • 39:55 - 40:01
    records. I would calculate the histogram
    for every offset of the first kilobyte,
  • 40:01 - 40:08
    and then quickly see how how many
    different bytes are there per offset. And
  • 40:08 - 40:15
    the last approach uses somewhat more data.
    It uses some of the metadata, and also
  • 40:15 - 40:24
    some of the file data. I will go into this
    right now. So, what I basically have here
  • 40:24 - 40:29
    as I said it's only this this little
    portion of the files encrypted. The whole
  • 40:29 - 40:34
    remainder of the file is not encrypted,
    which is quite nice. And also the file
  • 40:34 - 40:40
    name the file sizes not encrypted.
    So what I use, what I would do here
  • 40:40 - 40:45
    is create a database of
    known files of known
  • 40:45 - 40:52
    Windows system files, for example. Y'all
    might remember these nice background
  • 40:52 - 40:59
    images fonts all this stuff - plaintext
    is flying around everywhere if you just
  • 40:59 - 41:07
    look for it. And I would have basically
    three different, yeah, three three
  • 41:07 - 41:17
    different distinctors between those to
    know which which is the correct plaintext.
  • 41:17 - 41:25
    So there is this this key file name, the
    file size, and then the hash of this whole
  • 41:25 - 41:32
    remainder, of this whole tier. So if all
    these 3-tuple would match, then I would
  • 41:32 - 41:39
    consider this as a proper plaintext.
    However, there are some collisions, so
  • 41:39 - 41:48
    this is this is not really something that
    is straightforward. So the initial idea of
  • 41:48 - 41:56
    having of only needing to analyze the MFT
    and then could just being being able to
  • 41:56 - 42:04
    straightforward decrypt files needed to be
    altered a bit. So I added this database of
  • 42:04 - 42:12
    known files there, I added another another
    analysis stage in this nice box here, and
  • 42:12 - 42:20
    then I would be able to decrypt the files,
    eventually. So let's do a bit of science
  • 42:20 - 42:26
    and check if this approach would be
    worthwhile pursuing on a real-world
  • 42:26 - 42:38
    scenario. So let the science cat do its
    science, let me have a drink. So whenever
  • 42:38 - 42:47
    you did do here is to create a database of
    known files - I collected a bunch of
  • 42:47 - 42:55
    default Windows installations, which
    resulted in about 340,000 unique files in
  • 42:55 - 43:00
    it. Then I calculated, you know, the the
    different file type histograms I talked
  • 43:00 - 43:09
    about, I prepared my test setup, I added
    there 1000 different files. And you should
  • 43:09 - 43:16
    note that these files were not part of
    this known files database - they were
  • 43:16 - 43:21
    different from that because otherwise it
    would have been easy to do. Then I would
  • 43:21 - 43:28
    infect this machine and then let its
    encrypt by (Not)Petya and then attempting
  • 43:28 - 43:40
    recovery. And these are the results there.
    So, I did this with 4 different runs, so I
  • 43:40 - 43:46
    tried every approach separately, and then
    combined the three approaches. And the
  • 43:46 - 43:52
    outcome was something like this: So I
    would have only two files by the general
  • 43:52 - 44:00
    histogram approach being able to
    be to decrypt. At least 8%
  • 44:00 - 44:07
    where I will were yet decrypted by the
    location histogram approach. And by the
  • 44:07 - 44:12
    known files approach over 90% of the files
    could be successfully recovered. And even
  • 44:12 - 44:22
    better, the combined outcome would be
    almost all files being able to decrypt.
  • 44:22 - 44:31
    So, so much about academia. So the problem
    there is if you apply this to the real
  • 44:31 - 44:37
    world, you could get into more trouble
    there. And there was lots and lots of more
  • 44:37 - 44:43
    to think about so. There were, for
    example, non default windows installations
  • 44:43 - 44:48
    with the whole history of updates so this
    might be really interesting from a
  • 44:48 - 44:56
    forensics perspective. Moreover, there's
    lots of installed programs to derive known
  • 44:56 - 45:05
    plaintext from - for example, dotnet uses
    a vast source of known plaintext or JDK
  • 45:05 - 45:12
    installations. So, especially the JDK
    would would result in about tens of
  • 45:12 - 45:17
    thousands of different source code and
    HTML files. So this is this really was
  • 45:17 - 45:28
    really quite nice. The drawback there was,
    so there was so much data to collect, the
  • 45:28 - 45:35
    first attempts failed miserably because of
    the sheer amount of RAM I was using. And
  • 45:35 - 45:44
    this would would result in the admins
    constantly giving me more more RAM in my
  • 45:44 - 45:53
    VM so I would eventually end up with, I
    think, 128 gigabytes of RAM and my my test
  • 45:53 - 46:00
    VM. Also you have a much larger master
    file table you know because of all these
  • 46:00 - 46:06
    files that would need to be stored, so to
    put them in in comparison. So this nice
  • 46:06 - 46:14
    working prototype, so this nice test
    setup, I mean, would have an MFT of about
  • 46:14 - 46:20
    26 megabytes. And for real-world scenarios
    you would have something like at least 500
  • 46:20 - 46:27
    megabytes, and even MFT could be even like
    a couple of gigabytes in size. So this
  • 46:27 - 46:33
    means much more plaintext. So for through
    these really large MFTs you could quickly
  • 46:33 - 46:40
    recover the whole key stream, the whole
    four megabytes just by looking at the MFT.
  • 46:40 - 46:47
    And, in summary, this means that the
    decryption of (Not)Petya encrypted drives
  • 46:47 - 46:57
    is possible. So, for for the file systems
    the drives, I have looked at it, was
  • 46:57 - 47:03
    really I mean ever after having all these
    these first bugs out of the way,
  • 47:07 - 47:10
    I was able to decrypt and
    recover all the files there. So
  • 47:10 - 47:18
    there's a good chance the vacation photos
    can be recovered as well. And this will
  • 47:18 - 47:24
    conclude my talk, so quick summary -
    (Not)Petya has some severe cryptographic
  • 47:24 - 47:33
    failures and flaws, so I would only be
    able to call this LOLsa and not salsa
  • 47:33 - 47:41
    anymore. It might be possible to further
    look into this, this you know this expand
  • 47:41 - 47:49
    key and thing. It it has really really a
    bunch of more cryptographic failures in
  • 47:49 - 47:56
    it. I didn't look into that deeper but I
    know some of you guys are professors and
  • 47:56 - 48:01
    this might be a nice homework for your
    students to look at this (Not)Petya
  • 48:01 - 48:08
    implementation and check out some, you
    know, more advanced crypto analysis
  • 48:08 - 48:16
    methods. And you should note that this
    hook this this whole (Not)Petya thing here
  • 48:16 - 48:22
    I described the whole cryptographic flaws
    there are not just in (Not)Petya. They are
  • 48:22 - 48:28
    you see the brackets there: there they are
    in every each and every version of Petya.
  • 48:28 - 48:38
    So all of the drives that you potentially
    have saved can be decrypted. And so my
  • 48:38 - 48:45
    last point is if any of you or any of us
    falls victim to ransomware you really
  • 48:45 - 48:52
    should keep the drives. You keep them
    untouched in a locker and wait for a talk
  • 48:52 - 48:59
    like this basically. And then hope that
    someone comes around and then be able to
  • 48:59 - 49:07
    decrypt the drives and then, you will have
    your vacation photos back. So that's all
  • 49:07 - 49:09
    folks, thank you for your attention.
  • 49:09 - 49:13
    Applause
  • 49:13 - 49:16
    Herald: Thank you very much Sebastian. Now
  • 49:16 - 49:19
    it's time for questions, please queue up
    by the microphones.
  • 49:26 - 49:28
    Mic 1: Hi, well thank you very much for
  • 49:28 - 49:34
    sharing your findings with us. I'm from
    Rotterdam the largest harbour in Europe,
  • 49:34 - 49:36
    and as you might know, we were struck by
    Petya
  • 49:36 - 49:39
    Sebastian Yes
    Mic 1: Two terminals went down for a
  • 49:39 - 49:44
    couple of days, a couple of hundred
    billion euros of damage. Your approach is
  • 49:44 - 49:49
    quite theoretically, so now to practice -
    if you were there in this summer with
  • 49:49 - 49:54
    these findings, would you've been able to
    help us and decrypt the files and get all
  • 49:54 - 49:58
    the companies up and running again? Or is
    this too academic?
  • 49:58 - 50:05
    Sebastian: No, it's a practical approach I
    mean, I work for CrowdStrike and we had
  • 50:05 - 50:14
    some occasions where we were able to help
    the customers. So it's a very practical
  • 50:14 - 50:21
    thing to do so. And I was trying to
    insinuate this with, you know, this slide
  • 50:21 - 50:26
    here. So I was talking about the real
    world in this scenario. So I looked at
  • 50:26 - 50:32
    about 50 different encrypted hard drives
    with (Not)Petya, and I was able to decrypt
  • 50:32 - 50:39
    all of them, or most of them. Mostly the
    the ones not being able to decrypt or to
  • 50:39 - 50:45
    some, well, let's say, level 8 mistakes.
    Mic 1: Awesome. Have you shared your
  • 50:45 - 50:49
    findings with nomoreransoms.org?
    Sebastian: No, I don't know about this
  • 50:49 - 50:51
    site
    Mic 1: They provide decryption tools for
  • 50:51 - 50:57
    ransomware. Please do, thank you
    Herald: Microphone six, please
  • 50:57 - 51:02
    Mic 6: Thank you, in the beginning you
    mentioned that basically the key was
  • 51:02 - 51:05
    shortened to what was a 24-bit, something
    like that?
  • 51:05 - 51:10
    Sebastian: 26, yes
    Mic 6: 26, yeah. From that point, wasn't a
  • 51:10 - 51:14
    brute-force attack way faster and way more
    reliable?
  • 51:14 - 51:21
    Sebastian: Oh no no, don't get me wrong
    there. So the keystream length was 2^26,
  • 51:21 - 51:28
    so the keystream length was four
    megabytes. So you wouldn't be able to
  • 51:28 - 51:35
    brute-force that. So sorry, do you get
    that? The number of bytes was four
  • 51:35 - 51:39
    megabytes, and you couldn't be able to
    brute-force that.
  • 51:39 - 51:44
    Mic 6: Yeah but you already mention at the
    beginning that you basically shortened it
  • 51:44 - 51:50
    down to the possibility of at most two to
    the power of 26, and it is calculable.
  • 51:50 - 51:57
    Sebastian: Yes yes, I totally get the
    question. But you're you're missing the
  • 51:57 - 52:03
    point there. So it's not the key space,
    2^26, but the key length is 2^26, which
  • 52:03 - 52:11
    would be something - I I'm not good at
    that converting this to decimal. Would be
  • 52:11 - 52:17
    something like, let me check, two to
    the... I mean, here, math guys, computer
  • 52:17 - 52:22
    guys: how many bits is that?
    [Inaudible audience answer]
  • 52:22 - 52:26
    Sebastian: Say again?
    Crowd: A lot?
  • 52:26 - 52:31
    Sebastian: A lot! So it's, I mean, it's
    it's four megabytes of keylength and you
  • 52:31 - 52:35
    couldn't just brute force that because you
    you would have each of these
  • 52:35 - 52:44
    four megabytes you would have to brute
    force. Got that? So the key is not 2^26
  • 52:44 - 52:52
    but the key length, the keystream length,
    is that long. Got that? So yeah, this is
  • 52:52 - 52:58
    the key space would be longer than the
    Bible. You know, and you couldn't just
  • 52:58 - 53:03
    brute force the Bible, the text of the
    Bible. I mean given enough time yes but
  • 53:03 - 53:09
    you know we will all know the stories with
    in monkeys and typewriters. I mean we can
  • 53:09 - 53:12
    talk about that that offline but you're
    you're mixing up two different numbers
  • 53:12 - 53:18
    there.
    Mic 6: Yeah I guess, thank you.
  • 53:18 - 53:20
    Herald: Questions from the Internet,
    please
  • 53:20 - 53:26
    Signal Angel: Does the MFT encryption work
    the same for Petya and (Not)Petya?
  • 53:26 - 53:34
    Sebastian: Yes, the underlying mechanism
    is the same. The cryptography differs in
  • 53:34 - 53:43
    such a way that the constants number is
    different. So, this little guy here - this
  • 53:43 - 53:50
    is different. It would be usually like
    expand key something-something. And here
  • 53:50 - 53:57
    it's invalid sect ID. So it's somewhat
    different, but the the MFT encryption; I
  • 53:57 - 54:03
    mean the the bytecode the and the
    algorithm is the very same.
  • 54:03 - 54:12
    Signal Angel: Thanks
    Herald: Any more questions? Then please
  • 54:12 - 54:15
    give a great round of applause to
    Sebastian
  • 54:15 - 54:18
    Sebastian: Thank you
  • 54:18 - 54:24
    Applause
  • 54:24 - 54:44
    subtitles created by c3subtitles.de
    in the year 2018. Join, and help us!
Title:
34C3 - Defeating (Not)Petya's Cryptography
Description:

more » « less
Video Language:
English
Duration:
54:44

English subtitles

Revisions