< Return to Video

35C3 - A deep dive into the world of DOS viruses

  • 0:00 - 0:18
    35C3 Intro music
  • 0:18 - 0:23
    Herald Angel: OK. So this talk is called
    "A deep dive into the world of DOS
  • 0:23 - 0:34
    viruses" and if you happened to be at the
    8C3, that is 27 years ago, you would have
  • 0:34 - 0:39
    seen a very young and awkward, even more
    awkward than I am of the moment, version
  • 0:39 - 0:46
    of myself, speaking on basically the same
    subject. The stage of course was a lot
  • 0:46 - 0:50
    smaller than this, this would have really
    intimidated me back then, but I was
  • 0:50 - 0:55
    talking about a university project that we
    had run for about 3 years at that point,
  • 0:55 - 1:06
    and our possibilities were very limited.
    Meanwhile, 27 years later, our speaker, in
  • 1:06 - 1:13
    between fighting battleships over the
    public BGP network and trying to encode
  • 1:13 - 1:19
    data in dubstep music, was able to
    actually do all of the stuff that we were
  • 1:19 - 1:26
    trying to do, with a lot of effort,
    basically, and I guess 4 hours of CPU time
  • 1:26 - 1:33
    or something like that. Please help me in
    welcoming Ben to our stage, to talk about
  • 1:33 - 1:36
    a bygone era.
    Applause
  • 1:36 - 1:41
    Applause
  • 1:41 - 1:48
    Ben: Thank you. Hi, I'm Ben Cartwright-
    Cox, as the slide suggests. So I have an
  • 1:48 - 1:53
    admission to make: So this is a thing to
    be aware of.
  • 1:53 - 1:57
    Laughter
    Ben: And you know, things also to be aware
  • 1:57 - 2:07
    of. Anyway. So what is DOS? To get
    straight into it. You can do it in a
  • 2:07 - 2:11
    bullet points way. You know, DOS is an
    upgrade from CP/M, another very old legacy
  • 2:11 - 2:15
    system, but another thing to be aware of
    is that DOS covers a wide range of
  • 2:15 - 2:20
    vendors. Might not just be like those old
    IBM PCs. Some of the DOSes had
  • 2:20 - 2:24
    compatibility with each other, meaning
    that some of the DOSes had shared malware
  • 2:24 - 2:31
    with each other. But to be honest, most
    people know DOS as these lovely old beige
  • 2:31 - 2:38
    boxes; the same era gave us our loved
    Model M keyboard. Hated by some, loved by
  • 2:38 - 2:43
    others, for the sound. But, you know, most
    people's knowledge of DOS came from
  • 2:43 - 3:00
    computers, a user interface that looked
    like this. Pretty basic. Okay so this is
  • 3:00 - 3:04
    Wordstar, some of you may not know that
    Game of Thrones was written on Wordstar.
  • 3:04 - 3:09
    George R. R. Martin is apparently not a
    big fan of modern word processing. he
  • 3:09 - 3:16
    admitted he had some issue with disliking
    how spell checking worked. So just uses,
  • 3:16 - 3:19
    and I also guess it's a good security
    quality, you know, you can't get hacked,
  • 3:19 - 3:25
    if it literally has no Internet access.
    So, also though, for a lot of people this
  • 3:25 - 3:28
    is also their first experience into
    programming. For the some of the older
  • 3:28 - 3:36
    crowd. This is also the invention of
    QBasic, which, you know, gave a very basic
  • 3:36 - 3:41
    language to program creatively in DOS. For
    some people this was the gateway drug into
  • 3:41 - 3:47
    programming and perhaps the gateway drug
    into what they started as a career. For
  • 3:47 - 3:53
    other people the experience of DOS was not
    so great. For example, you know, let's
  • 3:53 - 3:58
    just say you were doing some work in an
    infinite loop and at some point stuff like
  • 3:58 - 4:04
    this happens. Unfortunately I don't have
    sound for this one, but you can just, in
  • 4:04 - 4:09
    your head, imagine like our PC speakers
    playing some small techno music, on like,
  • 4:09 - 4:14
    you know, but only one frequency at a
    time. This might get especially incredibly
  • 4:14 - 4:19
    embarrassing, if you are in an office
    environment, just slowly beeping away. You
  • 4:19 - 4:23
    can't exit this. It has to finish fully and
    if you touch the keyboard it reminds you
  • 4:23 - 4:30
    not to touch the keyboard, and continues
    playing this music. So, you know, this would be
  • 4:30 - 4:34
    fun, but this wouldn't be fun, especially
    in an office environment. But, you know,
  • 4:34 - 4:40
    ultimately it's not malicious. And that
    trend continues. This is another good
  • 4:40 - 4:45
    example of a DOS virus. This is ambulance,
    for when you run it, an ambulance just
  • 4:45 - 4:51
    drives past and then your normal program
    just continues running. I think this is
  • 4:51 - 4:57
    amazing, it's an interesting era of
    viruses. It was all, the history of it was
  • 4:57 - 5:01
    collected very well by a website called VX
    heavens, which sort of still lives, but
  • 5:01 - 5:07
    unfortunately, at one point was raided by
    the Ukrainian police, for what is the
  • 5:07 - 5:11
    fantastic wording they used. Basically,
    someone told them they were distributing
  • 5:11 - 5:17
    Malware. Unfortunately not malware that
    operates in this century. But I guess
  • 5:17 - 5:22
    that's good enough for a raid. But luckily
    for the archivists there are archivists of
  • 5:22 - 5:29
    archivists, and so we have a saved capture
    of VX heavens. This is actually an old
  • 5:29 - 5:33
    snapshot, there are way more modern
    snapshots, but thankfully the MS DOS virus
  • 5:33 - 5:38
    era doesn't move very quickly. So, but the
    interesting thing here is, like, there's
  • 5:38 - 5:44
    66000 items in this tarball and it's 6.6
    gigabytes of code. And these viruses are
  • 5:44 - 5:49
    like super dense. There's not much to
    them, like they are just blobs of machine
  • 5:49 - 5:52
    code. They are not like your electron app
    these days that ships an entire Chrome
  • 5:52 - 5:57
    browser, and normally an out of date
    Chrome browser, you know, this is just
  • 5:57 - 6:00
    basic, like, you know, how to draw an
    ambulance and, you know, some infection
  • 6:00 - 6:07
    routines. The normal distribution also
    changes with it as well. For example, the
  • 6:07 - 6:11
    normal lifecycle of an MS DOS virus is,
    you know, you download, or for some other
  • 6:11 - 6:18
    reason run an infected program that
    presumably does nothing; to you it looks
  • 6:18 - 6:22
    like it does nothing, so, you know,
    remains roughly undetected. Then you go
  • 6:22 - 6:28
    and run more files, the DOS virus infects
    more files and at some point you're
  • 6:28 - 6:31
    probably going to give one of those
    excutables to some other computer, or some
  • 6:31 - 6:35
    other person, whether it was by giving
    someone or copying a floppy disk of some
  • 6:35 - 6:39
    software, maybe some expensive software,
    so they didn't have to pay for it, or
  • 6:39 - 6:45
    uploading it to a BBS, where it could be
    downloaded by many people. So the
  • 6:45 - 6:50
    distribution mechanism is a far cry from
    the eternal blues of this era, where, you
  • 6:50 - 6:54
    know, we can have a strain of malware
    spread across the world very brutally,
  • 6:54 - 7:02
    very quickly. So most DOS viruses are
    pretty simple: They start, they say "have
  • 7:02 - 7:07
    my payload conditions been met?" If not,
    then they'll go on display, if they are
  • 7:07 - 7:12
    met they'll go and display the payload.
    And the payloads are definitely more,
  • 7:12 - 7:17
    I don't know, nice. You know, you have stuff
    like this, which is pretty and it uses VGA
  • 7:17 - 7:21
    colors and all sorts of pretty nice stuff.
    You get also some very demoscene vibes
  • 7:21 - 7:26
    from this. Another good example is this
    like VGA, like super trippy thing, which
  • 7:26 - 7:30
    is really impressive, 'cause this is
    really small. This is less than 1 kilobyte
  • 7:30 - 7:35
    of code. It's in fact way less than 1
    kilobyte, it's like 64k. Or you just get
  • 7:35 - 7:39
    like interesting screen effects as well.
    For example, it's quick, but like, you can
  • 7:39 - 7:44
    just watch the entire computer just
    dissolve away, which also might be quite
  • 7:44 - 7:48
    worrying, if you weren't expecting that.
    Alternatively, if the payload conditions
  • 7:48 - 7:53
    are not met, then, you know, you hook
    syscalls and you, or alternatively, if you
  • 7:53 - 7:57
    want to be way more aggressive, as a
    malware offer, you scan for files on the
  • 7:57 - 8:03
    system to infect proactively. And the way
    you infect DOS programs is pretty simple:
  • 8:03 - 8:07
    Imagining you have like one giant tape of
    all the code you have for the target
  • 8:07 - 8:11
    program. Most of them work like this: They
    replace the first 3 bytes of the program
  • 8:11 - 8:17
    with a x86 jump. They append their malware
    onto the end of the executable, and so the
  • 8:17 - 8:20
    first thing that you do, when you run the
    executable, is it jumps to the end of the
  • 8:20 - 8:25
    file, effectively, runs the malware chunk,
    and then it optionally will return control
  • 8:25 - 8:34
    back to the original program. But there's
    also the thing about hooking syscalls, right?
  • 8:34 - 8:39
    So, you know, MS-DOS is an
    operating system, it does have syscalls,
  • 8:39 - 8:44
    programs can reach out to MS-DOS, to do
    things like file access and stuff, so as
  • 8:44 - 8:49
    you expect, you run a software interrupt
    to get there. Thankfully though, MS-DOS
  • 8:49 - 8:56
    does also allow you to extend MS-DOS by
    adding handlers itself, or even
  • 8:56 - 8:59
    overwriting existing handlers, which is
    very convenient, if you are trying to
  • 8:59 - 9:02
    write drivers, but it's also incredibly
    convenient, if you're trying to write
  • 9:02 - 9:09
    malware. For some of the examples of the
    syscalls, most of them relevant towards
  • 9:09 - 9:16
    DOS virus making. Here's a decent example
    of the things that DOS will provide you. A lot
  • 9:16 - 9:21
    of them are just very useful in general
    for producing functional executables the
  • 9:21 - 9:26
    end users want to use. This is what an
    average program looks like. This is almost
  • 9:26 - 9:29
    the shortest hello world you can make,
    minus the actual hello world string. In
  • 9:29 - 9:35
    fact, the hello world string might be the
    largest part of this binary. It's a pretty
  • 9:35 - 9:40
    simple binary. Here we we're moving a
    pointer to the message we just set. We
  • 9:40 - 9:50
    then set the AH register to 9, or hex 9.
    That's the syscall for printing a string,
  • 9:50 - 9:58
    and then we run a software interrupt, 21h,
    which is short for 21 hex, and we continue on.
  • 9:58 - 10:07
    We then set AH again, to 4C, which is
    exit with a return code, and the program
  • 10:07 - 10:12
    will return. So, in the meantime, this is
    roughly the loop that just happened.
  • 10:12 - 10:18
    You have your program code, that calls an
    interrupt and that gets passed over to the
  • 10:18 - 10:22
    interrupt handler. In the process of doing
    this, the CPU has quickly looked at the
  • 10:22 - 10:28
    first 100 bytes of memory in the interrupt
    vector table, IVT, as it's abbreviated,
  • 10:28 - 10:32
    and then it's effectively a router. If
    anyone has written like a small piece of
  • 10:32 - 10:36
    code to route HTTP requests, or anything,
    it's basically like that, but in the 80s,
  • 10:36 - 10:41
    with syscalls. So it's just basically
    saying "Compare this, compare that, jump
  • 10:41 - 10:46
    there, jump there." Then the thing gets
    passed to the call handler, it goes and
  • 10:46 - 10:50
    does the syscall, the thing that was
    required. Normally it will leave some
  • 10:50 - 10:55
    registers behind, a state, or results of
    actions it has performed, and it returns
  • 10:55 - 11:00
    control back to the program. So,
    theoretically speaking, if we wanted to go
  • 11:00 - 11:04
    and look at what a program actually does
    we need to set a break point here, because
  • 11:04 - 11:11
    this is the only place that we can be sure
    the location exists, because this is way
  • 11:11 - 11:16
    before the era of ASLR, address space
    randomisation, and this is way, way before
  • 11:16 - 11:20
    the era of kernel space randomisation, in
    fact, MS DOS has almost no memory
  • 11:20 - 11:25
    protection whatsoever. Once you run a
    program you are basically putting the full
  • 11:25 - 11:29
    control of the system to that program,
    which means you can happily also boot
  • 11:29 - 11:34
    things like Linux directly from a COM
    file, which is handy if you want to
  • 11:34 - 11:44
    upgrade. So, if we look at certain files
    we can go and see what they do. So in this
  • 11:44 - 11:50
    case, here is one example. This is a goat
    file. A goat file is like a sacrificial
  • 11:50 - 11:55
    goat. It is a file that is purely designed
    to be infected. So what you do is you
  • 11:55 - 12:00
    bring a virus into into memory in the
    system and then you run a goat file, in
  • 12:00 - 12:04
    the vague hope that the virus will infect
    it, and then you have a nice clean sample
  • 12:04 - 12:08
    of just that virus and not another program
    inside the virus, which makes it way
  • 12:08 - 12:12
    easier to test and reverse engineer. So,
    we can see things are happening here. For
  • 12:12 - 12:17
    example, we can see it opening a file,
    moving like where it's looking into the
  • 12:17 - 12:20
    file, reading some data from the file,
    just 2 bytes, though, and it closes a
  • 12:20 - 12:24
    file. We see the same sort of thing repeat
    itself, except at one point it reads a
  • 12:24 - 12:28
    large amount of data, moves the file
    pointer, writes another large amount of
  • 12:28 - 12:33
    data, does some more stuff, and yeah, we
    pass some filenames, we display a string,
  • 12:33 - 12:39
    which is almost definitely the goat file
    message and yeah, we pretty much exit
  • 12:39 - 12:43
    after that. So, there were a few syscalls
    here that we would really like to know
  • 12:43 - 12:49
    more about. So, for that, it's the open
    files, we'd really like to know what files
  • 12:49 - 12:53
    were being opened. We would also want to
    know what, we'd like to know, what data
  • 12:53 - 12:56
    was being written to the file, rather than
    having to fish it out of the virtual
  • 12:56 - 13:01
    machine later, and we'd also, just out of
    curiosity, really want to know what
  • 13:01 - 13:05
    filenames it was asking MS-DOS to parse.
    Display string is also a nice test to
  • 13:05 - 13:09
    know, whether your code is working. So to
    do this you're gonna have to look a little
  • 13:09 - 13:15
    bit deeper into how the MS-DOS runtime
    and, by proxy, how x86 in 16-bit mode
  • 13:15 - 13:20
    works, or legacy mode, I guess. This is
    basically all the registers you have in
  • 13:20 - 13:26
    16-bit mode, and some nice computations at
    the bottom, to make it easier to read.
  • 13:26 - 13:34
    So, as we mentioned, AH is the one that you
    use to specify, which syscall you want,
  • 13:34 - 13:40
    and you'll notice it's not there. AH is
    actually the upper half of AX. AH is a
  • 13:40 - 13:46
    8-bit register, because sometimes people
    really just wanted only 8 bits. It's very
  • 13:46 - 13:54
    obscure that we were saving that much
    space. And so, this is what a, this is the
  • 13:54 - 13:58
    definition of the syscall of a print
    string. So you have AH needs to be set to
  • 13:58 - 14:03
    9, this is once you, in order to call the
    syscall for printing string, you set AH to
  • 14:03 - 14:09
    9, and then you need to set DS and DX to a
    pointer to a string that ends in a dollar.
  • 14:09 - 14:12
    And that doesn't make a lot of sense, or
    it didn't make a lot of sense to me, when
  • 14:12 - 14:16
    I first read that and so, to do this,
    we need to learn a little bit more about
  • 14:16 - 14:20
    how memory works, on these old CPUs, or
    the CPUs that are probably in your
  • 14:20 - 14:26
    laptops, but running in an older mode. So
    this is effectively what it looks like.
  • 14:26 - 14:32
    They have a 16-bit CPU, 2 to the 16 is 64
    kilobytes, and we have a 20-bit memory
  • 14:32 - 14:36
    addressing space. 2 to 20 is 1 megabyte,
    so if you ever see an MS-DOS machine like
  • 14:36 - 14:40
    limiting at 1 megabyte, or some old
    operating system, saying like the maximum
  • 14:40 - 14:44
    memory you can have is 1 megabyte, it's
    because it's running in 16 bit mode. And
  • 14:44 - 14:50
    the maximum it can physically see is 20
    bits. So the question is: How do we
  • 14:50 - 14:59
    address anything above 64K? If the CPU can
    only fundamentally see 16 bits. So, this
  • 14:59 - 15:02
    is where segment registers come in. We
    have 4 segment registers, actually we
  • 15:02 - 15:06
    might have more, but they're the ones who
    need to care about. There's the code
  • 15:06 - 15:11
    segment, the data segment, the stack
    segment and the extra segment, in case you
  • 15:11 - 15:15
    need just another one. So anyway, with
    that in mind, let's have a quick crash
  • 15:15 - 15:21
    course on segment registers. So, imagine
    if you have a very long piece of memory,
  • 15:21 - 15:30
    and we can only see 16 bits at a time. So,
    however, we can move the sliding window
  • 15:30 - 15:36
    around in the memory, to go and see, like,
    to move our view of where it is. So, we
  • 15:36 - 15:42
    can do this and put data around the
    system, and we can use the final pointer
  • 15:42 - 15:49
    to specify, how far in to the memory
    segment we should go. So the DS and DX
  • 15:49 - 15:55
    really just means a multiplier. So, where
    the data segment is 100, you need to just
  • 15:55 - 16:01
    move 100 times 16 to get to the correct
    place in memory, and then DX is the
  • 16:01 - 16:09
    offset. This continues on, so, where we
    have a 16 bit cpu, we have a bunch of
  • 16:09 - 16:13
    general use registers or general purpose
    registers. They're quite useful for
  • 16:13 - 16:17
    ensuring, you don't need to touch RAM too
    often. x86 actually has a fairly small
  • 16:17 - 16:25
    amount of general purpose registers. Some
    architectures have way more. I think more
  • 16:25 - 16:32
    modern chips like GPUs have hundreds, well
    hundreds, maybe thousands. However, this
  • 16:32 - 16:35
    doesn't really change over time in x86
    because we have to force backwards
  • 16:35 - 16:38
    compatibility. So, really what actually
    ends up happening, when we move up the
  • 16:38 - 16:43
    bittage, is that the same registers just
    get wider, and we add some more ones for
  • 16:43 - 16:45
    the programmers, that want them, and the
    exact same thing happened to 64 bit: The
  • 16:45 - 16:53
    registers just got wider. So thinking
    about it, we have a lot of malware now,
  • 16:53 - 16:58
    what if we want to know everything that's
    happened in this entire archive. So we
  • 16:58 - 17:01
    kind of want to trace all of these
    automatically, but we might not know what
  • 17:01 - 17:04
    we're looking for, so let's go through the
    checklist of what we need to do, to trace
  • 17:04 - 17:09
    all of this malware. We need to break
    point on the syscall handler. When we get
  • 17:09 - 17:13
    that breakpoint, we need to save all the
    registers, so we know which syscall was
  • 17:13 - 17:20
    run and potentially what data is being
    given to the syscall. Ideally, we're going
  • 17:20 - 17:25
    to save one hundred bytes from that data
    pointer, not especially because we need
  • 17:25 - 17:28
    it, but it's quite handy in a lot of
    registers in a lot of syscalls. It's for
  • 17:28 - 17:34
    example what you use to get the open file
    path, when you're opening files. We should
  • 17:34 - 17:38
    also, probably, record the screen for
    quick analysis, rather than just staring
  • 17:38 - 17:44
    at HTML tables, and so we can do that, we
    burn a lot of CPU time and probably cause
  • 17:44 - 17:51
    some minor amounts of environmental
    damage. And we get nothing. We just run a
  • 17:51 - 17:55
    bunch of stuff and most of them don't
    return anything. At best they return a
  • 17:55 - 18:03
    goat file string. They just do nothing.
    So, if we look deeper into the reason why,
  • 18:03 - 18:05
    it's sort of a smoking gun here, so we can
    see the syscalls that run on this file
  • 18:05 - 18:10
    that does nothing, and the smoking gun
    here is the date. So it's asking for the
  • 18:10 - 18:15
    date from the system, and this sort of
    flags out the first issue, is that a lot
  • 18:15 - 18:19
    of MS-DOS viruses don't really have a lot
    to go on, because they have no internet
  • 18:19 - 18:24
    connection, and there's not really any
    other state they can decide to activate on.
  • 18:24 - 18:29
    So the date syscall is pretty simple.
    The get date and get time just return all
  • 18:29 - 18:34
    of their values as registers. And, you
    know, some using the 8-bit halves, to save
  • 18:34 - 18:45
    space. So, a naive way of doing this, is
    what we do, is we would run the sample,
  • 18:45 - 18:50
    we'd wait for the syscall for date or
    time, we would just fiddle the values,
  • 18:50 - 18:53
    'cause in this case we're using a debugger,
    so we can automatically change, what the
  • 18:53 - 18:57
    state registers are, and we can then
    observe to see, if any of the syscalls
  • 18:57 - 19:00
    that the program ran changed, which is a
    pretty good indication that you've hit
  • 19:00 - 19:04
    some behavior that is different. And then,
    you know, we can say "Hooray, we found a
  • 19:04 - 19:08
    new test case!" The downside is: running
    every one of these samples takes 15
  • 19:08 - 19:14
    seconds of CPU-time because MS-DOS, well,
    15 seconds of wall-time, which,
  • 19:14 - 19:18
    when you are emulating MS-DOS is 15
    seconds of CPU-time because of the fact
  • 19:18 - 19:21
    that MS-DOS doesn't have power saving
    mode, so when it's not doing anything, it
  • 19:21 - 19:27
    just goes into a busy loop which makes it
    very hard to optimize. Or we could take a
  • 19:27 - 19:33
    cleverer look. So when we think about it,
    we are in the interrupt handler where all
  • 19:33 - 19:37
    we ever see is the insides of the
    interrupt handler because we don't know
  • 19:37 - 19:41
    where the program code is. The interrupt
    handler is the only place that we know is
  • 19:41 - 19:45
    consistent because MS-DOS could
    potentially load the code for the malware
  • 19:45 - 19:51
    or the program anywhere. But we want to
    know where the code is. It would be really
  • 19:51 - 19:54
    handy to know what the code is that we'd
    be about to run. So for this we need to
  • 19:54 - 19:59
    look towards the stack. Just like the DSN
    DX registers the stacks are located on a
  • 19:59 - 20:03
    stack segment, on a stack pointer.
    Luckily, the first two values is the
  • 20:03 - 20:07
    interrupt, the interrupt pointer in the
    stack segment so we can use that to grab
  • 20:07 - 20:11
    exactly where, what the code will be run
    afterwards. So we just need to add a few
  • 20:11 - 20:14
    things to our checklist. We need to grab 4
    bytes from the stack pointer and then
  • 20:14 - 20:18
    using that, we can calculate the
    destination that the syscall will return
  • 20:18 - 20:23
    to. And if we look at some of them - we
    can look at an example here - well, this
  • 20:23 - 20:27
    is what a piece of what one of the calls
    returns to us. So we see we running a compare
  • 20:27 - 20:37
    on DL against the HEX of 0x1E. And then
    if that comparison is equal it will
  • 20:37 - 20:43
    jump to 1 memory address. And if not it
    will jump to another. So if we look back
  • 20:43 - 20:53
    at the definition of those syscalls we can
    see that DL is the day. So with this we
  • 20:53 - 21:01
    can conclude that D if 0x1e is 30 and DL
    is the day this malware effectively is
  • 21:01 - 21:07
    saying if the day of month is 30 we need
    to go down a different path. If we run
  • 21:07 - 21:12
    these all over time across the whole
    dataset what we see is roughly this as a
  • 21:12 - 21:22
    polydome bar chart. We see out of the 17.500
    samples we have around 4.700 of them
  • 21:22 - 21:24
    checked for the date and time and these
    are the ones that are really tricky
  • 21:24 - 21:28
    because they're really hard to activate.
    They're also the most interesting though, because
  • 21:28 - 21:34
    those are the ones trying to hide. So, with
    that in mind, we need to, we have the code
  • 21:34 - 21:38
    segment that we're about to run, when we
    return and we can't really brute force
  • 21:38 - 21:44
    because it takes a little CPU-time and we
    can't brute force it inside a 'real' or
  • 21:44 - 21:47
    emulated machine but we can brute force it
    in a significantly more interesting way.
  • 21:47 - 21:54
    We need to build something: we need to
    build the world's worst x86 emulator so
  • 21:54 - 22:02
    dubbed BenX86, it's 16-bit only. Any
    attempt to access memory effectively ends
  • 22:02 - 22:06
    the simulation. It's got a fake stack if
    you try and push something onto the stack
  • 22:06 - 22:10
    it says sure, fine if you try and pop it
    it's like oh actually I never held any of
  • 22:10 - 22:14
    that data anyway so we are ending the
    simulation. 80 opcodes, most of them are
  • 22:14 - 22:19
    jumps. Because that's the primary
    purposes, comparing and jumps. The
  • 22:19 - 22:24
    difference is it logs every opcode every
    address that it went trough and it can be
  • 22:24 - 22:29
    run with just a small x86 code segment and
    a register snapshot. This means that we
  • 22:29 - 22:35
    can test old age from 1980 to 2005 and are
    roughly about 100 milliseconds and most
  • 22:35 - 22:41
    programs ended up having just 3 different
    code paths on average so that yields us
  • 22:41 - 22:48
    with 17.000 virus samples and about 10.000
    of samples that had date variations as in:
  • 22:48 - 22:54
    Once you exploit the complexity. So I'm
    going to now use my final remaining time
  • 22:54 - 23:00
    to go through some of my favorites. So
    this is an example of a virus that just
  • 23:00 - 23:04
    doesn't do anything on the 1st of 1980.
    However if you'd happen to be running this
  • 23:04 - 23:08
    on New Year's Day you would get this.
    Laughter
  • 23:08 - 23:11
    No matter what you do, every program you can't
  • 23:11 - 23:15
    exit out of this, your machine is hung. This
    might be great, right? You might be like:
  • 23:15 - 23:19
    'Oh cool, I don't need to do work anymore
    because my computer will literally not let me'
  • 23:19 - 23:21
    This also might be terrible, because
    you might need to do some work on New
  • 23:21 - 23:28
    Year's day. Here's another example. This
    does nothing as well just another innocent
  • 23:28 - 23:34
    .com file. Of course reminding these
    pieces of malware will be wrapped around
  • 23:34 - 23:38
    something else. Almost anything could be
    infected in here. In this case though
  • 23:38 - 23:47
    these binary is a nice and shaped down.
    However instead we get this, which I think
  • 23:47 - 23:54
    is super interesting and is basically the
    author is aware - they're telling you they
  • 23:54 - 23:57
    are actually like self disclosing in
    saying the previous year I've infected
  • 23:57 - 24:05
    your computer. And for some reason it's
    being nice. They're just saying. Actually
  • 24:05 - 24:12
    you have been infected. And as a - I guess a
    pity - I'm just going to remove myself now.
  • 24:12 - 24:17
    I don't really. For some reason it's also
    encouraging you to buy McAfee. This is
  • 24:17 - 24:26
    back in the day when John McAfee himself
    actually wrote McAfee. Interesting times.
  • 24:26 - 24:33
    Definitely interesting times. Here is
    another example. This one I found
  • 24:33 - 24:41
    particularly obscure. On the 8th of
    November 1980 or any year I think actually
  • 24:41 - 24:51
    it turns all zeroes on the system into
    tiny little glyphs that say "hate" if
  • 24:51 - 24:55
    anyone understands this I'd really like to
    know like I've been thinking about this a
  • 24:55 - 25:02
    lot. What does it mean? Is it an artistic
    statement? Is it. I wish I knew.
  • 25:02 - 25:06
    Someone in the audience: it says MATE
    Ben: There could be a CCC variant says
  • 25:06 - 25:13
    MATE. Another good one in that it's the
    last thing I ever want to see any program
  • 25:13 - 25:20
    tell me is this one here where you run it
    and it says "error eating drive C:". I
  • 25:20 - 25:25
    never ever want an error in any program
    unexpectedly just says 'Sorry almost I
  • 25:25 - 25:30
    failed to remove you root file system,
    don't know why, could you like change your
  • 25:30 - 25:36
    settings so I can remove it?' Cheers. And
    finally this is one of my absolute
  • 25:36 - 25:41
    favorites in that it's just brilliant in
    that it also stops you from running the
  • 25:41 - 25:46
    program you want to run it exits
    prematurely. This is the virus version of
  • 25:46 - 25:51
    the Navy SEAL copy pasta. Says "I am an
    assassin. I want to and I shall kill you."
  • 25:51 - 26:00
    "I also hate Aladdin and I also will kill
    it. I will eliminate you with ...". You know where
  • 26:00 - 26:05
    this is going. It says fear
    the virus that is more powerful than God.
  • 26:05 - 26:11
    It only activates on one day though, so
    it's fine. Thank you for your time. I know
  • 26:11 - 26:15
    it's late and I will happily take any
    questions or corrections if you know this
  • 26:15 - 26:27
    topic better than me.
    applause
  • 26:27 - 26:33
    Herald: This totally brings tears to my
    eyes with nostalgia. So if there is any
  • 26:33 - 26:38
    questions, we have microphones distributed around
    the room, there is like 1,2, 3, 4 and
  • 26:38 - 26:43
    one in the back. We also have questions
    perhaps from the internet if you want to
  • 26:43 - 26:48
    ask a question come up to the microphone
    ask the question just as a reminder a
  • 26:48 - 26:54
    question is one or two sentences with a
    question mark behind it and not a life
  • 26:54 - 27:01
    story attached. So let's see what we have.
    I'm going to start with microphone number
  • 27:01 - 27:04
    1 just because I can see it easiest, let's
    go for it.
  • 27:04 - 27:10
    Microphone 1: Hi Ben, thanks for the talk.
    Really interesting. My question would be
  • 27:10 - 27:16
    did you do any analysis on what ratio of
    the viruses was more artistic
  • 27:16 - 27:21
    and which one actually did damage.
    Ben: So most of them surprisingly don't do
  • 27:21 - 27:26
    damage. I actually really struggled to
    find a date varying sample that
  • 27:26 - 27:30
    specifically activated on a certain day
    and decided to delete every file. There
  • 27:30 - 27:35
    are some very good ones in some of them
    are like virus scanning utilities that just
  • 27:35 - 27:38
    don't do anything on certain dates and in
    one day like while they're telling you all
  • 27:38 - 27:41
    the files they are scanning is actually
    telling you all the files they're
  • 27:41 - 27:46
    deleting. So that's particularly cruel but
    it's actually surprisingly hard to find a
  • 27:46 - 27:50
    virus sample that actually was brutally
    malicious. There was some, that would just,
  • 27:50 - 27:54
    you know, infect binaries is but it's very hard
    to find one that I think was brutally
  • 27:54 - 27:58
    malicious, which is a far cry from the days
    well from the days that we live in right
  • 27:58 - 28:04
    now, where we're taking down hospitals with
    windows bugs.
  • 28:04 - 28:09
    Herald: as everybody is leaving the room.
    Please do it quietly. I see a question at
  • 28:09 - 28:12
    (microphone) 3, on that side.
    Microphone 3: Yes. Since a lot of
  • 28:12 - 28:20
    industrial control systems still run DOS.
    What's the threat from DOS malware that
  • 28:20 - 28:27
    might be written today.
    Ben: It's probably unlikely than an
  • 28:27 - 28:31
    Industrial Control System that's running
    DOS, would come into contact with DOS-malware.
  • 28:31 - 28:36
    The only way I can think is if one vendor
    was like or a factory or supply or
  • 28:36 - 28:41
    whatever it was basically downloading all
    basically wares onto industrial control
  • 28:41 - 28:47
    boxes. I wouldn't be surprised but it
    would be pretty irresponsible. But it
  • 28:47 - 28:53
    would be quite surprising to find MS-DOS
    malware today on industrial controllers
  • 28:53 - 28:57
    that was installed recently and not just a
    lingering infection from the last 20
  • 28:57 - 29:00
    years.
    Herald: Microphone 2
  • 29:00 - 29:05
    Microphone 2: Did you find any conditions
    that weren't date based. Some of them do
  • 29:05 - 29:10
    attempt to some of them try and circumvent
    the date recognition. Unfortunately it's
  • 29:10 - 29:13
    very hard to brute force those. Some of
    them install themselves as what's called
  • 29:13 - 29:20
    TSR or Terminate and Stay Resident which
    basically means that they will exit out,
  • 29:20 - 29:24
    run in the background and continuously ask
    the actual system time what time it is.
  • 29:24 - 29:28
    It's a bit of a more risky strategy
    because the system timer might not exist
  • 29:28 - 29:32
    which would be unfortunate for the virus.
    So definitely there are viruses that have
  • 29:32 - 29:38
    way more complicated execution conditions.
    I observed one sample that only activated
  • 29:38 - 29:44
    after I believe it was something silly
    like 100 keypresses which is very hard to
  • 29:44 - 29:50
    automatically test. Those sort of viruses
    require static analysis and statically
  • 29:50 - 29:54
    analyzing 17.000 samples is a time
    consuming task.
  • 29:54 - 30:02
    Herald: So we have a question from the Internet.
    Signal Angel: Do you have the source? What
  • 30:02 - 30:08
    is the source of the malware that you
    analyzed here, is it published somewhere?
  • 30:08 - 30:13
    Ben:You can still find dump's of VX
    heavens, and more modern dumps of VX
  • 30:13 - 30:18
    heavens on popular torrent websites.
    But I'm sure there are also copies
  • 30:18 - 30:21
    floating about on non-popular torrent
    websites.
  • 30:21 - 30:25
    Laughter
    Herald: Over to microphone 1.
  • 30:25 - 30:32
    Microphone 1: Hi Ben. I'm Jope. Thank you
    for your talk. I was wondering: did you
  • 30:32 - 30:37
    learn anything from your studies of these
    viruses that should be taught in modern
  • 30:37 - 30:43
    day computer science classes like more
    efficient sorting algorithm or some hidden
  • 30:43 - 30:47
    gem that actually should be part of
    computing these days.
  • 30:47 - 30:54
    Ben: My primary takeaway was x86 was a
    mistake.
  • 30:54 - 31:01
    Laughter & applause
    Herald: So I'm not seeing any more
  • 31:01 - 31:04
    questions. Oh no there is. OK one more
    question from the internet.
  • 31:04 - 31:11
    Signal angel: Have you found malware
    samples that did like try to detect dummy
  • 31:11 - 31:15
    binaries or whatever, to avoid easy
    analysis?
  • 31:15 - 31:20
    Ben: Oh actually, that's a really good question.
    So it is it's complicated:
  • 31:20 - 31:25
    So some viruses would so, maybe let's be
  • 31:25 - 31:30
    dangerous let's try and go backwards on my
    home written presentation software. So
  • 31:30 - 31:41
    humming Too many slides. I have
    regrets. Yes. OK. Here we are. This slide.
  • 31:41 - 31:45
    OK. So you know here I'm saying that the
    malware infection goes to the end. Well
  • 31:45 - 31:50
    some samples are really cool. They don't
    change the size of the file. They just
  • 31:50 - 31:55
    find areas in the files that are full of
    null bites and just say this is probably
  • 31:55 - 32:00
    fine. I'm just going to put myself here
    which may have unintended consequences. It
  • 32:00 - 32:05
    may mean if a program is like a statically
    typed, statically defined byte array of
  • 32:05 - 32:10
    like a certain size and the program is
    relying on it being zeros when it accesses
  • 32:10 - 32:14
    it for the first time it may get very
    surprised to find some malware code in
  • 32:14 - 32:20
    there. But generally speaking as far as
    I'm aware, this deployment
  • 32:20 - 32:26
    procedure works pretty well and actually
    is very good at avoiding antivirus of the
  • 32:26 - 32:30
    era which would just be checking like
    common system files and its size. And you
  • 32:30 - 32:35
    know the size increases of COMMAND.COM
    then that's clearly bad news.
  • 32:35 - 32:38
    Herald: We have a question on microphone
    1.
  • 32:38 - 32:46
    Microphone 1: Are there any viruses that
    try to eliminate or manipulate virus
  • 32:46 - 32:49
    scanners of the day.
    Oh yeah. So a lot of the samples will
  • 32:49 - 32:53
    actively go and look for files of other
    anti-viruses.
  • 32:53 - 32:57
    But I am generally under the impression
    that it's kind of hard to find them. They
  • 32:57 - 33:02
    weren't actually that many antivirus
    products back in the day.
  • 33:02 - 33:06
    I feel like, it was a bit of a niche thing to
    be running. Microsoft did for a while ship
  • 33:06 - 33:14
    their own antivirus with MS-DOS. So I
    guess you know what's new is old. So there
  • 33:14 - 33:18
    were antiviruses out there. I don't think
    many of them were very effective.
  • 33:18 - 33:27
    Herald: Any more questions? There, where?
    Oh right. Another one from the Internet.
  • 33:27 - 33:32
    It's interesting that the internet is
    querying MS-DOS all the time. Go ahead.
  • 33:32 - 33:38
    Signal angel: Did you do the diagrams by
    hand or do you have a tool?
  • 33:38 - 33:43
    Ben: So many hours. No. So there's a
    couple of good tools to do it.
  • 33:43 - 33:46
    asciiflow.org. I think is a fantastic
    tool. I would highly recommend it. I think
  • 33:46 - 33:53
    it's not maintained very well, though.
    Herald: microphone 1.
  • 33:53 - 33:56
    Microphone 1: Are you publishing the tools
    you wrote?
  • 33:56 - 34:02
    Ben: I will be publishing the tools at
    some point when they are less... when they
  • 34:02 - 34:08
    are less ugly. I will be publishing all of
    the automatic malware runs and the gifs
  • 34:08 - 34:13
    generated by them so that people can
    easily search google for the virus names
  • 34:13 - 34:17
    and get like actual real time versions.
    The hardest thing that I've found is when
  • 34:17 - 34:22
    looking at virus names was literally just
    finding any information about them and one
  • 34:22 - 34:25
    of the things I really wish existed at the
    time of writing this talk, was being able
  • 34:25 - 34:30
    to just query a name and be like oh yeah
    this virus it looks like it does this.
  • 34:30 - 34:33
    Herald: since I saw microphone 1 first
    let's go with that.
  • 34:33 - 34:40
    Microphone 1: Did you find any viruses
    that had signage in them not signage of
  • 34:40 - 34:44
    today but the name of the author. Like he
    was very proud of what he wrote.
  • 34:44 - 34:47
    Ben: Yeah, there are some notable
    examples. Quite a few of them will try and
  • 34:47 - 34:53
    name - so DOS-viruses do like have
    [incomprehensible] sample names in the same way
  • 34:53 - 34:57
    that we'd still today give viruses names.
    A lot of the time you will just encode a
  • 34:57 - 35:01
    string that you want the virus to be
    named, you know, somewhere in the file
  • 35:01 - 35:04
    just a random string doing nothing. It's
    like oh, ok, they clearly wanted the virus
  • 35:04 - 35:11
    to be called Tempest. So that does happen.
    One of the favorite examples is the brain
  • 35:11 - 35:17
    malware which literally encodes an address
    and phone number of the author. I believe
  • 35:17 - 35:23
    in Pakistan and there's a fantastic mini
    documentary by F-Secure where they go and
  • 35:23 - 35:26
    visit the people who wrote it. It's a
    super interesting watch and I would really
  • 35:26 - 35:30
    recommend it.
    Herald: Indeed it is. Microphone 2?
  • 35:30 - 35:36
    Microphone 2: Did you have any chance to
    look at any kind of viruses that did not
  • 35:36 - 35:42
    modify the files themselves. For example
    one of the largest virus infections at the time was a
  • 35:42 - 35:46
    virus called [incomprehensible] which modified
    the master boot record
  • 35:46 - 35:51
    Ben: Yes, Master boot record, I did
    consider. It was more of a time problem
  • 35:51 - 35:55
    that I had in getting to the point where
    you could brute force time and date
  • 35:55 - 36:01
    combinations and looking for master boot
    record changes. It was really hard. I am
  • 36:01 - 36:07
    super interested in reviewing a fact to be
    the root kits of the era. But yes that's
  • 36:07 - 36:10
    definitely something I will look into in
    the future.
  • 36:10 - 36:14
    Herald: And we have yet another question
    from the Internet.
  • 36:14 - 36:17
    Signal angel: And it's even from the same
    guy.
  • 36:17 - 36:23
    Ben: Oh damn.
    Signal angel: is the BenX86 software open-
  • 36:23 - 36:26
    source or can be found on the web
    somewhere.
  • 36:26 - 36:30
    Ben: It probably will be. I wouldn't
    expect it to work in, well, in any use-case
  • 36:30 - 36:36
    though. It's effectively designed to like
    not work correctly, right? Like what
  • 36:36 - 36:41
    was the spec? It basically like fails at
    every single thing awkward. I just went
  • 36:41 - 36:47
    like oh that's fine. We're probably far
    enough down there anyway. Are we? Be aware
  • 36:47 - 36:51
    this is the feature list.
    Herald: So is that a follow up question
  • 36:51 - 36:57
    from the internet?
    Signal angel: No it's a new one. I don't
  • 36:57 - 37:03
    know how serious it is but would it be
    possible or a good idea to use machine
  • 37:03 - 37:10
    learning to create new DOS malware from
    the existing samples.
  • 37:10 - 37:17
    Laughter & applause
    Ben: It would not be a good idea. But I
  • 37:17 - 37:24
    like how you think.
    Herald: Actually I saw somebody trying to
  • 37:24 - 37:28
    use NLP to generate viruses but ok that's
    enough for now.
  • 37:28 - 37:32
    Ben: you could probably do Markov Chains
    with x86 to be honest. Please don't do
  • 37:32 - 37:35
    that, please!
    Herald: Don't try this at home.
  • 37:35 - 37:37
    Ben: I have seen things I've seen. Just
    please don't do that.
  • 37:37 - 37:43
    Herald: So I think we've run out of
    questions. Going once, going twice. Let's
  • 37:43 - 37:50
    thank Ben for this marvelous retrospective
    talk.
    Big applause
  • 37:50 - 37:59
    36C3 postroll music
  • 37:59 - 38:12
    subtitles created by c3subtitles.de
    in the year 2020. Join, and help us!
Title:
35C3 - A deep dive into the world of DOS viruses
Description:

more » « less
Video Language:
English
Duration:
38:13

English subtitles

Revisions