0:00:00.000,0:00:17.860
35C3 Intro music
0:00:17.860,0:00:23.065
Herald Angel: OK. So this talk is called[br]"A deep dive into the world of DOS
0:00:23.065,0:00:33.500
viruses" and if you happened to be at the[br]8C3, that is 27 years ago, you would have
0:00:33.500,0:00:38.599
seen a very young and awkward, even more[br]awkward than I am of the moment, version
0:00:38.599,0:00:46.120
of myself, speaking on basically the same[br]subject. The stage of course was a lot
0:00:46.120,0:00:50.491
smaller than this, this would have really[br]intimidated me back then, but I was
0:00:50.491,0:00:55.160
talking about a university project that we[br]had run for about 3 years at that point,
0:00:55.160,0:01:05.500
and our possibilities were very limited.[br]Meanwhile, 27 years later, our speaker, in
0:01:05.500,0:01:13.040
between fighting battleships over the[br]public BGP network and trying to encode
0:01:13.040,0:01:18.690
data in dubstep music, was able to[br]actually do all of the stuff that we were
0:01:18.690,0:01:25.650
trying to do, with a lot of effort,[br]basically, and I guess 4 hours of CPU time
0:01:25.650,0:01:32.610
or something like that. Please help me in[br]welcoming Ben to our stage, to talk about
0:01:32.610,0:01:35.820
a bygone era.[br]Applause
0:01:35.820,0:01:40.920
Applause
0:01:40.920,0:01:48.340
Ben: Thank you. Hi, I'm Ben Cartwright-[br]Cox, as the slide suggests. So I have an
0:01:48.340,0:01:53.100
admission to make: So this is a thing to[br]be aware of.
0:01:53.100,0:01:56.970
Laughter[br]Ben: And you know, things also to be aware
0:01:56.970,0:02:07.110
of. Anyway. So what is DOS? To get[br]straight into it. You can do it in a
0:02:07.110,0:02:10.947
bullet points way. You know, DOS is an[br]upgrade from CP/M, another very old legacy
0:02:10.947,0:02:14.819
system, but another thing to be aware of[br]is that DOS covers a wide range of
0:02:14.819,0:02:19.950
vendors. Might not just be like those old[br]IBM PCs. Some of the DOSes had
0:02:19.950,0:02:23.950
compatibility with each other, meaning[br]that some of the DOSes had shared malware
0:02:23.950,0:02:31.390
with each other. But to be honest, most[br]people know DOS as these lovely old beige
0:02:31.390,0:02:37.709
boxes; the same era gave us our loved[br]Model M keyboard. Hated by some, loved by
0:02:37.709,0:02:42.840
others, for the sound. But, you know, most[br]people's knowledge of DOS came from
0:02:42.840,0:02:59.599
computers, a user interface that looked[br]like this. Pretty basic. Okay so this is
0:02:59.599,0:03:04.340
Wordstar, some of you may not know that[br]Game of Thrones was written on Wordstar.
0:03:04.340,0:03:09.281
George R. R. Martin is apparently not a[br]big fan of modern word processing. he
0:03:09.281,0:03:16.340
admitted he had some issue with disliking[br]how spell checking worked. So just uses,
0:03:16.340,0:03:18.700
and I also guess it's a good security[br]quality, you know, you can't get hacked,
0:03:18.700,0:03:24.680
if it literally has no Internet access.[br]So, also though, for a lot of people this
0:03:24.680,0:03:28.310
is also their first experience into[br]programming. For the some of the older
0:03:28.310,0:03:36.500
crowd. This is also the invention of[br]QBasic, which, you know, gave a very basic
0:03:36.500,0:03:40.940
language to program creatively in DOS. For[br]some people this was the gateway drug into
0:03:40.940,0:03:47.160
programming and perhaps the gateway drug[br]into what they started as a career. For
0:03:47.160,0:03:52.800
other people the experience of DOS was not[br]so great. For example, you know, let's
0:03:52.800,0:03:57.640
just say you were doing some work in an[br]infinite loop and at some point stuff like
0:03:57.640,0:04:04.001
this happens. Unfortunately I don't have[br]sound for this one, but you can just, in
0:04:04.001,0:04:09.200
your head, imagine like our PC speakers[br]playing some small techno music, on like,
0:04:09.200,0:04:14.310
you know, but only one frequency at a[br]time. This might get especially incredibly
0:04:14.310,0:04:18.589
embarrassing, if you are in an office[br]environment, just slowly beeping away. You
0:04:18.589,0:04:22.770
can't exit this. It has to finish fully and[br]if you touch the keyboard it reminds you
0:04:22.770,0:04:30.069
not to touch the keyboard, and continues[br]playing this music. So, you know, this would be
0:04:30.069,0:04:34.319
fun, but this wouldn't be fun, especially[br]in an office environment. But, you know,
0:04:34.319,0:04:40.339
ultimately it's not malicious. And that[br]trend continues. This is another good
0:04:40.339,0:04:45.240
example of a DOS virus. This is ambulance,[br]for when you run it, an ambulance just
0:04:45.240,0:04:50.589
drives past and then your normal program[br]just continues running. I think this is
0:04:50.589,0:04:56.729
amazing, it's an interesting era of[br]viruses. It was all, the history of it was
0:04:56.729,0:05:01.270
collected very well by a website called VX[br]heavens, which sort of still lives, but
0:05:01.270,0:05:06.629
unfortunately, at one point was raided by[br]the Ukrainian police, for what is the
0:05:06.629,0:05:11.469
fantastic wording they used. Basically,[br]someone told them they were distributing
0:05:11.469,0:05:16.770
Malware. Unfortunately not malware that[br]operates in this century. But I guess
0:05:16.770,0:05:21.710
that's good enough for a raid. But luckily[br]for the archivists there are archivists of
0:05:21.710,0:05:28.809
archivists, and so we have a saved capture[br]of VX heavens. This is actually an old
0:05:28.809,0:05:32.770
snapshot, there are way more modern[br]snapshots, but thankfully the MS DOS virus
0:05:32.770,0:05:38.189
era doesn't move very quickly. So, but the[br]interesting thing here is, like, there's
0:05:38.189,0:05:44.349
66000 items in this tarball and it's 6.6[br]gigabytes of code. And these viruses are
0:05:44.349,0:05:48.580
like super dense. There's not much to[br]them, like they are just blobs of machine
0:05:48.580,0:05:51.520
code. They are not like your electron app[br]these days that ships an entire Chrome
0:05:51.520,0:05:57.219
browser, and normally an out of date[br]Chrome browser, you know, this is just
0:05:57.219,0:06:00.429
basic, like, you know, how to draw an[br]ambulance and, you know, some infection
0:06:00.429,0:06:06.629
routines. The normal distribution also[br]changes with it as well. For example, the
0:06:06.629,0:06:11.059
normal lifecycle of an MS DOS virus is,[br]you know, you download, or for some other
0:06:11.059,0:06:17.560
reason run an infected program that[br]presumably does nothing; to you it looks
0:06:17.560,0:06:22.129
like it does nothing, so, you know,[br]remains roughly undetected. Then you go
0:06:22.129,0:06:27.830
and run more files, the DOS virus infects[br]more files and at some point you're
0:06:27.830,0:06:31.069
probably going to give one of those[br]excutables to some other computer, or some
0:06:31.069,0:06:35.409
other person, whether it was by giving[br]someone or copying a floppy disk of some
0:06:35.409,0:06:38.880
software, maybe some expensive software,[br]so they didn't have to pay for it, or
0:06:38.880,0:06:44.900
uploading it to a BBS, where it could be[br]downloaded by many people. So the
0:06:44.900,0:06:49.689
distribution mechanism is a far cry from[br]the eternal blues of this era, where, you
0:06:49.689,0:06:54.449
know, we can have a strain of malware[br]spread across the world very brutally,
0:06:54.449,0:07:01.709
very quickly. So most DOS viruses are[br]pretty simple: They start, they say "have
0:07:01.709,0:07:06.839
my payload conditions been met?" If not,[br]then they'll go on display, if they are
0:07:06.839,0:07:11.799
met they'll go and display the payload.[br]And the payloads are definitely more,
0:07:11.799,0:07:16.949
I don't know, nice. You know, you have stuff[br]like this, which is pretty and it uses VGA
0:07:16.949,0:07:20.580
colors and all sorts of pretty nice stuff.[br]You get also some very demoscene vibes
0:07:20.580,0:07:26.270
from this. Another good example is this[br]like VGA, like super trippy thing, which
0:07:26.270,0:07:29.909
is really impressive, 'cause this is[br]really small. This is less than 1 kilobyte
0:07:29.909,0:07:34.870
of code. It's in fact way less than 1[br]kilobyte, it's like 64k. Or you just get
0:07:34.870,0:07:38.591
like interesting screen effects as well.[br]For example, it's quick, but like, you can
0:07:38.591,0:07:43.580
just watch the entire computer just[br]dissolve away, which also might be quite
0:07:43.580,0:07:47.929
worrying, if you weren't expecting that.[br]Alternatively, if the payload conditions
0:07:47.929,0:07:52.860
are not met, then, you know, you hook[br]syscalls and you, or alternatively, if you
0:07:52.860,0:07:56.870
want to be way more aggressive, as a[br]malware offer, you scan for files on the
0:07:56.870,0:08:02.649
system to infect proactively. And the way[br]you infect DOS programs is pretty simple:
0:08:02.649,0:08:07.219
Imagining you have like one giant tape of[br]all the code you have for the target
0:08:07.219,0:08:11.499
program. Most of them work like this: They[br]replace the first 3 bytes of the program
0:08:11.499,0:08:16.909
with a x86 jump. They append their malware[br]onto the end of the executable, and so the
0:08:16.909,0:08:19.779
first thing that you do, when you run the[br]executable, is it jumps to the end of the
0:08:19.779,0:08:25.489
file, effectively, runs the malware chunk,[br]and then it optionally will return control
0:08:25.489,0:08:33.800
back to the original program. But there's[br]also the thing about hooking syscalls, right?
0:08:33.800,0:08:39.219
So, you know, MS-DOS is an[br]operating system, it does have syscalls,
0:08:39.219,0:08:43.779
programs can reach out to MS-DOS, to do[br]things like file access and stuff, so as
0:08:43.779,0:08:48.990
you expect, you run a software interrupt[br]to get there. Thankfully though, MS-DOS
0:08:48.990,0:08:55.829
does also allow you to extend MS-DOS by[br]adding handlers itself, or even
0:08:55.829,0:08:59.029
overwriting existing handlers, which is[br]very convenient, if you are trying to
0:08:59.029,0:09:02.160
write drivers, but it's also incredibly[br]convenient, if you're trying to write
0:09:02.160,0:09:09.410
malware. For some of the examples of the[br]syscalls, most of them relevant towards
0:09:09.410,0:09:15.530
DOS virus making. Here's a decent example[br]of the things that DOS will provide you. A lot
0:09:15.530,0:09:21.180
of them are just very useful in general[br]for producing functional executables the
0:09:21.180,0:09:25.660
end users want to use. This is what an[br]average program looks like. This is almost
0:09:25.660,0:09:29.269
the shortest hello world you can make,[br]minus the actual hello world string. In
0:09:29.269,0:09:34.870
fact, the hello world string might be the[br]largest part of this binary. It's a pretty
0:09:34.870,0:09:40.480
simple binary. Here we we're moving a[br]pointer to the message we just set. We
0:09:40.480,0:09:50.410
then set the AH register to 9, or hex 9.[br]That's the syscall for printing a string,
0:09:50.410,0:09:58.300
and then we run a software interrupt, 21h,[br]which is short for 21 hex, and we continue on.
0:09:58.300,0:10:06.589
We then set AH again, to 4C, which is[br]exit with a return code, and the program
0:10:06.589,0:10:12.439
will return. So, in the meantime, this is[br]roughly the loop that just happened.
0:10:12.439,0:10:18.470
You have your program code, that calls an[br]interrupt and that gets passed over to the
0:10:18.470,0:10:22.189
interrupt handler. In the process of doing[br]this, the CPU has quickly looked at the
0:10:22.189,0:10:28.430
first 100 bytes of memory in the interrupt[br]vector table, IVT, as it's abbreviated,
0:10:28.430,0:10:32.300
and then it's effectively a router. If[br]anyone has written like a small piece of
0:10:32.300,0:10:36.149
code to route HTTP requests, or anything,[br]it's basically like that, but in the 80s,
0:10:36.149,0:10:41.029
with syscalls. So it's just basically[br]saying "Compare this, compare that, jump
0:10:41.029,0:10:46.240
there, jump there." Then the thing gets[br]passed to the call handler, it goes and
0:10:46.240,0:10:49.740
does the syscall, the thing that was[br]required. Normally it will leave some
0:10:49.740,0:10:55.130
registers behind, a state, or results of[br]actions it has performed, and it returns
0:10:55.130,0:10:59.519
control back to the program. So,[br]theoretically speaking, if we wanted to go
0:10:59.519,0:11:04.199
and look at what a program actually does[br]we need to set a break point here, because
0:11:04.199,0:11:11.030
this is the only place that we can be sure[br]the location exists, because this is way
0:11:11.030,0:11:15.760
before the era of ASLR, address space[br]randomisation, and this is way, way before
0:11:15.760,0:11:19.819
the era of kernel space randomisation, in[br]fact, MS DOS has almost no memory
0:11:19.819,0:11:24.610
protection whatsoever. Once you run a[br]program you are basically putting the full
0:11:24.610,0:11:29.430
control of the system to that program,[br]which means you can happily also boot
0:11:29.430,0:11:33.870
things like Linux directly from a COM[br]file, which is handy if you want to
0:11:33.870,0:11:43.860
upgrade. So, if we look at certain files[br]we can go and see what they do. So in this
0:11:43.860,0:11:50.110
case, here is one example. This is a goat[br]file. A goat file is like a sacrificial
0:11:50.110,0:11:54.699
goat. It is a file that is purely designed[br]to be infected. So what you do is you
0:11:54.699,0:11:59.790
bring a virus into into memory in the[br]system and then you run a goat file, in
0:11:59.790,0:12:03.879
the vague hope that the virus will infect[br]it, and then you have a nice clean sample
0:12:03.879,0:12:08.450
of just that virus and not another program[br]inside the virus, which makes it way
0:12:08.450,0:12:12.079
easier to test and reverse engineer. So,[br]we can see things are happening here. For
0:12:12.079,0:12:16.600
example, we can see it opening a file,[br]moving like where it's looking into the
0:12:16.600,0:12:19.770
file, reading some data from the file,[br]just 2 bytes, though, and it closes a
0:12:19.770,0:12:23.839
file. We see the same sort of thing repeat[br]itself, except at one point it reads a
0:12:23.839,0:12:27.529
large amount of data, moves the file[br]pointer, writes another large amount of
0:12:27.529,0:12:32.769
data, does some more stuff, and yeah, we[br]pass some filenames, we display a string,
0:12:32.769,0:12:39.230
which is almost definitely the goat file[br]message and yeah, we pretty much exit
0:12:39.230,0:12:42.860
after that. So, there were a few syscalls[br]here that we would really like to know
0:12:42.860,0:12:48.790
more about. So, for that, it's the open[br]files, we'd really like to know what files
0:12:48.790,0:12:52.870
were being opened. We would also want to[br]know what, we'd like to know, what data
0:12:52.870,0:12:55.950
was being written to the file, rather than[br]having to fish it out of the virtual
0:12:55.950,0:13:00.550
machine later, and we'd also, just out of[br]curiosity, really want to know what
0:13:00.550,0:13:05.420
filenames it was asking MS-DOS to parse.[br]Display string is also a nice test to
0:13:05.420,0:13:08.519
know, whether your code is working. So to[br]do this you're gonna have to look a little
0:13:08.519,0:13:14.529
bit deeper into how the MS-DOS runtime[br]and, by proxy, how x86 in 16-bit mode
0:13:14.529,0:13:20.250
works, or legacy mode, I guess. This is[br]basically all the registers you have in
0:13:20.250,0:13:26.120
16-bit mode, and some nice computations at[br]the bottom, to make it easier to read.
0:13:26.120,0:13:33.550
So, as we mentioned, AH is the one that you[br]use to specify, which syscall you want,
0:13:33.550,0:13:40.339
and you'll notice it's not there. AH is[br]actually the upper half of AX. AH is a
0:13:40.339,0:13:46.320
8-bit register, because sometimes people[br]really just wanted only 8 bits. It's very
0:13:46.320,0:13:53.579
obscure that we were saving that much[br]space. And so, this is what a, this is the
0:13:53.579,0:13:57.660
definition of the syscall of a print[br]string. So you have AH needs to be set to
0:13:57.660,0:14:02.839
9, this is once you, in order to call the[br]syscall for printing string, you set AH to
0:14:02.839,0:14:09.070
9, and then you need to set DS and DX to a[br]pointer to a string that ends in a dollar.
0:14:09.070,0:14:11.890
And that doesn't make a lot of sense, or[br]it didn't make a lot of sense to me, when
0:14:11.890,0:14:15.579
I first read that and so, to do this,[br]we need to learn a little bit more about
0:14:15.579,0:14:19.730
how memory works, on these old CPUs, or[br]the CPUs that are probably in your
0:14:19.730,0:14:25.720
laptops, but running in an older mode. So[br]this is effectively what it looks like.
0:14:25.720,0:14:31.839
They have a 16-bit CPU, 2 to the 16 is 64[br]kilobytes, and we have a 20-bit memory
0:14:31.839,0:14:36.350
addressing space. 2 to 20 is 1 megabyte,[br]so if you ever see an MS-DOS machine like
0:14:36.350,0:14:39.519
limiting at 1 megabyte, or some old[br]operating system, saying like the maximum
0:14:39.519,0:14:43.980
memory you can have is 1 megabyte, it's[br]because it's running in 16 bit mode. And
0:14:43.980,0:14:50.249
the maximum it can physically see is 20[br]bits. So the question is: How do we
0:14:50.249,0:14:58.580
address anything above 64K? If the CPU can[br]only fundamentally see 16 bits. So, this
0:14:58.580,0:15:02.399
is where segment registers come in. We[br]have 4 segment registers, actually we
0:15:02.399,0:15:05.899
might have more, but they're the ones who[br]need to care about. There's the code
0:15:05.899,0:15:10.819
segment, the data segment, the stack[br]segment and the extra segment, in case you
0:15:10.819,0:15:15.420
need just another one. So anyway, with[br]that in mind, let's have a quick crash
0:15:15.420,0:15:21.419
course on segment registers. So, imagine[br]if you have a very long piece of memory,
0:15:21.419,0:15:30.430
and we can only see 16 bits at a time. So,[br]however, we can move the sliding window
0:15:30.430,0:15:36.180
around in the memory, to go and see, like,[br]to move our view of where it is. So, we
0:15:36.180,0:15:42.410
can do this and put data around the[br]system, and we can use the final pointer
0:15:42.410,0:15:48.589
to specify, how far in to the memory[br]segment we should go. So the DS and DX
0:15:48.589,0:15:55.360
really just means a multiplier. So, where[br]the data segment is 100, you need to just
0:15:55.360,0:16:01.350
move 100 times 16 to get to the correct[br]place in memory, and then DX is the
0:16:01.350,0:16:09.170
offset. This continues on, so, where we[br]have a 16 bit cpu, we have a bunch of
0:16:09.170,0:16:13.220
general use registers or general purpose[br]registers. They're quite useful for
0:16:13.220,0:16:17.379
ensuring, you don't need to touch RAM too[br]often. x86 actually has a fairly small
0:16:17.379,0:16:25.240
amount of general purpose registers. Some[br]architectures have way more. I think more
0:16:25.240,0:16:32.139
modern chips like GPUs have hundreds, well[br]hundreds, maybe thousands. However, this
0:16:32.139,0:16:34.699
doesn't really change over time in x86[br]because we have to force backwards
0:16:34.699,0:16:38.139
compatibility. So, really what actually[br]ends up happening, when we move up the
0:16:38.139,0:16:42.709
bittage, is that the same registers just[br]get wider, and we add some more ones for
0:16:42.709,0:16:45.499
the programmers, that want them, and the[br]exact same thing happened to 64 bit: The
0:16:45.499,0:16:52.970
registers just got wider. So thinking[br]about it, we have a lot of malware now,
0:16:52.970,0:16:58.319
what if we want to know everything that's[br]happened in this entire archive. So we
0:16:58.319,0:17:01.420
kind of want to trace all of these[br]automatically, but we might not know what
0:17:01.420,0:17:04.480
we're looking for, so let's go through the[br]checklist of what we need to do, to trace
0:17:04.480,0:17:09.335
all of this malware. We need to break[br]point on the syscall handler. When we get
0:17:09.335,0:17:13.260
that breakpoint, we need to save all the[br]registers, so we know which syscall was
0:17:13.260,0:17:19.880
run and potentially what data is being[br]given to the syscall. Ideally, we're going
0:17:19.880,0:17:25.130
to save one hundred bytes from that data[br]pointer, not especially because we need
0:17:25.130,0:17:28.149
it, but it's quite handy in a lot of[br]registers in a lot of syscalls. It's for
0:17:28.149,0:17:34.429
example what you use to get the open file[br]path, when you're opening files. We should
0:17:34.429,0:17:37.649
also, probably, record the screen for[br]quick analysis, rather than just staring
0:17:37.649,0:17:43.870
at HTML tables, and so we can do that, we[br]burn a lot of CPU time and probably cause
0:17:43.870,0:17:51.120
some minor amounts of environmental[br]damage. And we get nothing. We just run a
0:17:51.120,0:17:55.080
bunch of stuff and most of them don't[br]return anything. At best they return a
0:17:55.080,0:18:02.770
goat file string. They just do nothing.[br]So, if we look deeper into the reason why,
0:18:02.770,0:18:05.490
it's sort of a smoking gun here, so we can[br]see the syscalls that run on this file
0:18:05.490,0:18:09.840
that does nothing, and the smoking gun[br]here is the date. So it's asking for the
0:18:09.840,0:18:15.190
date from the system, and this sort of[br]flags out the first issue, is that a lot
0:18:15.190,0:18:18.750
of MS-DOS viruses don't really have a lot[br]to go on, because they have no internet
0:18:18.750,0:18:24.180
connection, and there's not really any[br]other state they can decide to activate on.
0:18:24.180,0:18:28.600
So the date syscall is pretty simple.[br]The get date and get time just return all
0:18:28.600,0:18:34.360
of their values as registers. And, you[br]know, some using the 8-bit halves, to save
0:18:34.360,0:18:44.970
space. So, a naive way of doing this, is[br]what we do, is we would run the sample,
0:18:44.970,0:18:50.030
we'd wait for the syscall for date or[br]time, we would just fiddle the values,
0:18:50.030,0:18:53.240
'cause in this case we're using a debugger,[br]so we can automatically change, what the
0:18:53.240,0:18:56.760
state registers are, and we can then[br]observe to see, if any of the syscalls
0:18:56.760,0:18:59.580
that the program ran changed, which is a[br]pretty good indication that you've hit
0:18:59.580,0:19:04.330
some behavior that is different. And then,[br]you know, we can say "Hooray, we found a
0:19:04.330,0:19:08.330
new test case!" The downside is: running[br]every one of these samples takes 15
0:19:08.330,0:19:13.940
seconds of CPU-time because MS-DOS, well,[br]15 seconds of wall-time, which,
0:19:13.940,0:19:18.080
when you are emulating MS-DOS is 15[br]seconds of CPU-time because of the fact
0:19:18.080,0:19:20.610
that MS-DOS doesn't have power saving[br]mode, so when it's not doing anything, it
0:19:20.610,0:19:27.120
just goes into a busy loop which makes it[br]very hard to optimize. Or we could take a
0:19:27.120,0:19:33.350
cleverer look. So when we think about it,[br]we are in the interrupt handler where all
0:19:33.350,0:19:36.830
we ever see is the insides of the[br]interrupt handler because we don't know
0:19:36.830,0:19:40.990
where the program code is. The interrupt[br]handler is the only place that we know is
0:19:40.990,0:19:45.450
consistent because MS-DOS could[br]potentially load the code for the malware
0:19:45.450,0:19:50.610
or the program anywhere. But we want to[br]know where the code is. It would be really
0:19:50.610,0:19:54.250
handy to know what the code is that we'd[br]be about to run. So for this we need to
0:19:54.250,0:19:59.190
look towards the stack. Just like the DSN[br]DX registers the stacks are located on a
0:19:59.190,0:20:02.970
stack segment, on a stack pointer.[br]Luckily, the first two values is the
0:20:02.970,0:20:07.130
interrupt, the interrupt pointer in the[br]stack segment so we can use that to grab
0:20:07.130,0:20:10.779
exactly where, what the code will be run[br]afterwards. So we just need to add a few
0:20:10.779,0:20:14.440
things to our checklist. We need to grab 4[br]bytes from the stack pointer and then
0:20:14.440,0:20:18.370
using that, we can calculate the[br]destination that the syscall will return
0:20:18.370,0:20:22.549
to. And if we look at some of them - we[br]can look at an example here - well, this
0:20:22.549,0:20:27.243
is what a piece of what one of the calls[br]returns to us. So we see we running a compare
0:20:27.243,0:20:36.640
on DL against the HEX of 0x1E. And then[br]if that comparison is equal it will
0:20:36.640,0:20:43.171
jump to 1 memory address. And if not it[br]will jump to another. So if we look back
0:20:43.171,0:20:52.560
at the definition of those syscalls we can[br]see that DL is the day. So with this we
0:20:52.560,0:21:01.150
can conclude that D if 0x1e is 30 and DL[br]is the day this malware effectively is
0:21:01.150,0:21:07.120
saying if the day of month is 30 we need[br]to go down a different path. If we run
0:21:07.120,0:21:11.950
these all over time across the whole[br]dataset what we see is roughly this as a
0:21:11.950,0:21:21.740
polydome bar chart. We see out of the 17.500[br]samples we have around 4.700 of them
0:21:21.740,0:21:24.330
checked for the date and time and these[br]are the ones that are really tricky
0:21:24.330,0:21:27.590
because they're really hard to activate.[br]They're also the most interesting though, because
0:21:27.590,0:21:33.900
those are the ones trying to hide. So, with[br]that in mind, we need to, we have the code
0:21:33.900,0:21:38.100
segment that we're about to run, when we[br]return and we can't really brute force
0:21:38.100,0:21:43.730
because it takes a little CPU-time and we[br]can't brute force it inside a 'real' or
0:21:43.730,0:21:47.419
emulated machine but we can brute force it[br]in a significantly more interesting way.
0:21:47.419,0:21:53.960
We need to build something: we need to[br]build the world's worst x86 emulator so
0:21:53.960,0:22:02.019
dubbed BenX86, it's 16-bit only. Any[br]attempt to access memory effectively ends
0:22:02.019,0:22:06.029
the simulation. It's got a fake stack if[br]you try and push something onto the stack
0:22:06.029,0:22:09.640
it says sure, fine if you try and pop it[br]it's like oh actually I never held any of
0:22:09.640,0:22:13.690
that data anyway so we are ending the[br]simulation. 80 opcodes, most of them are
0:22:13.690,0:22:18.900
jumps. Because that's the primary[br]purposes, comparing and jumps. The
0:22:18.900,0:22:23.630
difference is it logs every opcode every[br]address that it went trough and it can be
0:22:23.630,0:22:29.210
run with just a small x86 code segment and[br]a register snapshot. This means that we
0:22:29.210,0:22:34.909
can test old age from 1980 to 2005 and are[br]roughly about 100 milliseconds and most
0:22:34.909,0:22:40.860
programs ended up having just 3 different[br]code paths on average so that yields us
0:22:40.860,0:22:48.019
with 17.000 virus samples and about 10.000[br]of samples that had date variations as in:
0:22:48.019,0:22:53.539
Once you exploit the complexity. So I'm[br]going to now use my final remaining time
0:22:53.539,0:22:59.769
to go through some of my favorites. So[br]this is an example of a virus that just
0:22:59.769,0:23:04.440
doesn't do anything on the 1st of 1980.[br]However if you'd happen to be running this
0:23:04.440,0:23:08.477
on New Year's Day you would get this. [br]Laughter
0:23:08.477,0:23:10.610
No matter what you do, every program you can't
0:23:10.610,0:23:14.940
exit out of this, your machine is hung. This[br]might be great, right? You might be like:
0:23:14.940,0:23:19.040
'Oh cool, I don't need to do work anymore[br]because my computer will literally not let me'
0:23:19.040,0:23:21.049
This also might be terrible, because[br]you might need to do some work on New
0:23:21.049,0:23:28.100
Year's day. Here's another example. This[br]does nothing as well just another innocent
0:23:28.100,0:23:33.600
.com file. Of course reminding these[br]pieces of malware will be wrapped around
0:23:33.600,0:23:37.620
something else. Almost anything could be[br]infected in here. In this case though
0:23:37.620,0:23:46.880
these binary is a nice and shaped down.[br]However instead we get this, which I think
0:23:46.880,0:23:53.564
is super interesting and is basically the[br]author is aware - they're telling you they
0:23:53.564,0:23:57.110
are actually like self disclosing in[br]saying the previous year I've infected
0:23:57.110,0:24:04.800
your computer. And for some reason it's[br]being nice. They're just saying. Actually
0:24:04.800,0:24:11.580
you have been infected. And as a - I guess a[br]pity - I'm just going to remove myself now.
0:24:11.580,0:24:17.120
I don't really. For some reason it's also[br]encouraging you to buy McAfee. This is
0:24:17.120,0:24:26.179
back in the day when John McAfee himself[br]actually wrote McAfee. Interesting times.
0:24:26.179,0:24:33.059
Definitely interesting times. Here is[br]another example. This one I found
0:24:33.059,0:24:41.450
particularly obscure. On the 8th of[br]November 1980 or any year I think actually
0:24:41.450,0:24:51.110
it turns all zeroes on the system into[br]tiny little glyphs that say "hate" if
0:24:51.110,0:24:54.760
anyone understands this I'd really like to[br]know like I've been thinking about this a
0:24:54.760,0:25:01.950
lot. What does it mean? Is it an artistic[br]statement? Is it. I wish I knew.
0:25:01.950,0:25:05.669
Someone in the audience: it says MATE[br]Ben: There could be a CCC variant says
0:25:05.669,0:25:12.630
MATE. Another good one in that it's the[br]last thing I ever want to see any program
0:25:12.630,0:25:19.669
tell me is this one here where you run it[br]and it says "error eating drive C:". I
0:25:19.669,0:25:25.070
never ever want an error in any program[br]unexpectedly just says 'Sorry almost I
0:25:25.070,0:25:30.159
failed to remove you root file system,[br]don't know why, could you like change your
0:25:30.159,0:25:35.940
settings so I can remove it?' Cheers. And[br]finally this is one of my absolute
0:25:35.940,0:25:41.420
favorites in that it's just brilliant in[br]that it also stops you from running the
0:25:41.420,0:25:46.490
program you want to run it exits[br]prematurely. This is the virus version of
0:25:46.490,0:25:50.607
the Navy SEAL copy pasta. Says "I am an[br]assassin. I want to and I shall kill you."
0:25:50.607,0:25:59.809
"I also hate Aladdin and I also will kill[br]it. I will eliminate you with ...". You know where
0:25:59.809,0:26:04.880
this is going. It says fear[br]the virus that is more powerful than God.
0:26:04.880,0:26:10.830
It only activates on one day though, so[br]it's fine. Thank you for your time. I know
0:26:10.830,0:26:15.480
it's late and I will happily take any[br]questions or corrections if you know this
0:26:15.480,0:26:27.029
topic better than me.[br]applause
0:26:27.029,0:26:33.410
Herald: This totally brings tears to my[br]eyes with nostalgia. So if there is any
0:26:33.410,0:26:37.970
questions, we have microphones distributed around[br]the room, there is like 1,2, 3, 4 and
0:26:37.970,0:26:42.630
one in the back. We also have questions[br]perhaps from the internet if you want to
0:26:42.630,0:26:47.980
ask a question come up to the microphone[br]ask the question just as a reminder a
0:26:47.980,0:26:53.789
question is one or two sentences with a[br]question mark behind it and not a life
0:26:53.789,0:27:00.840
story attached. So let's see what we have.[br]I'm going to start with microphone number
0:27:00.840,0:27:04.470
1 just because I can see it easiest, let's[br]go for it.
0:27:04.470,0:27:09.559
Microphone 1: Hi Ben, thanks for the talk.[br]Really interesting. My question would be
0:27:09.559,0:27:16.297
did you do any analysis on what ratio of[br]the viruses was more artistic
0:27:16.297,0:27:20.690
and which one actually did damage.[br]Ben: So most of them surprisingly don't do
0:27:20.690,0:27:26.450
damage. I actually really struggled to[br]find a date varying sample that
0:27:26.450,0:27:30.140
specifically activated on a certain day[br]and decided to delete every file. There
0:27:30.140,0:27:35.259
are some very good ones in some of them[br]are like virus scanning utilities that just
0:27:35.259,0:27:37.990
don't do anything on certain dates and in[br]one day like while they're telling you all
0:27:37.990,0:27:41.120
the files they are scanning is actually[br]telling you all the files they're
0:27:41.120,0:27:46.120
deleting. So that's particularly cruel but[br]it's actually surprisingly hard to find a
0:27:46.120,0:27:50.480
virus sample that actually was brutally[br]malicious. There was some, that would just,
0:27:50.480,0:27:53.910
you know, infect binaries is but it's very hard[br]to find one that I think was brutally
0:27:53.910,0:27:58.100
malicious, which is a far cry from the days[br]well from the days that we live in right
0:27:58.100,0:28:03.549
now, where we're taking down hospitals with[br]windows bugs.
0:28:03.549,0:28:09.210
Herald: as everybody is leaving the room.[br]Please do it quietly. I see a question at
0:28:09.210,0:28:12.200
(microphone) 3, on that side.[br]Microphone 3: Yes. Since a lot of
0:28:12.200,0:28:19.970
industrial control systems still run DOS.[br]What's the threat from DOS malware that
0:28:19.970,0:28:27.150
might be written today.[br]Ben: It's probably unlikely than an
0:28:27.150,0:28:31.009
Industrial Control System that's running[br]DOS, would come into contact with DOS-malware.
0:28:31.009,0:28:36.010
The only way I can think is if one vendor[br]was like or a factory or supply or
0:28:36.010,0:28:41.049
whatever it was basically downloading all[br]basically wares onto industrial control
0:28:41.049,0:28:47.419
boxes. I wouldn't be surprised but it[br]would be pretty irresponsible. But it
0:28:47.419,0:28:52.510
would be quite surprising to find MS-DOS[br]malware today on industrial controllers
0:28:52.510,0:28:57.110
that was installed recently and not just a[br]lingering infection from the last 20
0:28:57.110,0:29:00.029
years.[br]Herald: Microphone 2
0:29:00.029,0:29:05.000
Microphone 2: Did you find any conditions[br]that weren't date based. Some of them do
0:29:05.000,0:29:09.610
attempt to some of them try and circumvent[br]the date recognition. Unfortunately it's
0:29:09.610,0:29:12.809
very hard to brute force those. Some of[br]them install themselves as what's called
0:29:12.809,0:29:19.710
TSR or Terminate and Stay Resident which[br]basically means that they will exit out,
0:29:19.710,0:29:23.750
run in the background and continuously ask[br]the actual system time what time it is.
0:29:23.750,0:29:27.639
It's a bit of a more risky strategy[br]because the system timer might not exist
0:29:27.639,0:29:31.650
which would be unfortunate for the virus.[br]So definitely there are viruses that have
0:29:31.650,0:29:38.340
way more complicated execution conditions.[br]I observed one sample that only activated
0:29:38.340,0:29:43.850
after I believe it was something silly[br]like 100 keypresses which is very hard to
0:29:43.850,0:29:49.770
automatically test. Those sort of viruses[br]require static analysis and statically
0:29:49.770,0:29:54.480
analyzing 17.000 samples is a time[br]consuming task.
0:29:54.480,0:30:02.009
Herald: So we have a question from the Internet.[br]Signal Angel: Do you have the source? What
0:30:02.009,0:30:07.990
is the source of the malware that you[br]analyzed here, is it published somewhere?
0:30:07.990,0:30:13.400
Ben:You can still find dump's of VX[br]heavens, and more modern dumps of VX
0:30:13.400,0:30:17.990
heavens on popular torrent websites.[br]But I'm sure there are also copies
0:30:17.990,0:30:21.399
floating about on non-popular torrent[br]websites.
0:30:21.399,0:30:24.810
Laughter[br]Herald: Over to microphone 1.
0:30:24.810,0:30:32.240
Microphone 1: Hi Ben. I'm Jope. Thank you[br]for your talk. I was wondering: did you
0:30:32.240,0:30:36.639
learn anything from your studies of these[br]viruses that should be taught in modern
0:30:36.639,0:30:42.820
day computer science classes like more[br]efficient sorting algorithm or some hidden
0:30:42.820,0:30:47.080
gem that actually should be part of[br]computing these days.
0:30:47.080,0:30:53.570
Ben: My primary takeaway was x86 was a[br]mistake.
0:30:53.570,0:31:01.320
Laughter & applause[br]Herald: So I'm not seeing any more
0:31:01.320,0:31:04.480
questions. Oh no there is. OK one more[br]question from the internet.
0:31:04.480,0:31:11.389
Signal angel: Have you found malware[br]samples that did like try to detect dummy
0:31:11.389,0:31:14.617
binaries or whatever, to avoid easy[br]analysis?
0:31:14.617,0:31:20.007
Ben: Oh actually, that's a really good question. [br]So it is it's complicated:
0:31:20.007,0:31:24.580
So some viruses would so, maybe let's be
0:31:25.027,0:31:29.770
dangerous let's try and go backwards on my[br]home written presentation software. So
0:31:29.770,0:31:41.160
humming Too many slides. I have[br]regrets. Yes. OK. Here we are. This slide.
0:31:41.160,0:31:45.450
OK. So you know here I'm saying that the[br]malware infection goes to the end. Well
0:31:45.450,0:31:49.850
some samples are really cool. They don't[br]change the size of the file. They just
0:31:49.850,0:31:54.590
find areas in the files that are full of[br]null bites and just say this is probably
0:31:54.590,0:32:00.230
fine. I'm just going to put myself here[br]which may have unintended consequences. It
0:32:00.230,0:32:04.960
may mean if a program is like a statically[br]typed, statically defined byte array of
0:32:04.960,0:32:10.039
like a certain size and the program is[br]relying on it being zeros when it accesses
0:32:10.039,0:32:14.440
it for the first time it may get very[br]surprised to find some malware code in
0:32:14.440,0:32:20.159
there. But generally speaking as far as[br]I'm aware, this deployment
0:32:20.159,0:32:26.220
procedure works pretty well and actually[br]is very good at avoiding antivirus of the
0:32:26.220,0:32:30.390
era which would just be checking like[br]common system files and its size. And you
0:32:30.390,0:32:35.059
know the size increases of COMMAND.COM[br]then that's clearly bad news.
0:32:35.059,0:32:38.450
Herald: We have a question on microphone[br]1.
0:32:38.450,0:32:45.620
Microphone 1: Are there any viruses that[br]try to eliminate or manipulate virus
0:32:45.620,0:32:48.970
scanners of the day.[br]Oh yeah. So a lot of the samples will
0:32:48.970,0:32:52.960
actively go and look for files of other[br]anti-viruses.
0:32:52.960,0:32:57.159
But I am generally under the impression[br]that it's kind of hard to find them. They
0:32:57.159,0:33:01.750
weren't actually that many antivirus[br]products back in the day.
0:33:01.750,0:33:06.410
I feel like, it was a bit of a niche thing to[br]be running. Microsoft did for a while ship
0:33:06.410,0:33:14.330
their own antivirus with MS-DOS. So I[br]guess you know what's new is old. So there
0:33:14.330,0:33:17.860
were antiviruses out there. I don't think[br]many of them were very effective.
0:33:17.860,0:33:27.260
Herald: Any more questions? There, where?[br]Oh right. Another one from the Internet.
0:33:27.260,0:33:32.049
It's interesting that the internet is[br]querying MS-DOS all the time. Go ahead.
0:33:32.049,0:33:38.000
Signal angel: Did you do the diagrams by[br]hand or do you have a tool?
0:33:38.000,0:33:42.559
Ben: So many hours. No. So there's a[br]couple of good tools to do it.
0:33:42.559,0:33:46.429
asciiflow.org. I think is a fantastic[br]tool. I would highly recommend it. I think
0:33:46.429,0:33:52.779
it's not maintained very well, though.[br]Herald: microphone 1.
0:33:52.779,0:33:55.519
Microphone 1: Are you publishing the tools[br]you wrote?
0:33:55.519,0:34:02.429
Ben: I will be publishing the tools at[br]some point when they are less... when they
0:34:02.429,0:34:08.320
are less ugly. I will be publishing all of[br]the automatic malware runs and the gifs
0:34:08.320,0:34:12.929
generated by them so that people can[br]easily search google for the virus names
0:34:12.929,0:34:16.890
and get like actual real time versions.[br]The hardest thing that I've found is when
0:34:16.890,0:34:21.710
looking at virus names was literally just[br]finding any information about them and one
0:34:21.710,0:34:25.220
of the things I really wish existed at the[br]time of writing this talk, was being able
0:34:25.220,0:34:29.580
to just query a name and be like oh yeah[br]this virus it looks like it does this.
0:34:29.580,0:34:33.420
Herald: since I saw microphone 1 first[br]let's go with that.
0:34:33.420,0:34:40.260
Microphone 1: Did you find any viruses[br]that had signage in them not signage of
0:34:40.260,0:34:43.520
today but the name of the author. Like he[br]was very proud of what he wrote.
0:34:43.520,0:34:47.450
Ben: Yeah, there are some notable[br]examples. Quite a few of them will try and
0:34:47.450,0:34:52.870
name - so DOS-viruses do like have[br][incomprehensible] sample names in the same way
0:34:52.870,0:34:57.470
that we'd still today give viruses names.[br]A lot of the time you will just encode a
0:34:57.470,0:35:01.131
string that you want the virus to be[br]named, you know, somewhere in the file
0:35:01.131,0:35:04.472
just a random string doing nothing. It's[br]like oh, ok, they clearly wanted the virus
0:35:04.472,0:35:11.430
to be called Tempest. So that does happen.[br]One of the favorite examples is the brain
0:35:11.430,0:35:16.750
malware which literally encodes an address[br]and phone number of the author. I believe
0:35:16.750,0:35:22.720
in Pakistan and there's a fantastic mini[br]documentary by F-Secure where they go and
0:35:22.720,0:35:25.850
visit the people who wrote it. It's a[br]super interesting watch and I would really
0:35:25.850,0:35:29.990
recommend it.[br]Herald: Indeed it is. Microphone 2?
0:35:29.990,0:35:36.260
Microphone 2: Did you have any chance to[br]look at any kind of viruses that did not
0:35:36.260,0:35:42.330
modify the files themselves. For example[br]one of the largest virus infections at the time was a
0:35:42.330,0:35:46.080
virus called [incomprehensible] which modified[br]the master boot record
0:35:46.080,0:35:51.060
Ben: Yes, Master boot record, I did[br]consider. It was more of a time problem
0:35:51.060,0:35:55.320
that I had in getting to the point where[br]you could brute force time and date
0:35:55.320,0:36:01.020
combinations and looking for master boot[br]record changes. It was really hard. I am
0:36:01.020,0:36:06.610
super interested in reviewing a fact to be[br]the root kits of the era. But yes that's
0:36:06.610,0:36:10.220
definitely something I will look into in[br]the future.
0:36:10.220,0:36:14.410
Herald: And we have yet another question[br]from the Internet.
0:36:14.410,0:36:17.400
Signal angel: And it's even from the same[br]guy.
0:36:17.400,0:36:22.830
Ben: Oh damn.[br]Signal angel: is the BenX86 software open-
0:36:22.830,0:36:25.530
source or can be found on the web[br]somewhere.
0:36:25.530,0:36:29.870
Ben: It probably will be. I wouldn't[br]expect it to work in, well, in any use-case
0:36:29.870,0:36:36.360
though. It's effectively designed to like[br]not work correctly, right? Like what
0:36:36.360,0:36:40.880
was the spec? It basically like fails at[br]every single thing awkward. I just went
0:36:40.880,0:36:46.660
like oh that's fine. We're probably far[br]enough down there anyway. Are we? Be aware
0:36:46.660,0:36:50.740
this is the feature list.[br]Herald: So is that a follow up question
0:36:50.740,0:36:57.010
from the internet?[br]Signal angel: No it's a new one. I don't
0:36:57.010,0:37:02.660
know how serious it is but would it be[br]possible or a good idea to use machine
0:37:02.660,0:37:09.500
learning to create new DOS malware from[br]the existing samples.
0:37:09.500,0:37:17.021
Laughter & applause[br]Ben: It would not be a good idea. But I
0:37:17.021,0:37:24.230
like how you think.[br]Herald: Actually I saw somebody trying to
0:37:24.230,0:37:27.640
use NLP to generate viruses but ok that's[br]enough for now.
0:37:27.640,0:37:32.400
Ben: you could probably do Markov Chains[br]with x86 to be honest. Please don't do
0:37:32.400,0:37:34.530
that, please![br]Herald: Don't try this at home.
0:37:34.530,0:37:37.480
Ben: I have seen things I've seen. Just[br]please don't do that.
0:37:37.480,0:37:43.461
Herald: So I think we've run out of[br]questions. Going once, going twice. Let's
0:37:43.461,0:37:49.520
thank Ben for this marvelous retrospective[br]talk.[br]Big applause
0:37:49.520,0:37:58.785
36C3 postroll music
0:37:58.785,0:38:12.000
subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!