WEBVTT 00:00:00.000 --> 00:00:17.860 35C3 Intro music 00:00:17.860 --> 00:00:23.065 Herald Angel: OK. So this talk is called "A deep dive into the world of DOS 00:00:23.065 --> 00:00:33.500 viruses" and if you happened to be at the 8C3, that is 27 years ago, you would have 00:00:33.500 --> 00:00:38.599 seen a very young and awkward, even more awkward than I am of the moment, version 00:00:38.599 --> 00:00:46.120 of myself, speaking on basically the same subject. The stage of course was a lot 00:00:46.120 --> 00:00:50.491 smaller than this, this would have really intimidated me back then, but I was 00:00:50.491 --> 00:00:55.160 talking about a university project that we had run for about 3 years at that point, 00:00:55.160 --> 00:01:05.500 and our possibilities were very limited. Meanwhile, 27 years later, our speaker, in 00:01:05.500 --> 00:01:13.040 between fighting battleships over the public BGP network and trying to encode 00:01:13.040 --> 00:01:18.690 data in dubstep music, was able to actually do all of the stuff that we were 00:01:18.690 --> 00:01:25.650 trying to do, with a lot of effort, basically, and I guess 4 hours of CPU time 00:01:25.650 --> 00:01:32.610 or something like that. Please help me in welcoming Ben to our stage, to talk about 00:01:32.610 --> 00:01:35.820 a bygone era. Applause 00:01:35.820 --> 00:01:40.920 Applause 00:01:40.920 --> 00:01:48.340 Ben: Thank you. Hi, I'm Ben Cartwright- Cox, as the slide suggests. So I have an 00:01:48.340 --> 00:01:53.100 admission to make: So this is a thing to be aware of. 00:01:53.100 --> 00:01:56.970 Laughter Ben: And you know, things also to be aware 00:01:56.970 --> 00:02:07.110 of. Anyway. So what is DOS? To get straight into it. You can do it in a 00:02:07.110 --> 00:02:10.947 bullet points way. You know, DOS is an upgrade from CP/M, another very old legacy 00:02:10.947 --> 00:02:14.819 system, but another thing to be aware of is that DOS covers a wide range of 00:02:14.819 --> 00:02:19.950 vendors. Might not just be like those old IBM PCs. Some of the DOSes had 00:02:19.950 --> 00:02:23.950 compatibility with each other, meaning that some of the DOSes had shared malware 00:02:23.950 --> 00:02:31.390 with each other. But to be honest, most people know DOS as these lovely old beige 00:02:31.390 --> 00:02:37.709 boxes; the same era gave us our loved Model M keyboard. Hated by some, loved by 00:02:37.709 --> 00:02:42.840 others, for the sound. But, you know, most people's knowledge of DOS came from 00:02:42.840 --> 00:02:59.599 computers, a user interface that looked like this. Pretty basic. Okay so this is 00:02:59.599 --> 00:03:04.340 Wordstar, some of you may not know that Game of Thrones was written on Wordstar. 00:03:04.340 --> 00:03:09.281 George R. R. Martin is apparently not a big fan of modern word processing. he 00:03:09.281 --> 00:03:16.340 admitted he had some issue with disliking how spell checking worked. So just uses, 00:03:16.340 --> 00:03:18.700 and I also guess it's a good security quality, you know, you can't get hacked, 00:03:18.700 --> 00:03:24.680 if it literally has no Internet access. So, also though, for a lot of people this 00:03:24.680 --> 00:03:28.310 is also their first experience into programming. For the some of the older 00:03:28.310 --> 00:03:36.500 crowd. This is also the invention of QBasic, which, you know, gave a very basic 00:03:36.500 --> 00:03:40.940 language to program creatively in DOS. For some people this was the gateway drug into 00:03:40.940 --> 00:03:47.160 programming and perhaps the gateway drug into what they started as a career. For 00:03:47.160 --> 00:03:52.800 other people the experience of DOS was not so great. For example, you know, let's 00:03:52.800 --> 00:03:57.640 just say you were doing some work in an infinite loop and at some point stuff like 00:03:57.640 --> 00:04:04.001 this happens. Unfortunately I don't have sound for this one, but you can just, in 00:04:04.001 --> 00:04:09.200 your head, imagine like our PC speakers playing some small techno music, on like, 00:04:09.200 --> 00:04:14.310 you know, but only one frequency at a time. This might get especially incredibly 00:04:14.310 --> 00:04:18.589 embarrassing, if you are in an office environment, just slowly beeping away. You 00:04:18.589 --> 00:04:22.770 can't exit this. It has to finish fully and if you touch the keyboard it reminds you 00:04:22.770 --> 00:04:30.069 not to touch the keyboard, and continues playing this music. So, you know, this would be 00:04:30.069 --> 00:04:34.319 fun, but this wouldn't be fun, especially in an office environment. But, you know, 00:04:34.319 --> 00:04:40.339 ultimately it's not malicious. And that trend continues. This is another good 00:04:40.339 --> 00:04:45.240 example of a DOS virus. This is ambulance, for when you run it, an ambulance just 00:04:45.240 --> 00:04:50.589 drives past and then your normal program just continues running. I think this is 00:04:50.589 --> 00:04:56.729 amazing, it's an interesting era of viruses. It was all, the history of it was 00:04:56.729 --> 00:05:01.270 collected very well by a website called VX heavens, which sort of still lives, but 00:05:01.270 --> 00:05:06.629 unfortunately, at one point was raided by the Ukrainian police, for what is the 00:05:06.629 --> 00:05:11.469 fantastic wording they used. Basically, someone told them they were distributing 00:05:11.469 --> 00:05:16.770 Malware. Unfortunately not malware that operates in this century. But I guess 00:05:16.770 --> 00:05:21.710 that's good enough for a raid. But luckily for the archivists there are archivists of 00:05:21.710 --> 00:05:28.809 archivists, and so we have a saved capture of VX heavens. This is actually an old 00:05:28.809 --> 00:05:32.770 snapshot, there are way more modern snapshots, but thankfully the MS DOS virus 00:05:32.770 --> 00:05:38.189 era doesn't move very quickly. So, but the interesting thing here is, like, there's 00:05:38.189 --> 00:05:44.349 66000 items in this tarball and it's 6.6 gigabytes of code. And these viruses are 00:05:44.349 --> 00:05:48.580 like super dense. There's not much to them, like they are just blobs of machine 00:05:48.580 --> 00:05:51.520 code. They are not like your electron app these days that ships an entire Chrome 00:05:51.520 --> 00:05:57.219 browser, and normally an out of date Chrome browser, you know, this is just 00:05:57.219 --> 00:06:00.429 basic, like, you know, how to draw an ambulance and, you know, some infection 00:06:00.429 --> 00:06:06.629 routines. The normal distribution also changes with it as well. For example, the 00:06:06.629 --> 00:06:11.059 normal lifecycle of an MS DOS virus is, you know, you download, or for some other 00:06:11.059 --> 00:06:17.560 reason run an infected program that presumably does nothing; to you it looks 00:06:17.560 --> 00:06:22.129 like it does nothing, so, you know, remains roughly undetected. Then you go 00:06:22.129 --> 00:06:27.830 and run more files, the DOS virus infects more files and at some point you're 00:06:27.830 --> 00:06:31.069 probably going to give one of those excutables to some other computer, or some 00:06:31.069 --> 00:06:35.409 other person, whether it was by giving someone or copying a floppy disk of some 00:06:35.409 --> 00:06:38.880 software, maybe some expensive software, so they didn't have to pay for it, or 00:06:38.880 --> 00:06:44.900 uploading it to a BBS, where it could be downloaded by many people. So the 00:06:44.900 --> 00:06:49.689 distribution mechanism is a far cry from the eternal blues of this era, where, you 00:06:49.689 --> 00:06:54.449 know, we can have a strain of malware spread across the world very brutally, 00:06:54.449 --> 00:07:01.709 very quickly. So most DOS viruses are pretty simple: They start, they say "have 00:07:01.709 --> 00:07:06.839 my payload conditions been met?" If not, then they'll go on display, if they are 00:07:06.839 --> 00:07:11.799 met they'll go and display the payload. And the payloads are definitely more, 00:07:11.799 --> 00:07:16.949 I don't know, nice. You know, you have stuff like this, which is pretty and it uses VGA 00:07:16.949 --> 00:07:20.580 colors and all sorts of pretty nice stuff. You get also some very demoscene vibes 00:07:20.580 --> 00:07:26.270 from this. Another good example is this like VGA, like super trippy thing, which 00:07:26.270 --> 00:07:29.909 is really impressive, 'cause this is really small. This is less than 1 kilobyte 00:07:29.909 --> 00:07:34.870 of code. It's in fact way less than 1 kilobyte, it's like 64k. Or you just get 00:07:34.870 --> 00:07:38.591 like interesting screen effects as well. For example, it's quick, but like, you can 00:07:38.591 --> 00:07:43.580 just watch the entire computer just dissolve away, which also might be quite 00:07:43.580 --> 00:07:47.929 worrying, if you weren't expecting that. Alternatively, if the payload conditions 00:07:47.929 --> 00:07:52.860 are not met, then, you know, you hook syscalls and you, or alternatively, if you 00:07:52.860 --> 00:07:56.870 want to be way more aggressive, as a malware offer, you scan for files on the 00:07:56.870 --> 00:08:02.649 system to infect proactively. And the way you infect DOS programs is pretty simple: 00:08:02.649 --> 00:08:07.219 Imagining you have like one giant tape of all the code you have for the target 00:08:07.219 --> 00:08:11.499 program. Most of them work like this: They replace the first 3 bytes of the program 00:08:11.499 --> 00:08:16.909 with a x86 jump. They append their malware onto the end of the executable, and so the 00:08:16.909 --> 00:08:19.779 first thing that you do, when you run the executable, is it jumps to the end of the 00:08:19.779 --> 00:08:25.489 file, effectively, runs the malware chunk, and then it optionally will return control 00:08:25.489 --> 00:08:33.800 back to the original program. But there's also the thing about hooking syscalls, right? 00:08:33.800 --> 00:08:39.219 So, you know, MS-DOS is an operating system, it does have syscalls, 00:08:39.219 --> 00:08:43.779 programs can reach out to MS-DOS, to do things like file access and stuff, so as 00:08:43.779 --> 00:08:48.990 you expect, you run a software interrupt to get there. Thankfully though, MS-DOS 00:08:48.990 --> 00:08:55.829 does also allow you to extend MS-DOS by adding handlers itself, or even 00:08:55.829 --> 00:08:59.029 overwriting existing handlers, which is very convenient, if you are trying to 00:08:59.029 --> 00:09:02.160 write drivers, but it's also incredibly convenient, if you're trying to write 00:09:02.160 --> 00:09:09.410 malware. For some of the examples of the syscalls, most of them relevant towards 00:09:09.410 --> 00:09:15.530 DOS virus making. Here's a decent example of the things that DOS will provide you. A lot 00:09:15.530 --> 00:09:21.180 of them are just very useful in general for producing functional executables the 00:09:21.180 --> 00:09:25.660 end users want to use. This is what an average program looks like. This is almost 00:09:25.660 --> 00:09:29.269 the shortest hello world you can make, minus the actual hello world string. In 00:09:29.269 --> 00:09:34.870 fact, the hello world string might be the largest part of this binary. It's a pretty 00:09:34.870 --> 00:09:40.480 simple binary. Here we we're moving a pointer to the message we just set. We 00:09:40.480 --> 00:09:50.410 then set the AH register to 9, or hex 9. That's the syscall for printing a string, 00:09:50.410 --> 00:09:58.300 and then we run a software interrupt, 21h, which is short for 21 hex, and we continue on. 00:09:58.300 --> 00:10:06.589 We then set AH again, to 4C, which is exit with a return code, and the program 00:10:06.589 --> 00:10:12.439 will return. So, in the meantime, this is roughly the loop that just happened. 00:10:12.439 --> 00:10:18.470 You have your program code, that calls an interrupt and that gets passed over to the 00:10:18.470 --> 00:10:22.189 interrupt handler. In the process of doing this, the CPU has quickly looked at the 00:10:22.189 --> 00:10:28.430 first 100 bytes of memory in the interrupt vector table, IVT, as it's abbreviated, 00:10:28.430 --> 00:10:32.300 and then it's effectively a router. If anyone has written like a small piece of 00:10:32.300 --> 00:10:36.149 code to route HTTP requests, or anything, it's basically like that, but in the 80s, 00:10:36.149 --> 00:10:41.029 with syscalls. So it's just basically saying "Compare this, compare that, jump 00:10:41.029 --> 00:10:46.240 there, jump there." Then the thing gets passed to the call handler, it goes and 00:10:46.240 --> 00:10:49.740 does the syscall, the thing that was required. Normally it will leave some 00:10:49.740 --> 00:10:55.130 registers behind, a state, or results of actions it has performed, and it returns 00:10:55.130 --> 00:10:59.519 control back to the program. So, theoretically speaking, if we wanted to go 00:10:59.519 --> 00:11:04.199 and look at what a program actually does we need to set a break point here, because 00:11:04.199 --> 00:11:11.030 this is the only place that we can be sure the location exists, because this is way 00:11:11.030 --> 00:11:15.760 before the era of ASLR, address space randomisation, and this is way, way before 00:11:15.760 --> 00:11:19.819 the era of kernel space randomisation, in fact, MS DOS has almost no memory 00:11:19.819 --> 00:11:24.610 protection whatsoever. Once you run a program you are basically putting the full 00:11:24.610 --> 00:11:29.430 control of the system to that program, which means you can happily also boot 00:11:29.430 --> 00:11:33.870 things like Linux directly from a COM file, which is handy if you want to 00:11:33.870 --> 00:11:43.860 upgrade. So, if we look at certain files we can go and see what they do. So in this 00:11:43.860 --> 00:11:50.110 case, here is one example. This is a goat file. A goat file is like a sacrificial 00:11:50.110 --> 00:11:54.699 goat. It is a file that is purely designed to be infected. So what you do is you 00:11:54.699 --> 00:11:59.790 bring a virus into into memory in the system and then you run a goat file, in 00:11:59.790 --> 00:12:03.879 the vague hope that the virus will infect it, and then you have a nice clean sample 00:12:03.879 --> 00:12:08.450 of just that virus and not another program inside the virus, which makes it way 00:12:08.450 --> 00:12:12.079 easier to test and reverse engineer. So, we can see things are happening here. For 00:12:12.079 --> 00:12:16.600 example, we can see it opening a file, moving like where it's looking into the 00:12:16.600 --> 00:12:19.770 file, reading some data from the file, just 2 bytes, though, and it closes a 00:12:19.770 --> 00:12:23.839 file. We see the same sort of thing repeat itself, except at one point it reads a 00:12:23.839 --> 00:12:27.529 large amount of data, moves the file pointer, writes another large amount of 00:12:27.529 --> 00:12:32.769 data, does some more stuff, and yeah, we pass some filenames, we display a string, 00:12:32.769 --> 00:12:39.230 which is almost definitely the goat file message and yeah, we pretty much exit 00:12:39.230 --> 00:12:42.860 after that. So, there were a few syscalls here that we would really like to know 00:12:42.860 --> 00:12:48.790 more about. So, for that, it's the open files, we'd really like to know what files 00:12:48.790 --> 00:12:52.870 were being opened. We would also want to know what, we'd like to know, what data 00:12:52.870 --> 00:12:55.950 was being written to the file, rather than having to fish it out of the virtual 00:12:55.950 --> 00:13:00.550 machine later, and we'd also, just out of curiosity, really want to know what 00:13:00.550 --> 00:13:05.420 filenames it was asking MS-DOS to parse. Display string is also a nice test to 00:13:05.420 --> 00:13:08.519 know, whether your code is working. So to do this you're gonna have to look a little 00:13:08.519 --> 00:13:14.529 bit deeper into how the MS-DOS runtime and, by proxy, how x86 in 16-bit mode 00:13:14.529 --> 00:13:20.250 works, or legacy mode, I guess. This is basically all the registers you have in 00:13:20.250 --> 00:13:26.120 16-bit mode, and some nice computations at the bottom, to make it easier to read. 00:13:26.120 --> 00:13:33.550 So, as we mentioned, AH is the one that you use to specify, which syscall you want, 00:13:33.550 --> 00:13:40.339 and you'll notice it's not there. AH is actually the upper half of AX. AH is a 00:13:40.339 --> 00:13:46.320 8-bit register, because sometimes people really just wanted only 8 bits. It's very 00:13:46.320 --> 00:13:53.579 obscure that we were saving that much space. And so, this is what a, this is the 00:13:53.579 --> 00:13:57.660 definition of the syscall of a print string. So you have AH needs to be set to 00:13:57.660 --> 00:14:02.839 9, this is once you, in order to call the syscall for printing string, you set AH to 00:14:02.839 --> 00:14:09.070 9, and then you need to set DS and DX to a pointer to a string that ends in a dollar. 00:14:09.070 --> 00:14:11.890 And that doesn't make a lot of sense, or it didn't make a lot of sense to me, when 00:14:11.890 --> 00:14:15.579 I first read that and so, to do this, we need to learn a little bit more about 00:14:15.579 --> 00:14:19.730 how memory works, on these old CPUs, or the CPUs that are probably in your 00:14:19.730 --> 00:14:25.720 laptops, but running in an older mode. So this is effectively what it looks like. 00:14:25.720 --> 00:14:31.839 They have a 16-bit CPU, 2 to the 16 is 64 kilobytes, and we have a 20-bit memory 00:14:31.839 --> 00:14:36.350 addressing space. 2 to 20 is 1 megabyte, so if you ever see an MS-DOS machine like 00:14:36.350 --> 00:14:39.519 limiting at 1 megabyte, or some old operating system, saying like the maximum 00:14:39.519 --> 00:14:43.980 memory you can have is 1 megabyte, it's because it's running in 16 bit mode. And 00:14:43.980 --> 00:14:50.249 the maximum it can physically see is 20 bits. So the question is: How do we 00:14:50.249 --> 00:14:58.580 address anything above 64K? If the CPU can only fundamentally see 16 bits. So, this 00:14:58.580 --> 00:15:02.399 is where segment registers come in. We have 4 segment registers, actually we 00:15:02.399 --> 00:15:05.899 might have more, but they're the ones who need to care about. There's the code 00:15:05.899 --> 00:15:10.819 segment, the data segment, the stack segment and the extra segment, in case you 00:15:10.819 --> 00:15:15.420 need just another one. So anyway, with that in mind, let's have a quick crash 00:15:15.420 --> 00:15:21.419 course on segment registers. So, imagine if you have a very long piece of memory, 00:15:21.419 --> 00:15:30.430 and we can only see 16 bits at a time. So, however, we can move the sliding window 00:15:30.430 --> 00:15:36.180 around in the memory, to go and see, like, to move our view of where it is. So, we 00:15:36.180 --> 00:15:42.410 can do this and put data around the system, and we can use the final pointer 00:15:42.410 --> 00:15:48.589 to specify, how far in to the memory segment we should go. So the DS and DX 00:15:48.589 --> 00:15:55.360 really just means a multiplier. So, where the data segment is 100, you need to just 00:15:55.360 --> 00:16:01.350 move 100 times 16 to get to the correct place in memory, and then DX is the 00:16:01.350 --> 00:16:09.170 offset. This continues on, so, where we have a 16 bit cpu, we have a bunch of 00:16:09.170 --> 00:16:13.220 general use registers or general purpose registers. They're quite useful for 00:16:13.220 --> 00:16:17.379 ensuring, you don't need to touch RAM too often. x86 actually has a fairly small 00:16:17.379 --> 00:16:25.240 amount of general purpose registers. Some architectures have way more. I think more 00:16:25.240 --> 00:16:32.139 modern chips like GPUs have hundreds, well hundreds, maybe thousands. However, this 00:16:32.139 --> 00:16:34.699 doesn't really change over time in x86 because we have to force backwards 00:16:34.699 --> 00:16:38.139 compatibility. So, really what actually ends up happening, when we move up the 00:16:38.139 --> 00:16:42.709 bittage, is that the same registers just get wider, and we add some more ones for 00:16:42.709 --> 00:16:45.499 the programmers, that want them, and the exact same thing happened to 64 bit: The 00:16:45.499 --> 00:16:52.970 registers just got wider. So thinking about it, we have a lot of malware now, 00:16:52.970 --> 00:16:58.319 what if we want to know everything that's happened in this entire archive. So we 00:16:58.319 --> 00:17:01.420 kind of want to trace all of these automatically, but we might not know what 00:17:01.420 --> 00:17:04.480 we're looking for, so let's go through the checklist of what we need to do, to trace 00:17:04.480 --> 00:17:09.335 all of this malware. We need to break point on the syscall handler. When we get 00:17:09.335 --> 00:17:13.260 that breakpoint, we need to save all the registers, so we know which syscall was 00:17:13.260 --> 00:17:19.880 run and potentially what data is being given to the syscall. Ideally, we're going 00:17:19.880 --> 00:17:25.130 to save one hundred bytes from that data pointer, not especially because we need 00:17:25.130 --> 00:17:28.149 it, but it's quite handy in a lot of registers in a lot of syscalls. It's for 00:17:28.149 --> 00:17:34.429 example what you use to get the open file path, when you're opening files. We should 00:17:34.429 --> 00:17:37.649 also, probably, record the screen for quick analysis, rather than just staring 00:17:37.649 --> 00:17:43.870 at HTML tables, and so we can do that, we burn a lot of CPU time and probably cause 00:17:43.870 --> 00:17:51.120 some minor amounts of environmental damage. And we get nothing. We just run a 00:17:51.120 --> 00:17:55.080 bunch of stuff and most of them don't return anything. At best they return a 00:17:55.080 --> 00:18:02.770 goat file string. They just do nothing. So, if we look deeper into the reason why, 00:18:02.770 --> 00:18:05.490 it's sort of a smoking gun here, so we can see the syscalls that run on this file 00:18:05.490 --> 00:18:09.840 that does nothing, and the smoking gun here is the date. So it's asking for the 00:18:09.840 --> 00:18:15.190 date from the system, and this sort of flags out the first issue, is that a lot 00:18:15.190 --> 00:18:18.750 of MS-DOS viruses don't really have a lot to go on, because they have no internet 00:18:18.750 --> 00:18:24.180 connection, and there's not really any other state they can decide to activate on. 00:18:24.180 --> 00:18:28.600 So the date syscall is pretty simple. The get date and get time just return all 00:18:28.600 --> 00:18:34.360 of their values as registers. And, you know, some using the 8-bit halves, to save 00:18:34.360 --> 00:18:44.970 space. So, a naive way of doing this, is what we do, is we would run the sample, 00:18:44.970 --> 00:18:50.030 we'd wait for the syscall for date or time, we would just fiddle the values, 00:18:50.030 --> 00:18:53.240 'cause in this case we're using a debugger, so we can automatically change, what the 00:18:53.240 --> 00:18:56.760 state registers are, and we can then observe to see, if any of the syscalls 00:18:56.760 --> 00:18:59.580 that the program ran changed, which is a pretty good indication that you've hit 00:18:59.580 --> 00:19:04.330 some behavior that is different. And then, you know, we can say "Hooray, we found a 00:19:04.330 --> 00:19:08.330 new test case!" The downside is: running every one of these samples takes 15 00:19:08.330 --> 00:19:13.940 seconds of CPU-time because MS-DOS, well, 15 seconds of wall-time, which, 00:19:13.940 --> 00:19:18.080 when you are emulating MS-DOS is 15 seconds of CPU-time because of the fact 00:19:18.080 --> 00:19:20.610 that MS-DOS doesn't have power saving mode, so when it's not doing anything, it 00:19:20.610 --> 00:19:27.120 just goes into a busy loop which makes it very hard to optimize. Or we could take a 00:19:27.120 --> 00:19:33.350 cleverer look. So when we think about it, we are in the interrupt handler where all 00:19:33.350 --> 00:19:36.830 we ever see is the insides of the interrupt handler because we don't know 00:19:36.830 --> 00:19:40.990 where the program code is. The interrupt handler is the only place that we know is 00:19:40.990 --> 00:19:45.450 consistent because MS-DOS could potentially load the code for the malware 00:19:45.450 --> 00:19:50.610 or the program anywhere. But we want to know where the code is. It would be really 00:19:50.610 --> 00:19:54.250 handy to know what the code is that we'd be about to run. So for this we need to 00:19:54.250 --> 00:19:59.190 look towards the stack. Just like the DSN DX registers the stacks are located on a 00:19:59.190 --> 00:20:02.970 stack segment, on a stack pointer. Luckily, the first two values is the 00:20:02.970 --> 00:20:07.130 interrupt, the interrupt pointer in the stack segment so we can use that to grab 00:20:07.130 --> 00:20:10.779 exactly where, what the code will be run afterwards. So we just need to add a few 00:20:10.779 --> 00:20:14.440 things to our checklist. We need to grab 4 bytes from the stack pointer and then 00:20:14.440 --> 00:20:18.370 using that, we can calculate the destination that the syscall will return 00:20:18.370 --> 00:20:22.549 to. And if we look at some of them - we can look at an example here - well, this 00:20:22.549 --> 00:20:27.243 is what a piece of what one of the calls returns to us. So we see we running a compare 00:20:27.243 --> 00:20:36.640 on DL against the HEX of 0x1E. And then if that comparison is equal it will 00:20:36.640 --> 00:20:43.171 jump to 1 memory address. And if not it will jump to another. So if we look back 00:20:43.171 --> 00:20:52.560 at the definition of those syscalls we can see that DL is the day. So with this we 00:20:52.560 --> 00:21:01.150 can conclude that D if 0x1e is 30 and DL is the day this malware effectively is 00:21:01.150 --> 00:21:07.120 saying if the day of month is 30 we need to go down a different path. If we run 00:21:07.120 --> 00:21:11.950 these all over time across the whole dataset what we see is roughly this as a 00:21:11.950 --> 00:21:21.740 polydome bar chart. We see out of the 17.500 samples we have around 4.700 of them 00:21:21.740 --> 00:21:24.330 checked for the date and time and these are the ones that are really tricky 00:21:24.330 --> 00:21:27.590 because they're really hard to activate. They're also the most interesting though, because 00:21:27.590 --> 00:21:33.900 those are the ones trying to hide. So, with that in mind, we need to, we have the code 00:21:33.900 --> 00:21:38.100 segment that we're about to run, when we return and we can't really brute force 00:21:38.100 --> 00:21:43.730 because it takes a little CPU-time and we can't brute force it inside a 'real' or 00:21:43.730 --> 00:21:47.419 emulated machine but we can brute force it in a significantly more interesting way. 00:21:47.419 --> 00:21:53.960 We need to build something: we need to build the world's worst x86 emulator so 00:21:53.960 --> 00:22:02.019 dubbed BenX86, it's 16-bit only. Any attempt to access memory effectively ends 00:22:02.019 --> 00:22:06.029 the simulation. It's got a fake stack if you try and push something onto the stack 00:22:06.029 --> 00:22:09.640 it says sure, fine if you try and pop it it's like oh actually I never held any of 00:22:09.640 --> 00:22:13.690 that data anyway so we are ending the simulation. 80 opcodes, most of them are 00:22:13.690 --> 00:22:18.900 jumps. Because that's the primary purposes, comparing and jumps. The 00:22:18.900 --> 00:22:23.630 difference is it logs every opcode every address that it went trough and it can be 00:22:23.630 --> 00:22:29.210 run with just a small x86 code segment and a register snapshot. This means that we 00:22:29.210 --> 00:22:34.909 can test old age from 1980 to 2005 and are roughly about 100 milliseconds and most 00:22:34.909 --> 00:22:40.860 programs ended up having just 3 different code paths on average so that yields us 00:22:40.860 --> 00:22:48.019 with 17.000 virus samples and about 10.000 of samples that had date variations as in: 00:22:48.019 --> 00:22:53.539 Once you exploit the complexity. So I'm going to now use my final remaining time 00:22:53.539 --> 00:22:59.769 to go through some of my favorites. So this is an example of a virus that just 00:22:59.769 --> 00:23:04.440 doesn't do anything on the 1st of 1980. However if you'd happen to be running this 00:23:04.440 --> 00:23:08.477 on New Year's Day you would get this. Laughter 00:23:08.477 --> 00:23:10.610 No matter what you do, every program you can't 00:23:10.610 --> 00:23:14.940 exit out of this, your machine is hung. This might be great, right? You might be like: 00:23:14.940 --> 00:23:19.040 'Oh cool, I don't need to do work anymore because my computer will literally not let me' 00:23:19.040 --> 00:23:21.049 This also might be terrible, because you might need to do some work on New 00:23:21.049 --> 00:23:28.100 Year's day. Here's another example. This does nothing as well just another innocent 00:23:28.100 --> 00:23:33.600 .com file. Of course reminding these pieces of malware will be wrapped around 00:23:33.600 --> 00:23:37.620 something else. Almost anything could be infected in here. In this case though 00:23:37.620 --> 00:23:46.880 these binary is a nice and shaped down. However instead we get this, which I think 00:23:46.880 --> 00:23:53.564 is super interesting and is basically the author is aware - they're telling you they 00:23:53.564 --> 00:23:57.110 are actually like self disclosing in saying the previous year I've infected 00:23:57.110 --> 00:24:04.800 your computer. And for some reason it's being nice. They're just saying. Actually 00:24:04.800 --> 00:24:11.580 you have been infected. And as a - I guess a pity - I'm just going to remove myself now. 00:24:11.580 --> 00:24:17.120 I don't really. For some reason it's also encouraging you to buy McAfee. This is 00:24:17.120 --> 00:24:26.179 back in the day when John McAfee himself actually wrote McAfee. Interesting times. 00:24:26.179 --> 00:24:33.059 Definitely interesting times. Here is another example. This one I found 00:24:33.059 --> 00:24:41.450 particularly obscure. On the 8th of November 1980 or any year I think actually 00:24:41.450 --> 00:24:51.110 it turns all zeroes on the system into tiny little glyphs that say "hate" if 00:24:51.110 --> 00:24:54.760 anyone understands this I'd really like to know like I've been thinking about this a 00:24:54.760 --> 00:25:01.950 lot. What does it mean? Is it an artistic statement? Is it. I wish I knew. 00:25:01.950 --> 00:25:05.669 Someone in the audience: it says MATE Ben: There could be a CCC variant says 00:25:05.669 --> 00:25:12.630 MATE. Another good one in that it's the last thing I ever want to see any program 00:25:12.630 --> 00:25:19.669 tell me is this one here where you run it and it says "error eating drive C:". I 00:25:19.669 --> 00:25:25.070 never ever want an error in any program unexpectedly just says 'Sorry almost I 00:25:25.070 --> 00:25:30.159 failed to remove you root file system, don't know why, could you like change your 00:25:30.159 --> 00:25:35.940 settings so I can remove it?' Cheers. And finally this is one of my absolute 00:25:35.940 --> 00:25:41.420 favorites in that it's just brilliant in that it also stops you from running the 00:25:41.420 --> 00:25:46.490 program you want to run it exits prematurely. This is the virus version of 00:25:46.490 --> 00:25:50.607 the Navy SEAL copy pasta. Says "I am an assassin. I want to and I shall kill you." 00:25:50.607 --> 00:25:59.809 "I also hate Aladdin and I also will kill it. I will eliminate you with ...". You know where 00:25:59.809 --> 00:26:04.880 this is going. It says fear the virus that is more powerful than God. 00:26:04.880 --> 00:26:10.830 It only activates on one day though, so it's fine. Thank you for your time. I know 00:26:10.830 --> 00:26:15.480 it's late and I will happily take any questions or corrections if you know this 00:26:15.480 --> 00:26:27.029 topic better than me. applause 00:26:27.029 --> 00:26:33.410 Herald: This totally brings tears to my eyes with nostalgia. So if there is any 00:26:33.410 --> 00:26:37.970 questions, we have microphones distributed around the room, there is like 1,2, 3, 4 and 00:26:37.970 --> 00:26:42.630 one in the back. We also have questions perhaps from the internet if you want to 00:26:42.630 --> 00:26:47.980 ask a question come up to the microphone ask the question just as a reminder a 00:26:47.980 --> 00:26:53.789 question is one or two sentences with a question mark behind it and not a life 00:26:53.789 --> 00:27:00.840 story attached. So let's see what we have. I'm going to start with microphone number 00:27:00.840 --> 00:27:04.470 1 just because I can see it easiest, let's go for it. 00:27:04.470 --> 00:27:09.559 Microphone 1: Hi Ben, thanks for the talk. Really interesting. My question would be 00:27:09.559 --> 00:27:16.297 did you do any analysis on what ratio of the viruses was more artistic 00:27:16.297 --> 00:27:20.690 and which one actually did damage. Ben: So most of them surprisingly don't do 00:27:20.690 --> 00:27:26.450 damage. I actually really struggled to find a date varying sample that 00:27:26.450 --> 00:27:30.140 specifically activated on a certain day and decided to delete every file. There 00:27:30.140 --> 00:27:35.259 are some very good ones in some of them are like virus scanning utilities that just 00:27:35.259 --> 00:27:37.990 don't do anything on certain dates and in one day like while they're telling you all 00:27:37.990 --> 00:27:41.120 the files they are scanning is actually telling you all the files they're 00:27:41.120 --> 00:27:46.120 deleting. So that's particularly cruel but it's actually surprisingly hard to find a 00:27:46.120 --> 00:27:50.480 virus sample that actually was brutally malicious. There was some, that would just, 00:27:50.480 --> 00:27:53.910 you know, infect binaries is but it's very hard to find one that I think was brutally 00:27:53.910 --> 00:27:58.100 malicious, which is a far cry from the days well from the days that we live in right 00:27:58.100 --> 00:28:03.549 now, where we're taking down hospitals with windows bugs. 00:28:03.549 --> 00:28:09.210 Herald: as everybody is leaving the room. Please do it quietly. I see a question at 00:28:09.210 --> 00:28:12.200 (microphone) 3, on that side. Microphone 3: Yes. Since a lot of 00:28:12.200 --> 00:28:19.970 industrial control systems still run DOS. What's the threat from DOS malware that 00:28:19.970 --> 00:28:27.150 might be written today. Ben: It's probably unlikely than an 00:28:27.150 --> 00:28:31.009 Industrial Control System that's running DOS, would come into contact with DOS-malware. 00:28:31.009 --> 00:28:36.010 The only way I can think is if one vendor was like or a factory or supply or 00:28:36.010 --> 00:28:41.049 whatever it was basically downloading all basically wares onto industrial control 00:28:41.049 --> 00:28:47.419 boxes. I wouldn't be surprised but it would be pretty irresponsible. But it 00:28:47.419 --> 00:28:52.510 would be quite surprising to find MS-DOS malware today on industrial controllers 00:28:52.510 --> 00:28:57.110 that was installed recently and not just a lingering infection from the last 20 00:28:57.110 --> 00:29:00.029 years. Herald: Microphone 2 00:29:00.029 --> 00:29:05.000 Microphone 2: Did you find any conditions that weren't date based. Some of them do 00:29:05.000 --> 00:29:09.610 attempt to some of them try and circumvent the date recognition. Unfortunately it's 00:29:09.610 --> 00:29:12.809 very hard to brute force those. Some of them install themselves as what's called 00:29:12.809 --> 00:29:19.710 TSR or Terminate and Stay Resident which basically means that they will exit out, 00:29:19.710 --> 00:29:23.750 run in the background and continuously ask the actual system time what time it is. 00:29:23.750 --> 00:29:27.639 It's a bit of a more risky strategy because the system timer might not exist 00:29:27.639 --> 00:29:31.650 which would be unfortunate for the virus. So definitely there are viruses that have 00:29:31.650 --> 00:29:38.340 way more complicated execution conditions. I observed one sample that only activated 00:29:38.340 --> 00:29:43.850 after I believe it was something silly like 100 keypresses which is very hard to 00:29:43.850 --> 00:29:49.770 automatically test. Those sort of viruses require static analysis and statically 00:29:49.770 --> 00:29:54.480 analyzing 17.000 samples is a time consuming task. 00:29:54.480 --> 00:30:02.009 Herald: So we have a question from the Internet. Signal Angel: Do you have the source? What 00:30:02.009 --> 00:30:07.990 is the source of the malware that you analyzed here, is it published somewhere? 00:30:07.990 --> 00:30:13.400 Ben:You can still find dump's of VX heavens, and more modern dumps of VX 00:30:13.400 --> 00:30:17.990 heavens on popular torrent websites. But I'm sure there are also copies 00:30:17.990 --> 00:30:21.399 floating about on non-popular torrent websites. 00:30:21.399 --> 00:30:24.810 Laughter Herald: Over to microphone 1. 00:30:24.810 --> 00:30:32.240 Microphone 1: Hi Ben. I'm Jope. Thank you for your talk. I was wondering: did you 00:30:32.240 --> 00:30:36.639 learn anything from your studies of these viruses that should be taught in modern 00:30:36.639 --> 00:30:42.820 day computer science classes like more efficient sorting algorithm or some hidden 00:30:42.820 --> 00:30:47.080 gem that actually should be part of computing these days. 00:30:47.080 --> 00:30:53.570 Ben: My primary takeaway was x86 was a mistake. 00:30:53.570 --> 00:31:01.320 Laughter & applause Herald: So I'm not seeing any more 00:31:01.320 --> 00:31:04.480 questions. Oh no there is. OK one more question from the internet. 00:31:04.480 --> 00:31:11.389 Signal angel: Have you found malware samples that did like try to detect dummy 00:31:11.389 --> 00:31:14.617 binaries or whatever, to avoid easy analysis? 00:31:14.617 --> 00:31:20.007 Ben: Oh actually, that's a really good question. So it is it's complicated: 00:31:20.007 --> 00:31:24.580 So some viruses would so, maybe let's be 00:31:25.027 --> 00:31:29.770 dangerous let's try and go backwards on my home written presentation software. So 00:31:29.770 --> 00:31:41.160 humming Too many slides. I have regrets. Yes. OK. Here we are. This slide. 00:31:41.160 --> 00:31:45.450 OK. So you know here I'm saying that the malware infection goes to the end. Well 00:31:45.450 --> 00:31:49.850 some samples are really cool. They don't change the size of the file. They just 00:31:49.850 --> 00:31:54.590 find areas in the files that are full of null bites and just say this is probably 00:31:54.590 --> 00:32:00.230 fine. I'm just going to put myself here which may have unintended consequences. It 00:32:00.230 --> 00:32:04.960 may mean if a program is like a statically typed, statically defined byte array of 00:32:04.960 --> 00:32:10.039 like a certain size and the program is relying on it being zeros when it accesses 00:32:10.039 --> 00:32:14.440 it for the first time it may get very surprised to find some malware code in 00:32:14.440 --> 00:32:20.159 there. But generally speaking as far as I'm aware, this deployment 00:32:20.159 --> 00:32:26.220 procedure works pretty well and actually is very good at avoiding antivirus of the 00:32:26.220 --> 00:32:30.390 era which would just be checking like common system files and its size. And you 00:32:30.390 --> 00:32:35.059 know the size increases of COMMAND.COM then that's clearly bad news. 00:32:35.059 --> 00:32:38.450 Herald: We have a question on microphone 1. 00:32:38.450 --> 00:32:45.620 Microphone 1: Are there any viruses that try to eliminate or manipulate virus 00:32:45.620 --> 00:32:48.970 scanners of the day. Oh yeah. So a lot of the samples will 00:32:48.970 --> 00:32:52.960 actively go and look for files of other anti-viruses. 00:32:52.960 --> 00:32:57.159 But I am generally under the impression that it's kind of hard to find them. They 00:32:57.159 --> 00:33:01.750 weren't actually that many antivirus products back in the day. 00:33:01.750 --> 00:33:06.410 I feel like, it was a bit of a niche thing to be running. Microsoft did for a while ship 00:33:06.410 --> 00:33:14.330 their own antivirus with MS-DOS. So I guess you know what's new is old. So there 00:33:14.330 --> 00:33:17.860 were antiviruses out there. I don't think many of them were very effective. 00:33:17.860 --> 00:33:27.260 Herald: Any more questions? There, where? Oh right. Another one from the Internet. 00:33:27.260 --> 00:33:32.049 It's interesting that the internet is querying MS-DOS all the time. Go ahead. 00:33:32.049 --> 00:33:38.000 Signal angel: Did you do the diagrams by hand or do you have a tool? 00:33:38.000 --> 00:33:42.559 Ben: So many hours. No. So there's a couple of good tools to do it. 00:33:42.559 --> 00:33:46.429 asciiflow.org. I think is a fantastic tool. I would highly recommend it. I think 00:33:46.429 --> 00:33:52.779 it's not maintained very well, though. Herald: microphone 1. 00:33:52.779 --> 00:33:55.519 Microphone 1: Are you publishing the tools you wrote? 00:33:55.519 --> 00:34:02.429 Ben: I will be publishing the tools at some point when they are less... when they 00:34:02.429 --> 00:34:08.320 are less ugly. I will be publishing all of the automatic malware runs and the gifs 00:34:08.320 --> 00:34:12.929 generated by them so that people can easily search google for the virus names 00:34:12.929 --> 00:34:16.890 and get like actual real time versions. The hardest thing that I've found is when 00:34:16.890 --> 00:34:21.710 looking at virus names was literally just finding any information about them and one 00:34:21.710 --> 00:34:25.220 of the things I really wish existed at the time of writing this talk, was being able 00:34:25.220 --> 00:34:29.580 to just query a name and be like oh yeah this virus it looks like it does this. 00:34:29.580 --> 00:34:33.420 Herald: since I saw microphone 1 first let's go with that. 00:34:33.420 --> 00:34:40.260 Microphone 1: Did you find any viruses that had signage in them not signage of 00:34:40.260 --> 00:34:43.520 today but the name of the author. Like he was very proud of what he wrote. 00:34:43.520 --> 00:34:47.450 Ben: Yeah, there are some notable examples. Quite a few of them will try and 00:34:47.450 --> 00:34:52.870 name - so DOS-viruses do like have [incomprehensible] sample names in the same way 00:34:52.870 --> 00:34:57.470 that we'd still today give viruses names. A lot of the time you will just encode a 00:34:57.470 --> 00:35:01.131 string that you want the virus to be named, you know, somewhere in the file 00:35:01.131 --> 00:35:04.472 just a random string doing nothing. It's like oh, ok, they clearly wanted the virus 00:35:04.472 --> 00:35:11.430 to be called Tempest. So that does happen. One of the favorite examples is the brain 00:35:11.430 --> 00:35:16.750 malware which literally encodes an address and phone number of the author. I believe 00:35:16.750 --> 00:35:22.720 in Pakistan and there's a fantastic mini documentary by F-Secure where they go and 00:35:22.720 --> 00:35:25.850 visit the people who wrote it. It's a super interesting watch and I would really 00:35:25.850 --> 00:35:29.990 recommend it. Herald: Indeed it is. Microphone 2? 00:35:29.990 --> 00:35:36.260 Microphone 2: Did you have any chance to look at any kind of viruses that did not 00:35:36.260 --> 00:35:42.330 modify the files themselves. For example one of the largest virus infections at the time was a 00:35:42.330 --> 00:35:46.080 virus called [incomprehensible] which modified the master boot record 00:35:46.080 --> 00:35:51.060 Ben: Yes, Master boot record, I did consider. It was more of a time problem 00:35:51.060 --> 00:35:55.320 that I had in getting to the point where you could brute force time and date 00:35:55.320 --> 00:36:01.020 combinations and looking for master boot record changes. It was really hard. I am 00:36:01.020 --> 00:36:06.610 super interested in reviewing a fact to be the root kits of the era. But yes that's 00:36:06.610 --> 00:36:10.220 definitely something I will look into in the future. 00:36:10.220 --> 00:36:14.410 Herald: And we have yet another question from the Internet. 00:36:14.410 --> 00:36:17.400 Signal angel: And it's even from the same guy. 00:36:17.400 --> 00:36:22.830 Ben: Oh damn. Signal angel: is the BenX86 software open- 00:36:22.830 --> 00:36:25.530 source or can be found on the web somewhere. 00:36:25.530 --> 00:36:29.870 Ben: It probably will be. I wouldn't expect it to work in, well, in any use-case 00:36:29.870 --> 00:36:36.360 though. It's effectively designed to like not work correctly, right? Like what 00:36:36.360 --> 00:36:40.880 was the spec? It basically like fails at every single thing awkward. I just went 00:36:40.880 --> 00:36:46.660 like oh that's fine. We're probably far enough down there anyway. Are we? Be aware 00:36:46.660 --> 00:36:50.740 this is the feature list. Herald: So is that a follow up question 00:36:50.740 --> 00:36:57.010 from the internet? Signal angel: No it's a new one. I don't 00:36:57.010 --> 00:37:02.660 know how serious it is but would it be possible or a good idea to use machine 00:37:02.660 --> 00:37:09.500 learning to create new DOS malware from the existing samples. 00:37:09.500 --> 00:37:17.021 Laughter & applause Ben: It would not be a good idea. But I 00:37:17.021 --> 00:37:24.230 like how you think. Herald: Actually I saw somebody trying to 00:37:24.230 --> 00:37:27.640 use NLP to generate viruses but ok that's enough for now. 00:37:27.640 --> 00:37:32.400 Ben: you could probably do Markov Chains with x86 to be honest. Please don't do 00:37:32.400 --> 00:37:34.530 that, please! Herald: Don't try this at home. 00:37:34.530 --> 00:37:37.480 Ben: I have seen things I've seen. Just please don't do that. 00:37:37.480 --> 00:37:43.461 Herald: So I think we've run out of questions. Going once, going twice. Let's 00:37:43.461 --> 00:37:49.520 thank Ben for this marvelous retrospective talk. Big applause 00:37:49.520 --> 00:37:58.785 36C3 postroll music 00:37:58.785 --> 00:38:12.000 subtitles created by c3subtitles.de in the year 2020. Join, and help us!