WEBVTT 00:00:00.110 --> 00:00:14.320 music 00:00:14.320 --> 00:00:18.810 applause 00:00:18.810 --> 00:00:23.560 Raichoo: Yeah sorry about that - beamers or projectors, I don't like them. They 00:00:23.560 --> 00:00:27.210 don't like me either. So this is a little heads up - this is going to be the only 00:00:27.210 --> 00:00:32.049 slide I'm going to show you today so, "slide", because I think doing stuff like 00:00:32.049 --> 00:00:35.940 that in a terminal might be a little bit more interesting for you. But sadly 00:00:35.940 --> 00:00:40.020 something is getting cut off so I we have to improvise a little bit. But anyway, so 00:00:40.020 --> 00:00:43.960 today I will be able to talk about two of my favorite things right now which are 00:00:43.960 --> 00:00:47.820 FreeBSD and DTrace. But this talk has been capped down to 30 minutes so we'll be 00:00:47.820 --> 00:00:53.190 focusing a little more on the DTrace part. So there will be a little bit less BSD 00:00:53.190 --> 00:00:57.560 than I anticipated. And also adjusted everything a little bit to fit better into 00:00:57.560 --> 00:01:03.130 the resilience track so hopefully you'll enjoy that. So before we begin, who here 00:01:03.130 --> 00:01:08.640 is actually using DTrace? Okay more than I expected but still not as many as I 00:01:08.640 --> 00:01:12.610 would like to see. So hopefully after this talk you will think, "oh, this is a really 00:01:12.610 --> 00:01:17.170 awesome tool, I gotta learn it." Because I totally love it - it changed the way I do 00:01:17.170 --> 00:01:22.110 a lot of stuff. So for those of you who do not know what DTrace is, first, let me 00:01:22.110 --> 00:01:27.260 fill you in on this stuff. So it's open source, it originated on Solaris, and been 00:01:27.260 --> 00:01:31.640 developed currently on illumos which is a fork from OpenSolaris. It has been ported 00:01:31.640 --> 00:01:37.930 to FreeBSD, NetBSD, OS X, there's also a port for Linux called next called DTrace 00:01:37.930 --> 00:01:43.689 for Linux. I think it's done by a person called Paul Fox. It's been ported to QNX 00:01:43.689 --> 00:01:49.810 and the OpenBSD folks are currently doing some work to get the technology like 00:01:49.810 --> 00:01:54.040 DTrace on their system. And I think there's a port for Windows? I don't know 00:01:54.040 --> 00:01:57.869 if this is actually true, but it is it's kind of cool because then that means it's 00:01:57.869 --> 00:02:04.650 basically everywhere. So, most of you would probably know static tools like 00:02:04.650 --> 00:02:09.470 strace. We have a very similar tool on FreeBSD that is called truss, and what 00:02:09.470 --> 00:02:14.500 truss and strace are doing is - you can attach them to a process and look at the 00:02:14.500 --> 00:02:18.650 system calls that this process is emitting. So in case something is going 00:02:18.650 --> 00:02:23.319 wrong you can well look inside of the program, which can be kind of useful when 00:02:23.319 --> 00:02:28.870 you're trying to find a problem. It's kind of handy but it's also pretty 00:02:28.870 --> 00:02:32.890 limited. Because first of all it really really slows down the process that you're 00:02:32.890 --> 00:02:37.250 currently looking at. So if you want to debug a performance issue, you're pretty 00:02:37.250 --> 00:02:42.170 much out of luck there. And also it's kind of like, narrow down - you can just look 00:02:42.170 --> 00:02:47.940 at one process. Which is also like bad thing because the system that we currently 00:02:47.940 --> 00:02:52.660 have - all these systems are very complex: we have a lot of layers. You have 00:02:52.660 --> 00:02:56.300 virtual file systems, you have virtual memory, you have network, you have 00:02:56.300 --> 00:03:00.500 databases, processes communicating with each other. And in case you are using a 00:03:00.500 --> 00:03:04.710 high-level programming language, you might also have a runtime system. So it's a 00:03:04.710 --> 00:03:09.519 little operating system on top of your operating system. So when something goes 00:03:09.519 --> 00:03:15.000 wrong in a system that has such large complexity, something happens that we call 00:03:15.000 --> 00:03:19.850 the blame game. And the blame game - it's never your fault, it's always someone 00:03:19.850 --> 00:03:25.710 else's. So what we want to be able to do is we want to look at the system as a 00:03:25.710 --> 00:03:30.349 whole, so we can correlate all the data and come up with some meaningful answers 00:03:30.349 --> 00:03:34.506 when something is really going wrong in there. And also, we don't want to 00:03:34.506 --> 00:03:39.260 switch out all the processes for debug processes to make that happen, 00:03:39.260 --> 00:03:44.969 because as these things are all -- every problem happens in production. It never 00:03:44.969 --> 00:03:48.470 happens on the development box. So like, switching out all the processes - that's 00:03:48.470 --> 00:03:55.030 totally out of the picture. So to do that in an arbitrary way, to like, instrument 00:03:55.030 --> 00:03:59.910 the system in an arbitrary way, we sort of need like a programming language. So, we 00:03:59.910 --> 00:04:03.640 need to describe - when that happens, please submit data so I can see what's 00:04:03.640 --> 00:04:09.489 going on. So this kind of implies a programming language. And DTrace comes 00:04:09.489 --> 00:04:13.670 with such a programming language - it's a little bit reminiscent of awk cross with 00:04:13.670 --> 00:04:18.798 C? It's pretty simple to learn - you can pick it up 20 up to pick it up in 20 00:04:18.798 --> 00:04:25.199 minutes and you can start churning out your first DTrace scripts. So like awk, if 00:04:25.199 --> 00:04:30.559 you know awk, awk can be used to analyze large bodies of text. Dtrace is pretty 00:04:30.559 --> 00:04:34.749 much the same, but for system behavior - so a little bit mind boggling, but 00:04:34.749 --> 00:04:40.069 probably I can show you what I mean by that. And also, as a bonus we don't want 00:04:40.069 --> 00:04:43.860 to slow down the system, so we want to be able to do things like performance 00:04:43.860 --> 00:04:52.300 debugging, performance tests like that. So I've prepared this little demo here, and. 00:04:52.300 --> 00:04:58.780 So since we had some issues here probably this is not -- I have to play around a 00:04:58.780 --> 00:05:04.249 little bit. So what I'm going to do is I'm going to look at a very very naive way 00:05:04.249 --> 00:05:18.009 to -- excuse me for a second -- very naive way to -- give me a second -- so very 00:05:18.009 --> 00:05:21.960 naive way to authenticate a user. And there's a lot of stuff wrong with this 00:05:21.960 --> 00:05:26.030 code, but like what we're going to do is we're going to take a user string as 00:05:26.030 --> 00:05:32.740 input, and then we're going to just compare it to another, to a secret. So I 00:05:32.740 --> 00:05:36.420 know, the the secret in here is like in plain text I know this is a problem, but 00:05:36.420 --> 00:05:41.639 this is a little bit artificial. But I just want to get my point across. So from 00:05:41.639 --> 00:05:47.159 an algorithmic perspective, this check function is correct: so we take a string 00:05:47.159 --> 00:05:52.449 we take another string and we compare them. So everything's fine and easy. So if 00:05:52.449 --> 00:05:58.599 you look at the way string compare works and what it does, it's essentially 00:05:58.599 --> 00:06:04.449 taking these two strings and it's comparing every character bit by bit. So 00:06:04.449 --> 00:06:10.729 when it finds the first pair of characters that do not match up, it's going to stop. 00:06:10.729 --> 00:06:17.879 So we can we can conclude something about from that - so if it takes very short if 00:06:17.879 --> 00:06:23.399 if this function this check function takes a very short amount of time, then, what 00:06:23.399 --> 00:06:29.129 will happen is it will terminate earlier. And if our password guess is better, it 00:06:29.129 --> 00:06:34.479 will take well, it will take longer. And if we can measure that we can basically 00:06:34.479 --> 00:06:40.809 extract information from that running algorithm. So I wrote a little driver 00:06:40.809 --> 00:06:47.449 program in Haskell that basically just iterates over an alphabet and just feeds 00:06:47.449 --> 00:06:53.379 this one letter into that program, And I'm going to use DTrace to get some 00:06:53.379 --> 00:06:59.020 timing information. So let me start the driver. So this is now just running in the 00:06:59.020 --> 00:07:04.919 background. And you cannot see what I'm typing there, but don't worry - these 00:07:04.919 --> 00:07:12.240 scripts will all be; I will push them on my github. So DTrace now produces this 00:07:12.240 --> 00:07:17.240 nice little distribution. So if you if you were if you were able to see the entire 00:07:17.240 --> 00:07:22.949 alphabet, you would see that everything except "D" behaves differently. So if you 00:07:22.949 --> 00:07:29.399 squint a little, what you see there is DTrace the D letter takes a couple of 00:07:29.399 --> 00:07:32.949 nanoseconds longer. This is the precision that I'm measuring here - ten to minus 00:07:32.949 --> 00:07:39.219 nine seconds. Like really precise. And D takes longer than everything else, so it's 00:07:39.219 --> 00:07:43.929 a little bit cut off there, but trust me. I know it sound like Donald Trump I'm 00:07:43.929 --> 00:07:52.759 saying that. So yeah, and from that let's just enter a letter. And now the password 00:07:52.759 --> 00:07:56.799 and now the script clears everything and it's going to guess the next letter. So 00:07:56.799 --> 00:08:02.020 sadly this is cut off, because you would see that this distribution radically 00:08:02.020 --> 00:08:08.830 changed. It looks completely different, and so we can play that game a little bit. 00:08:08.830 --> 00:08:13.419 So let's just roll with that. And like every three seconds the script is 00:08:13.419 --> 00:08:19.159 going to recompute looking at the new distribution. And you can probably see 00:08:19.159 --> 00:08:26.849 where this is going. So here you can see, okay, and now it just - it just takes 00:08:26.849 --> 00:08:34.559 about like three seconds for me to guess the next letter. So, and this is not a 00:08:34.559 --> 00:08:39.809 problem that is only of something that happens when you do string 00:08:39.809 --> 00:08:44.139 compares. This can happen with basically everything - so it's especially 00:08:44.139 --> 00:08:48.029 in things like cryptographic stuff where you don't want to have some information 00:08:48.029 --> 00:08:56.620 leaked out. So this is what we call a timing side channel attack. So I could 00:08:56.620 --> 00:09:02.959 essentially use DTrace to analyze the real binary. So I didn't change the 00:09:02.959 --> 00:09:07.040 binary - I didn't have some some debug code there. This is like the actual binary 00:09:07.040 --> 00:09:12.500 that I would put into production. So what's important about out that, is to 00:09:12.500 --> 00:09:16.500 take the actual binary, is some of these these timing side channels might be 00:09:16.500 --> 00:09:21.620 introduced by a compiler optimization. And when you insert debug code into that code, 00:09:21.620 --> 00:09:26.920 then it might actually go away. So, you want to look at the real code that you're 00:09:26.920 --> 00:09:34.420 putting into production. Let me show you the script that I came up with to write 00:09:34.420 --> 00:09:40.779 that. So there are three interesting things in this script. So and and don't 00:09:40.779 --> 00:09:44.180 worry - this is the more complicated example, I just want to like 00:09:44.180 --> 00:09:48.839 inspire your ideas. Because the things that you can do with DTrace that's pretty 00:09:48.839 --> 00:09:54.600 much - the sky's the limit. You can come up with the weirdest ideas, and so 00:09:54.600 --> 00:09:59.420 this is more complicated example. I'm going to show you simpler ones. So to 00:09:59.420 --> 00:10:04.440 demonstrate how we got here. So there are three interesting things in this code. The 00:10:04.440 --> 00:10:09.509 first one is something that we call a probe. So a probe is a point of 00:10:09.509 --> 00:10:15.019 instrumentation in the system. So whenever a certain event happens in the system this 00:10:15.019 --> 00:10:21.269 probe is going to fire. And in this case, the begin probe like marks the state 00:10:21.269 --> 00:10:27.379 the moment when the script starts. So the second interesting thing is this clause. 00:10:27.379 --> 00:10:31.680 So this clause is basically what this probe is going to execute - what's going 00:10:31.680 --> 00:10:37.780 to be executed once that probe fires. So it's a little block of code. 00:10:37.780 --> 00:10:42.370 And this probe is a little bit more interesting, because it tells us 00:10:42.370 --> 00:10:48.270 something about the structure of how such a probe looks like. Because every 00:10:48.270 --> 00:10:54.100 probe is uniquely identified by a four tuple. So it's like four components that 00:10:54.100 --> 00:10:59.079 uniquely identify a probe. And the first one is called the first part of this 00:10:59.079 --> 00:11:03.269 tuple is called the provider, and I'm going to talk about providers in a couple 00:11:03.269 --> 00:11:07.160 of seconds and what they are. The second one is called the module. Third one is 00:11:07.160 --> 00:11:13.449 called the function. And the last one is called the name. So these four pieces of 00:11:13.449 --> 00:11:21.079 data, like, they identify a probe uniquely. So the third thing that is 00:11:21.079 --> 00:11:25.440 interesting here is, sadly something that I don't have time to talk about today, 00:11:25.440 --> 00:11:31.139 this is called an aggregation. And this single line that you see here is 00:11:31.139 --> 00:11:35.889 essentially responsible for accumulating all this data to print out this 00:11:35.889 --> 00:11:39.949 distribution stuff - to generate this distribution. So this is built 00:11:39.949 --> 00:11:44.629 into DTrace. You don't have to do that yourself. As it, when you look at this 00:11:44.629 --> 00:11:50.189 script, it's like 42 lines of code. And I came up with the first prototype 00:11:50.189 --> 00:11:55.279 after five minutes. So it's not a lot of stuff to do to get something out of 00:11:55.279 --> 00:12:00.360 that. So it's very useful to have things - if you use DTrace you 00:12:00.360 --> 00:12:05.060 will use this a lot for performance debugging so it's kind of neat that we 00:12:05.060 --> 00:12:11.410 have that. So yeah, let's talk a little bit about providers, and this will 00:12:11.410 --> 00:12:18.300 probably also will be cut off. So I'm going to cheat a little bit here - I'm 00:12:18.300 --> 00:12:27.649 just going to double that. So let's talk about providers -- oh that's handy -- 00:12:27.649 --> 00:12:32.339 so I got 27 providers here and the number of providers vary from operating system to 00:12:32.339 --> 00:12:38.339 operating system. But these are the ones that I can see right now. There are 00:12:38.339 --> 00:12:44.499 other providers that can be come into existence when you demand them. So I have 00:12:44.499 --> 00:12:49.370 these 27 providers, and we're going to look at the syscall provider and the FBT 00:12:49.370 --> 00:12:55.129 provider first. So, every provider knows how to instrument a specific part of the 00:12:55.129 --> 00:13:01.410 system. So the syscall provider knows how to instrument the syscall table. That's not 00:13:01.410 --> 00:13:08.699 very surprising. So if you can look at the syscall provider and here you can see 00:13:08.699 --> 00:13:16.720 essentially every system call entry and return that FreeBSD offers. So 00:13:16.720 --> 00:13:20.120 here you can see this four tuple, like, the provider syscall, FreeBSD is the 00:13:20.120 --> 00:13:28.189 module, and so on. So these are all the system calls that I have in my system. And 00:13:28.189 --> 00:13:32.910 the other provider that I want to look at is the so called FBT provider, and that is 00:13:32.910 --> 00:13:38.810 pretty astonishing. The FBT provider, FBT stands for "function boundary tracer" and 00:13:38.810 --> 00:13:45.160 what it allows us to do, it allows us to trace every single function in the kernel. 00:13:45.160 --> 00:13:50.850 So I can look at the entire kernel at functions, as they are being called. So to 00:13:50.850 --> 00:13:57.660 illustrate that I wrote a little, very simple DTrace script and this is probably, 00:13:57.660 --> 00:14:01.399 look at the upper half please, so this is probably one of the first DTrace scripts 00:14:01.399 --> 00:14:05.529 that you will come up with, it's a fairly simple example, so let's break it 00:14:05.529 --> 00:14:09.680 down. So I'm going to instrument the mmap system call. For those of you who do not 00:14:09.680 --> 00:14:13.720 know what the mmap system call is, what you can do with it is you can so you can 00:14:13.720 --> 00:14:20.970 take a file and map that into the address space of your process, so very dumbed down 00:14:20.970 --> 00:14:27.449 version. So whenever we enter the mmap system call we are going to set the 00:14:27.449 --> 00:14:32.810 variable "follow" to one, and what this "self at" means: this is essentially a 00:14:32.810 --> 00:14:37.970 thread local variable and we're going to associate that variable with the thread 00:14:37.970 --> 00:14:45.230 that we're currently inspecting. Then I'm going to do something pretty, that sounds 00:14:45.230 --> 00:14:49.149 scary but I'm going to instrument the entire kernel. Every function entry and 00:14:49.149 --> 00:14:53.009 every function return, I'm going to instrument that and say "please emit data 00:14:53.009 --> 00:14:57.189 when you do that". And this is what we call a predicate, so this is where the 00:14:57.189 --> 00:15:02.009 awkiness of the DTrace programming language comes in. So this is a predicate 00:15:02.009 --> 00:15:07.059 and whenever that evaluates to true then the probe is going to fire, so in 00:15:07.059 --> 00:15:11.139 this case when we are in the thread that we're currently tracing we're going to 00:15:11.139 --> 00:15:16.329 emit data. And this is just an empty clause, we just want to know "hey we got 00:15:16.329 --> 00:15:23.480 here". So when we exit the mmap system call and the predicate is set we're 00:15:23.480 --> 00:15:27.660 going to set the variable "follow" to zero, because every uninitialized variable 00:15:27.660 --> 00:15:33.860 in DTrace is set to zero, so this pretty much amounts to deallocating that variable 00:15:33.860 --> 00:15:41.279 and then we're going to exit cleanly. So let me run that. So it takes a couple of 00:15:41.279 --> 00:15:48.480 seconds and boom. So you saw a little pause here, that was when the DTrace guard 00:15:48.480 --> 00:15:55.009 reverted the driver, the kernel. So now you can see every function call that 00:15:55.009 --> 00:15:59.480 happens inside the mmap system call. And this is a little bit hard on the eyes, so 00:15:59.480 --> 00:16:08.379 let me pass this flag here and now you can have nice to read indentation. So 00:16:08.379 --> 00:16:12.629 now you might say "I don't like that. You are injecting code into the kernel. That 00:16:12.629 --> 00:16:17.880 is, that sounds dangerous" and yeah, but let me show you something that I find 00:16:17.880 --> 00:16:23.980 really interesting. So I'm not going too much into depth here, but this 00:16:23.980 --> 00:16:28.750 is a byte code, so every DTrace script gets compiled to bytecode and this 00:16:28.750 --> 00:16:34.499 bytecode gets sent to the kernel and in the kernel you have a virtual machine that 00:16:34.499 --> 00:16:39.059 interprets that bytecode. So in case you write a script that for some reason might 00:16:39.059 --> 00:16:44.550 go rogue on your kernel, it like allocates too much memory, takes too much time, this 00:16:44.550 --> 00:16:49.279 virtual machine can just say "okay, stop it" and just going to revert all the 00:16:49.279 --> 00:16:53.890 changes that happened to your kernel, and that's kinda handy. And it's not a new 00:16:53.890 --> 00:17:01.199 idea, so if you're using TCP dump it's basically the same approach. They also 00:17:01.199 --> 00:17:04.832 have this kind of bytecode, so that's just a little excursion here. This is called 00:17:04.832 --> 00:17:13.250 BPF, Berkeley Packet Filter, so it's not an entirely new idea. So everything I 00:17:13.250 --> 00:17:19.470 showed you until now was "hey, I can look when function calls happen". that's not 00:17:19.470 --> 00:17:22.519 very much information, so we're going to increase the amount of information that we 00:17:22.519 --> 00:17:35.080 get out of the system with every example. So let me look at the actual kernel. So I 00:17:35.080 --> 00:17:39.980 had to restart my machine, so my setup is basically gone now. So let's look at this 00:17:39.980 --> 00:17:45.309 VM fault function. So this is, this is the source code of the operating system that 00:17:45.309 --> 00:17:52.900 I'm running right now. This is FreeBSD current 12 and the VM fault function; 00:17:52.900 --> 00:17:57.539 remember the mmap system call that I told you? So the mmap system call 00:17:57.539 --> 00:18:03.899 I told you can bring, like map a file into your address space. And it doesn't 00:18:03.899 --> 00:18:10.320 necessarily have to load the entire file, so whenever we are touching a page in the 00:18:10.320 --> 00:18:15.780 system, like a memory page, this machine is four kilobytes and it's no super pages 00:18:15.780 --> 00:18:21.429 here, so whenever it touches a piece of memory that you didn't bring into memory 00:18:21.429 --> 00:18:25.309 yet, we're generating something that's called a page fault, and then this 00:18:25.309 --> 00:18:31.180 function gets called. So here let's look at the arguments, and I'm going to skip 00:18:31.180 --> 00:18:36.990 the zeroeth argument, to look at the first argument. So this is the address that 00:18:36.990 --> 00:18:44.160 provoked that page fault, this is the type and these are the flags and I'm going 00:18:44.160 --> 00:18:48.780 to show you something to make that a little bit more readable. So what about 00:18:48.780 --> 00:18:58.960 this one? So you see it's a pointer and this is a big structure, so we want 00:18:58.960 --> 00:19:09.961 to be able to look at that structure. And just probably should do this here, so 00:19:09.961 --> 00:19:17.090 let's look at this VM fault script here. So this is, make this a little bit more, 00:19:17.090 --> 00:19:20.950 so this is, don't pay too much attention to this code, this this is basically just 00:19:20.950 --> 00:19:26.049 boilerplate to make make stuff readable and this is where the actual action is 00:19:26.049 --> 00:19:31.690 happening. So this is, so what I'm doing there is I'm instrumenting the VM 00:19:31.690 --> 00:19:36.350 fault function and whenever we enter it then we're going to use some information 00:19:36.350 --> 00:19:40.720 that DTrace gives us for free. So this is execname, this is the name of the 00:19:40.720 --> 00:19:45.909 currently running executable that provoked the page fault, this is the process ID and 00:19:45.909 --> 00:19:53.250 here we have a bunch of argument variables. So these arg1, arg2, arg3, 00:19:53.250 --> 00:19:57.964 that are essentially just integers, so nothing too fancy there. But we wanna 00:19:57.964 --> 00:20:02.380 look, wanna be able to look at that struct. And here I'm going to use this 00:20:02.380 --> 00:20:08.140 args array, and this args array is kind of special, because it has typing 00:20:08.140 --> 00:20:15.870 information about the arguments. So when you run that, so you're referencing that 00:20:15.870 --> 00:20:26.570 pointer there with the star, excuse me, and let's just run that and maybe, that's 00:20:26.570 --> 00:20:32.899 a start yeah. So this is an in-kernel data structure that we can now look 00:20:32.899 --> 00:20:40.010 at. So DTrace enabled us to look at in- memory data structures as the system runs. 00:20:40.010 --> 00:20:44.330 And this is really really powerful. In in the DTrace script I could use all 00:20:44.330 --> 00:20:50.490 these fields like I can manipulate this args array, this value in there, just like 00:20:50.490 --> 00:20:57.010 just like every other variable; I can pretty much work like I was in C. So 00:20:57.010 --> 00:21:02.659 how is it doing that? There is something that's called CTF, that's not capture the 00:21:02.659 --> 00:21:10.120 flag, it's, this is the, the Compact C Tracing Format, so you can see that but 00:21:10.120 --> 00:21:14.320 there is a man page in FreeBSD, and there's a little segment in the kernel 00:21:14.320 --> 00:21:19.190 binary, where all this typing information is stored. I don't know how that compares 00:21:19.190 --> 00:21:24.320 to modern DWARF but yeah this is what DTrace is working with. So now you might 00:21:24.320 --> 00:21:28.549 ask yourself "Why on earth would I do that? Why on earth would I look at virtual 00:21:28.549 --> 00:21:33.590 memory, because, yeah, um, this stuff is safe isn't it? I mean there's no bugs in 00:21:33.590 --> 00:21:42.820 there." Except when they are. Anyone remembers remembers "Dirty COW"? So this 00:21:42.820 --> 00:21:48.510 was a very nasty vulnerability in the Linux kernel and that that was a problem 00:21:48.510 --> 00:21:52.399 in the virtual memory management. So it allowed you to write to a file that you 00:21:52.399 --> 00:21:56.679 didn't own as a regular user. So you could essentially just write to a binary that 00:21:56.679 --> 00:22:01.789 had "set UID" set. Very unpleasant, but I'm not going to bash the Linux folks 00:22:01.789 --> 00:22:08.030 here, this is just, I just want to show you these things are hard. And the first 00:22:08.030 --> 00:22:15.440 fix for this problem was in 2005 and then it came back in 2016. So now that's fixed 00:22:15.440 --> 00:22:21.080 and then it came back with "Huge Dirty COW" in 2017, so this is, I mean this 00:22:21.080 --> 00:22:27.580 was there for way over a decade. These things are hard to debug. And this 00:22:27.580 --> 00:22:33.110 is what I like about these systems, so not having, not having tools like DTrace to 00:22:33.110 --> 00:22:37.640 figure out what's going on inside of the system somehow, to me, amounts to security 00:22:37.640 --> 00:22:42.360 by obscurity. And I've heard that some people who are developing exploits for 00:22:42.360 --> 00:22:46.100 systems that have DTrace they say "Oh, I really like developing exploits on these 00:22:46.100 --> 00:22:53.230 systems, because the tooling is so great!" Yeah, but, to be honest this is cool, 00:22:53.230 --> 00:22:58.899 because an exploit is a proof of concept and coming up with these exploits quickly 00:22:58.899 --> 00:23:03.440 is very usable, because you know what's going on you can show "Hey, this is going 00:23:03.440 --> 00:23:07.279 wrong". I had situations, where people were telling me "Oh, this is this 00:23:07.279 --> 00:23:11.020 is not a problem with our program, this is this weird operating system that you're 00:23:11.020 --> 00:23:18.100 using. Like Solaris, weird operating system." And, yeah, and then I churned out 00:23:18.100 --> 00:23:22.059 some DTrace scripts and "No, it's actually your problem". "Oh, now I can see 00:23:22.059 --> 00:23:31.419 that on my Linux box!" Magic. So, everything I showed you until now was 00:23:31.419 --> 00:23:38.179 very, very much related to function calls and we want to have a little bit more 00:23:38.179 --> 00:23:44.720 semantics here, because you might want to write a script that inspects protocols, 00:23:44.720 --> 00:23:48.760 stuff like TCP, UDP stuff like that. So, you don't want to know which function 00:23:48.760 --> 00:23:54.320 inside of the kernel is responsible for handling your TCP/IP stuff, so DTrace 00:23:54.320 --> 00:24:00.549 comes with something that's called static providers and I'm just going to show the 00:24:00.549 --> 00:24:04.769 apropos here. So these are, so every static provider has a main page which is 00:24:04.769 --> 00:24:10.950 kind of handy - documentation whoo - and you can see there is an I/O provider if 00:24:10.950 --> 00:24:17.539 you are interested in looking at this guy: Oh, IP for looking at IPv4 and IPv6, 00:24:17.539 --> 00:24:23.570 TCP... This one is pretty cool, it's about scheduling behavior. So, "what does my 00:24:23.570 --> 00:24:29.010 scheduler do?" And if you look at that, you can see some interesting stuff like length 00:24:29.010 --> 00:24:33.150 priority if you ever saw things like priority inversion, stuff like that, now 00:24:33.150 --> 00:24:36.970 you can see that happen. I'm a nerd, I find this interesting for some reason, I 00:24:36.970 --> 00:24:43.230 don't know. And it's also pretty interesting to figure out what's going on, 00:24:43.230 --> 00:24:48.279 "why is this getting de-scheduled all the time?" So, some interesting things going 00:24:48.279 --> 00:24:55.809 on there. So, I'm running a little bit short on time here, but I just quickly 00:24:55.809 --> 00:24:59.340 want to show you something - this is all kernel stuff right now - can we do that 00:24:59.340 --> 00:25:05.380 with userspace? Of course. So, there was one provider that didn't show up when I 00:25:05.380 --> 00:25:09.590 had my provider listing, but was in the DTrace script where I did this timing 00:25:09.590 --> 00:25:16.230 attack stuff. And that's called the PID provider. And the PID provider generates 00:25:16.230 --> 00:25:21.080 probes on demand, because a process might have a lot of probes and you will shortly 00:25:21.080 --> 00:25:25.190 see why and this is why I'm going to use a very small program which is called "true", 00:25:25.190 --> 00:25:31.560 and true just exits with exit code zero. So, nothing too exciting going on here, 00:25:31.560 --> 00:25:37.810 and this dollar target gets substituted in, we get the process ID there. And this 00:25:37.810 --> 00:25:44.640 is everything that happens when I'm executing this program you see this is a 00:25:44.640 --> 00:25:48.679 little bit more fine-grained than the FBT provider, because now we can trace every 00:25:48.679 --> 00:25:53.520 single instruction inside of that function, which is kind of a handy. It's a 00:25:53.520 --> 00:25:58.090 scriptable debugger. So, these numbers are the instructional offsets inside of that 00:25:58.090 --> 00:26:03.360 function. We can also look at - so this is everything in the true segment - we can 00:26:03.360 --> 00:26:09.899 also look at libraries that got linked in and there's a lot of stuff happening in 00:26:09.899 --> 00:26:15.780 libc for example when you run true. So, one last thing that I wanted to show 00:26:15.780 --> 00:26:22.340 you because it consumed a week of my life: I'm using a lot of Haskell and the Mac OS 00:26:22.340 --> 00:26:29.419 people, they also have DTrace and they have GHC Haskell DTrace support - so the 00:26:29.419 --> 00:26:38.380 Glasgow Haskell compiler - and glorious... they have probes to analyze what's going 00:26:38.380 --> 00:26:41.620 on inside of the runtime system. So, I thought "I want to have that, I have 00:26:41.620 --> 00:26:47.019 DTrace, why doesn't it work on FreeBSD?" So, after a week of fighting with make 00:26:47.019 --> 00:26:55.100 files and linkers, that works: If you check out the recent GHC repository and 00:26:55.100 --> 00:27:00.260 build it on FreeBSD, you get all the nice stuff that I'm going to show you now. So, 00:27:00.260 --> 00:27:05.909 this is a very boring program - it just starts 32 green threads and schedules them 00:27:05.909 --> 00:27:10.470 all over the place - and now I can do something like this: phone rings I can 00:27:10.470 --> 00:27:13.934 ring a telephone. laughter 00:27:13.934 --> 00:27:18.750 No, that would be interesting... So, you can also use 00:27:18.750 --> 00:27:26.970 wildcards - and not as name of the probe - and this is what's going on inside, like 00:27:26.970 --> 00:27:31.580 GC garbage collection and all this stuff. Now you can look at this and write useful 00:27:31.580 --> 00:27:37.509 DTrace scripts that also take my runtime system into account. So, stuff like that 00:27:37.509 --> 00:27:41.810 exists for I think Python - I'm not entirely sure because I don't use it - 00:27:41.810 --> 00:27:49.120 nodejs same, Postgres - I used it but not with DTrace right now - and what a find 00:27:49.120 --> 00:27:55.210 interesting: Firefox. When you run JavaScript in your Firefox, it actually 00:27:55.210 --> 00:27:59.360 has a provider, so you can trace JavaScript running in your browser with 00:27:59.360 --> 00:28:05.130 DTrace, so after everything I just showed you, there might be some stuff going on 00:28:05.130 --> 00:28:10.700 there. So yeah, this is basically everything I wanted to show you and I 00:28:10.700 --> 00:28:13.759 think I'm going to wrap out, because otherwise we're not going to have a lot of 00:28:13.759 --> 00:28:19.001 time for questions and maybe you have some. So yeah, thanks. 00:28:19.001 --> 00:28:29.610 applause Herald: Thank you very much Raichoo. We 00:28:29.610 --> 00:28:34.257 are actually over time already, but we have two more minutes because we started 00:28:34.257 --> 00:28:38.817 three minutes late, so if there are any really quick questions, possibly from the 00:28:38.817 --> 00:28:43.030 internet... There is one, the signal angel says, let's hear it. 00:28:43.030 --> 00:28:48.013 Question: Yeah, hi, okay. So, the question is, "which changes are actually necessary 00:28:48.013 --> 00:28:51.809 to do in the kernel of an operating system to support DTrace?" 00:28:51.809 --> 00:28:56.370 Answer: That's a lot of work. So, it's not something like you do in a weekend. This 00:28:56.370 --> 00:29:03.062 is... So, the person who started the work on FreeBSD has sadly passed away now, but 00:29:03.062 --> 00:29:09.559 I think they took a couple of years to have everything in place, so you have to 00:29:09.559 --> 00:29:13.730 have stuff like the CTF thing that I showed you, which is what OpenBSD is 00:29:13.730 --> 00:29:19.890 currently working on. And then you need all those those magic gizmos, like kernel 00:29:19.890 --> 00:29:25.660 modules and stuff like that. So, it takes a lot of time, but it's been ported to 00:29:25.660 --> 00:29:30.889 most operating systems that are available and in use right now. So yeah, hope this 00:29:30.889 --> 00:29:34.239 answers the question. Herald: Excellent and there are no more 00:29:34.239 --> 00:29:38.839 questions here in the room. I will thank Raichoo and you can find him outside of 00:29:38.839 --> 00:29:46.590 the room and also on Twitter at "raichoo" if you have any more further question. 00:29:46.590 --> 00:29:51.405 postroll music 00:29:51.405 --> 00:30:08.000 subtitles created by c3subtitles.de in the year 2020. Join, and help us!