WEBVTT
00:00:00.110 --> 00:00:14.320
music
00:00:14.320 --> 00:00:18.810
applause
00:00:18.810 --> 00:00:23.560
Raichoo: Yeah sorry about that - beamers
or projectors, I don't like them. They
00:00:23.560 --> 00:00:27.210
don't like me either. So this is a little
heads up - this is going to be the only
00:00:27.210 --> 00:00:32.049
slide I'm going to show you today so,
"slide", because I think doing stuff like
00:00:32.049 --> 00:00:35.940
that in a terminal might be a little bit
more interesting for you. But sadly
00:00:35.940 --> 00:00:40.020
something is getting cut off so I we have
to improvise a little bit. But anyway, so
00:00:40.020 --> 00:00:43.960
today I will be able to talk about two of
my favorite things right now which are
00:00:43.960 --> 00:00:47.820
FreeBSD and DTrace. But this talk has been
capped down to 30 minutes so we'll be
00:00:47.820 --> 00:00:53.190
focusing a little more on the DTrace part.
So there will be a little bit less BSD
00:00:53.190 --> 00:00:57.560
than I anticipated. And also adjusted
everything a little bit to fit better into
00:00:57.560 --> 00:01:03.130
the resilience track so hopefully you'll
enjoy that. So before we begin, who here
00:01:03.130 --> 00:01:08.640
is actually using DTrace? Okay more than
I expected but still not as many as I
00:01:08.640 --> 00:01:12.610
would like to see. So hopefully after this
talk you will think, "oh, this is a really
00:01:12.610 --> 00:01:17.170
awesome tool, I gotta learn it." Because I
totally love it - it changed the way I do
00:01:17.170 --> 00:01:22.110
a lot of stuff. So for those of you who do
not know what DTrace is, first, let me
00:01:22.110 --> 00:01:27.260
fill you in on this stuff. So it's open
source, it originated on Solaris, and been
00:01:27.260 --> 00:01:31.640
developed currently on illumos which is a
fork from OpenSolaris. It has been ported
00:01:31.640 --> 00:01:37.930
to FreeBSD, NetBSD, OS X, there's also a
port for Linux called next called DTrace
00:01:37.930 --> 00:01:43.689
for Linux. I think it's done by a person
called Paul Fox. It's been ported to QNX
00:01:43.689 --> 00:01:49.810
and the OpenBSD folks are currently doing
some work to get the technology like
00:01:49.810 --> 00:01:54.040
DTrace on their system. And I think
there's a port for Windows? I don't know
00:01:54.040 --> 00:01:57.869
if this is actually true, but it is it's
kind of cool because then that means it's
00:01:57.869 --> 00:02:04.650
basically everywhere. So, most of you
would probably know static tools like
00:02:04.650 --> 00:02:09.470
strace. We have a very similar tool on
FreeBSD that is called truss, and what
00:02:09.470 --> 00:02:14.500
truss and strace are doing is - you can
attach them to a process and look at the
00:02:14.500 --> 00:02:18.650
system calls that this process is
emitting. So in case something is going
00:02:18.650 --> 00:02:23.319
wrong you can well look inside of the
program, which can be kind of useful when
00:02:23.319 --> 00:02:28.870
you're trying to find a problem. It's
kind of handy but it's also pretty
00:02:28.870 --> 00:02:32.890
limited. Because first of all it really
really slows down the process that you're
00:02:32.890 --> 00:02:37.250
currently looking at. So if you want to
debug a performance issue, you're pretty
00:02:37.250 --> 00:02:42.170
much out of luck there. And also it's kind
of like, narrow down - you can just look
00:02:42.170 --> 00:02:47.940
at one process. Which is also like bad
thing because the system that we currently
00:02:47.940 --> 00:02:52.660
have - all these systems are very
complex: we have a lot of layers. You have
00:02:52.660 --> 00:02:56.300
virtual file systems, you have virtual
memory, you have network, you have
00:02:56.300 --> 00:03:00.500
databases, processes communicating with
each other. And in case you are using a
00:03:00.500 --> 00:03:04.710
high-level programming language, you might
also have a runtime system. So it's a
00:03:04.710 --> 00:03:09.519
little operating system on top of your
operating system. So when something goes
00:03:09.519 --> 00:03:15.000
wrong in a system that has such large
complexity, something happens that we call
00:03:15.000 --> 00:03:19.850
the blame game. And the blame game - it's
never your fault, it's always someone
00:03:19.850 --> 00:03:25.710
else's. So what we want to be able to do
is we want to look at the system as a
00:03:25.710 --> 00:03:30.349
whole, so we can correlate all the data
and come up with some meaningful answers
00:03:30.349 --> 00:03:34.506
when something is really going wrong in
there. And also, we don't want to
00:03:34.506 --> 00:03:39.260
switch out all the processes for
debug processes to make that happen,
00:03:39.260 --> 00:03:44.969
because as these things are all -- every
problem happens in production. It never
00:03:44.969 --> 00:03:48.470
happens on the development box. So like,
switching out all the processes - that's
00:03:48.470 --> 00:03:55.030
totally out of the picture. So to do that
in an arbitrary way, to like, instrument
00:03:55.030 --> 00:03:59.910
the system in an arbitrary way, we sort of
need like a programming language. So, we
00:03:59.910 --> 00:04:03.640
need to describe - when that happens,
please submit data so I can see what's
00:04:03.640 --> 00:04:09.489
going on. So this kind of implies a
programming language. And DTrace comes
00:04:09.489 --> 00:04:13.670
with such a programming language - it's a
little bit reminiscent of awk cross with
00:04:13.670 --> 00:04:18.798
C? It's pretty simple to learn - you can
pick it up 20 up to pick it up in 20
00:04:18.798 --> 00:04:25.199
minutes and you can start churning out
your first DTrace scripts. So like awk, if
00:04:25.199 --> 00:04:30.559
you know awk, awk can be used to analyze
large bodies of text. Dtrace is pretty
00:04:30.559 --> 00:04:34.749
much the same, but for system behavior -
so a little bit mind boggling, but
00:04:34.749 --> 00:04:40.069
probably I can show you what I mean by
that. And also, as a bonus we don't want
00:04:40.069 --> 00:04:43.860
to slow down the system, so we want to be
able to do things like performance
00:04:43.860 --> 00:04:52.300
debugging, performance tests like that. So
I've prepared this little demo here, and.
00:04:52.300 --> 00:04:58.780
So since we had some issues here probably
this is not -- I have to play around a
00:04:58.780 --> 00:05:04.249
little bit. So what I'm going to do is
I'm going to look at a very very naive way
00:05:04.249 --> 00:05:18.009
to -- excuse me for a second -- very naive
way to -- give me a second -- so very
00:05:18.009 --> 00:05:21.960
naive way to authenticate a user. And
there's a lot of stuff wrong with this
00:05:21.960 --> 00:05:26.030
code, but like what we're going to do is
we're going to take a user string as
00:05:26.030 --> 00:05:32.740
input, and then we're going to just
compare it to another, to a secret. So I
00:05:32.740 --> 00:05:36.420
know, the the secret in here is like in
plain text I know this is a problem, but
00:05:36.420 --> 00:05:41.639
this is a little bit artificial. But I
just want to get my point across. So from
00:05:41.639 --> 00:05:47.159
an algorithmic perspective, this check
function is correct: so we take a string
00:05:47.159 --> 00:05:52.449
we take another string and we compare
them. So everything's fine and easy. So if
00:05:52.449 --> 00:05:58.599
you look at the way string compare works
and what it does, it's essentially
00:05:58.599 --> 00:06:04.449
taking these two strings and it's
comparing every character bit by bit. So
00:06:04.449 --> 00:06:10.729
when it finds the first pair of characters
that do not match up, it's going to stop.
00:06:10.729 --> 00:06:17.879
So we can we can conclude something about
from that - so if it takes very short if
00:06:17.879 --> 00:06:23.399
if this function this check function takes
a very short amount of time, then, what
00:06:23.399 --> 00:06:29.129
will happen is it will terminate earlier.
And if our password guess is better, it
00:06:29.129 --> 00:06:34.479
will take well, it will take longer. And
if we can measure that we can basically
00:06:34.479 --> 00:06:40.809
extract information from that running
algorithm. So I wrote a little driver
00:06:40.809 --> 00:06:47.449
program in Haskell that basically just
iterates over an alphabet and just feeds
00:06:47.449 --> 00:06:53.379
this one letter into that program,
And I'm going to use DTrace to get some
00:06:53.379 --> 00:06:59.020
timing information. So let me start the
driver. So this is now just running in the
00:06:59.020 --> 00:07:04.919
background. And you cannot see what I'm
typing there, but don't worry - these
00:07:04.919 --> 00:07:12.240
scripts will all be; I will push them on
my github. So DTrace now produces this
00:07:12.240 --> 00:07:17.240
nice little distribution. So if you if you
were if you were able to see the entire
00:07:17.240 --> 00:07:22.949
alphabet, you would see that everything
except "D" behaves differently. So if you
00:07:22.949 --> 00:07:29.399
squint a little, what you see there is
DTrace the D letter takes a couple of
00:07:29.399 --> 00:07:32.949
nanoseconds longer. This is the precision
that I'm measuring here - ten to minus
00:07:32.949 --> 00:07:39.219
nine seconds. Like really precise. And D
takes longer than everything else, so it's
00:07:39.219 --> 00:07:43.929
a little bit cut off there, but trust me.
I know it sound like Donald Trump I'm
00:07:43.929 --> 00:07:52.759
saying that. So yeah, and from that let's
just enter a letter. And now the password
00:07:52.759 --> 00:07:56.799
and now the script clears everything and
it's going to guess the next letter. So
00:07:56.799 --> 00:08:02.020
sadly this is cut off, because you would
see that this distribution radically
00:08:02.020 --> 00:08:08.830
changed. It looks completely different,
and so we can play that game a little bit.
00:08:08.830 --> 00:08:13.419
So let's just roll with that.
And like every three seconds the script is
00:08:13.419 --> 00:08:19.159
going to recompute looking at the new
distribution. And you can probably see
00:08:19.159 --> 00:08:26.849
where this is going. So here you can see,
okay, and now it just - it just takes
00:08:26.849 --> 00:08:34.559
about like three seconds for me to guess
the next letter. So, and this is not a
00:08:34.559 --> 00:08:39.809
problem that is only of
something that happens when you do string
00:08:39.809 --> 00:08:44.139
compares. This can happen with
basically everything - so it's especially
00:08:44.139 --> 00:08:48.029
in things like cryptographic stuff where
you don't want to have some information
00:08:48.029 --> 00:08:56.620
leaked out. So this is what we call a
timing side channel attack. So I could
00:08:56.620 --> 00:09:02.959
essentially use DTrace to analyze
the real binary. So I didn't change the
00:09:02.959 --> 00:09:07.040
binary - I didn't have some some debug
code there. This is like the actual binary
00:09:07.040 --> 00:09:12.500
that I would put into production. So
what's important about out that, is to
00:09:12.500 --> 00:09:16.500
take the actual binary, is some of these
these timing side channels might be
00:09:16.500 --> 00:09:21.620
introduced by a compiler optimization. And
when you insert debug code into that code,
00:09:21.620 --> 00:09:26.920
then it might actually go away. So, you
want to look at the real code that you're
00:09:26.920 --> 00:09:34.420
putting into production. Let me show you
the script that I came up with to write
00:09:34.420 --> 00:09:40.779
that. So there are three interesting
things in this script. So and and don't
00:09:40.779 --> 00:09:44.180
worry - this is the more
complicated example, I just want to like
00:09:44.180 --> 00:09:48.839
inspire your ideas. Because the things
that you can do with DTrace that's pretty
00:09:48.839 --> 00:09:54.600
much - the sky's the limit. You can
come up with the weirdest ideas, and so
00:09:54.600 --> 00:09:59.420
this is more complicated example. I'm
going to show you simpler ones. So to
00:09:59.420 --> 00:10:04.440
demonstrate how we got here. So there are
three interesting things in this code. The
00:10:04.440 --> 00:10:09.509
first one is something that we call a
probe. So a probe is a point of
00:10:09.509 --> 00:10:15.019
instrumentation in the system. So whenever
a certain event happens in the system this
00:10:15.019 --> 00:10:21.269
probe is going to fire. And in this case,
the begin probe like marks the state
00:10:21.269 --> 00:10:27.379
the moment when the script starts. So the
second interesting thing is this clause.
00:10:27.379 --> 00:10:31.680
So this clause is basically what this
probe is going to execute - what's going
00:10:31.680 --> 00:10:37.780
to be executed once that probe fires. So
it's a little block of code.
00:10:37.780 --> 00:10:42.370
And this probe is a little bit more
interesting, because it tells us
00:10:42.370 --> 00:10:48.270
something about the structure of how such
a probe looks like. Because every
00:10:48.270 --> 00:10:54.100
probe is uniquely identified by a four
tuple. So it's like four components that
00:10:54.100 --> 00:10:59.079
uniquely identify a probe. And the first
one is called the first part of this
00:10:59.079 --> 00:11:03.269
tuple is called the provider, and I'm
going to talk about providers in a couple
00:11:03.269 --> 00:11:07.160
of seconds and what they are. The second
one is called the module. Third one is
00:11:07.160 --> 00:11:13.449
called the function. And the last one is
called the name. So these four pieces of
00:11:13.449 --> 00:11:21.079
data, like, they identify a probe
uniquely. So the third thing that is
00:11:21.079 --> 00:11:25.440
interesting here is, sadly something that
I don't have time to talk about today,
00:11:25.440 --> 00:11:31.139
this is called an aggregation. And this
single line that you see here is
00:11:31.139 --> 00:11:35.889
essentially responsible for accumulating
all this data to print out this
00:11:35.889 --> 00:11:39.949
distribution stuff - to generate this
distribution. So this is built
00:11:39.949 --> 00:11:44.629
into DTrace. You don't have to do that
yourself. As it, when you look at this
00:11:44.629 --> 00:11:50.189
script, it's like 42 lines of code.
And I came up with the first prototype
00:11:50.189 --> 00:11:55.279
after five minutes. So it's not a lot
of stuff to do to get something out of
00:11:55.279 --> 00:12:00.360
that. So it's very useful to have things -
if you use DTrace you
00:12:00.360 --> 00:12:05.060
will use this a lot for performance
debugging so it's kind of neat that we
00:12:05.060 --> 00:12:11.410
have that. So yeah, let's talk a little
bit about providers, and this will
00:12:11.410 --> 00:12:18.300
probably also will be cut off. So I'm
going to cheat a little bit here - I'm
00:12:18.300 --> 00:12:27.649
just going to double that. So let's talk
about providers -- oh that's handy --
00:12:27.649 --> 00:12:32.339
so I got 27 providers here and the number
of providers vary from operating system to
00:12:32.339 --> 00:12:38.339
operating system. But these are the
ones that I can see right now. There are
00:12:38.339 --> 00:12:44.499
other providers that can be come into
existence when you demand them. So I have
00:12:44.499 --> 00:12:49.370
these 27 providers, and we're going to
look at the syscall provider and the FBT
00:12:49.370 --> 00:12:55.129
provider first. So, every provider knows
how to instrument a specific part of the
00:12:55.129 --> 00:13:01.410
system. So the syscall provider knows how
to instrument the syscall table. That's not
00:13:01.410 --> 00:13:08.699
very surprising. So if you can look at the
syscall provider and here you can see
00:13:08.699 --> 00:13:16.720
essentially every system call entry and
return that FreeBSD offers. So
00:13:16.720 --> 00:13:20.120
here you can see this four tuple, like,
the provider syscall, FreeBSD is the
00:13:20.120 --> 00:13:28.189
module, and so on. So these are all the
system calls that I have in my system. And
00:13:28.189 --> 00:13:32.910
the other provider that I want to look at
is the so called FBT provider, and that is
00:13:32.910 --> 00:13:38.810
pretty astonishing. The FBT provider, FBT
stands for "function boundary tracer" and
00:13:38.810 --> 00:13:45.160
what it allows us to do, it allows us to
trace every single function in the kernel.
00:13:45.160 --> 00:13:50.850
So I can look at the entire kernel at
functions, as they are being called. So to
00:13:50.850 --> 00:13:57.660
illustrate that I wrote a little, very
simple DTrace script and this is probably,
00:13:57.660 --> 00:14:01.399
look at the upper half please, so this is
probably one of the first DTrace scripts
00:14:01.399 --> 00:14:05.529
that you will come up with, it's a
fairly simple example, so let's break it
00:14:05.529 --> 00:14:09.680
down. So I'm going to instrument the mmap
system call. For those of you who do not
00:14:09.680 --> 00:14:13.720
know what the mmap system call is, what
you can do with it is you can so you can
00:14:13.720 --> 00:14:20.970
take a file and map that into the address
space of your process, so very dumbed down
00:14:20.970 --> 00:14:27.449
version. So whenever we enter the mmap
system call we are going to set the
00:14:27.449 --> 00:14:32.810
variable "follow" to one, and what this
"self at" means: this is essentially a
00:14:32.810 --> 00:14:37.970
thread local variable and we're going to
associate that variable with the thread
00:14:37.970 --> 00:14:45.230
that we're currently inspecting. Then I'm
going to do something pretty, that sounds
00:14:45.230 --> 00:14:49.149
scary but I'm going to instrument the
entire kernel. Every function entry and
00:14:49.149 --> 00:14:53.009
every function return, I'm going to
instrument that and say "please emit data
00:14:53.009 --> 00:14:57.189
when you do that". And this is what we
call a predicate, so this is where the
00:14:57.189 --> 00:15:02.009
awkiness of the DTrace programming
language comes in. So this is a predicate
00:15:02.009 --> 00:15:07.059
and whenever that evaluates to true
then the probe is going to fire, so in
00:15:07.059 --> 00:15:11.139
this case when we are in the thread that
we're currently tracing we're going to
00:15:11.139 --> 00:15:16.329
emit data. And this is just an empty
clause, we just want to know "hey we got
00:15:16.329 --> 00:15:23.480
here". So when we exit the mmap
system call and the predicate is set we're
00:15:23.480 --> 00:15:27.660
going to set the variable "follow" to
zero, because every uninitialized variable
00:15:27.660 --> 00:15:33.860
in DTrace is set to zero, so this pretty
much amounts to deallocating that variable
00:15:33.860 --> 00:15:41.279
and then we're going to exit cleanly. So
let me run that. So it takes a couple of
00:15:41.279 --> 00:15:48.480
seconds and boom. So you saw a little
pause here, that was when the DTrace guard
00:15:48.480 --> 00:15:55.009
reverted the driver, the kernel. So now
you can see every function call that
00:15:55.009 --> 00:15:59.480
happens inside the mmap system call. And
this is a little bit hard on the eyes, so
00:15:59.480 --> 00:16:08.379
let me pass this flag here and now you can
have nice to read indentation. So
00:16:08.379 --> 00:16:12.629
now you might say "I don't like that. You
are injecting code into the kernel. That
00:16:12.629 --> 00:16:17.880
is, that sounds dangerous" and yeah, but
let me show you something that I find
00:16:17.880 --> 00:16:23.980
really interesting. So I'm not
going too much into depth here, but this
00:16:23.980 --> 00:16:28.750
is a byte code, so every DTrace script
gets compiled to bytecode and this
00:16:28.750 --> 00:16:34.499
bytecode gets sent to the kernel and in
the kernel you have a virtual machine that
00:16:34.499 --> 00:16:39.059
interprets that bytecode. So in case you
write a script that for some reason might
00:16:39.059 --> 00:16:44.550
go rogue on your kernel, it like allocates
too much memory, takes too much time, this
00:16:44.550 --> 00:16:49.279
virtual machine can just say "okay, stop
it" and just going to revert all the
00:16:49.279 --> 00:16:53.890
changes that happened to your kernel, and
that's kinda handy. And it's not a new
00:16:53.890 --> 00:17:01.199
idea, so if you're using TCP dump it's
basically the same approach. They also
00:17:01.199 --> 00:17:04.832
have this kind of bytecode, so that's just
a little excursion here. This is called
00:17:04.832 --> 00:17:13.250
BPF, Berkeley Packet Filter, so it's not
an entirely new idea. So everything I
00:17:13.250 --> 00:17:19.470
showed you until now was "hey, I can look
when function calls happen". that's not
00:17:19.470 --> 00:17:22.519
very much information, so we're going to
increase the amount of information that we
00:17:22.519 --> 00:17:35.080
get out of the system with every example.
So let me look at the actual kernel. So I
00:17:35.080 --> 00:17:39.980
had to restart my machine, so my setup is
basically gone now. So let's look at this
00:17:39.980 --> 00:17:45.309
VM fault function. So this is, this is the
source code of the operating system that
00:17:45.309 --> 00:17:52.900
I'm running right now. This is FreeBSD
current 12 and the VM fault function;
00:17:52.900 --> 00:17:57.539
remember the mmap system call that I told
you? So the mmap system call
00:17:57.539 --> 00:18:03.899
I told you can bring, like map a file
into your address space. And it doesn't
00:18:03.899 --> 00:18:10.320
necessarily have to load the entire file,
so whenever we are touching a page in the
00:18:10.320 --> 00:18:15.780
system, like a memory page, this machine
is four kilobytes and it's no super pages
00:18:15.780 --> 00:18:21.429
here, so whenever it touches a piece of
memory that you didn't bring into memory
00:18:21.429 --> 00:18:25.309
yet, we're generating something that's
called a page fault, and then this
00:18:25.309 --> 00:18:31.180
function gets called. So here let's look
at the arguments, and I'm going to skip
00:18:31.180 --> 00:18:36.990
the zeroeth argument, to look at the first
argument. So this is the address that
00:18:36.990 --> 00:18:44.160
provoked that page fault, this is the
type and these are the flags and I'm going
00:18:44.160 --> 00:18:48.780
to show you something to make that a
little bit more readable. So what about
00:18:48.780 --> 00:18:58.960
this one? So you see it's a pointer and
this is a big structure, so we want
00:18:58.960 --> 00:19:09.961
to be able to look at that structure. And
just probably should do this here, so
00:19:09.961 --> 00:19:17.090
let's look at this VM fault script here.
So this is, make this a little bit more,
00:19:17.090 --> 00:19:20.950
so this is, don't pay too much attention
to this code, this this is basically just
00:19:20.950 --> 00:19:26.049
boilerplate to make make stuff readable
and this is where the actual action is
00:19:26.049 --> 00:19:31.690
happening. So this is, so what I'm doing
there is I'm instrumenting the VM
00:19:31.690 --> 00:19:36.350
fault function and whenever we enter it
then we're going to use some information
00:19:36.350 --> 00:19:40.720
that DTrace gives us for free. So this is
execname, this is the name of the
00:19:40.720 --> 00:19:45.909
currently running executable that provoked
the page fault, this is the process ID and
00:19:45.909 --> 00:19:53.250
here we have a bunch of argument
variables. So these arg1, arg2, arg3,
00:19:53.250 --> 00:19:57.964
that are essentially just integers, so
nothing too fancy there. But we wanna
00:19:57.964 --> 00:20:02.380
look, wanna be able to look at that
struct. And here I'm going to use this
00:20:02.380 --> 00:20:08.140
args array, and this args array
is kind of special, because it has typing
00:20:08.140 --> 00:20:15.870
information about the arguments. So when
you run that, so you're referencing that
00:20:15.870 --> 00:20:26.570
pointer there with the star, excuse me,
and let's just run that and maybe, that's
00:20:26.570 --> 00:20:32.899
a start yeah. So this is an in-kernel
data structure that we can now look
00:20:32.899 --> 00:20:40.010
at. So DTrace enabled us to look at in-
memory data structures as the system runs.
00:20:40.010 --> 00:20:44.330
And this is really really powerful.
In in the DTrace script I could use all
00:20:44.330 --> 00:20:50.490
these fields like I can manipulate this
args array, this value in there, just like
00:20:50.490 --> 00:20:57.010
just like every other variable; I
can pretty much work like I was in C. So
00:20:57.010 --> 00:21:02.659
how is it doing that? There is something
that's called CTF, that's not capture the
00:21:02.659 --> 00:21:10.120
flag, it's, this is the, the Compact C
Tracing Format, so you can see that but
00:21:10.120 --> 00:21:14.320
there is a man page in FreeBSD, and
there's a little segment in the kernel
00:21:14.320 --> 00:21:19.190
binary, where all this typing information
is stored. I don't know how that compares
00:21:19.190 --> 00:21:24.320
to modern DWARF but yeah this is what
DTrace is working with. So now you might
00:21:24.320 --> 00:21:28.549
ask yourself "Why on earth would I do
that? Why on earth would I look at virtual
00:21:28.549 --> 00:21:33.590
memory, because, yeah, um, this stuff is
safe isn't it? I mean there's no bugs in
00:21:33.590 --> 00:21:42.820
there." Except when they are. Anyone
remembers remembers "Dirty COW"? So this
00:21:42.820 --> 00:21:48.510
was a very nasty vulnerability in the
Linux kernel and that that was a problem
00:21:48.510 --> 00:21:52.399
in the virtual memory management. So it
allowed you to write to a file that you
00:21:52.399 --> 00:21:56.679
didn't own as a regular user. So you could
essentially just write to a binary that
00:21:56.679 --> 00:22:01.789
had "set UID" set. Very unpleasant, but
I'm not going to bash the Linux folks
00:22:01.789 --> 00:22:08.030
here, this is just, I just want to show
you these things are hard. And the first
00:22:08.030 --> 00:22:15.440
fix for this problem was in 2005 and then
it came back in 2016. So now that's fixed
00:22:15.440 --> 00:22:21.080
and then it came back with "Huge Dirty
COW" in 2017, so this is, I mean this
00:22:21.080 --> 00:22:27.580
was there for way over a decade.
These things are hard to debug. And this
00:22:27.580 --> 00:22:33.110
is what I like about these systems, so not
having, not having tools like DTrace to
00:22:33.110 --> 00:22:37.640
figure out what's going on inside of the
system somehow, to me, amounts to security
00:22:37.640 --> 00:22:42.360
by obscurity. And I've heard that some
people who are developing exploits for
00:22:42.360 --> 00:22:46.100
systems that have DTrace they say "Oh, I
really like developing exploits on these
00:22:46.100 --> 00:22:53.230
systems, because the tooling is so great!"
Yeah, but, to be honest this is cool,
00:22:53.230 --> 00:22:58.899
because an exploit is a proof of concept
and coming up with these exploits quickly
00:22:58.899 --> 00:23:03.440
is very usable, because you know what's
going on you can show "Hey, this is going
00:23:03.440 --> 00:23:07.279
wrong". I had situations, where
people were telling me "Oh, this is this
00:23:07.279 --> 00:23:11.020
is not a problem with our program, this is
this weird operating system that you're
00:23:11.020 --> 00:23:18.100
using. Like Solaris, weird operating
system." And, yeah, and then I churned out
00:23:18.100 --> 00:23:22.059
some DTrace scripts and "No, it's
actually your problem". "Oh, now I can see
00:23:22.059 --> 00:23:31.419
that on my Linux box!" Magic. So,
everything I showed you until now was
00:23:31.419 --> 00:23:38.179
very, very much related to function calls
and we want to have a little bit more
00:23:38.179 --> 00:23:44.720
semantics here, because you might want to
write a script that inspects protocols,
00:23:44.720 --> 00:23:48.760
stuff like TCP, UDP stuff like that. So,
you don't want to know which function
00:23:48.760 --> 00:23:54.320
inside of the kernel is responsible for
handling your TCP/IP stuff, so DTrace
00:23:54.320 --> 00:24:00.549
comes with something that's called static
providers and I'm just going to show the
00:24:00.549 --> 00:24:04.769
apropos here. So these are, so every
static provider has a main page which is
00:24:04.769 --> 00:24:10.950
kind of handy - documentation whoo - and
you can see there is an I/O provider if
00:24:10.950 --> 00:24:17.539
you are interested in looking at this guy:
Oh, IP for looking at IPv4 and IPv6,
00:24:17.539 --> 00:24:23.570
TCP... This one is pretty cool, it's about
scheduling behavior. So, "what does my
00:24:23.570 --> 00:24:29.010
scheduler do?" And if you look at that, you
can see some interesting stuff like length
00:24:29.010 --> 00:24:33.150
priority if you ever saw things like
priority inversion, stuff like that, now
00:24:33.150 --> 00:24:36.970
you can see that happen. I'm a nerd, I
find this interesting for some reason, I
00:24:36.970 --> 00:24:43.230
don't know. And it's also pretty
interesting to figure out what's going on,
00:24:43.230 --> 00:24:48.279
"why is this getting de-scheduled all the
time?" So, some interesting things going
00:24:48.279 --> 00:24:55.809
on there. So, I'm running a little bit
short on time here, but I just quickly
00:24:55.809 --> 00:24:59.340
want to show you something - this is all
kernel stuff right now - can we do that
00:24:59.340 --> 00:25:05.380
with userspace? Of course. So, there was
one provider that didn't show up when I
00:25:05.380 --> 00:25:09.590
had my provider listing, but was in the
DTrace script where I did this timing
00:25:09.590 --> 00:25:16.230
attack stuff. And that's called the PID
provider. And the PID provider generates
00:25:16.230 --> 00:25:21.080
probes on demand, because a process might
have a lot of probes and you will shortly
00:25:21.080 --> 00:25:25.190
see why and this is why I'm going to use a
very small program which is called "true",
00:25:25.190 --> 00:25:31.560
and true just exits with exit code zero.
So, nothing too exciting going on here,
00:25:31.560 --> 00:25:37.810
and this dollar target gets substituted
in, we get the process ID there. And this
00:25:37.810 --> 00:25:44.640
is everything that happens when I'm
executing this program you see this is a
00:25:44.640 --> 00:25:48.679
little bit more fine-grained than the FBT
provider, because now we can trace every
00:25:48.679 --> 00:25:53.520
single instruction inside of that
function, which is kind of a handy. It's a
00:25:53.520 --> 00:25:58.090
scriptable debugger. So, these numbers are
the instructional offsets inside of that
00:25:58.090 --> 00:26:03.360
function. We can also look at - so this is
everything in the true segment - we can
00:26:03.360 --> 00:26:09.899
also look at libraries that got linked in
and there's a lot of stuff happening in
00:26:09.899 --> 00:26:15.780
libc for example when you run true.
So, one last thing that I wanted to show
00:26:15.780 --> 00:26:22.340
you because it consumed a week of my life:
I'm using a lot of Haskell and the Mac OS
00:26:22.340 --> 00:26:29.419
people, they also have DTrace and they
have GHC Haskell DTrace support - so the
00:26:29.419 --> 00:26:38.380
Glasgow Haskell compiler - and glorious...
they have probes to analyze what's going
00:26:38.380 --> 00:26:41.620
on inside of the runtime system. So, I
thought "I want to have that, I have
00:26:41.620 --> 00:26:47.019
DTrace, why doesn't it work on FreeBSD?"
So, after a week of fighting with make
00:26:47.019 --> 00:26:55.100
files and linkers, that works: If you
check out the recent GHC repository and
00:26:55.100 --> 00:27:00.260
build it on FreeBSD, you get all the nice
stuff that I'm going to show you now. So,
00:27:00.260 --> 00:27:05.909
this is a very boring program - it just
starts 32 green threads and schedules them
00:27:05.909 --> 00:27:10.470
all over the place - and now I can do
something like this: phone rings I can
00:27:10.470 --> 00:27:13.934
ring a telephone.
laughter
00:27:13.934 --> 00:27:18.750
No, that would be
interesting... So, you can also use
00:27:18.750 --> 00:27:26.970
wildcards - and not as name of the probe -
and this is what's going on inside, like
00:27:26.970 --> 00:27:31.580
GC garbage collection and all this stuff.
Now you can look at this and write useful
00:27:31.580 --> 00:27:37.509
DTrace scripts that also take my runtime
system into account. So, stuff like that
00:27:37.509 --> 00:27:41.810
exists for I think Python - I'm not
entirely sure because I don't use it -
00:27:41.810 --> 00:27:49.120
nodejs same, Postgres - I used it but not
with DTrace right now - and what a find
00:27:49.120 --> 00:27:55.210
interesting: Firefox. When you run
JavaScript in your Firefox, it actually
00:27:55.210 --> 00:27:59.360
has a provider, so you can trace
JavaScript running in your browser with
00:27:59.360 --> 00:28:05.130
DTrace, so after everything I just showed
you, there might be some stuff going on
00:28:05.130 --> 00:28:10.700
there. So yeah, this is basically
everything I wanted to show you and I
00:28:10.700 --> 00:28:13.759
think I'm going to wrap out, because
otherwise we're not going to have a lot of
00:28:13.759 --> 00:28:19.001
time for questions and maybe you have
some. So yeah, thanks.
00:28:19.001 --> 00:28:29.610
applause
Herald: Thank you very much Raichoo. We
00:28:29.610 --> 00:28:34.257
are actually over time already, but we
have two more minutes because we started
00:28:34.257 --> 00:28:38.817
three minutes late, so if there are any
really quick questions, possibly from the
00:28:38.817 --> 00:28:43.030
internet... There is one, the signal angel
says, let's hear it.
00:28:43.030 --> 00:28:48.013
Question: Yeah, hi, okay. So, the question
is, "which changes are actually necessary
00:28:48.013 --> 00:28:51.809
to do in the kernel of an operating system
to support DTrace?"
00:28:51.809 --> 00:28:56.370
Answer: That's a lot of work. So, it's not
something like you do in a weekend. This
00:28:56.370 --> 00:29:03.062
is... So, the person who started the work
on FreeBSD has sadly passed away now, but
00:29:03.062 --> 00:29:09.559
I think they took a couple of years to
have everything in place, so you have to
00:29:09.559 --> 00:29:13.730
have stuff like the CTF thing that I
showed you, which is what OpenBSD is
00:29:13.730 --> 00:29:19.890
currently working on. And then you need
all those those magic gizmos, like kernel
00:29:19.890 --> 00:29:25.660
modules and stuff like that. So, it takes
a lot of time, but it's been ported to
00:29:25.660 --> 00:29:30.889
most operating systems that are available
and in use right now. So yeah, hope this
00:29:30.889 --> 00:29:34.239
answers the question.
Herald: Excellent and there are no more
00:29:34.239 --> 00:29:38.839
questions here in the room. I will thank
Raichoo and you can find him outside of
00:29:38.839 --> 00:29:46.590
the room and also on Twitter at "raichoo"
if you have any more further question.
00:29:46.590 --> 00:29:51.405
postroll music
00:29:51.405 --> 00:30:08.000
subtitles created by c3subtitles.de
in the year 2020. Join, and help us!