0:00:00.000,0:00:19.500
36C3 preroll music
0:00:19.500,0:00:26.220
Herald: So, our next talk is practical[br]cache attacks from the network. And the
0:00:26.220,0:00:33.960
speaker, Michael Kurth, is the person who[br]discovered the attack it’s the first
0:00:33.960,0:00:42.640
attack of its type. So he’s the first[br]author of the paper. And this talk is
0:00:42.640,0:00:47.470
going to be amazing! We’ve also been[br]promised a lot of bad cat puns, so I’m
0:00:47.470,0:00:52.750
going to hold you to that. A round of[br]applause for Michael Kurth!
0:00:52.750,0:00:58.690
applaus
0:00:58.690,0:01:03.800
Michael: Hey everyone and thank you so[br]much for making it to my talk tonight. My
0:01:03.800,0:01:08.780
name is Michael and I want to share with[br]you the research that I was able to
0:01:08.780,0:01:15.659
conduct at the amazing VUSec group during[br]my master thesis. Briefly to myself: So I
0:01:15.659,0:01:20.260
pursued my masthers degree in Computer[br]Science at ETH Zürich and could do my
0:01:20.260,0:01:27.869
Master’s thesis in Amsterdam. Nowadays, I[br]work as a security analyst at infoGuard.
0:01:27.869,0:01:33.450
So what you see here are the people that[br]actually made this research possible.
0:01:33.450,0:01:37.869
These are my supervisors and research[br]colleagues which supported me all the way
0:01:37.869,0:01:43.500
along and put so much time and effort in[br]the research. So these are the true
0:01:43.500,0:01:50.990
rockstars behind this research. So, but[br]let’s start with cache attacks. So, cache
0:01:50.990,0:01:56.850
attacks are previously known to be local[br]code execution attacks. So, for example,
0:01:56.850,0:02:03.679
in a cloud setting here on the left-hand[br]side, we have two VMs that basically share
0:02:03.679,0:02:10.270
the hardware. So they’re time-sharing the[br]CPU and the cache and therefore an
0:02:10.270,0:02:18.120
attacker that controlls VM2 can actually[br]attack VM1 via cache attack. Similarly,
0:02:18.120,0:02:23.100
JavaScript. So, a malicious JavaScript[br]gets served to your browser which then
0:02:23.100,0:02:28.030
executes it and because you share the[br]resource on your computer, it can also
0:02:28.030,0:02:33.330
attack other processes. Well, this[br]JavaScript thing gives you the feeling of
0:02:33.330,0:02:39.340
a remoteness, right? But still, it[br]requires this JavaScript to be executed on
0:02:39.340,0:02:46.170
your machine to be actually effective. So[br]we wanted to really push this further and
0:02:46.170,0:02:54.060
have a true network cache attack. We have[br]this basic setting where a client does SSH
0:02:54.060,0:03:00.790
to a server and we have a third machine[br]that is controlled by the attack. And as I
0:03:00.790,0:03:08.360
will show you today, we can break the[br]confidentiality of this SSH session from
0:03:08.360,0:03:13.269
the third machine without any malicious[br]software running either on the client or
0:03:13.269,0:03:20.540
the server. Furthermore, the CPU on the[br]server is not even involved in any of
0:03:20.540,0:03:25.390
these cache attacks. So it’s just there[br]and not even noticing that we actually
0:03:25.390,0:03:34.689
leak secrets. So, let’s look a bit more[br]closely. So, we have this nice cat doing
0:03:34.689,0:03:41.409
an SSH session to the server and everytime[br]the cat presses a key, one packet gets
0:03:41.409,0:03:49.700
send to the server. So this is always true[br]for interactive SSH sessions. Because, as
0:03:49.700,0:03:56.530
it’s said in the name, it gives you this[br]feeling of interactiveness. When we look a
0:03:56.530,0:04:01.459
bit more under the hood what’s happening[br]on the server, we see that these packages
0:04:01.459,0:04:06.950
are actually activating the Last Level[br]Cache. More to that also later into the
0:04:06.950,0:04:13.349
talk. Now, the attacker in the same time[br]launches a remote cache attack on the Last
0:04:13.349,0:04:19.340
Level Cache by just sending network[br]packets. And by this, we can actually leak
0:04:19.340,0:04:28.020
arrival times of individual SSH packets.[br]Now, you might ask yourself: “How would
0:04:28.020,0:04:36.800
arrival times of SSH packets break the[br]confidentiality of my SSH session?” Well,
0:04:36.800,0:04:43.210
humans have distinct typing patterns. And[br]here we see an example of a user typing
0:04:43.210,0:04:50.460
the word “because”. And you see that[br]typing e right after b is faster than for
0:04:50.460,0:04:56.870
example c after e. And this can be[br]generalised. And we can use this to launch
0:04:56.870,0:05:03.960
a statistical analysis. So here on the[br]orange dots, if we’re able to reconstruct
0:05:03.960,0:05:10.530
these arrival times correctly—and what[br]correctly means: we can reconstruct the
0:05:10.530,0:05:16.270
exact times of when the user was typing—,[br]we can then launch this statistical
0:05:16.270,0:05:22.690
analysis on the inter-arrival timings. And[br]therefore, we can leak what you were
0:05:22.690,0:05:29.809
typing in your private SSH session. Sounds[br]very scary and futuristic, but I will
0:05:29.809,0:05:36.580
demistify this during my talk. So,[br]alright! There is something I want to
0:05:36.580,0:05:42.730
bringt up right here at the beginning: As[br]per tradition and the ease of writing, you
0:05:42.730,0:05:48.180
give a name to your paper. And if you’re[br]following InfoSec twitter closely, you
0:05:48.180,0:05:53.930
probably already know what I’m talking[br]about. Because in our case, we named our
0:05:53.930,0:06:00.740
paper NetCAT. Well, of course, it was a[br]pun. In our case, NetCAT stands for
0:06:00.740,0:06:08.560
“Network Cache Attack,” and as it is with[br]humour, it can backfire sometime. And in
0:06:08.560,0:06:17.830
our case, it backfired massively. And with[br]that we caused like a small twitter drama
0:06:17.830,0:06:24.400
this September. One of the most-liked[br]tweets about this research was the one
0:06:24.400,0:06:32.889
from Jake. These talks are great, because[br]you can put the face to such tweets and
0:06:32.889,0:06:42.599
yes: I’m this idiot. So let’s fix this![br]Intel acknowledged us with a bounty and
0:06:42.599,0:06:48.720
also a CVE number, so from nowadays, we[br]can just refer it with the CVE number. Or
0:06:48.720,0:06:54.479
if that is inconvenient to you, during[br]that twitter drama, somebody sent us like
0:06:54.479,0:06:59.800
a nice little alternative name and also[br]including a logo which actually I quite
0:06:59.800,0:07:09.240
like. It’s called NeoCAT. Anyway, lessons[br]learned on that whole naming thing. And
0:07:09.240,0:07:15.250
so, let’s move on. Let’s get back to the[br]actual interesting bits and pieces of our
0:07:15.250,0:07:22.460
research! So, a quick outline: I’m firstly[br]going to talk about the background, so
0:07:22.460,0:07:28.240
general cache attacks. Then DDIO and RDMA[br]which are the key technologies that we
0:07:28.240,0:07:34.330
were abusing for our remote cache attack.[br]Then about the attack itself, how we
0:07:34.330,0:07:42.190
reverse-engineered DDIO, the End-to-End[br]attack, and, of course, a small demo. So,
0:07:42.190,0:07:47.050
cache attacks are all about observing a[br]microarchitectural state which should be
0:07:47.050,0:07:53.160
hidden from software. And we do this by[br]leveraging shared resources to leak
0:07:53.160,0:07:59.759
information. An analogy here is: Safe[br]cracking with a stethoscope, where the
0:07:59.759,0:08:06.300
shared resource is actually air that just[br]transmits the sound noises from the lock
0:08:06.300,0:08:11.990
on different inputs that you’re doing. And[br]actually works quite similarly in
0:08:11.990,0:08:21.949
computers. But here, it’s just the cache.[br]So, caches solve the problem that latency
0:08:21.949,0:08:28.389
of loads from memory are really bad,[br]right? Which make up roughly a quarter of
0:08:28.389,0:08:34.320
all instructions. And with caches, we can[br]reuse specific data and also use spatial
0:08:34.320,0:08:41.980
locality in programs. Modern CPUs have[br]usually this 3-layer cache hierarchy: L1,
0:08:41.980,0:08:47.041
which is split between data and[br]instruction cache. L2, and then L3, which
0:08:47.041,0:08:54.290
is shared amongst the cores. If data that[br]you access is already in the cache, that
0:08:54.290,0:08:58.780
results in a cache hit. And if it has to[br]be fetched from main memory, that’s
0:08:58.780,0:09:06.290
considered a cache miss. So, how do we[br]actually know now if a cache hits or
0:09:06.290,0:09:11.549
misses? Because we cannot actually read[br]data directly from the caches. We can do
0:09:11.549,0:09:15.700
this, for example, with prime and probe.[br]It’s a well-known technique that we
0:09:15.700,0:09:20.980
actually also used in the network setting.[br]So I want to quickly go through what’s
0:09:20.980,0:09:26.430
actually happening. So the first step of[br]prime+probe is that the hacker brings the
0:09:26.430,0:09:33.860
cache to a known state. Basically priming[br]the cache. So it fills it with its own
0:09:33.860,0:09:42.310
data and then the attacker waits until the[br]victim accesses it. The last step is then
0:09:42.310,0:09:49.040
probing which is basically doing priming[br]again, but this time just timing the
0:09:49.040,0:09:56.260
access times. So, fast access cache hits[br]are meaning that the cache was not touched
0:09:56.260,0:10:02.750
in-between. And cache misses results in,[br]that we known now, that the victim
0:10:02.750,0:10:10.270
actually accessed one of the cache lines[br]in the time between prime and probe. So
0:10:10.270,0:10:15.750
what can we do with these cache hits and[br]misses now? Well: We can analyse them! And
0:10:15.750,0:10:21.410
these timing information tell us a lot[br]about the behaviour of programs and users.
0:10:21.410,0:10:28.519
And based on cache hits and misses alone,[br]we can—or researchers were able to—leak
0:10:28.519,0:10:35.829
crypto keys, guess visited websites, or[br]leak memory content. That’s with SPECTRE
0:10:35.829,0:10:42.260
and MELTDOWN. So let’s see how we can[br]actually launch such an attack over the
0:10:42.260,0:10:50.550
network! So, one of the key technologies[br]is DDIO. But first, I want to talk to DMA,
0:10:50.550,0:10:55.420
because it’s like the predecessor to it.[br]So DMA is basically a technology that
0:10:55.420,0:11:02.010
allows your PCIe device, for example the[br]network card, to interact directly on
0:11:02.010,0:11:08.519
itself with main memory without the CPU[br]interrupt. So for example if a packet is
0:11:08.519,0:11:14.339
received, the PCIe device then just puts[br]it in main memory and then, when the
0:11:14.339,0:11:19.110
program or the application wants to work[br]on that data, then it can fetch from main
0:11:19.110,0:11:27.089
memory. Now with DDIO, this is a bit[br]different. With DDIO, the PCIe device can
0:11:27.089,0:11:33.110
directly put data into the Last Level[br]Cache. And that’s great, because now the
0:11:33.110,0:11:38.620
application, when working on the data,[br]just doesn’t have to go through the costly
0:11:38.620,0:11:43.910
main-memory walk and can just directly[br]work on the data from—or fetch it from—the
0:11:43.910,0:11:52.010
Last Level Cache. So DDIO stands for “Data[br]Direct I/O Technology,” and it’s enabled
0:11:52.010,0:11:58.560
on all Intel server-grade processors since[br]2012. It’s enabled by default and
0:11:58.560,0:12:04.069
transparent to drivers and operating[br]systems. So I guess, most people didn’t
0:12:04.069,0:12:09.279
even notice that something changed unter[br]the hood. And it changed somethings quite
0:12:09.279,0:12:17.100
drastically. But why is DDIO actually[br]needed? Well: It’s for performance
0:12:17.100,0:12:23.489
reasons. So here we have a nice study from[br]Intel, which shows on the bottom,
0:12:23.489,0:12:29.090
different times of NICs. So we have a[br]setting with 2 NICs, 4 NICs, 6, and 8
0:12:29.090,0:12:35.750
NICs. And you have the throughput for it.[br]And as you can see with the dark blue,
0:12:35.750,0:12:42.850
that without DDIO, it basically stops[br]scaling after having 4 NICs. With the
0:12:42.850,0:12:47.890
light-blue you then see that it still[br]scales up when you add more netowork cards
0:12:47.890,0:12:56.770
to it. So DDIO is specifically built to[br]scale network applications. The other
0:12:56.770,0:13:02.250
technology that we were abusing is RDMA.[br]So stands for “Remote Direct Memory
0:13:02.250,0:13:08.750
Access,” and it basically offloads[br]transport-layer tasks to silicon. It’s
0:13:08.750,0:13:15.390
basically a kernel bypass. And it’s also[br]no CPU involvement, so application can
0:13:15.390,0:13:23.520
access remote memory without consuming any[br]CPU time on the remote server. So I
0:13:23.520,0:13:28.329
brought here a little illustration to[br]showcase you the RDMA. So on the left we
0:13:28.329,0:13:34.230
have the initiator and on the right we[br]have the target server. A memory region
0:13:34.230,0:13:39.670
gets allocated on startup of the server[br]and from now on, applications can perform
0:13:39.670,0:13:44.490
data transfer without the involvement of[br]the network software stack. So you made
0:13:44.490,0:13:52.779
the TCP/IP stack completely. With one-[br]sided RDMA operations you even allow the
0:13:52.779,0:13:59.740
initiator to read and write to arbitrary[br]offsets within that allocated space on the
0:13:59.740,0:14:06.880
target. I quote here a statement of the[br]market leader of one of these high
0:14:06.880,0:14:12.900
performance snakes: “Moreover, the caches[br]of the remote CPU will not be filled with
0:14:12.900,0:14:20.639
the accessed memory content.” Well, that’s[br]not true anymore with DDIO and that’s
0:14:20.639,0:14:28.540
exactly what we attacked on. So you might[br]ask yourself, “where is this RDMA used,”
0:14:28.540,0:14:33.749
right? And I can tell you that RDMA is one[br]of these technologies that you don’t hear
0:14:33.749,0:14:38.780
often but are actually extensively used in[br]the backends of the big data centres and
0:14:38.780,0:14:45.509
cloud infrastructures. So you can get your[br]own RDMA-enabled infrastructures from
0:14:45.509,0:14:52.550
public clouds like Azure, Oracle Cloud,[br]Huawei, or AliBaba. Also file protocols
0:14:52.550,0:14:59.230
use SMB… like SMB and NFS can support[br]RDMA. And other applications are HIgh
0:14:59.230,0:15:07.320
Performance Computing, Big Data, Machine[br]Learning, Data Centres, Clouds, and so on.
0:15:07.320,0:15:12.810
But let’s get a bit into detail about the[br]research and how we abused the 2
0:15:12.810,0:15:19.339
technologies. So we know now that we have[br]a Shared Resource exposed to the network
0:15:19.339,0:15:26.291
via DDIO and RDMA gives us the necessary[br]Read and Write primitives to launch such a
0:15:26.291,0:15:34.310
cache attack over the network. But first,[br]we needed to clarify some things. Of
0:15:34.310,0:15:39.320
course, we did many experiments and[br]extensively tested the DDIO port to
0:15:39.320,0:15:44.630
understand the inner workings. But here, I[br]brought with me like 2 major questions
0:15:44.630,0:15:50.420
which we had to answer. So first of all[br]is, of course, can we distinguish a cache
0:15:50.420,0:15:57.860
hit or miss over the network? But we still[br]have network latency and packet queueing
0:15:57.860,0:16:04.020
and so on. So would it be possible to[br]actually get the timing right? Which is an
0:16:04.020,0:16:09.040
absolute must for launching a side-[br]channel. Well, the second question is
0:16:09.040,0:16:14.240
then: Can we actually access the full Last[br]Level Cache? This would correspond more to
0:16:14.240,0:16:20.589
the attack surface that we actually have[br]for attack. So the first question, we can
0:16:20.589,0:16:26.640
answer with this very simple experiment:[br]So we have on the left, a very small code
0:16:26.640,0:16:33.180
snippet. We have a timed RDMA read to a[br]certain offset. Then we write to that
0:16:33.180,0:16:41.850
offset and we read again from the offset.[br]So what you can see is that, when doing
0:16:41.850,0:16:46.040
this like 50 000 times over multiple[br]different offsets, you can clearly
0:16:46.040,0:16:52.000
distinguish the two distributions. So the[br]blue one corresponds to data that was
0:16:52.000,0:16:58.149
fetched from my memory and the orange one[br]to the data that was fetched from the Last
0:16:58.149,0:17:03.250
Level Cache over the network. You can also[br]see the effects of the network. For
0:17:03.250,0:17:09.820
example, you can see the long tails which[br]correspond to some packages that were
0:17:09.820,0:17:16.430
slowed down in the network or were queued.[br]So on a sidenote here for all the side-
0:17:16.430,0:17:23.280
channel experts: We really need that write,[br]because actually with DDIO reads do not
0:17:23.280,0:17:30.290
allocate anything in the Last Level Cache.[br]So basically, this is the building block
0:17:30.290,0:17:36.030
to launch a prime and probe attack over[br]the network. However, we still need to
0:17:36.030,0:17:40.500
have a target what we can actually[br]profile. So let’s see what kind of an
0:17:40.500,0:17:46.350
attack surface we actually have. Which[br]brings us to the question: Can we access
0:17:46.350,0:17:51.470
the full Last Level Cache? And[br]unfortunately, this is not the case. So
0:17:51.470,0:17:58.930
DDIO has this allocation limitation of two[br]ways. Here in the example out of 20 ways.
0:17:58.930,0:18:08.080
So roughly 10%. It’s not a dedicated way,[br]so still the CPU uses this. But we would
0:18:08.080,0:18:16.610
only have like access to 10% of the cache[br]activity of the CPU in the Last Level bit.
0:18:16.610,0:18:22.560
So that was not so well working for a[br]first attack. But the good news is that
0:18:22.560,0:18:31.760
other PCIe devices—let’s say a second[br]network card—will also use the same two
0:18:31.760,0:18:38.780
cache ways. And with that, we have 100%[br]visibility of what other PCIe devices are
0:18:38.780,0:18:48.690
doing in the cache. So let’s look at the[br]end-to-end attack! So as I told you
0:18:48.690,0:18:54.050
before, we have this basic setup of a[br]client and a server. And we have the
0:18:54.050,0:19:01.470
machine that is controlled by us, the[br]attackers. So the client just sends this
0:19:01.470,0:19:06.770
package over a normal ethernet NIC and[br]there is a second NIC attached to the
0:19:06.770,0:19:15.410
server which allows the attacker to launch[br]RDMA operations. So we also know now that
0:19:15.410,0:19:19.960
all the packets that… or all the[br]keystrokes that the user is typing are
0:19:19.960,0:19:25.540
sent in individual packets which are[br]activated in the Last Level Cache through
0:19:25.540,0:19:33.750
DDIO. But how can we actually now get[br]these arrival times of packets? Because
0:19:33.750,0:19:39.420
that’s what we are interested in! So now[br]we have to look a bit more closely to how
0:19:39.420,0:19:46.830
such arrival of network packages actually[br]work. So the IP stack has a ring buffer
0:19:46.830,0:19:52.960
which is basically there to have an[br]asynchronous operation between the
0:19:52.960,0:20:01.720
hardware—so the NIC—and the CPU. So if a[br]packet arrives, it will allocate this in
0:20:01.720,0:20:07.530
the first ring buffer position. On the[br]right-hand side you see the view of the
0:20:07.530,0:20:13.700
attacker which can just profile the cache[br]activity. And we see that the cache line
0:20:13.700,0:20:18.930
at position 1 lights up. So we see an[br]activity there. Could also be on cache
0:20:18.930,0:20:24.750
line 2, that’s … we don’t know on which[br]cache line this will actually pop up. But
0:20:24.750,0:20:29.200
what is important is: What happens with[br]the second packet? Because the second
0:20:29.200,0:20:35.380
packet will also light up a cache line,[br]but this time different. And it’s actually
0:20:35.380,0:20:41.760
the next cache line as from the previous[br]package. And if we do this for 3 and 4
0:20:41.760,0:20:51.310
packets, we can see that we suddenly have[br]this nice staircase pattern. So now we
0:20:51.310,0:20:56.940
have predictable pattern that we can[br]exploit to get information when packets
0:20:56.940,0:21:04.290
were received. And this is just because[br]the ring buffer is allocated in a way that
0:21:04.290,0:21:10.300
it doesn’t evict itself, right? It doesn’t[br]evict if packet 2 arrives. It doesn’t
0:21:10.300,0:21:16.660
evict the cache content of the packet 1.[br]Which is great for us as an attacker,
0:21:16.660,0:21:22.260
because we can profile it well. Well,[br]let’s look at the real-life example. So
0:21:22.260,0:21:28.010
this is the cache activity when the server[br]receives constant pings. You can see this
0:21:28.010,0:21:34.750
nice staircase pattern and you can also[br]see that the ring buffer reuses locations
0:21:34.750,0:21:40.650
as it is a circular buffer. Here, it is[br]important to know that the ring buffer
0:21:40.650,0:21:48.940
doesn’t hold the data content, just the[br]descriptor to the data. So this is reused.
0:21:48.940,0:21:55.520
Unfortunately when the user types over[br]SSH, the pattern is not as nice as this
0:21:55.520,0:22:00.000
one here. Because then we would already[br]have a done deal and just could work on
0:22:00.000,0:22:05.780
this. Because when a user types, you will[br]have more delays between packages.
0:22:05.780,0:22:11.470
Generally also you don’t know when the[br]user is typing, so you have to profile all
0:22:11.470,0:22:16.060
the time to get the timings right.[br]Therefore, we needed to build a bit more
0:22:16.060,0:22:23.880
of a sophisticated pipeline. So it[br]basically is a 2-stage pipeline which
0:22:23.880,0:22:31.520
consists of an online tracker that is just[br]looking at a bunch of cache lines that
0:22:31.520,0:22:37.990
he’s observing all the time. And when he[br]sees that certain cache lines were
0:22:37.990,0:22:44.300
activated, it moves that windows forward[br]the next position that he believes an
0:22:44.300,0:22:50.260
activation will have. The reason why is[br]that we have a speed advantage. So we need
0:22:50.260,0:22:57.090
to profile much faster than the network[br]packets of the SSH session are arriving.
0:22:57.090,0:23:00.710
And what you can see here one the left-[br]hand side is a visual output of what the
0:23:00.710,0:23:07.260
online tracker does. So it just profiles[br]this window which you can see in red. And
0:23:07.260,0:23:15.030
if you look very closely, you can see also[br]more lit-up in the middle which
0:23:15.030,0:23:19.690
corresponds to arrived network packets.[br]You can also see that there is plenty of
0:23:19.690,0:23:27.280
noise involved, so therefore we’re not[br]able just to directly get the packet
0:23:27.280,0:23:35.250
arrival times from it. That’s why we need[br]a second stage. The Offline Extractor. And
0:23:35.250,0:23:40.590
the offline extractor is in charge of[br]computing the most likeliest occurence of
0:23:40.590,0:23:46.010
client SSH network packet. It uses the[br]information from the online tracker and
0:23:46.010,0:23:52.451
the predictable pattern of the ring buffer[br]to do so. And then, it outputs the inter-
0:23:52.451,0:23:59.380
packet arrival times for different words[br]as shown here on the right. Great. So, now
0:23:59.380,0:24:04.900
we’re again at the point where we have[br]just packet arrival times but no words,
0:24:04.900,0:24:10.040
which we need for breaking the[br]confidentiality of your private SSH
0:24:10.040,0:24:19.260
session. So, as I told you before, users[br]or generally humans have distinctive
0:24:19.260,0:24:27.330
typing patterns. And with that, we were[br]able to launch a statistical attack. More
0:24:27.330,0:24:33.060
closely, we just do like a machine[br]learning of mapping between user typing
0:24:33.060,0:24:39.340
behaviour and actual words. So that in the[br]end, we can output the two words that you
0:24:39.340,0:24:48.090
were typing in your SSH session. So we[br]used 20 subjects that were typing free and
0:24:48.090,0:24:55.830
transcribed text which resulted in a total[br]of 4 574 unique words. And each
0:24:55.830,0:25:01.230
represented as a point in a multi-[br]dimensional space. And we used really
0:25:01.230,0:25:06.431
simple machine learning techniques like[br]the k-nearest neighbour’s algorithm which
0:25:06.431,0:25:11.960
is basically categorising the measurements[br]in terms of Euclidian space to other
0:25:11.960,0:25:17.550
words. The reason why we just used like a[br]very basic machine learning algorithm is
0:25:17.550,0:25:21.330
that we just wanted to prove that the[br]signal that we were extracting from the
0:25:21.330,0:25:26.590
remote cache is actually strong enough to[br]launch such an attack. So we didn’t want
0:25:26.590,0:25:32.910
to improve in general, like, these kind of[br]mapping between users and their typing
0:25:32.910,0:25:40.050
behaviour. So let’s look how this worked[br]out! So, firstly, on the left-hand side,
0:25:40.050,0:25:47.090
you see we used our classifier on raw[br]keyboard data. So means that we just used
0:25:47.090,0:25:52.880
the signal that was emitted during the[br]typing. So when they were typing on their
0:25:52.880,0:25:58.900
local keyboard. Which gives us perfect and[br]precise data timing. And we can see that
0:25:58.900,0:26:02.450
this is already quite challenging to[br]mount. So we have an accuracy of
0:26:02.450,0:26:09.500
roughly 35%. But looking at the top 10[br]accuracy which is basically: the attacker
0:26:09.500,0:26:15.580
can guess 10 words, and if the correct[br]word was among these 10 words, then that’s
0:26:15.580,0:26:22.930
considered to be accurate. And with the[br]top 10 guesses, we have an accuracy of
0:26:22.930,0:26:30.750
58%. That’s just on the raw keyboard data.[br]And then we used the same data and also
0:26:30.750,0:26:35.730
the same classifier on the remote signal.[br]And of course, this is less precise
0:26:35.730,0:26:43.840
because we have noise factors and we could[br]even add or miss out on keystrokes. And
0:26:43.840,0:26:54.610
the accuracy is roughly 11% less and the[br]top 10 accuracy is roughly 60%. So as we
0:26:54.610,0:27:00.851
used a very basic machine learning[br]algorithm, many subjects, and a relately
0:27:00.851,0:27:07.600
large word corpus, we believe that we can[br]showcase that the signal is strong enough
0:27:07.600,0:27:15.470
to launch such attacks. So of course, now[br]we want to see this whole thing working,
0:27:15.470,0:27:21.030
right? As I’m a bit nervous here on stage,[br]I’m not going to do a live demo because it
0:27:21.030,0:27:27.630
would involve me doing some typing which[br]probably would confuse myself and of
0:27:27.630,0:27:34.060
course also the machine-learning model.[br]Therefore, I brought a video with me. So
0:27:34.060,0:27:39.890
here on the right-hand side, you see the[br]victim. So it will shortly begin with
0:27:39.890,0:27:45.480
doing an SSH session. And then on the[br]left-hand side, you see the attacker. So
0:27:45.480,0:27:51.260
mainly on the bottom you see this online[br]tracker and on top you see the extractor
0:27:51.260,0:27:58.080
and hopefully the predicted words. So now[br]the victim starts this SSH session to
0:27:58.080,0:28:04.720
the server called “father.” And the[br]attacker, which is on the machine “son,”
0:28:04.720,0:28:10.590
launches now this attack. So you saw we[br]profiled the ring buffer location and now
0:28:10.590,0:28:19.790
the victim starts to type. And as this[br]pipeline takes a bit to process this words
0:28:19.790,0:28:24.350
and to predict the right thing, you will[br]shortly see, like slowly, the words
0:28:24.350,0:28:41.600
popping up in the correct—hopefully the[br]correct—order. And as you can see, we can
0:28:41.600,0:28:48.010
correctly guess the right words over the[br]network by just sending network package to
0:28:48.010,0:28:53.620
the same server. And with that, getting[br]out the crucial information of when such
0:28:53.620,0:29:05.450
SSH packets were arrived.[br]applause
0:29:05.450,0:29:10.330
So now you might ask yourself: How do you[br]mitigate against these things? Well,
0:29:10.330,0:29:16.860
luckily it’s just server-grade processors,[br]so no clients and so on. But then, from
0:29:16.860,0:29:22.960
our viewpoint, the only true mitigation at[br]the moment is to either disable DDIO or
0:29:22.960,0:29:30.260
don’t use RDMA. Both comes quite with the[br]performance impact. So DDIO, you will talk
0:29:30.260,0:29:37.130
roughly about 10-18% less performance,[br]depending, of course, on your application.
0:29:37.130,0:29:42.640
And if you decide just to don’t use RDMA,[br]you probably rewrite your whole
0:29:42.640,0:29:50.500
application. So, Intel on their publication[br]on Disclosure Day sounded a bit different
0:29:50.500,0:30:00.430
therefore. But read it for yourself! I[br]mean, the meaning “untrusted network” can,
0:30:00.430,0:30:10.250
I guess, be quite debatable. And yeah. But[br]it is what it is. So I’m very proud that
0:30:10.250,0:30:17.420
we got accepted at Security and Privacy[br]2020. Also, Intel acknowledged our
0:30:17.420,0:30:22.540
findings, public disclosure was in[br]September, and we also got a bug bounty
0:30:22.540,0:30:26.950
payment.[br]someone cheering in crowd
0:30:26.950,0:30:29.640
laughs[br]Increased peripheral performance has
0:30:29.640,0:30:36.550
forced Intel to place the Last Level Cache[br]on the fast I/O path in its processors.
0:30:36.550,0:30:43.250
And by this, it exposed even more shared[br]microarchitectural components which we
0:30:43.250,0:30:51.631
know by now have a direct security impact.[br]Our research is the first DDIO side-
0:30:51.631,0:30:55.730
channel vulnerability but we still believe[br]that we just scratched the surface with
0:30:55.730,0:31:03.320
it. Remember: There’s more PCIe devices[br]attached to them! So there could be
0:31:03.320,0:31:10.900
storage devices—so you could profile cache[br]activity of storage devices and so on!
0:31:10.900,0:31:20.419
There is even such things as GPUDirect[br]which gives you access to the GPU’s cache.
0:31:20.419,0:31:25.740
But that’s a whole other story. So, yeah.[br]I think there’s much more to discover on
0:31:25.740,0:31:33.090
that side and stay tuned with that! All is[br]left to say is a massive “thank you” to
0:31:33.090,0:31:38.480
you and, of course, to all the volunteers[br]here at the conference. Thank you!
0:31:38.480,0:31:46.970
applause
0:31:46.970,0:31:52.740
Herald: Thank you, Michael! We have time[br]for questions. So you can line up behind
0:31:52.740,0:31:58.220
the microphones. And I can see someone at[br]microphone 7!
0:31:58.220,0:32:02.720
Question: So, thank you for your talk! I[br]had a question about—when I’m working on a
0:32:02.720,0:32:08.920
remote machine using SSH, I’m usually not[br]typing nice words like you’ve shown, but
0:32:08.920,0:32:13.750
usually it’s weird bash things like dollar[br]signs, and dashes, and I don’t know. Have
0:32:13.750,0:32:18.120
you looked into that as well?[br]Michael: Well, I think … I mean, of
0:32:18.120,0:32:22.230
course: What we would’ve wanted to[br]showcase is that we could leak passwords,
0:32:22.230,0:32:27.720
right? If you would do “sudo” or[br]whatsoever. The thing with passwords is
0:32:27.720,0:32:35.620
that it’s kind of its own dynamic. So you[br]type key… passwords differently than you
0:32:35.620,0:32:40.470
type normal keywords. And then it gets a[br]bit difficult because when you want to do
0:32:40.470,0:32:45.870
a large study of how users would type[br]passwords, you either ask them for their
0:32:45.870,0:32:51.030
real password—which is not so ethical[br]anymore—or you train them different
0:32:51.030,0:32:57.600
passwords. And that’s also difficult[br]because they might adapt different style
0:32:57.600,0:33:03.180
of how they type these passwords than if[br]it were the real password. And of course,
0:33:03.180,0:33:09.580
the same would go for command line in[br]general and we just didn’t have, like, the
0:33:09.580,0:33:13.050
word corpus for it to launch such an[br]attack.
0:33:13.050,0:33:18.880
Herald: Thank you! Microphone 1![br]Q: Hi. Thanks for your talk! I’d like to
0:33:18.880,0:33:27.180
ask: the original SSH timing paper[br]attacks, is like 2001?
0:33:27.180,0:33:31.270
Michael: Yeah, exactly. Exactly![br]Q: And do you have some idea why there are
0:33:31.270,0:33:37.650
no circumventions on the side of SSH[br]clients to add some padding or some random
0:33:37.650,0:33:41.980
delays or something like that? Do you have[br]some idea why there’s nothing happening
0:33:41.980,0:33:46.260
there? Is it some technical reason or[br]what’s the deal?
0:33:46.260,0:33:52.752
Michael: So, we also were afraid that[br]between 2001 and nowadays, that they added
0:33:52.752,0:33:59.360
some kind of a delay or batching or[br]whatsoever. I’m not sure if it’s just a
0:33:59.360,0:34:04.580
tradeoff between the interactiveness of[br]your SSH session or if there’s, like, a
0:34:04.580,0:34:09.450
true reason behind it. But what I do know[br]is that it’s oftentimes quite difficult to
0:34:09.450,0:34:15.649
add, like these artifical packets in-[br]between. Because if it’s, like, not random
0:34:15.649,0:34:21.389
at all, you could even filter out, like,[br]additional packets that just get inserted
0:34:21.389,0:34:27.289
by the SSH. But other than that, I’m not[br]familiar with anything, why they didn’t
0:34:27.289,0:34:34.770
adapt, or why this wasn’t on their radar.[br]Herald: Thank you! Microphone 4.
0:34:34.770,0:34:42.389
Q: How much do you rely on the skill of[br]the typers? So I think of a user that has
0:34:42.389,0:34:49.220
to search each letter on the keyboard or[br]someone that is distracted while typing,
0:34:49.220,0:34:56.520
so not having a real pattern[br]behind the typing.
0:34:56.520,0:35:01.900
Michael: Oh, we’re actually absolutely[br]relying that the pattern is reducible. As
0:35:01.900,0:35:06.640
I said: We’re just using this very simple[br]machine learning algorithm that just looks
0:35:06.640,0:35:11.820
at the Euclidian distance of previous[br]words that you were typing and a new word
0:35:11.820,0:35:17.260
or the new arrival times that we were[br]observing. And so if that is completely
0:35:17.260,0:35:24.440
different, then the accuracy would drop.[br]Herald: Thank you! Microphone 8!
0:35:24.440,0:35:29.120
Q: As a follow-up to what was said before.[br]Wouldn’t this make it a targeted attack
0:35:29.120,0:35:33.220
since you would need to train the machine-[br]learning algorithm exactly for the person
0:35:33.220,0:35:40.340
that you want to extract the data from?[br]Michael: So, yeah. Our goal of the
0:35:40.340,0:35:47.410
research was not, like, to do next-level,[br]let’s say machine-learning type of
0:35:47.410,0:35:53.510
recognition of your typing behaviours. So[br]we actually used the information which
0:35:53.510,0:36:01.310
user was typing so to profile that[br]correctly. But still I think you could
0:36:01.310,0:36:06.540
maybe generalize. So there is other[br]research showing that you can categorize
0:36:06.540,0:36:12.740
users in different type of typers and if I[br]remember correctly, they came up that you
0:36:12.740,0:36:20.260
can categorize each person into, like, 7[br]different typing, let’s say, categories.
0:36:20.260,0:36:26.800
And I also know that some kind of online[br]trackers are using your typing behaviour
0:36:26.800,0:36:34.530
to re-identify you. So just to, like,[br]serve you personalized ads, and so on. But
0:36:34.530,0:36:41.400
still, I mean—we didn’t, like, want to go[br]into that depth of improving the state of
0:36:41.400,0:36:45.550
this whole thing.[br]Herald: Thank you! And we’ll take a
0:36:45.550,0:36:49.470
question from the Internet next![br]Signal angel: Did you ever try this with a
0:36:49.470,0:36:56.240
high-latency network like the Internet?[br]Michael: So of course, we rely on a—let’s
0:36:56.240,0:37:02.740
say—a constant latency. Because otherwise[br]it would basically screw up our timing
0:37:02.740,0:37:09.290
attack. So as we’re talking with RDMA,[br]which is usually in datacenters, we also
0:37:09.290,0:37:15.940
tested it in datacenter kind of[br]topologies. It would make it, I guess,
0:37:15.940,0:37:20.620
quite hard, which means that you would[br]have to do a lot of repetition which is
0:37:20.620,0:37:25.510
actually bad because you cannot tell the[br]users “please retype what you just did
0:37:25.510,0:37:32.730
because I have to profile it again,”[br]right? So yeah, the answer is: No.
0:37:32.730,0:37:39.520
Herald: Thank you! Mic 1, please.[br]Q: If the victim pastes something into the
0:37:39.520,0:37:44.760
SSH session. Would you be able to carry[br]out the attacks successfully?
0:37:44.760,0:37:51.200
Michael: No. This is … so if you paste[br]stuff, this is just sent out as a badge
0:37:51.200,0:37:54.310
when you enter.[br]Q: OK, thanks!
0:37:54.310,0:37:59.920
Herald: Thank you! The angels tell me[br]there is a person behind mic 6 whom I’m
0:37:59.920,0:38:03.020
completely unable to see[br]because of all the lights.
0:38:03.020,0:38:08.410
Q: So as far as I understood, the attacker[br]can only see that some package arrived on
0:38:08.410,0:38:13.490
their NIC. So if there’s a second SSH[br]session running simultaneously on the
0:38:13.490,0:38:18.210
machine under attack, would this[br]already interfere with this attack?
0:38:18.210,0:38:23.910
Michael: Yeah, absolutely! So even[br]distinguishing SSH packets from normal
0:38:23.910,0:38:31.840
network packages is challenging. So we use[br]kind of a heuristic here because the thing
0:38:31.840,0:38:37.505
with SSH is that it always sends two[br]packets right after. So not only 1, just
0:38:37.505,0:38:43.800
2. But I ommited this part because of[br]simplicity of this talk. But we also rely
0:38:43.800,0:38:48.990
on these kind of heuristics to even filter[br]out SSH packets. And if you would have a
0:38:48.990,0:38:54.850
second SSH session, I can imagine that[br]this would completely… so we cannot
0:38:54.850,0:39:05.140
distinguish which SSH session it was.[br]Herald: Thank you. Mic 7 again!
0:39:05.140,0:39:11.760
Q: You always said you were using two[br]connectors, like—what was it called? NICs?
0:39:11.760,0:39:15.970
Michael: Yes, exactly.[br]Q: Is it has to be two different ones? Can
0:39:15.970,0:39:21.210
it be the same? Or how does it work?[br]Michael: So in our setting we used one NIC
0:39:21.210,0:39:27.461
that has the capability of doing RDMA. So[br]in our case, this was Fabric, so
0:39:27.461,0:39:31.950
InfiniBand. And the other was just like a[br]normal Ethernet connection.
0:39:31.950,0:39:36.910
Q: But could it be the same or could it be[br]both over InfiniBand, for example?
0:39:36.910,0:39:43.400
Michael: Yes, I mean … the thing with[br]InfiniBand: It doesn’t use the ring buffer
0:39:43.400,0:39:49.720
so we would have to come up with a[br]different kind of tracking ability to get
0:39:49.720,0:39:54.020
this. Which could even get a bit more[br]complicated because it does this kernel
0:39:54.020,0:39:58.730
bypass. But if there’s a predictable[br]pattern, we could potentially also do
0:39:58.730,0:40:03.730
this.[br]Herald: Thank you. Mic 1?
0:40:03.730,0:40:08.840
Q: Hello again! I would like to ask, I[br]know it was not the main focus of your
0:40:08.840,0:40:13.710
study, but do you have some estimation how[br]practical this can be, this timing attack?
0:40:13.710,0:40:20.050
Like, if you do, like, real-world[br]simulation, not the, like, prepared one?
0:40:20.050,0:40:23.190
How big a problem can it really be?[br]What would you think, like, what’s
0:40:23.190,0:40:27.170
the state-of-the-art in this field? How[br]do you feel the risk?
0:40:27.170,0:40:30.300
Michael: You’re just referring to the[br]typing attack, right?
0:40:30.300,0:40:34.330
Q: Timing attack. SSH timing. Not[br]necessarily the cache version.
0:40:34.330,0:40:40.500
Michael: So, the original research that[br]was conducted is out there since 2001. And
0:40:40.500,0:40:45.900
since then, many researchers have showed[br]that it’s possible to launch such typing
0:40:45.900,0:40:52.180
attacks over different scenarios, for[br]example JavaScript is another one. It’s
0:40:52.180,0:40:56.820
always a bit difficult to judge because[br]most of the researcher are using different
0:40:56.820,0:41:03.340
datasets so it’s different to compare. But[br]I think in general, I mean, we have used,
0:41:03.340,0:41:09.400
like, quite a large word corpus and it[br]still worked. Not super-precisely, but it
0:41:09.400,0:41:15.910
still worked. So yeah, I do believe it’s[br]possible. But to even make it a real-world
0:41:15.910,0:41:21.210
attack where an attacker wants to have[br]high accuracy, he probably would need a
0:41:21.210,0:41:25.950
lot of data and even, like, more[br]sophisticated techniques. Which there are.
0:41:25.950,0:41:29.970
So there are a couple other of machine-[br]learning techniques that you could use
0:41:29.970,0:41:34.180
which have their pros and cons.[br]Q: Thanks.
0:41:34.180,0:41:39.750
Herald: Thank you! Ladies and[br]Gentlemen—the man who named an attack
0:41:39.750,0:41:44.737
netCAT: Michael Kurth! Give him[br]a round of applause, please!
0:41:44.737,0:41:58.042
applause[br]Michael: Thanks a lot!
0:41:57.048,0:42:01.400
36C3 postscroll music
0:42:01.400,0:42:16.000
Subtitles created by c3subtitles.de[br]in the year 2020. Join, and help us!