1
00:00:00,000 --> 00:00:19,500
36C3 preroll music
2
00:00:19,500 --> 00:00:26,220
Herald: So, our next talk is practical
cache attacks from the network. And the
3
00:00:26,220 --> 00:00:33,960
speaker, Michael Kurth, is the person who
discovered the attack it’s the first
4
00:00:33,960 --> 00:00:42,640
attack of its type. So he’s the first
author of the paper. And this talk is
5
00:00:42,640 --> 00:00:47,470
going to be amazing! We’ve also been
promised a lot of bad cat puns, so I’m
6
00:00:47,470 --> 00:00:52,750
going to hold you to that. A round of
applause for Michael Kurth!
7
00:00:52,750 --> 00:00:58,690
applaus
8
00:00:58,690 --> 00:01:03,800
Michael: Hey everyone and thank you so
much for making it to my talk tonight. My
9
00:01:03,800 --> 00:01:08,780
name is Michael and I want to share with
you the research that I was able to
10
00:01:08,780 --> 00:01:15,659
conduct at the amazing VUSec group during
my master thesis. Briefly to myself: So I
11
00:01:15,659 --> 00:01:20,260
pursued my masthers degree in Computer
Science at ETH Zürich and could do my
12
00:01:20,260 --> 00:01:27,869
Master’s thesis in Amsterdam. Nowadays, I
work as a security analyst at infoGuard.
13
00:01:27,869 --> 00:01:33,450
So what you see here are the people that
actually made this research possible.
14
00:01:33,450 --> 00:01:37,869
These are my supervisors and research
colleagues which supported me all the way
15
00:01:37,869 --> 00:01:43,500
along and put so much time and effort in
the research. So these are the true
16
00:01:43,500 --> 00:01:50,990
rockstars behind this research. So, but
let’s start with cache attacks. So, cache
17
00:01:50,990 --> 00:01:56,850
attacks are previously known to be local
code execution attacks. So, for example,
18
00:01:56,850 --> 00:02:03,679
in a cloud setting here on the left-hand
side, we have two VMs that basically share
19
00:02:03,679 --> 00:02:10,270
the hardware. So they’re time-sharing the
CPU and the cache and therefore an
20
00:02:10,270 --> 00:02:18,120
attacker that controlls VM2 can actually
attack VM1 via cache attack. Similarly,
21
00:02:18,120 --> 00:02:23,100
JavaScript. So, a malicious JavaScript
gets served to your browser which then
22
00:02:23,100 --> 00:02:28,030
executes it and because you share the
resource on your computer, it can also
23
00:02:28,030 --> 00:02:33,330
attack other processes. Well, this
JavaScript thing gives you the feeling of
24
00:02:33,330 --> 00:02:39,340
a remoteness, right? But still, it
requires this JavaScript to be executed on
25
00:02:39,340 --> 00:02:46,170
your machine to be actually effective. So
we wanted to really push this further and
26
00:02:46,170 --> 00:02:54,060
have a true network cache attack. We have
this basic setting where a client does SSH
27
00:02:54,060 --> 00:03:00,790
to a server and we have a third machine
that is controlled by the attack. And as I
28
00:03:00,790 --> 00:03:08,360
will show you today, we can break the
confidentiality of this SSH session from
29
00:03:08,360 --> 00:03:13,269
the third machine without any malicious
software running either on the client or
30
00:03:13,269 --> 00:03:20,540
the server. Furthermore, the CPU on the
server is not even involved in any of
31
00:03:20,540 --> 00:03:25,390
these cache attacks. So it’s just there
and not even noticing that we actually
32
00:03:25,390 --> 00:03:34,689
leak secrets. So, let’s look a bit more
closely. So, we have this nice cat doing
33
00:03:34,689 --> 00:03:41,409
an SSH session to the server and everytime
the cat presses a key, one packet gets
34
00:03:41,409 --> 00:03:49,700
send to the server. So this is always true
for interactive SSH sessions. Because, as
35
00:03:49,700 --> 00:03:56,530
it’s said in the name, it gives you this
feeling of interactiveness. When we look a
36
00:03:56,530 --> 00:04:01,459
bit more under the hood what’s happening
on the server, we see that these packages
37
00:04:01,459 --> 00:04:06,950
are actually activating the Last Level
Cache. More to that also later into the
38
00:04:06,950 --> 00:04:13,349
talk. Now, the attacker in the same time
launches a remote cache attack on the Last
39
00:04:13,349 --> 00:04:19,340
Level Cache by just sending network
packets. And by this, we can actually leak
40
00:04:19,340 --> 00:04:28,020
arrival times of individual SSH packets.
Now, you might ask yourself: “How would
41
00:04:28,020 --> 00:04:36,800
arrival times of SSH packets break the
confidentiality of my SSH session?” Well,
42
00:04:36,800 --> 00:04:43,210
humans have distinct typing patterns. And
here we see an example of a user typing
43
00:04:43,210 --> 00:04:50,460
the word “because”. And you see that
typing e right after b is faster than for
44
00:04:50,460 --> 00:04:56,870
example c after e. And this can be
generalised. And we can use this to launch
45
00:04:56,870 --> 00:05:03,960
a statistical analysis. So here on the
orange dots, if we’re able to reconstruct
46
00:05:03,960 --> 00:05:10,530
these arrival times correctly—and what
correctly means: we can reconstruct the
47
00:05:10,530 --> 00:05:16,270
exact times of when the user was typing—,
we can then launch this statistical
48
00:05:16,270 --> 00:05:22,690
analysis on the inter-arrival timings. And
therefore, we can leak what you were
49
00:05:22,690 --> 00:05:29,809
typing in your private SSH session. Sounds
very scary and futuristic, but I will
50
00:05:29,809 --> 00:05:36,580
demistify this during my talk. So,
alright! There is something I want to
51
00:05:36,580 --> 00:05:42,730
bringt up right here at the beginning: As
per tradition and the ease of writing, you
52
00:05:42,730 --> 00:05:48,180
give a name to your paper. And if you’re
following InfoSec twitter closely, you
53
00:05:48,180 --> 00:05:53,930
probably already know what I’m talking
about. Because in our case, we named our
54
00:05:53,930 --> 00:06:00,740
paper NetCAT. Well, of course, it was a
pun. In our case, NetCAT stands for
55
00:06:00,740 --> 00:06:08,560
“Network Cache Attack,” and as it is with
humour, it can backfire sometime. And in
56
00:06:08,560 --> 00:06:17,830
our case, it backfired massively. And with
that we caused like a small twitter drama
57
00:06:17,830 --> 00:06:24,400
this September. One of the most-liked
tweets about this research was the one
58
00:06:24,400 --> 00:06:32,889
from Jake. These talks are great, because
you can put the face to such tweets and
59
00:06:32,889 --> 00:06:42,599
yes: I’m this idiot. So let’s fix this!
Intel acknowledged us with a bounty and
60
00:06:42,599 --> 00:06:48,720
also a CVE number, so from nowadays, we
can just refer it with the CVE number. Or
61
00:06:48,720 --> 00:06:54,479
if that is inconvenient to you, during
that twitter drama, somebody sent us like
62
00:06:54,479 --> 00:06:59,800
a nice little alternative name and also
including a logo which actually I quite
63
00:06:59,800 --> 00:07:09,240
like. It’s called NeoCAT. Anyway, lessons
learned on that whole naming thing. And
64
00:07:09,240 --> 00:07:15,250
so, let’s move on. Let’s get back to the
actual interesting bits and pieces of our
65
00:07:15,250 --> 00:07:22,460
research! So, a quick outline: I’m firstly
going to talk about the background, so
66
00:07:22,460 --> 00:07:28,240
general cache attacks. Then DDIO and RDMA
which are the key technologies that we
67
00:07:28,240 --> 00:07:34,330
were abusing for our remote cache attack.
Then about the attack itself, how we
68
00:07:34,330 --> 00:07:42,190
reverse-engineered DDIO, the End-to-End
attack, and, of course, a small demo. So,
69
00:07:42,190 --> 00:07:47,050
cache attacks are all about observing a
microarchitectural state which should be
70
00:07:47,050 --> 00:07:53,160
hidden from software. And we do this by
leveraging shared resources to leak
71
00:07:53,160 --> 00:07:59,759
information. An analogy here is: Safe
cracking with a stethoscope, where the
72
00:07:59,759 --> 00:08:06,300
shared resource is actually air that just
transmits the sound noises from the lock
73
00:08:06,300 --> 00:08:11,990
on different inputs that you’re doing. And
actually works quite similarly in
74
00:08:11,990 --> 00:08:21,949
computers. But here, it’s just the cache.
So, caches solve the problem that latency
75
00:08:21,949 --> 00:08:28,389
of loads from memory are really bad,
right? Which make up roughly a quarter of
76
00:08:28,389 --> 00:08:34,320
all instructions. And with caches, we can
reuse specific data and also use spatial
77
00:08:34,320 --> 00:08:41,980
locality in programs. Modern CPUs have
usually this 3-layer cache hierarchy: L1,
78
00:08:41,980 --> 00:08:47,041
which is split between data and
instruction cache. L2, and then L3, which
79
00:08:47,041 --> 00:08:54,290
is shared amongst the cores. If data that
you access is already in the cache, that
80
00:08:54,290 --> 00:08:58,780
results in a cache hit. And if it has to
be fetched from main memory, that’s
81
00:08:58,780 --> 00:09:06,290
considered a cache miss. So, how do we
actually know now if a cache hits or
82
00:09:06,290 --> 00:09:11,549
misses? Because we cannot actually read
data directly from the caches. We can do
83
00:09:11,549 --> 00:09:15,700
this, for example, with prime and probe.
It’s a well-known technique that we
84
00:09:15,700 --> 00:09:20,980
actually also used in the network setting.
So I want to quickly go through what’s
85
00:09:20,980 --> 00:09:26,430
actually happening. So the first step of
prime+probe is that the hacker brings the
86
00:09:26,430 --> 00:09:33,860
cache to a known state. Basically priming
the cache. So it fills it with its own
87
00:09:33,860 --> 00:09:42,310
data and then the attacker waits until the
victim accesses it. The last step is then
88
00:09:42,310 --> 00:09:49,040
probing which is basically doing priming
again, but this time just timing the
89
00:09:49,040 --> 00:09:56,260
access times. So, fast access cache hits
are meaning that the cache was not touched
90
00:09:56,260 --> 00:10:02,750
in-between. And cache misses results in,
that we known now, that the victim
91
00:10:02,750 --> 00:10:10,270
actually accessed one of the cache lines
in the time between prime and probe. So
92
00:10:10,270 --> 00:10:15,750
what can we do with these cache hits and
misses now? Well: We can analyse them! And
93
00:10:15,750 --> 00:10:21,410
these timing information tell us a lot
about the behaviour of programs and users.
94
00:10:21,410 --> 00:10:28,519
And based on cache hits and misses alone,
we can—or researchers were able to—leak
95
00:10:28,519 --> 00:10:35,829
crypto keys, guess visited websites, or
leak memory content. That’s with SPECTRE
96
00:10:35,829 --> 00:10:42,260
and MELTDOWN. So let’s see how we can
actually launch such an attack over the
97
00:10:42,260 --> 00:10:50,550
network! So, one of the key technologies
is DDIO. But first, I want to talk to DMA,
98
00:10:50,550 --> 00:10:55,420
because it’s like the predecessor to it.
So DMA is basically a technology that
99
00:10:55,420 --> 00:11:02,010
allows your PCIe device, for example the
network card, to interact directly on
100
00:11:02,010 --> 00:11:08,519
itself with main memory without the CPU
interrupt. So for example if a packet is
101
00:11:08,519 --> 00:11:14,339
received, the PCIe device then just puts
it in main memory and then, when the
102
00:11:14,339 --> 00:11:19,110
program or the application wants to work
on that data, then it can fetch from main
103
00:11:19,110 --> 00:11:27,089
memory. Now with DDIO, this is a bit
different. With DDIO, the PCIe device can
104
00:11:27,089 --> 00:11:33,110
directly put data into the Last Level
Cache. And that’s great, because now the
105
00:11:33,110 --> 00:11:38,620
application, when working on the data,
just doesn’t have to go through the costly
106
00:11:38,620 --> 00:11:43,910
main-memory walk and can just directly
work on the data from—or fetch it from—the
107
00:11:43,910 --> 00:11:52,010
Last Level Cache. So DDIO stands for “Data
Direct I/O Technology,” and it’s enabled
108
00:11:52,010 --> 00:11:58,560
on all Intel server-grade processors since
2012. It’s enabled by default and
109
00:11:58,560 --> 00:12:04,069
transparent to drivers and operating
systems. So I guess, most people didn’t
110
00:12:04,069 --> 00:12:09,279
even notice that something changed unter
the hood. And it changed somethings quite
111
00:12:09,279 --> 00:12:17,100
drastically. But why is DDIO actually
needed? Well: It’s for performance
112
00:12:17,100 --> 00:12:23,489
reasons. So here we have a nice study from
Intel, which shows on the bottom,
113
00:12:23,489 --> 00:12:29,090
different times of NICs. So we have a
setting with 2 NICs, 4 NICs, 6, and 8
114
00:12:29,090 --> 00:12:35,750
NICs. And you have the throughput for it.
And as you can see with the dark blue,
115
00:12:35,750 --> 00:12:42,850
that without DDIO, it basically stops
scaling after having 4 NICs. With the
116
00:12:42,850 --> 00:12:47,890
light-blue you then see that it still
scales up when you add more netowork cards
117
00:12:47,890 --> 00:12:56,770
to it. So DDIO is specifically built to
scale network applications. The other
118
00:12:56,770 --> 00:13:02,250
technology that we were abusing is RDMA.
So stands for “Remote Direct Memory
119
00:13:02,250 --> 00:13:08,750
Access,” and it basically offloads
transport-layer tasks to silicon. It’s
120
00:13:08,750 --> 00:13:15,390
basically a kernel bypass. And it’s also
no CPU involvement, so application can
121
00:13:15,390 --> 00:13:23,520
access remote memory without consuming any
CPU time on the remote server. So I
122
00:13:23,520 --> 00:13:28,329
brought here a little illustration to
showcase you the RDMA. So on the left we
123
00:13:28,329 --> 00:13:34,230
have the initiator and on the right we
have the target server. A memory region
124
00:13:34,230 --> 00:13:39,670
gets allocated on startup of the server
and from now on, applications can perform
125
00:13:39,670 --> 00:13:44,490
data transfer without the involvement of
the network software stack. So you made
126
00:13:44,490 --> 00:13:52,779
the TCP/IP stack completely. With one-
sided RDMA operations you even allow the
127
00:13:52,779 --> 00:13:59,740
initiator to read and write to arbitrary
offsets within that allocated space on the
128
00:13:59,740 --> 00:14:06,880
target. I quote here a statement of the
market leader of one of these high
129
00:14:06,880 --> 00:14:12,900
performance snakes: “Moreover, the caches
of the remote CPU will not be filled with
130
00:14:12,900 --> 00:14:20,639
the accessed memory content.” Well, that’s
not true anymore with DDIO and that’s
131
00:14:20,639 --> 00:14:28,540
exactly what we attacked on. So you might
ask yourself, “where is this RDMA used,”
132
00:14:28,540 --> 00:14:33,749
right? And I can tell you that RDMA is one
of these technologies that you don’t hear
133
00:14:33,749 --> 00:14:38,780
often but are actually extensively used in
the backends of the big data centres and
134
00:14:38,780 --> 00:14:45,509
cloud infrastructures. So you can get your
own RDMA-enabled infrastructures from
135
00:14:45,509 --> 00:14:52,550
public clouds like Azure, Oracle Cloud,
Huawei, or AliBaba. Also file protocols
136
00:14:52,550 --> 00:14:59,230
use SMB… like SMB and NFS can support
RDMA. And other applications are HIgh
137
00:14:59,230 --> 00:15:07,320
Performance Computing, Big Data, Machine
Learning, Data Centres, Clouds, and so on.
138
00:15:07,320 --> 00:15:12,810
But let’s get a bit into detail about the
research and how we abused the 2
139
00:15:12,810 --> 00:15:19,339
technologies. So we know now that we have
a Shared Resource exposed to the network
140
00:15:19,339 --> 00:15:26,291
via DDIO and RDMA gives us the necessary
Read and Write primitives to launch such a
141
00:15:26,291 --> 00:15:34,310
cache attack over the network. But first,
we needed to clarify some things. Of
142
00:15:34,310 --> 00:15:39,320
course, we did many experiments and
extensively tested the DDIO port to
143
00:15:39,320 --> 00:15:44,630
understand the inner workings. But here, I
brought with me like 2 major questions
144
00:15:44,630 --> 00:15:50,420
which we had to answer. So first of all
is, of course, can we distinguish a cache
145
00:15:50,420 --> 00:15:57,860
hit or miss over the network? But we still
have network latency and packet queueing
146
00:15:57,860 --> 00:16:04,020
and so on. So would it be possible to
actually get the timing right? Which is an
147
00:16:04,020 --> 00:16:09,040
absolute must for launching a side-
channel. Well, the second question is
148
00:16:09,040 --> 00:16:14,240
then: Can we actually access the full Last
Level Cache? This would correspond more to
149
00:16:14,240 --> 00:16:20,589
the attack surface that we actually have
for attack. So the first question, we can
150
00:16:20,589 --> 00:16:26,640
answer with this very simple experiment:
So we have on the left, a very small code
151
00:16:26,640 --> 00:16:33,180
snippet. We have a timed RDMA read to a
certain offset. Then we write to that
152
00:16:33,180 --> 00:16:41,850
offset and we read again from the offset.
So what you can see is that, when doing
153
00:16:41,850 --> 00:16:46,040
this like 50 000 times over multiple
different offsets, you can clearly
154
00:16:46,040 --> 00:16:52,000
distinguish the two distributions. So the
blue one corresponds to data that was
155
00:16:52,000 --> 00:16:58,149
fetched from my memory and the orange one
to the data that was fetched from the Last
156
00:16:58,149 --> 00:17:03,250
Level Cache over the network. You can also
see the effects of the network. For
157
00:17:03,250 --> 00:17:09,820
example, you can see the long tails which
correspond to some packages that were
158
00:17:09,820 --> 00:17:16,430
slowed down in the network or were queued.
So on a sidenote here for all the side-
159
00:17:16,430 --> 00:17:23,280
channel experts: We really need that write,
because actually with DDIO reads do not
160
00:17:23,280 --> 00:17:30,290
allocate anything in the Last Level Cache.
So basically, this is the building block
161
00:17:30,290 --> 00:17:36,030
to launch a prime and probe attack over
the network. However, we still need to
162
00:17:36,030 --> 00:17:40,500
have a target what we can actually
profile. So let’s see what kind of an
163
00:17:40,500 --> 00:17:46,350
attack surface we actually have. Which
brings us to the question: Can we access
164
00:17:46,350 --> 00:17:51,470
the full Last Level Cache? And
unfortunately, this is not the case. So
165
00:17:51,470 --> 00:17:58,930
DDIO has this allocation limitation of two
ways. Here in the example out of 20 ways.
166
00:17:58,930 --> 00:18:08,080
So roughly 10%. It’s not a dedicated way,
so still the CPU uses this. But we would
167
00:18:08,080 --> 00:18:16,610
only have like access to 10% of the cache
activity of the CPU in the Last Level bit.
168
00:18:16,610 --> 00:18:22,560
So that was not so well working for a
first attack. But the good news is that
169
00:18:22,560 --> 00:18:31,760
other PCIe devices—let’s say a second
network card—will also use the same two
170
00:18:31,760 --> 00:18:38,780
cache ways. And with that, we have 100%
visibility of what other PCIe devices are
171
00:18:38,780 --> 00:18:48,690
doing in the cache. So let’s look at the
end-to-end attack! So as I told you
172
00:18:48,690 --> 00:18:54,050
before, we have this basic setup of a
client and a server. And we have the
173
00:18:54,050 --> 00:19:01,470
machine that is controlled by us, the
attackers. So the client just sends this
174
00:19:01,470 --> 00:19:06,770
package over a normal ethernet NIC and
there is a second NIC attached to the
175
00:19:06,770 --> 00:19:15,410
server which allows the attacker to launch
RDMA operations. So we also know now that
176
00:19:15,410 --> 00:19:19,960
all the packets that… or all the
keystrokes that the user is typing are
177
00:19:19,960 --> 00:19:25,540
sent in individual packets which are
activated in the Last Level Cache through
178
00:19:25,540 --> 00:19:33,750
DDIO. But how can we actually now get
these arrival times of packets? Because
179
00:19:33,750 --> 00:19:39,420
that’s what we are interested in! So now
we have to look a bit more closely to how
180
00:19:39,420 --> 00:19:46,830
such arrival of network packages actually
work. So the IP stack has a ring buffer
181
00:19:46,830 --> 00:19:52,960
which is basically there to have an
asynchronous operation between the
182
00:19:52,960 --> 00:20:01,720
hardware—so the NIC—and the CPU. So if a
packet arrives, it will allocate this in
183
00:20:01,720 --> 00:20:07,530
the first ring buffer position. On the
right-hand side you see the view of the
184
00:20:07,530 --> 00:20:13,700
attacker which can just profile the cache
activity. And we see that the cache line
185
00:20:13,700 --> 00:20:18,930
at position 1 lights up. So we see an
activity there. Could also be on cache
186
00:20:18,930 --> 00:20:24,750
line 2, that’s … we don’t know on which
cache line this will actually pop up. But
187
00:20:24,750 --> 00:20:29,200
what is important is: What happens with
the second packet? Because the second
188
00:20:29,200 --> 00:20:35,380
packet will also light up a cache line,
but this time different. And it’s actually
189
00:20:35,380 --> 00:20:41,760
the next cache line as from the previous
package. And if we do this for 3 and 4
190
00:20:41,760 --> 00:20:51,310
packets, we can see that we suddenly have
this nice staircase pattern. So now we
191
00:20:51,310 --> 00:20:56,940
have predictable pattern that we can
exploit to get information when packets
192
00:20:56,940 --> 00:21:04,290
were received. And this is just because
the ring buffer is allocated in a way that
193
00:21:04,290 --> 00:21:10,300
it doesn’t evict itself, right? It doesn’t
evict if packet 2 arrives. It doesn’t
194
00:21:10,300 --> 00:21:16,660
evict the cache content of the packet 1.
Which is great for us as an attacker,
195
00:21:16,660 --> 00:21:22,260
because we can profile it well. Well,
let’s look at the real-life example. So
196
00:21:22,260 --> 00:21:28,010
this is the cache activity when the server
receives constant pings. You can see this
197
00:21:28,010 --> 00:21:34,750
nice staircase pattern and you can also
see that the ring buffer reuses locations
198
00:21:34,750 --> 00:21:40,650
as it is a circular buffer. Here, it is
important to know that the ring buffer
199
00:21:40,650 --> 00:21:48,940
doesn’t hold the data content, just the
descriptor to the data. So this is reused.
200
00:21:48,940 --> 00:21:55,520
Unfortunately when the user types over
SSH, the pattern is not as nice as this
201
00:21:55,520 --> 00:22:00,000
one here. Because then we would already
have a done deal and just could work on
202
00:22:00,000 --> 00:22:05,780
this. Because when a user types, you will
have more delays between packages.
203
00:22:05,780 --> 00:22:11,470
Generally also you don’t know when the
user is typing, so you have to profile all
204
00:22:11,470 --> 00:22:16,060
the time to get the timings right.
Therefore, we needed to build a bit more
205
00:22:16,060 --> 00:22:23,880
of a sophisticated pipeline. So it
basically is a 2-stage pipeline which
206
00:22:23,880 --> 00:22:31,520
consists of an online tracker that is just
looking at a bunch of cache lines that
207
00:22:31,520 --> 00:22:37,990
he’s observing all the time. And when he
sees that certain cache lines were
208
00:22:37,990 --> 00:22:44,300
activated, it moves that windows forward
the next position that he believes an
209
00:22:44,300 --> 00:22:50,260
activation will have. The reason why is
that we have a speed advantage. So we need
210
00:22:50,260 --> 00:22:57,090
to profile much faster than the network
packets of the SSH session are arriving.
211
00:22:57,090 --> 00:23:00,710
And what you can see here one the left-
hand side is a visual output of what the
212
00:23:00,710 --> 00:23:07,260
online tracker does. So it just profiles
this window which you can see in red. And
213
00:23:07,260 --> 00:23:15,030
if you look very closely, you can see also
more lit-up in the middle which
214
00:23:15,030 --> 00:23:19,690
corresponds to arrived network packets.
You can also see that there is plenty of
215
00:23:19,690 --> 00:23:27,280
noise involved, so therefore we’re not
able just to directly get the packet
216
00:23:27,280 --> 00:23:35,250
arrival times from it. That’s why we need
a second stage. The Offline Extractor. And
217
00:23:35,250 --> 00:23:40,590
the offline extractor is in charge of
computing the most likeliest occurence of
218
00:23:40,590 --> 00:23:46,010
client SSH network packet. It uses the
information from the online tracker and
219
00:23:46,010 --> 00:23:52,451
the predictable pattern of the ring buffer
to do so. And then, it outputs the inter-
220
00:23:52,451 --> 00:23:59,380
packet arrival times for different words
as shown here on the right. Great. So, now
221
00:23:59,380 --> 00:24:04,900
we’re again at the point where we have
just packet arrival times but no words,
222
00:24:04,900 --> 00:24:10,040
which we need for breaking the
confidentiality of your private SSH
223
00:24:10,040 --> 00:24:19,260
session. So, as I told you before, users
or generally humans have distinctive
224
00:24:19,260 --> 00:24:27,330
typing patterns. And with that, we were
able to launch a statistical attack. More
225
00:24:27,330 --> 00:24:33,060
closely, we just do like a machine
learning of mapping between user typing
226
00:24:33,060 --> 00:24:39,340
behaviour and actual words. So that in the
end, we can output the two words that you
227
00:24:39,340 --> 00:24:48,090
were typing in your SSH session. So we
used 20 subjects that were typing free and
228
00:24:48,090 --> 00:24:55,830
transcribed text which resulted in a total
of 4 574 unique words. And each
229
00:24:55,830 --> 00:25:01,230
represented as a point in a multi-
dimensional space. And we used really
230
00:25:01,230 --> 00:25:06,431
simple machine learning techniques like
the k-nearest neighbour’s algorithm which
231
00:25:06,431 --> 00:25:11,960
is basically categorising the measurements
in terms of Euclidian space to other
232
00:25:11,960 --> 00:25:17,550
words. The reason why we just used like a
very basic machine learning algorithm is
233
00:25:17,550 --> 00:25:21,330
that we just wanted to prove that the
signal that we were extracting from the
234
00:25:21,330 --> 00:25:26,590
remote cache is actually strong enough to
launch such an attack. So we didn’t want
235
00:25:26,590 --> 00:25:32,910
to improve in general, like, these kind of
mapping between users and their typing
236
00:25:32,910 --> 00:25:40,050
behaviour. So let’s look how this worked
out! So, firstly, on the left-hand side,
237
00:25:40,050 --> 00:25:47,090
you see we used our classifier on raw
keyboard data. So means that we just used
238
00:25:47,090 --> 00:25:52,880
the signal that was emitted during the
typing. So when they were typing on their
239
00:25:52,880 --> 00:25:58,900
local keyboard. Which gives us perfect and
precise data timing. And we can see that
240
00:25:58,900 --> 00:26:02,450
this is already quite challenging to
mount. So we have an accuracy of
241
00:26:02,450 --> 00:26:09,500
roughly 35%. But looking at the top 10
accuracy which is basically: the attacker
242
00:26:09,500 --> 00:26:15,580
can guess 10 words, and if the correct
word was among these 10 words, then that’s
243
00:26:15,580 --> 00:26:22,930
considered to be accurate. And with the
top 10 guesses, we have an accuracy of
244
00:26:22,930 --> 00:26:30,750
58%. That’s just on the raw keyboard data.
And then we used the same data and also
245
00:26:30,750 --> 00:26:35,730
the same classifier on the remote signal.
And of course, this is less precise
246
00:26:35,730 --> 00:26:43,840
because we have noise factors and we could
even add or miss out on keystrokes. And
247
00:26:43,840 --> 00:26:54,610
the accuracy is roughly 11% less and the
top 10 accuracy is roughly 60%. So as we
248
00:26:54,610 --> 00:27:00,851
used a very basic machine learning
algorithm, many subjects, and a relately
249
00:27:00,851 --> 00:27:07,600
large word corpus, we believe that we can
showcase that the signal is strong enough
250
00:27:07,600 --> 00:27:15,470
to launch such attacks. So of course, now
we want to see this whole thing working,
251
00:27:15,470 --> 00:27:21,030
right? As I’m a bit nervous here on stage,
I’m not going to do a live demo because it
252
00:27:21,030 --> 00:27:27,630
would involve me doing some typing which
probably would confuse myself and of
253
00:27:27,630 --> 00:27:34,060
course also the machine-learning model.
Therefore, I brought a video with me. So
254
00:27:34,060 --> 00:27:39,890
here on the right-hand side, you see the
victim. So it will shortly begin with
255
00:27:39,890 --> 00:27:45,480
doing an SSH session. And then on the
left-hand side, you see the attacker. So
256
00:27:45,480 --> 00:27:51,260
mainly on the bottom you see this online
tracker and on top you see the extractor
257
00:27:51,260 --> 00:27:58,080
and hopefully the predicted words. So now
the victim starts this SSH session to
258
00:27:58,080 --> 00:28:04,720
the server called “father.” And the
attacker, which is on the machine “son,”
259
00:28:04,720 --> 00:28:10,590
launches now this attack. So you saw we
profiled the ring buffer location and now
260
00:28:10,590 --> 00:28:19,790
the victim starts to type. And as this
pipeline takes a bit to process this words
261
00:28:19,790 --> 00:28:24,350
and to predict the right thing, you will
shortly see, like slowly, the words
262
00:28:24,350 --> 00:28:41,600
popping up in the correct—hopefully the
correct—order. And as you can see, we can
263
00:28:41,600 --> 00:28:48,010
correctly guess the right words over the
network by just sending network package to
264
00:28:48,010 --> 00:28:53,620
the same server. And with that, getting
out the crucial information of when such
265
00:28:53,620 --> 00:29:05,450
SSH packets were arrived.
applause
266
00:29:05,450 --> 00:29:10,330
So now you might ask yourself: How do you
mitigate against these things? Well,
267
00:29:10,330 --> 00:29:16,860
luckily it’s just server-grade processors,
so no clients and so on. But then, from
268
00:29:16,860 --> 00:29:22,960
our viewpoint, the only true mitigation at
the moment is to either disable DDIO or
269
00:29:22,960 --> 00:29:30,260
don’t use RDMA. Both comes quite with the
performance impact. So DDIO, you will talk
270
00:29:30,260 --> 00:29:37,130
roughly about 10-18% less performance,
depending, of course, on your application.
271
00:29:37,130 --> 00:29:42,640
And if you decide just to don’t use RDMA,
you probably rewrite your whole
272
00:29:42,640 --> 00:29:50,500
application. So, Intel on their publication
on Disclosure Day sounded a bit different
273
00:29:50,500 --> 00:30:00,430
therefore. But read it for yourself! I
mean, the meaning “untrusted network” can,
274
00:30:00,430 --> 00:30:10,250
I guess, be quite debatable. And yeah. But
it is what it is. So I’m very proud that
275
00:30:10,250 --> 00:30:17,420
we got accepted at Security and Privacy
2020. Also, Intel acknowledged our
276
00:30:17,420 --> 00:30:22,540
findings, public disclosure was in
September, and we also got a bug bounty
277
00:30:22,540 --> 00:30:26,950
payment.
someone cheering in crowd
278
00:30:26,950 --> 00:30:29,640
laughs
Increased peripheral performance has
279
00:30:29,640 --> 00:30:36,550
forced Intel to place the Last Level Cache
on the fast I/O path in its processors.
280
00:30:36,550 --> 00:30:43,250
And by this, it exposed even more shared
microarchitectural components which we
281
00:30:43,250 --> 00:30:51,631
know by now have a direct security impact.
Our research is the first DDIO side-
282
00:30:51,631 --> 00:30:55,730
channel vulnerability but we still believe
that we just scratched the surface with
283
00:30:55,730 --> 00:31:03,320
it. Remember: There’s more PCIe devices
attached to them! So there could be
284
00:31:03,320 --> 00:31:10,900
storage devices—so you could profile cache
activity of storage devices and so on!
285
00:31:10,900 --> 00:31:20,419
There is even such things as GPUDirect
which gives you access to the GPU’s cache.
286
00:31:20,419 --> 00:31:25,740
But that’s a whole other story. So, yeah.
I think there’s much more to discover on
287
00:31:25,740 --> 00:31:33,090
that side and stay tuned with that! All is
left to say is a massive “thank you” to
288
00:31:33,090 --> 00:31:38,480
you and, of course, to all the volunteers
here at the conference. Thank you!
289
00:31:38,480 --> 00:31:46,970
applause
290
00:31:46,970 --> 00:31:52,740
Herald: Thank you, Michael! We have time
for questions. So you can line up behind
291
00:31:52,740 --> 00:31:58,220
the microphones. And I can see someone at
microphone 7!
292
00:31:58,220 --> 00:32:02,720
Question: So, thank you for your talk! I
had a question about—when I’m working on a
293
00:32:02,720 --> 00:32:08,920
remote machine using SSH, I’m usually not
typing nice words like you’ve shown, but
294
00:32:08,920 --> 00:32:13,750
usually it’s weird bash things like dollar
signs, and dashes, and I don’t know. Have
295
00:32:13,750 --> 00:32:18,120
you looked into that as well?
Michael: Well, I think … I mean, of
296
00:32:18,120 --> 00:32:22,230
course: What we would’ve wanted to
showcase is that we could leak passwords,
297
00:32:22,230 --> 00:32:27,720
right? If you would do “sudo” or
whatsoever. The thing with passwords is
298
00:32:27,720 --> 00:32:35,620
that it’s kind of its own dynamic. So you
type key… passwords differently than you
299
00:32:35,620 --> 00:32:40,470
type normal keywords. And then it gets a
bit difficult because when you want to do
300
00:32:40,470 --> 00:32:45,870
a large study of how users would type
passwords, you either ask them for their
301
00:32:45,870 --> 00:32:51,030
real password—which is not so ethical
anymore—or you train them different
302
00:32:51,030 --> 00:32:57,600
passwords. And that’s also difficult
because they might adapt different style
303
00:32:57,600 --> 00:33:03,180
of how they type these passwords than if
it were the real password. And of course,
304
00:33:03,180 --> 00:33:09,580
the same would go for command line in
general and we just didn’t have, like, the
305
00:33:09,580 --> 00:33:13,050
word corpus for it to launch such an
attack.
306
00:33:13,050 --> 00:33:18,880
Herald: Thank you! Microphone 1!
Q: Hi. Thanks for your talk! I’d like to
307
00:33:18,880 --> 00:33:27,180
ask: the original SSH timing paper
attacks, is like 2001?
308
00:33:27,180 --> 00:33:31,270
Michael: Yeah, exactly. Exactly!
Q: And do you have some idea why there are
309
00:33:31,270 --> 00:33:37,650
no circumventions on the side of SSH
clients to add some padding or some random
310
00:33:37,650 --> 00:33:41,980
delays or something like that? Do you have
some idea why there’s nothing happening
311
00:33:41,980 --> 00:33:46,260
there? Is it some technical reason or
what’s the deal?
312
00:33:46,260 --> 00:33:52,752
Michael: So, we also were afraid that
between 2001 and nowadays, that they added
313
00:33:52,752 --> 00:33:59,360
some kind of a delay or batching or
whatsoever. I’m not sure if it’s just a
314
00:33:59,360 --> 00:34:04,580
tradeoff between the interactiveness of
your SSH session or if there’s, like, a
315
00:34:04,580 --> 00:34:09,450
true reason behind it. But what I do know
is that it’s oftentimes quite difficult to
316
00:34:09,450 --> 00:34:15,649
add, like these artifical packets in-
between. Because if it’s, like, not random
317
00:34:15,649 --> 00:34:21,389
at all, you could even filter out, like,
additional packets that just get inserted
318
00:34:21,389 --> 00:34:27,289
by the SSH. But other than that, I’m not
familiar with anything, why they didn’t
319
00:34:27,289 --> 00:34:34,770
adapt, or why this wasn’t on their radar.
Herald: Thank you! Microphone 4.
320
00:34:34,770 --> 00:34:42,389
Q: How much do you rely on the skill of
the typers? So I think of a user that has
321
00:34:42,389 --> 00:34:49,220
to search each letter on the keyboard or
someone that is distracted while typing,
322
00:34:49,220 --> 00:34:56,520
so not having a real pattern
behind the typing.
323
00:34:56,520 --> 00:35:01,900
Michael: Oh, we’re actually absolutely
relying that the pattern is reducible. As
324
00:35:01,900 --> 00:35:06,640
I said: We’re just using this very simple
machine learning algorithm that just looks
325
00:35:06,640 --> 00:35:11,820
at the Euclidian distance of previous
words that you were typing and a new word
326
00:35:11,820 --> 00:35:17,260
or the new arrival times that we were
observing. And so if that is completely
327
00:35:17,260 --> 00:35:24,440
different, then the accuracy would drop.
Herald: Thank you! Microphone 8!
328
00:35:24,440 --> 00:35:29,120
Q: As a follow-up to what was said before.
Wouldn’t this make it a targeted attack
329
00:35:29,120 --> 00:35:33,220
since you would need to train the machine-
learning algorithm exactly for the person
330
00:35:33,220 --> 00:35:40,340
that you want to extract the data from?
Michael: So, yeah. Our goal of the
331
00:35:40,340 --> 00:35:47,410
research was not, like, to do next-level,
let’s say machine-learning type of
332
00:35:47,410 --> 00:35:53,510
recognition of your typing behaviours. So
we actually used the information which
333
00:35:53,510 --> 00:36:01,310
user was typing so to profile that
correctly. But still I think you could
334
00:36:01,310 --> 00:36:06,540
maybe generalize. So there is other
research showing that you can categorize
335
00:36:06,540 --> 00:36:12,740
users in different type of typers and if I
remember correctly, they came up that you
336
00:36:12,740 --> 00:36:20,260
can categorize each person into, like, 7
different typing, let’s say, categories.
337
00:36:20,260 --> 00:36:26,800
And I also know that some kind of online
trackers are using your typing behaviour
338
00:36:26,800 --> 00:36:34,530
to re-identify you. So just to, like,
serve you personalized ads, and so on. But
339
00:36:34,530 --> 00:36:41,400
still, I mean—we didn’t, like, want to go
into that depth of improving the state of
340
00:36:41,400 --> 00:36:45,550
this whole thing.
Herald: Thank you! And we’ll take a
341
00:36:45,550 --> 00:36:49,470
question from the Internet next!
Signal angel: Did you ever try this with a
342
00:36:49,470 --> 00:36:56,240
high-latency network like the Internet?
Michael: So of course, we rely on a—let’s
343
00:36:56,240 --> 00:37:02,740
say—a constant latency. Because otherwise
it would basically screw up our timing
344
00:37:02,740 --> 00:37:09,290
attack. So as we’re talking with RDMA,
which is usually in datacenters, we also
345
00:37:09,290 --> 00:37:15,940
tested it in datacenter kind of
topologies. It would make it, I guess,
346
00:37:15,940 --> 00:37:20,620
quite hard, which means that you would
have to do a lot of repetition which is
347
00:37:20,620 --> 00:37:25,510
actually bad because you cannot tell the
users “please retype what you just did
348
00:37:25,510 --> 00:37:32,730
because I have to profile it again,”
right? So yeah, the answer is: No.
349
00:37:32,730 --> 00:37:39,520
Herald: Thank you! Mic 1, please.
Q: If the victim pastes something into the
350
00:37:39,520 --> 00:37:44,760
SSH session. Would you be able to carry
out the attacks successfully?
351
00:37:44,760 --> 00:37:51,200
Michael: No. This is … so if you paste
stuff, this is just sent out as a badge
352
00:37:51,200 --> 00:37:54,310
when you enter.
Q: OK, thanks!
353
00:37:54,310 --> 00:37:59,920
Herald: Thank you! The angels tell me
there is a person behind mic 6 whom I’m
354
00:37:59,920 --> 00:38:03,020
completely unable to see
because of all the lights.
355
00:38:03,020 --> 00:38:08,410
Q: So as far as I understood, the attacker
can only see that some package arrived on
356
00:38:08,410 --> 00:38:13,490
their NIC. So if there’s a second SSH
session running simultaneously on the
357
00:38:13,490 --> 00:38:18,210
machine under attack, would this
already interfere with this attack?
358
00:38:18,210 --> 00:38:23,910
Michael: Yeah, absolutely! So even
distinguishing SSH packets from normal
359
00:38:23,910 --> 00:38:31,840
network packages is challenging. So we use
kind of a heuristic here because the thing
360
00:38:31,840 --> 00:38:37,505
with SSH is that it always sends two
packets right after. So not only 1, just
361
00:38:37,505 --> 00:38:43,800
2. But I ommited this part because of
simplicity of this talk. But we also rely
362
00:38:43,800 --> 00:38:48,990
on these kind of heuristics to even filter
out SSH packets. And if you would have a
363
00:38:48,990 --> 00:38:54,850
second SSH session, I can imagine that
this would completely… so we cannot
364
00:38:54,850 --> 00:39:05,140
distinguish which SSH session it was.
Herald: Thank you. Mic 7 again!
365
00:39:05,140 --> 00:39:11,760
Q: You always said you were using two
connectors, like—what was it called? NICs?
366
00:39:11,760 --> 00:39:15,970
Michael: Yes, exactly.
Q: Is it has to be two different ones? Can
367
00:39:15,970 --> 00:39:21,210
it be the same? Or how does it work?
Michael: So in our setting we used one NIC
368
00:39:21,210 --> 00:39:27,461
that has the capability of doing RDMA. So
in our case, this was Fabric, so
369
00:39:27,461 --> 00:39:31,950
InfiniBand. And the other was just like a
normal Ethernet connection.
370
00:39:31,950 --> 00:39:36,910
Q: But could it be the same or could it be
both over InfiniBand, for example?
371
00:39:36,910 --> 00:39:43,400
Michael: Yes, I mean … the thing with
InfiniBand: It doesn’t use the ring buffer
372
00:39:43,400 --> 00:39:49,720
so we would have to come up with a
different kind of tracking ability to get
373
00:39:49,720 --> 00:39:54,020
this. Which could even get a bit more
complicated because it does this kernel
374
00:39:54,020 --> 00:39:58,730
bypass. But if there’s a predictable
pattern, we could potentially also do
375
00:39:58,730 --> 00:40:03,730
this.
Herald: Thank you. Mic 1?
376
00:40:03,730 --> 00:40:08,840
Q: Hello again! I would like to ask, I
know it was not the main focus of your
377
00:40:08,840 --> 00:40:13,710
study, but do you have some estimation how
practical this can be, this timing attack?
378
00:40:13,710 --> 00:40:20,050
Like, if you do, like, real-world
simulation, not the, like, prepared one?
379
00:40:20,050 --> 00:40:23,190
How big a problem can it really be?
What would you think, like, what’s
380
00:40:23,190 --> 00:40:27,170
the state-of-the-art in this field? How
do you feel the risk?
381
00:40:27,170 --> 00:40:30,300
Michael: You’re just referring to the
typing attack, right?
382
00:40:30,300 --> 00:40:34,330
Q: Timing attack. SSH timing. Not
necessarily the cache version.
383
00:40:34,330 --> 00:40:40,500
Michael: So, the original research that
was conducted is out there since 2001. And
384
00:40:40,500 --> 00:40:45,900
since then, many researchers have showed
that it’s possible to launch such typing
385
00:40:45,900 --> 00:40:52,180
attacks over different scenarios, for
example JavaScript is another one. It’s
386
00:40:52,180 --> 00:40:56,820
always a bit difficult to judge because
most of the researcher are using different
387
00:40:56,820 --> 00:41:03,340
datasets so it’s different to compare. But
I think in general, I mean, we have used,
388
00:41:03,340 --> 00:41:09,400
like, quite a large word corpus and it
still worked. Not super-precisely, but it
389
00:41:09,400 --> 00:41:15,910
still worked. So yeah, I do believe it’s
possible. But to even make it a real-world
390
00:41:15,910 --> 00:41:21,210
attack where an attacker wants to have
high accuracy, he probably would need a
391
00:41:21,210 --> 00:41:25,950
lot of data and even, like, more
sophisticated techniques. Which there are.
392
00:41:25,950 --> 00:41:29,970
So there are a couple other of machine-
learning techniques that you could use
393
00:41:29,970 --> 00:41:34,180
which have their pros and cons.
Q: Thanks.
394
00:41:34,180 --> 00:41:39,750
Herald: Thank you! Ladies and
Gentlemen—the man who named an attack
395
00:41:39,750 --> 00:41:44,737
netCAT: Michael Kurth! Give him
a round of applause, please!
396
00:41:44,737 --> 00:41:58,042
applause
Michael: Thanks a lot!
397
00:41:57,048 --> 00:42:01,400
36C3 postscroll music
398
00:42:01,400 --> 00:42:16,000
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!