rc3 preroll music
Herald: In the world of bad puns, everyone
knows and loves the famous line from the
cinematic masterpiece, where the IT
security specialists ask the CPU architect
"Warum leakt hier Strom?" or in English,
"why is power leaking here?". In this talk
our four speakers demonstrate how they can
attack modern processors purely in
software, relying on technical, techniques
from classical power side channel attacks.
They'll explain how to use these
unprivileged access to energy monitoring
features and modern Intel and AMD CPU's.
Please welcome with a round of digital
applause. Moritz Lipp, Michael Schwarz,
Daniel Gruss and Andreas Kogler.
Moritz: Warum leaked hier Strom?
laugh track
Andreas: Und warum wendest du
kein Masking an?
laugh track
Daniel: But to understand how we got here,
we have to go back to San Diego in May
2017.
A: This is a great, Moritz, this is
a great talk title. We have to use this.
laugh track
M: Yeah, but actually, before we can
do a talk, we should do some interesting
research that we can present, right?
laugh track
A: Of course. Of course. But we have
to remember this talk title, it's great.
laugh track
M: Yes.
music
Michael: Hey Moritz. Today I have found
something really cool.
Moritz: OK, what is it?
Michael: Our computers, they give
us the current energy consumption in
microjoule and you can access that
from userspace.
laugh track
Moritz: What? Are you for real?
Michael: That, that basically means we
could mount something like software based
power side channels.
Moritz: Nice. We should try that out.
Michael: Yes, I already did, because I
thought you might not believe me.
Moritz: OK.
Michael: So this is one of the experiments
I did. Here you can already see that. I
measured the power consumption using that
interface.
Moritz: yeah
Michael: First while doing nothing, idling
around sleeping
Moritz: like always
Michael: and then I increased the CPU
load, I just did an endless loop which
accessed a bit of memory. It's nothing
interesting but you can already see the
difference for that. So you can see that
there's a difference in doing nothing and
doing a lot. That's pretty nice.
Moritz: We should look take a closer look
at that, I think.
Michael: Definitely.
music
Moritz: sings You can create
my power trace
Andreas: Oh, this is great. We already
have a song for this paper now. Okay.
Well, this is a great song that we can use
for the paper...
music
Michael: Powertrace,
like power analysis attacks?
Moritz: Yeah, but that would be
an attack with physical access.
Daniel: Software-only would be great
Michael: Yes, I told you already,
I found one can measure energy
consumption in micro joules
Moritz: Like attacking all server,
desktop and laptop CPUs
Daniel: Ideally with unprivileged access
Michael: Imagine if you could
distinguish different instructions
or even observe the Hamming weights of
operands and memory loads
Daniel: Control flow monitoring
Moritz: In physical attacks they often go
for cryptographic keys.
That would be great.
Attacking AES-NI and RSA
Daniel: There's just one problem:
there is no such channel
Michael: As I said,
don't you listen, Daniel?
It's like always, there is this RAPL
register. This interface is already there
and you can measure power consumption
Daniel: Yes, but only on a
very coarse granularity
Moritz: But first, we need to get a bit
more understanding of the CPU power
management. The thermal design power, the
TDP, is the power consumption under the
maximum theoretical load of the processor.
And you probably know that number from the
CPU specification. And this gives
integrators a target to find the proper
thermal solution when you integrate CPU in
a computer so that it doesn't run too hot.
But for short periods of time, the CPU can
consume more power than that. And this we
can see in this graphic. So here for this
Tau moment, the power consumption is much
higher than for the rest of the CPU.
Because usually a CPU is not instantly hot
and thermal properties propagate over a
bit of time. So on the other hand, you
should also be able to save power. And you
can do this in different ways. For
instance, you could just shut down
resources completely that you do not need
at the moment, or you can reduce the
voltage of the processor or those
components and then it also consumes less
power. And on top of that, you could also
reduce the frequency of the processor and
then it also consumes less power. And you
need this for different scenarios. For
instance, with your laptop, you need to
budget the power consumption because you
want to have a long run time. And you also
know these options that you can change,
like the performance level if it should
run on high performance or to save
battery. And you need this in different
scenarios.
Michael: Yes, Moritz, that's exactly what
I showed you before. Do you remember? I
showed you this intel running average
power limit, short RAPL, that provides
exactly that functionality. So with this
Intel RAPL, you have the power limiting
features so you can do exactly what you
just described, reduce the power usage for
your system or for parts of your system.
And additionally, you also have the energy
readings. So you know exactly how much
power is currently used on a system which
helps you do exactly the things you just
mentioned before, like getting a better
power performance balance. So this is
already there.
Moritz: Because the CPU needs to know in a
way how much power it consumes, right?
Michael: Exactly and the scheduler also
uses that feature to ensure that you get a
better battery runtime on your laptop, for
example. And because this is an important
feature you can directly get that from the
operating system as well. On Linux, you
can even get that as an unprivileged
application. There's the powercap
framework that you can directly access in
this pseudo file system where you get the
current power readings, you can directly
see how much power your CPU currently
consumes.
Moritz: How convenient!
Michael: On MacOS and on Windows you have
a similar thing, but for that you first
need to install a driver because usually
you don't need that as a userspace
application. But some drivers might want
to have that and some drivers even expose
that to you and you can use that. So there
are some drivers that are even
preinstalled on some of the motherboards
that expose that information to
applications as well on Windows.
Moritz: Interesting, but what can we do
with this? So I ran some experiments
because I wanted to know how good this
energy consumption monitoring works. And
in a first run we tried to distinguish
instructions from each other. So we
implemented a small program just running
the same instructions all the time, and we
measured its power consumption. And as we
can see easily in this plot, different
instructions need a different amount of
power. So we can distinguish instructions
from each other. In addition, what I
tried, I changed the operands that
different instructions used. For instance,
for a multiplication, you can multiply
different numbers with each other. And
also here we see, depending on the bits
that are set in the operand a different
power consumption of the same instruction,
but just depending on the operand so we
can also distinguish them from each other.
This could also come in handy later on.
But I also tried to load data with an
instruction and I wanted to know if I
could see differences in the power
consumption, depending on the data that
has been loaded by the processor. And as
you can see in this plot, the more bits
that are set in the data that is loaded,
the more power the CPU consumes. But let's
be honest here, to record these
measurements, it took more than 23 days,
so it took quite some time to get to this
granularity to see those differences, but
in other cases, if you just...
Michael: still a fascinating result.
Moritz: Yes, it's a very interesting
result. And in other cases, Michael, you
only want to know if one operand or one
value is a zero or if it's not a zero. And
to come to this result, you don't need
that many measurements. And the last
experiments that we did was we wanted to
know if we would see a difference in the
energy consumption, depending where data
has been loaded from. For instance, as
we've seen also at CCC in many different
talks over the past years, they are like
cache attacks. And here in this
experiment, we also were able to see a
difference in the power consumption if
your value has been loadad from the cache
or if it has to be loaded from the main
memory, because, of course, then DRAM is
activated and it consumes more power. But
these results are very nice.
Michael: Yes, these are really fascinating
results. So we should actually exploit
them and build attacks from that. I mean,
it's fascinating to see that all these
measurements are possible, but we also
want to do something security related.
Moritz: Do you have any idea what we
could do?
Michael: Yes, I have that idea I already
showed you something from before. If you
remember from the office, this one
measurement. And I extended that
measurement.
Moritz: Yes.
Michael: Into a covert channel. So a
covert channel is a communication channel
between two parties that are usually not
allowed to communicate with each other. So
there might be different reasons for that.
Maybe ther's no interface, maybe there's a
policy or a firewall or something that
prevents them from communicating. And
still, in this scenario, I want to
communicate. So for that, I'm using
exactly these power side channels and all
this analysis you have done to actually
communicate. And that's is very simple to
do, actually. I have two processes, a
sender and a receiver, and the sender
tries to send single bits, zeros and ones.
And to send a one bit. I do something that
uses a lot of energy, like accessing main
memory. And if I want to send a zero bit,
then I don't do anything. And now as a
receiver, I just have to measure the power
consumption and I see if the power
consumption has a spike. Then I know the
sender is sending a one. If there's
nothing the sender is apparently sending a
zero and from that I can get this
information a Sender wants to send me.
Moritz: But did you try that out?
laugh track
Michael: Yes, I also tried that and we can
see that here in this graph. So this is
the energy measurement.
Moritz: That's a very clean signal.
Michael: Yes, it's the energy measurement
on the receiver side. And we see exactly
what I told you before. If there are one
bits, then the energy consumption is
higher. If there are zero bits, it's
lower. And from that we can deduce the
information that I wanted to send on the
sender side. Pretty neat, huh?
Moritz: Yeah, but this is just from one
process to another process. Actually, I
took your idea and used this in a
hypervisor scenario where we attack the
Xen hypervisor. So it's not limited to two
processes. I installed the Xen hypervisor
with two virtual machines. And what Xen
does is it also exposes those RAPL
registers to the virtual machine. So now
as a virtual machine, I can have direct
access to that and then I can establish a
covert channel between two virtual
machines in the cloud.
Michael: That's even better.
Moritz: And this is really working, as you
can see here. I mean, here I'm just
sending ones and zeros, but the signal is
pretty clear.
Michael: That's nice.
Moritz: But it's the more that we can do?
Michael: Yes. I mean, covert channels are
great to demonstrate something, that it
actually works, across VM, really great. I
like that. That gives you a different
threat model here, but still they are a
bit boring. So I decided to have something
more interesting as another example of
what we can do. I always like to break
kernel address space layout randomization,
KASLR. With this kernel address space
layout randomization, the kernel is mapped
to different virtual locations every time
I boot my computer to make it difficult to
actually exploit something in the kernel
because it's not predictable where the
kernel is located. And I again use the
energy consumption to figure out where
this kernel is located. So how does that
work? In this address space I have the
kernel which is actually mapped using
physical pages and I have a lot of nothing
where no physical page is mapped. And if I
try to access these addresses, I can't, of
course, because I don't have the
privileges for that. But I will still see
differences when doing that because the
CPU has to do different things depending
on whether there's actually a page or not,
whether this page can be cached, this
translation, or whether this translation
is always invalid because there's nothing
there and it can't be cached. We can see
that here in an illustration, if you're
wondering how that really works. So it
turns out the kernel can only be mapped to
a limited number of places because it has
to be aligned by two megabytes, so I only
need to check the spots there where the
kernel could be located. And for all these
places in the address space, I just try to
access it and measure how much energy that
consumes. And if there's nothing mapped,
it consumes quite a lot of energy because
the CPU has to figure out that there's
nothing mapped. It goes through the page
tables, the page table walk, and at the
end figures out, oh, there's nothing here,
so I can't do anything, and aborts that.
And that uses quite some energy. But if
there's actually the kernel here, then
this translation is valid. It works. There
is something there. It will likely be
already in the translation caches in the
TLB, so the CPU has less work. It just
needs to check the cache, sees: "Oh it's
there. I know that. But wait a moment, you
can't access it" and can immediately abort
and that uses less energy. So just from
the energy consumption, I can see if
there's something mapped and with that see
where the kernel is actually mapped.
Moritz: And this is really working? Did
you try it out or is this just some
theoretical thing?
Michael: You're always so skeptical. Of
course I tried that and I brought the demo
with me. So here you can see the demo
running. This is on a real system. And you
see it's super fast measuring the energy
consumption going over the address space
and finding the kernel.
applause
Moritz: But these attacks are boring,
Michael. We want to attack something real,
we want to be like real attackers, we want
to attack crypto, we want to get keys.
Michael: Crypto is complicated. That's …
laugh track
Moritz: No, no, no, just listen. So, for
instance, with RSA, this is a widely used
public-key cryptosystem. This is really
easy because to encrypt some data, you
have a public key. To decrypt the data you
have a private key. And if we get the
private key: profit, easy as that. What do
you say?
Michael: Yeah, I know how that works. So
the theory is easy, that I have the two
keys and I have a private key. But then
the complicated part starts where you
really have to understand the crypto to
actually attack it. And that's really
complicated. And I don't really want to do
that. Maybe we can a student who tries
that but I'm out of here. laughter
Andreas: Hi guys, I'm a student and I want
a master thesis.
Moritz: This is perfect. Your name is
Andreas, right?
Andreas: Yeah, sure, I'm Andreas.
laughter
M: OK, I don't know if you have heard
the last bits, but we want to attack some
crypto with power side channel attacks.
A: OK
Moritz: And for instance, with RSA, we
have the private key and the public key.
Here we have M the message and C the
ciphertext and d the private exponent. And
of course, it's a computer. It consists of
ones and zeros. And depending on the key
bit if it's a one, for the computation of
the algorithm, we do a square and the
multiply operation. And if it's zero, we
just do the square operation and we do
this for the entire private key.
A: Now OK, sounds easy enough.
M: Yes. And if we can observe that we
can extract the key. Sounds good. But I
did some experiments and it didn't work
out as well as I've expected it to be. So
we need to get a bit more control and
maybe a better threat model how to do
that. And there comes Intel SGX into play.
And this is an instruction set extension
and it provides you with integrity and
confidentiality of code and data even in
untrusted environments. So with Intel SGX,
you can run programs using protected areas
of memory. And even in the case where the
operating system is compromised and cannot
be trusted at all.
A: So basically we have the full
access of all operating system features to
attack, the enclave.
M: Yes, exactly
A: OK, that sounds quite powerful
M: But there's still one issue. It's
still just executing a program. So we have
more power, but we need to make use of
that. And there is this paper called
SGX-Step, which gives you more control of
enclaves and Jo Van Bulck the author maybe
has time to explain this a bit to us so
maybe we can give him a call.
A: Sounds great.
ringing sound
M: Hi Jo, this is Moritz. I've seen
the paper of yours, this SGX-Step paper.
It might be the thing that we need, but
can you explain a bit what it is about?
Jo: Yes, surely Moritz, so SGX-Step I
think in one sentence it's an enclave
execution control framework. What I mean
with that is that it allows you to
precisely control the execution of the
enclave so that you can interleave it with
attacker code, as the name implies, you
would do one step of the enclave, one step
of the attacker again one step of the
enclave, one step of the attacker, etc.
M: That's perfect.
J: That's the high level.
Moritz: Can you expand it a bit on the
technical point of view? How do you do
that?
J: Yes, I'm very excited about the
technical details, Moritz. So let me walk
you through. The first thing you should
know about SGX-Step: it's completely open
source and we build it on top of stock
Linux environments.
M: Nice
J: So what you should start with always
is to load a malicious kernel driver. And
this is called the /dev/sgx-step driver.
And from that moment on we kind of export
all of the powers of the Linux kernel into
the userspace. And the second component of
SGX-step that's important is this small
library operating system that we wrote.
It's called libsgxstep and it sits just
alongside of the library alongside in the
userspace application. And libsgxstep
allows you to do a number of cool things.
I think the most important thing being
that you have direct access to the APIC
x86 high resolution timing device. So that
sounds interesting for you, right Moriz?.
M: Yeah, but what do you
do with the timer?
J: Well, what you can do with the timer
is essentially you can arm it just before
you enter the enclave. And what would
happen then is, let's have a look. You arm
the timer, you start executing the
enclave, then after a while and interrupt
fires and you exit the enclave again.
M: Hmm, so it's like a debugger like
GDB, but for enclaves?
J: Yes, it's a... it's exactly that
Moritz. It's like an attacker controlled
debugger without using any of the debug
features, just using the raw x86
primitives and operating system files. And
just as in a debugger, it allows you to do
single stepping. So every instruction will
be executed one at a time. At most one at
a time I should say.
M: But what happens if I, like,
configure the timer a bit lower? Does it
then like start executing an instruction?
J: That's a very good question. And
configuring the timer is the tricky thing
about SGX-step. So it will indeed happen
sometimes what we call a zero step event.
So you will fire the timer before the
enclave even had time to execute an
instruction. And those are a kind of event
that you can also detect with SGX-step.
There is a trick to detect whether you had
a single step or a zero step.
M: Jo, this is perfect. This is
exactly what we are looking for. Thank you
so much for explaining that.
J: I'm very happy to hear that.
M: I'm looking forward to try it out
now.
J: Go.
M: See you hopefully soon.
J: Bye bye.
M: Bye!
M: So SGX-step to sum it up,
it's an open source Linux kernel
framework, and it allows us to configure
the APIC timer interrupts so that we can
interrupt the enclave execution to single
and zero step it. And this is perfect
because now we can combine it with the
power measurements of Intel RAPL, and this
gives us the possibility to measure the
energy consumption of single instructions.
Can you try it out Andi?
A: OK, let me dig deeper into that.
We have this really slow RAPL interface
here and if you want to visualize it, we
could imagine that it's like we have slots
where we can fill the slots with
instructions and the RAPL interface gives
us the average power consumption over the
slots. So in the default case, when we
execute our target instruction, we have
basically one slot filled with the target
instruction and the remaining slots filled
with other instructions we don't know. So
basically noise. The best case for us
would be if we repeat the target
instruction indefinitely and fill every
slot with the target instruction.
M: This is exactly what I did
in the experiments in the beginning.
A: Yeah, exactly. That's the reason
why we got so good measurements there.
Another trick would be if we only used the
target instruction in one slot and fill
the remaining slots with instructions
where we know the energy consumption of or
we know the instruction of. Then it could
do tricks to calculate the energy
consumption of the target instruction.
With SGX-step now we can use a hybrid
solution here, where we use SGX-step the
zero stepping mechanism to reissue this
instruction and we can fill multiple slots
with the same target instruction. Only
drawback here is that we have a noise
overhead of SGX-step itself, but this is
probably the best solution we can go with.
M: This sounds pretty good, so we
should actually try that out. So we
implement a toy cipher, which imitates
square and multiply basically. So we can
leave out all the rest, the overhead of a
library that would be used otherwise. And
we then just single step every instruction
and measure its energy consumption and
then we could plot this. Can you do that?
A: I got already some results here
for us. Basically here we use, as you
explained, a toy example for square and
multiply. And in both cases the square and
the multiply, they execute exactly six
instructions. And so basically we have a
period of six here. And if you look at the
results of the measurement here, we can
see that we have patterns that repeat with
a period of six and we can see that these
different patterns correspond to either a
square or a multiply instruction here.
M: Nice, perfect, but this is just a
toy cipher, right? laughter
A: Yeah.
M: Can we do like real crypto?
laughter
A: We can try. So the plan now is
that we want to attack a real RSA
implementation and the real implementation
is not like a toy square and multiply
algorithm. The real implementation needs
to handle these huge numbers. So basically
there's much more code involved and it's
not feasible to single step every
instruction there. So we must do a more
clever approach here. If we observe the
square multiply part here, we see that the
square and the multiply function uses the
AVX optimized memset function. So the
energy consumption should also be more if
we execute an AVX instruction because AVX
instructions use much larger registers. So
basically we should be able to observe
that.
M: Interesting.
A: The only drawback here is that we
cannot use the same approach as with the
toy cipher because the square has a
different number of instructions as the
square and multiply function. So we need
to do a trick here. So to understand what
we did here, our target is that we
reconstruct a key bit. And if the key bit
is one we execute a square and multiply.
If the key bit is zero, we execute a
square. So to visualize how we execute
zero and single stepping, we have to dig
into the assembler a bit. So to test for
the key bit, we execute like a test
instruction and then we execute a
conditional jump. And if we execute the
square and multiply we have for instance,
K instructions. And if we execute the
square we have for instance L
instructions. So we can see that these two
numbers do not add up. They are different.
So we cannot simply measure each Kth
instruction and get the key out. So we
need to do something different here. We
can number the instructions after the jump
instruction and then using single stepping
to single step to the Nth instruction
after the jump instruction. And on the
left side, if you observe one, we hit then
exactly the AVX instruction there, used in
the AVX memset. And if you then use our
measurement framework to measure exactly
the nth instruction after the jump, we
observe on the one hand a high energy
consumption and on the other hand, we
observe low energy consumption if the
branch was not taken or a zero.
M: It's very clever.
A: So if you measured both
instructions here, we can then combine
this energy measurements and then use a
simple threshold to reconstruct the key
bit in the beginning. And then we do this
iteratively for each key bit.
M: This sounds pretty promising, but
did you try it out?
laughter
A: Sure. Here, the results of that.
And we can clearly see that we have
different energy consumption or in this
case voltage
applause
based on if the
AVX instruction is executed or if the
instruction at the same offset in the
other branch is executed.
M: How fast does this work, does this
take like 5 days?
A: Not quite that long. We have one
problem here that the time per key bit
increases the further or later the key bit
is in the key. So basically the first key
bit we can reconstruct very fast, but for
the last key bit, we need a single step
much further in the code to actually reach
it. And this adds up. So basically the
time increases linearly between the key
bits. But for our key here, our test key
with 512 bits that takes us about 3.5
hours to reconstruct a complete key. Note
here that we spent like 52 minutes
only to find the target instruction. So
basically, if we could optimize that, the
attack would be much faster. In addition,
we had to record like 3 samples per key
bit. But with the implementation, it
should be possible to actually do that
with 1 sample. And since we then only need
one sample per key bit, we actually can do
it with a single trace attack. But we did
not try that out, unfortunately.
Moritz: quite fast.
Michael: So while all this sounded quite
easy and straightforward in hindsight,
this was actually a really long process.
Starting at the beginning of 2017 when we
discovered this interface, the RAPL
interface. Then we had to come up with a
title for this talk, of course, laughter
and some lyrics for a song. We had the
first toy attack on RSA at the end of
2017. It took us until 2018 to finally get
a KASLR break that was working and only in
2019, by the end of 2019. After Andreas
did his master's thesis on that, we were
able to produce a full attack on RSA. And
this is also the time when we submitted
that as a paper to a conference and
disclosed that to the CPU vendors so that
they can fix that. And this is also the
start of the embargo. This embargo for
this vulnerability lasted almost one year.
So from November 2019 to November 2020. It
was just a few weeks ago that this embargo
ended here.
Moritz: But there's one thing missing. We
really wanted to do crypto attacks, but
not only with SGX-step as a compromised
operating system, but also from userspace.
But as we've seen, it's so difficult to
measure parts of the code without having
SGX-step. But what we can do is we can
measure the power consumption of the
overall execution of an algorithm and
there correlation power analysis comes in
handy. And there what we do is we build a
power consumption model of our device. As
we've heard earlier, the Hamming Weight is
the number of bits that is set in an
operand or in the data. And we assume that
if a bit is set, the computer takes more
power to process it. In addition, what you
can use as a different model is the
Hamming distance. So from one operation to
the other, how many bits change? And then
we assume the more bits change, the more
power is consumed. And we really want to
try that out. So what we are targeting now
is AES-NI, a side channel resistant
instruction set of Intel. And we target it
in a scenario where we can trigger the
encryption and decryption of many, many
blocks over long time so that the
execution time is longer than the RAPL
update rate, so that we can really see the
power consumption in our measurement. And
this is used, for instance, in disk
encryption or decryption or if you seal or
unseal the SGX enclave state. And we can
now do that and record power measurements
in different scenarios, right?
Andreas: Sure, we can try that. So in our
experiment, we recorded two million traces
over 26 hours for SGX environment. But we
also tried to reconstruct it without SGX
where we used the encryption inside a
kernel module. And there we recorded
4 million traces in 50 hours. And to
understand the attack here, we have to
look at this animation. So basically we
have our computer where secret key is
stored somewhere intern. Then we have this
key to encrypt some messages and we also
have the power consumption here. And what
we now did is we recorded the encrypted
message and the power consumption it took
to encrypt this message for many messages.
And then we use a model of the CPU here to
predict the energy consumption, to
reconstruct the key. The key is usually
split up into parts, where each of the
parts can have a value between 0 and 255.
So to reconstruct the key here, we simply
use our measurements in the model and we
try out one of the key parts and estimate
the energy consumption for the key part.
And then we store the correlation between
the recorded messages and the prediction.
And we do this for every of the possible
key values. And once we found the key
value of the highest correlation, we know
that this key value corresponds to the key
part of the key. And we then simply repeat
the process for each of the parts of the
key until we get the final key.
M: And we actually tried that out. So
here in our demo video, you see on the
left where we test all the combinations
and see what is the most likely key
candidate at the moment, while for a
single key byte on the right, you see
every possible value and the correlation.
So in the beginning, with not that many
traces processed, it's not very clear
which key candidate is the right one,
because there's so much measurement noise
introduced by measuring over the overall
execution time. But over time, this signal
gets more stable and we see on the right
with the peak getting more and more
distance from the other candidates that
this is our correct key byte. And we do
this, as Andreas said, for every possible
key byte with every possible value. So in
the end, we end up with the correct key.
applause
A: OK, but this seems like it's only
Intel CPUs. Does this also affect others?
M: Yes. So actually, we also tried
out how to CPU vendors if they have
similar interfaces. And for instance, AMD
is affected as well. But we never really
heard back from them after our disclosure.
And the patch how to try to solve the
problem with the driver is similar to the
one that Intel has.
A: Your right Moritz, it actually
works. So I tried the same code on AMD.
The one you showed before was
distinguishing operands, at that also
works on AMD. That's pretty nice. It's not
an Intel only issue. It also affects at
least AMD as well.
M: Yes, but actually there are many
other vendors as well that provide
interfaces, even some of them unprivileged
to user space where you could probably
mount similar attacks. For instance,
Nvidia, IBM, or Marvell and Ampere.
A: So this is really an industry
wide problem here. And we've also seen
that from the media coverage. So not only
German news brought about that like Heise
or Golem, but it also went more
international with ZDNET, Ars Technica,
CSO, Tech Radar, Computer Weekly and many,
many others that wrote about this new type
of vulnerability that affects many
computers out there. And I guess if it
affects many computers, we should do
something against that.
M: Yes, you're right. We cannot only
have an attack and no mitigation against
it. This would not be right. And indeed,
it's quite easy to fix that because we
said in the beginning, you have
unprivileged access to those registers. So
we just restrict the access. And we are
done, and this is exactly a one line patch
for the Linux kernel. But as we've seen
with the threat model of Intel SGX, which
allows a compromised operating system. So
this one line patch does not help there
because I'm the operating system, I can do
whatever I want to. We need more and more
complex mitigations. And in this case,
microcode updates are necessary. And what
Intel does is to fall back to the model of
the energy consumption. So they have an
internal model. How much energy is
consumed by an executed instruction and
use that instead of the real measurement.
And this does not allow to distinguish
data and operands from each other again.
So if your implementation is implemented
correctly, if you use constant time, then
you are mitigated and protected against
these attacks. And as we see here in the
plot, we tried to mitigation out. So on
the left, we were able to see differences
depending on the Hamming weight of the
operands. And on the right with the
mitigation in place, it just does not work
anymore and you cannot see any
differences. applause
Andreas: Nice. So you really
can't read her power trace any more.
Music: Pokerface by Lady Gaga
sings
I wonna probe 'em like in 1943
touch 'em, measure wattage
correlate and get the key
I probe it
Oscilloscopes are not the same
without a probe
And babe, if it's remote if it's not code,
it cannot run
I'll let him plot, let's see what he's got
I'll let him plot, let's see what he's got
Can't read my, can't read my
No he can't read my power trace
She's got the countermeasure
Can't read my, can't read my
No he can't read my power trace
She's got the countermeasure
P-p-p-power trace, p-p-power trace
P-p-p-power trace, p-p-power trace
P-p-p-power trace, p-p-power trace
P-p-p-power trace, p-p-power trace
applause
Moritz: With all those nasty songs, we
wrote them down in a scientific paper and
the PLATYPUS paper has been accepted
recently at a conference. And we also want
to thank you, all the other coauthors who
are not in this talk, like David Oswald,
Catherine Easton and Claudio Canela. To
sum it up, what we have seen is that with
power sidechannel attacks, you can even
exploit them from software. So there is no
need to attach an oscilloscope on modern
Intel CPUs.
Michael: And what we've also seen is
that since the SGX threat model allows for
much more capable attackers, mitigating
power sidechannel attacks on the SGX
enclaves is much more work than simple
software patches.
Andreas: Yes, and that concludes
our talk on PLATYPUS. Thank you all for
listening.
Applause and Music
Herald: Thank you very much for your
excuse me, nerdy talk and thank Moritz,
Michael, Daniel and Andreas. We head over
to our Q&A session and the first question
would be, how does it come that you have
so, let's say through the back door
against CPU attack against the CPU idea,
you mentioned you attack the through a
power driver RSA. Could you tell me a
little bit more about that?
Moritz: Yes. So the basic idea of
attacking cryptographic algorithms with
power side channel attacks is not very new
This was like one of the first things
researchers have shown, but most of the
time for like smaller devices, like smart
cards, like your bank card, for instance.
And for those attacks, you usually had
like an oscilloscope that you needed to
attach to the device to do the attack. But
with modern processors, they have
basically an oscilloscope built into the
processor, which you can read out as the
operating system. And in our case, there
are like drivers that expose this
interface, also to userspace. So from
there as an unprivileged attacker, you can
then try to exploit that. And yeah
basically the best thing that we wanted to
achieve with those attacks is to attack
cryptographic algorithms and not to
transmit some data between two processes.
Herald: Cool, thank you. Our next
question, you mentioned a little bit about
ARM sorry, AMD, how about ARM? So not x86
architecture?
Moritz: So there are many other vendors
that have similar interfaces, some of them
also provide drivers that expose them
directly to userspace, but we hardly had
any access to those devices, so we could
not really fully evaluate if these attacks
are also possible on them. But in the
paper, we have an appendix where we
describe them in a bit more detail so you
can try it out on your own and let us know
if it works.
Herald: Cool. Thank you. So please, fellow
hackers, try it out at your system, at
home. Now, our next question is related to
that. Is there a survey which hardware has
the RAPL or similar weaknesses? Intel,
AMD, ARM even.
Moritz: I don't know if anyone else wants
to answer that, I can also take the
question. So the RAPL interface itself
comes from Intel, but a similar interface
is also implemented for AMD, and they also
use basically the same name. They have
a... For now, it's implemented in two ways
for the Linux kernel, also in the RAPL
driver, but also in a separate called AMD
Energy Driver, which is included since a
few months in the Linux kernel, in the
upstream Kernel. And for other vendors it
works a bit differently. So some of them
just give you similar measurements, but
not in a tightly related way to the RAPL
Interface with a measure over a period of
time and give you the average.
Herald: OK, and..
Michael: Maybe to add one point here: On
Intel, basically the high resolution
sensors are included since the Skylake
micro architecture. So something around
2015.
Herald: I see. We have another related
question to AMD. So did AMD issue any
Microcode update for the secure encrypted
virtual machines case apart from
restricting access to MSR?
Moritz: Not as far as we know. But from
our knowledge to attack AMD CPU's, we need
to wait for a new generation so that we
can do similar attacks from a similar
threat model than we can do on an Intel.
Herald: Cool, thank you. So another I
think this is also related to it, you
mentioned your Xen example where you
attack through a hypervisor. Does it work
on other hypervisors like KVM or hyperV as
well?
Moritz: So for KVM, I don't think so. For
Windows I also don't know I don't think
they exposed those MSR directly to the
virtual machines. So the issue is really
here that we can have access to those MSRs
at the virtual machine where we should not
have access to.
Herald: OK, we have another question from,
I think, the hardware section of our
remote Congress. Someone wonders if the
same could be achieved with external power
measurement.
Moritz: You mean if you could attach
actually an oscilloscope or a different
probe to the CPU? Yes, you can do that.
And it has already been demonstrated in
the past.
Michael: But it turned out with external
tools, it takes even longer than with
software. You have more issues finding the
right spot in measuring. And there is one
paper, it took 14 days of collecting
traces which are harder to probe, which is
much longer than in software. But it can
be done.
Herald: And there's another follow up
question, how external is external? Where
do you measure power consumptions of an
x86 server?
Moritz: OK, you would need to get physical
access to the data center, I guess. And if
this is in your threat model, you probably
have different things to worry about.
Michael: Yeah, you still need to find the
right spot on your mainboard.
Herald: OK, so are there, let's say
documentation's where to get that right
spot.
Moritz: I think one can take a look at
other research papers where they attached
a probe, I think there are experts out
there, but I don't know.
Herald: OK, thank you. The next question,
why is the power information exported in
such detail to the kernel or userspace
software? Why isn't it only available to
the firmware or filtered to return an
average, for example, one second power
trace?
Moritz: Good question. We did not
implement that. I think the reason is...
Andi?
Andreas: The once second power trace would
make the attack only slower because you
can still do exactly what we did with
single stepping here, because RAPL is
already very slow and we need a mechanism
to replay instructions to get a good
reading of the energy consumption of the
instructions. So if you only increase the
update rate there, the attacks would still
be possible, but only take longer to
record the traces there. So you have to...
Yeah. So you have to find a tradeoff
between your countermeasures there.
Herald: Okay, so let's say with an
average, your resolution is lower, but
still it just takes more time to record
it. And still it does work, right?
Moritz: Yes. And the other thing is that
one needs to keep in mind those drivers
are not written for security in mind, but
for performance so that this can be used
by other tools that like give you the best
performance of your CPU. And in that case,
it just has not been masked and you get
the value directly at the operating system
sees.
Herald: Crazy. Our second to last
question, how long is the update interval
for this measurement? I heard something
about...
Andreas: For the fastest register we
observed, it's like 10 microseconds, for
the slowest one... So there are different
domains where you measure only parts of
the CPU and for the whole package, this
includes all the cores and the memory
controller, it takes around one
millisecond there. So this is already very
slow, if you compare it to the frequency
where CPUs are currently running at.
Herald: Crazy. In this case, are there any
other questions from the interwebs, from
Twitter, from our IRC channel? Because
otherwise we would head over to more,
let's say, personal interview. Let's give
them a try.
In this case, no more
questions, so in this. So, again, thank
you. Moritz, Michael, Daniel and Andreas,
for these for this really interesting talk
for this Q&A session, the Internet tells
me no questions. We head over to our
personal interview. I asked you earlier
before our talk. So with all these, let's
say, research things going on in the
Corona time. So what's your personal
experience? What changed in your work life
balance in the last one year?
Moritz: I think the biggest change is that
most of the coffee breaks you do alone
instead of with the colleagues.
Herald: So how do you meet in your in
your, let's say, lunch break? Do you have
as well a lunch break break out session in
Jitsi? Yeah, we started with Jitsi, but
used different systems on the long way.
And now it's like a fixed coffee meeting
at 2:00 p.m. every day and try to meet
everyone or have individual meetings, of
course.
Herald: And does this work? But so is
everyone on time. So sharp 12?
Moritz: No, but I think no one really
cares.
Herald: So it's just for socializing?
Moritz: Yes. But we also discuss work
related issues also in separate meetings.
And yeah, I think time is different, but
you get used to it. But let's hope it's
over soon.
Herald: What about the others, Michael?
Michael: Yes, I'm in the same coffee
breaks as Moritz. Sometimes every day,
depends on the workload, so I feel quite
lucky that we can still work full time and
get our work done. And I don't have to
fear that we lose our jobs in the in the
short term. So I think that takes a lot of
pressure off. But, yeah, I mean, it's
different. I'm also missing the
conferences, so I used to travel around a
lot before Corona times and this year is
basically nothing. So you really miss the
social interactions and conferences,
meeting other researchers, exchanging
ideas, having that online is different and
just not the same, but still it works. So
I can still do a lot of research. The
positive thing, you have less
interruptions than when you're in the
office. So that's a positive thing. But
yeah, I also hope that it's over soon.
Daniel: But then again, on the other side,
you have way more conference calls because
instead of writing emails, people ask for
conference calls all the time.
Michael: Yes, you are in meetings all the
time.
Herald: Yeah, Daniel you mentioned earlier
you're, let's say, flightplan the last
year. And as far as I understood it, you
like to be in personal contact with your
colleagues, also from others or from
foreign countries. How does this work? So
let's say topic exchange between different
organizations, between different
countries?
Daniel: Yeah, it's more difficult. So in
2018, I had these 54 talks outside of Graz
in 52 weeks and this year I had a single
talk outside of, outside of Graz where I
was in person of course. Of course more
Online. Um yeah. So it's, it's difficult
to engage with people from other places,
but it works of course in teams that you,
that you already have established in the
past, for instance. So you can continue in
teams that you've already built there. But
also in some cases it works to start new
collaborations. But it's of course more
difficult than if you can just meet people
in person like we did for this paper
actually, David Osvald, one of the
coauthors, we met with him in person and
talked with him about the paper in person.
Herald: Andreas, what's your, let's say,
Corona year?
Andreas: Yeah, since I'm one of the
persons who was interrupting Michael all
the time I am missing the office because
it looks like the unscheduled flow,
because it's sitting in an office and
suddenly you have like a question or idea,
you can not or you don't have to write it.
You can just ask it on the fly. So I'm a
bit missing that side. On the other side,
I gained a lot of time since I don't have
to travel to work there. And often I got a
bit better in writing stuff I want to
know, asking questions more, much more
faster, like losing the clover and that
stuff. And so I think it's both positive
and negative. And I only joined since I
think August, when I finished my master's
thesis and in the first half of the year,
I worked at a software company where the
first lockdown was also handled very well.
So we had like a smooth transition. So I'm
kind of used to home office, but I miss
interacting with people.
Herald: I think that's the main thing 2020
brings us: more remote work. Which is
basically a good thing to work more from
home, but we have some minutes left. And
please excuse me myself. Did your mate
consumption increase or decrease?
Moritz: I think it's hard to say for
coffee because I used to drink more coffee
in the office than at home. Yeah, but but
now I see it when we go grocery shopping.
laughs It's hard to say.
Michael: I think it decreased for me
because now if I'm tired, I can simply
take a nap, thats easier.
Herald: And just turn your instant
messaging off.
Michael: Yeah.
Herald: So our time is over. Thank you
again for the brilliant for the amazing
work, for these attack against CPU, for
the great puns you brought, for the nice
interview and have a nice remote Congress
3.
postrol music
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!