OK everyone

also now please join me in welcoming Eric, who is a PhD-student at the VU Amsterdam,
and he will talk about ASLR.

Please give him a warm round of appluse.

Hello.

Like Herold said, I'm Eric, PhD-student at the VU Amsterdam, VUSec group.

I will now be presenting work, that we have done in the group.

But the work I'm presenting, most of the work was done by Ben and Kaveh and by Stephen, who showed that attack I'm presenting is applicable to all 22 micro CPU micro architectures that he has tested.

I tried to sneak this slide in all my talks, but this time is especially apt, because this talk is about finding them.

So this talk is about attacking ALSR, which is short for Address Space Layout Randomization.

It's an exploit mitigation technique, which as far as deployment concert is one of the success stories since it's been introduced.

It's been widely adopted and it makes exploitation somewhat more difficult.

The way ASLR makes it more difficult is that it changes the location of code and data usually every time the processes run, so that an attacker cannot rely on certain addresses to be the same all the time.

On modern 64 bit architectures the address space usually is 48 bits, which means you can address about 256 terabytes of memory.

Of course you cannot write everywhere or read everywhere, because your computer probably doesn't have that much memory.

So in reality only very small portion of the memory is allocated to a process.

So it's quite easy to change the location of that memory.

So it makes life for a exploit writer a tiny bit more difficult, because it's very usefull to know the location of data, for example if you want to overwrite a return address on the stack, then it's nice to know where you can jump to and if you don't known, you may jump into nowhere and the program crashes.

However not much is needed to bypass this mitigation.

You just need leak location of the memory.

So I really like this backronym.

You can try to reuse the bug, that you can use to exploit, to first leak information an then exploit.

Or, if that is not possible, you can find another bug, which allows you to leak this location.

Or maybe you don't have to.

This presentation si about an attack, which uses side channel from javascript on processes in the hardware itself, to discover information about locations of data or code in memory.

The modern CPU architecture is a wonderous abstraction layer.

Even if you as a programmer write machine code, there is lots of stuff you don't have to worry about.

Especially stuff to make your programs fast.

Memory access is very slow compared to CPU on modern computers.

That's why there is a cache mechanism built in.

Other things are also abstracted away.

For example if your program does a memory access, the data is written to the cache, but where is it written?

Your program gives a virtual address to the CPU and the CPU needs to translate that to the physical address, which is done by a component called memory management unit.

The memory management unit has a small cache of mappings from virtual memory to physical memory,

but if an address is not in the cache, it has to do a page table walk.

The page table walk is what we are going to try to attack.

We'll measure the effect, that page table walk will have on the L3 cache, the last and biggest cache on the CPU, to find out, what happens in the page table walk.

We're talking about doing timing attack from javascript to measure wheter mememory gets accessed, which means that we need a pretty good timer to be able to do this.

Luckely for us, the browser standard commitees have come up with an API to just do that.

You can take a timestamp, do an operation and then take another timestamp and then you get a very crisp time measurement.

Until someone published a paper, which showed basically that you can do last level cache attack on the cpu and discover something.

So the browser makers made the time measurements much more granular.

Every microsecond or so you get a little bump and then for one microsecond nothing changes.

But all is not lost for the attacker, because you can turn the coarse grained timer into a fine grained timer,

What you can do is for example wait for this bump to happen and then quickly do an operation and then start a counter.

And then, the longer the operation takes, the smaller the counter is when the jump happens.

So in chrome they chosed to vary the length of the time when this happens, but still you can do multiple measurements and then you take an average and then you can still get a good measurement.

However we can do better.

The browser makers decided to make this a bit more difficult, but when the browser standards commitee takes they also gives.

They decided to implement an object called the shared array buffer, which allows multiple threads, which are called webworkers in javascript, to work on single piece of memory.

They decided to enable this by default, which is actually after we published the attack.

They basically given up on preventing nanoseconds scale time measurements in javascript.

The shared array buffer can be used for other things, but I'll no talk about this today.

So how can we measure time using shared memory.

Well it's quite simple.

One thread is used for doing the time measurement and the other thread does the operation.

The timer thread waits until the the thread, which does the operation sets a variable and starts the operation, meanwhile the timer thread sees that the shared buffer has changed and will start counting and when the operation is done, the second thread changes the buffer again and the counter thread stops.

So this gives a very crisp measurement.

So now we have a nano second scaled timer and we can do side channel attacks from javascript.

So we'll be doing a timer attack on the last level cache and when the CPU accesses memory everything is on the granularity of cache line, which is 64 bytes.

Whitin for example the level 3 cache a certain phyiscal address maps on to a certain cache set and this cache set can for example on a four core desktop intel machine contain 16 different cache lines.

I'll talk about a modern intel machine, but the concept translates also to other microarchitectures.