<i>32C3 preroll music</i>

Herald: Okay, welcome to our
last talk in this hall today!

It’s about Console Hacking and I guess
that’s the reason why you are here.

Console hacking has a long
tradition at our great conference

and we have seen lots of funny things.
People doing stuff with Xboxes,

Playstations and everything.

Okay. Today we got a team which
deals with the Nintendo DS,

so give a warm applause
for plutoo, derrek and smea!

<i>applause</i>

smea: Hi! I’m smea,
this is plutoo, this is derrek,

and today we are going to talk to you
about our work on the Nintendo 3DS.

So, the way this talk is going to be
structured, is we are just going to

go over all the hardware,
organisation, software, like…

Just give you a basic overview
about how the system works.

And after that we are going to go into

basically every layer of
security the system has,

and break every one of them.

<i>laughter</i>

<i>applause</i>

Okay. So, as you probably know,
the 3DS, the original Nintendo 3DS

was released in 2011. It’s a system
that is kind of underpowered.

It’s got, like… It’s got an
ARM11 dual core CPU,

268Mhz, it’s got a nice
proprietary GPU, a bit of RAM,

you know, the usual. It’s also backwards
compatible with the DS games,

which is nice. Then the new 3DS
was released in 2014 and 2015,

there was like different regions. And it
was basically just the same console,

just some improvements in the
hardware. You’ve got a better CPU,

it has got more cores. It’s faster, it has
got more RAM. Basically everywhere.

So, it is just the same thing,
it runs the same software, exactly.

It has got some exclusive
software, but not much.

So, in terms of a hardware overview, this
is what what we are going to talk about

looks like; in general. So you
got the top part right here,

which is what we are
going to go into first.

This is like the ARM11 part.

Basically, you’ve got the ARM11,
which is the main CPU. It runs

the main operating system. It has
2 cores as I just said, or 4 cores.

So, it runs the main operating
system, it runs the games,

it runs all the applications.
Basically, it’s just –

if you’re doing something on the 3DS
that you can… you can see it happening,

it’s happening on that CPU. It has got
access to all of the main memory.

So that includes FCRAM,

which is 128MB or 256MB,

depending on which model it is.
And FCRAM is actually divided

into 3 separate regions. So you
first got the Application Region,

which contains the currently
running game or application.

The System Region, which contains applets,
which are basically tiny applications,

which run in the background.
So, that includes the home menu,

which is actually always running
in background, and the web browser,

which you can actually run at
the same time as your game, so

it has to run there. And then you got the
Base Region, which is more interesting.

It contains all the system modules
of the operating system,

as well as some kernel data,
such as handle tables

and MMU tables. So it is kind of sensitive
stuff. And then we got a WRAM,

which is tiny and contains all
the kernel code, and, well,

most of the kernel structures as well.
So it’s also an interesting target.

Then we’ve got the lower part, which
is the ARM9 part of the hardware.

So the ARM9 is basically a separate, well…

it’s an entirely separate CPU,
which has access to…

well… So it runs basically the
same microkernel as the ARM11.

It’s mostly the same code,
it has just got some pure features.

Mostly it runs a single process,
which is called ‘Process9’,

which does everything the ARM9 does.
Beyond that the role of the ARM9 is

to broker access to hardware that
might be sensitive in terms of security.

So one of the things it does is it
brokers access to all storage media,

so that includes the permanent
storage as well as the SD card.

And then it does all sorts of crypto
stuff, which is really important,

and does that by using hardware, actually.
So there is this hardware key scrambler,

which is used to.. to store
secrets in hardware basically.

The idea is, you feed
it two separate keys,

and it is going to generate a
normal key and feed that directly

into the hardware implementation
of the AES algorithm.

So that way, we never
actually see the final keys.

So that’s something that
is kind of annoying.

And then beyond that what you can see is:
the ARM9 has access to all of main memory

without much of, well, without any
restrictions. But it has also got

its own internal memory which the
ARM11 does not have access to.

So the ARM9 internal memory is
where the ARM9 stores all its code,

all of its data; and this way we
can’t actually take over the ARM9

just from the ARM11 without some kind of
exploit. So it’s basically a security CPU.

So this leads us to having
4 layers of security.

Basically, you’re first going to have
the ARM11 userland, which is what…

well, like your games, your applications,
whatever. On top of that,

you’re going to have, well, below
that, I guess, the ARM11 kernel.

So that is going to have
full privileges on the ARM11.

And then you’re going to have
ARM9 userland, which is ‘Process9’.

Beyond that you’ll have ARM9
kernel mode. So that’s in theory.

In practice, the microkernel
has a system call,

which we call… syscall…
we call it ‘svc backdoor’.

Because essentially you feed it a
function pointer and it just executes

that function in kernel mode.
So you don’t even need an exploit

if you have access to that syscall.
Of course, on the ARM11

no application or title or anything
ever has access to that,

but on the ARM9 ‘Process9’ actually
has access to it. Which means,

that from here we actually…
well, userland and kernel mode

are basically the same thing.
When you got userland on the ARM9,

you got kernel mode.
So that’s nice.

Beyond that, in terms of
cryptography on the system,

basically, they went out loud (?). So, anything
that can be signed, is signed.

So, that includes the firmware,
that includes every application.

Signatures are checked not only
at install time but also at runtime,

so that’s something to keep in mind.

Same thing: anything that can
be encrypted is encrypted.

And anything that can be made, well,
console-specific through cryptography

or authentication, such as
internal permanent storage

or the data that is stored on
the SD card, or savegames,

or extra data for games, this
is all made console-specific.

And gamecard-specific in
regards of savegames.

So, that’s kind of annoying as well. And,
of course, all this is handled by the ARM9

using the hardware… the crypto
hardware, so we got to get through that

if we want to do interesting things.

So, first we are going to go through the
first layer, which is the ARM11 userland.

Basically, getting a full
hold onto the system.

So, we first need to find
some kind of entry point.

There are problems… well,
there are challenges there.

One of the challenges is
that the system implements

strict Data Execution Prevention. So,
existing pages will never be read…

well, will never be read-write-executable.
It’s all only going to be read-only,

or read-writable or read-executable.
There’s no way from a standard application

to reprotect or map new pages
that are read-write-executable.

Because all of the system
calls are locked out, except for

higher privileged system
modules. Another thing is

that there is no ASLR, so that is not
a challenge, that’s actually kind of nice.

The nice thing here is that we… well,
that makes savegame vulnerabilities

totally fair game because, well, we don’t
need an actual scripting environment

or any kind of exotic
vulnerability to exploit this.

As long as we can get past
DEP somehow. And then,

of course, the fact that all
savegames are both encrypted

and made specific either to the
gamecard or the game console,

in the case of eShop games, is really
annoying for savegame vulnerabilities

because basically you can’t use those
as an initial entry point in most cases,

because, well, you can’t generate
the right, well, ES MAC,

or just… you don’t know the right
cryptography. So, that’s annoying.

Thankfully, the 3DS runs Webkit…

<i>laughter</i>

So, that’s nice.
Can always use that.

<i>applause</i>

So, Webkit is used in a number of places,
obviously it’s using the main web browser,

which you can access from the home menu.
It’s also used in the Youtube application,

which is available free on the eShop
and doesn’t use any kind of

client side authentication for the server,
so you can just redirect traffic through,

like a DNS server for example. Miiverse
applet, other stuff, that also uses it.

Slightly more secure, but might be
usable at some point, I don’t know.

Anywho, the important part here,
is that it’s not only using webkit,

it is using a very old version of webkit.
Basically, they do cherrypick

some patches into the version
of webkit they use, but only

after we exploit those on release, so it
comes a little too late, most of the time.

So yeah, this has been used by multiple
people, most notably yellows8,

but it has proven to be a very
efficient, reliable entry point.

Beyond that, we got Cubic Ninja as initial
entry point. Cubic Ninja is a game

that was released in 2011 on Nintendo
3DS. It is nice, because it actually

allows users to share levels
that they make themselves

through QR codes; and also it is
really bad at parsing those levels.

So what you can do, is just, well,
manufacture your own QR code

that is going to crash the game
and give you access. So these are

nice initial entry points. So, once we’ve
got this, what we have to remember is

that we might be able to crash the game
and may be able to control registers,

but we don’t actually have our code
running because of that. So,

the obvious solution to
hit this, is to use ROP.

For those of you, who are
not familiar with ROP:

You build your own fake stack
that lets you return into

code snippets that are located right
before return instructions. That way…

so this is an example. You can just

jump to this kind of instruction,
so ‘pop {r0, pc}’ and then

this is going to let you load your own
register value and then it is going to

jump to the next instruction that you give
it. So, this is a way of executing code

without actually executing code,
which is widely used; so this is like

the obvious thing to do. Of course,
ROP is annoying. It is very limiting.

It can be enough to actually execute
an exploit to get higher privileges,

but overall it is just annoying and very
limiting for homebrew, for example.

And of course, as I mentioned earlier, we
don’t have access to any of the system calls

that would let us map
read-writable-executable pages.

Also, the system does support dynamically
linked libraries, so that might be a way,

but these are signed and checked in
places that we can’t access at this point.

So, what we’re going to look
at next is the GPU to see

if we use that to bypass that.
What you can see here is that

the GPU has access not only to
video RAM, but also to FCRAM,

which is, if you recall it, main
memory. So, if you look at this,

with all the different memory regions,

we have got the Application Region
here, which is entirely contained within

what the GPU can access within FCRAM.
Of course, the GPU can not actually access

all of that FCRAM, so that is kind
of limiting. What we can see here,

is that, of course, application code is
within range of the GPU’s level of access.

The reason the GPU has access to
FCRAM and Video RAM, through DMA,

by the way, is, so that it can access
information such as textures,

vertex buffers, this sort of thing.

So, it’s actually kind of important. And
the reason it can write to it is because

it has to render its data somewhere.
The point is, that we can use this

to render data into main memory.

And main memory contains application
code. And since the physical layout is

actually completely deterministic, and
even if it wasn’t, we could just use the

read capabilities of the GPU to
search for what we are looking for.

Well, we can use this to overwrite our
current application’s text section

and we get code execution
that way, in spite of DEP.

Yeah, so this is where
we get code execution…

<i>applause</i>

We execute our own,
unsigned code, which is very…

<i>applause</i>

It’s great, but we are still confined
within the application sandbox.

So, we bypassed DEP,
we are inside the sandbox.

This means we can only access
our current application’s savedata,

so if we want to install some kind of
secondary exploit, this is too limiting.

We can only access certain services and
system calls, which is also limiting

and frustrating. And we can’t alter
memory layout, so we can’t allocate

more executable pages
than I mentioned earlier.

So, we are still kind
of limited at this point.

So, what we are going to do, is look at
what else the GPU can access.

And you can see, is that, of course, there
is this entirely separate memory region

the GPU can modify.

So it can access most of the System
Region. And the System Region contains

a few things. It contains the home menu, as
I mentioned, because that is an applet.

It contains the internet browser, and it
contains actually a single system module,

which is called ‘NS’, which we think stands
for ‘Nintendo Shell’, we don’t really know.

So, let’s look at this. First we got
NS code well beyond the GPU cutoff.

We got menu code, which is
also well beyond GPU cutoff.

But we got the menu’s heap, right here,
well, actually there is separate heaps,

these are well within the
GPU’s range, so that’s good.

NS unfortunately is still well beyond the
cutoff. All of its data, all of its code.

So we apparently can’t get to that.

So, then the idea is, to just,
well, okay, so actually…

What’s interesting here, is that
the cutoff is right before the end of

the System Region, which as we just
saw, has some interesting things, but

also excludes all of Base Region,
which also has very interesting things.

So, it seems likely that Nintendo knew
about the capabilities of GPU DMA,

like the theoretical capabilities, but
they didn’t do anything about it.

So, it seems that they probably didn’t
realize what we could do with it,

which is a lot.

So, basically, we got menu heaps. So
what we do, is… we have a heap, and

this is all C++ code. We are just
going to find objects inside the heap

and overwrite it. So it’s pretty simple.
Just find an object, that is going to be

triggered to some kind of synchronisation
mechanism. In this case, it’s gonna be

just ‘Return to Menu’. And we
create some kind of vague vtable

and get it to run our own
stack pivot. And then we get…

we get ROP execution under
Home menu, which is cool.

We still don’t have code execution
in the Home menu, but that’s okay.

So, we can do a bunch of stuff from ROP.

We can access a new system
service, which is called ‘ns:s’,

which is very helpful, because it can
kill any arbitrary process, as well as

create new ones. Also it gives us access
to SD card, which most applications

actually don’t have. And it lets us
decrypt/dump any title on the system.

So any game, even if it uses new
cryptography that Nintendo introduced,

we can actually dump that, because
for some reason, well, Home menu

apparently needs access to
that. And then we can also

access and overwrite all that extra data
used by any application, which is great.

So we use this as a base
for running homebrew.

Our homebrew launcher is
essentially just a service

that runs in the background under Home
menu process. It is written in ROP,

which is kind of disgusting, but it works.
<i>laughter</i>

The ‘Service’ handles running homebrew,
so the process is very simple. You just

kill off the current application, you
spawn a new one, and then you take it over

using the GPU DMA access.
And then, what we do is

we send all of these new capabilities that
we got through handles to the new process

and that gives us some
higher privilege homebrew.

It also handles events, such as Home
button, Power button, all that good stuff.

Which is nice, because we can actually
run code under any arbitrary application

or game, so we can actually modify
these games. We can run ROM hacks.

So there has been a bunch of translations
that can be run through this, for games

that haven’t come out outside
of Japan, so that’s pretty nice.

It’s the same principle, you just
launch the app, you take it over,

you pass the code, and then
you jump to it, essentially.

All within the confines of
userland, which is nice.

So, the other thing is, we can actually
access any game or application’s data

because we can run code under
it. So, these things include

savegame data for any game. So we
can actually install more convenient

secondary entry points, which do not
rely on the browser, which can be

patched any moment, or on some old game.

So, some examples include ‘Menuhax’
by yellows8, which exploits

faulty theme handling code, which
was introduced in firmware 9.0.

Which is really nice, because this way,
you can actually just run homebrew

right as Home menu is opened,
so right on boot time,

which is great. Then you got other games.
Of course you got a Zelda game

that’s vulnerable.
<i>audience chuckles</i>

This time it wasn’t the
horse’s name, but pretty similar.

And then you got other games. We
got tons of entry points at this point.

We’re really, literally drowning
in them. So, this is nice.

But we forgot about ‘Nintendo Shell’,
right? It’s a very attractive target,

for a couple of reasons. For one thing,
it has access the ‘am:u’ service,

which can be used to
downgrade any system title.

It’s not actually designed to downgrade
titles, the thing is, you can both

install and uninstall titles.
So, what happens is,

if you uninstall a title, and
then install an older version

of that title, you actually
bypass the version check.

So, you can just do that to
downgrade any system title

and bring back old exploits,
if that is necessary.

Assuming you have
access to the service.

And of course it’s in a region
that we can partially modify,

so it’s an interesting target.

Unfortunately, we can’t actually
access its data right now.

But maybe we can actually move
it to somewhere, where we can.

The idea is, if you were to kill NS, and
then allocate something in it’s place,

then run NS again, you can
move it below the cutoff.

<i>laughter</i>

<i>applause</i>

Thanks. But unfortunately
it’s not that simple. That can’t work.

The reason being, that we actually need
NS to be running to launch NS again.

So that kind of sucks.

But… well, no.
Actually we also can’t run

a second instance of NS at the same time,

so we can’t do that either.

But interestingly…
Well, the 3DS has an interesting feature,

which is called ‘Safe Mode’. Basically
it’s a second firmware, which is

an old version of the
regular one, and that

creates a bunch of
copies of system titles.

Most of them, anyways. So that gives
it a different ID. So, the idea is,

that if it has got a different ID, we
might be able to run it at the same time,

because, well, PM might fail
to notice that. Of course it doesn’t.

It actually does notice that. So we can’t
run the Safe Mode version of a title

at the sime time as the regular
version of the title. But,

for some reason, in the case of NS – you
might not be able to see this very well,

but we’ve got NS’s regular title right
here, and then we got Safe Mode NS

right here. And for some reason
they created a new 3DS version

of the Safe Mode version of NS,
though there is no new 3DS version

of the original NS. So that
creates a separate title ID

which we can run at the same time
as regular NS. So then, the exploit

becomes very simple. You keep NS running,
just allocate enough data, that it will be

below the cutoff; and then you
just run new 3DS Safe Mode NS.

And then it’s within range of the GPU
and you can take it over and have

access to everything. So, this is nice.

It’s more of an oversight than
a proper exploit, but whatever.

So this gives us access to a
bunch of system calls. Mostly

service handling system calls,
so we can post our own service,

which can be useful for other
exploits that I won’t get into, for

impersonating other services
to other system modules.

And then we got access to all of
these services, which is great.

So we can downgrade
system titles arbitrarily.

And this runs in background, which
can always be helpful for homebrew.

The only problem is at this point,
it’s still new 3DS only, because

it relies on this new 3DS title. But
there are actually ways around that.

This was just to show that we can actually
get fairly high levels of privilege,

even still just always staying
in userland on the ARM11.

And there are other, similar attacks to
that. If you’re interested you can look up

‘rohax’, which is a similar
attack in the system module.

So, now derrek is going to talk to you
about exploiting the ARM11 kernel.

derrek?
<i>applause</i>

derrek: So, hi everyone!

First, I will give you some
very short inside view

of the kernel, and then I will
explain how you can exploit

the latest version of the ARM11 kernel.

So,

this is actually Nintendo’s very
first gaming console kernel.

Like on any other older console,

there was no kernel. All games
were just running on bare metal.

Like there was a kernel for the Wii,

like a very small microkernel
running on the security processor,

but that wasn’t written by Nintendo.

So it’s their very first
gaming console kernel.

That kernel is made to be thread safe,

so it can run on multiple cores

at the same time and there are like

130 system calls available.

So that’s quite a lot, in my opinion.

But usually, if you have gained execution

in ARM11 userland, you
only have access to, like,

around 50 system calls.

And there’s a reason for that, but I’m
going to explain that in a second.

So, internally, the kernel
works with C++ objects.

So here are some examples
for system calls. So, we have

‘CreateSemaphore’, for
example. That will just create

a semaphore object in the kernel

and it will return a
handle to the userland.

And when you want to do any operations

on that semaphore, you
have to pass that handle

to the kernel, and it will look up
this handle in a handle table

to find the original C++ object.

Also there are 2 different
kinds of memory allocators.

So, we have a memory allocator
for the main memory, which is

the FCRAM. And there is also a Slab Heap,

where all the C++ objects are stored in.

And this Slab Heap is located in FCRAM,

which is the ARM11 memory,

where all the kernel code and data is in.

Also, there’s an IPC system.

IPC is ‘inter process communication’.

And it basically allows you
to talk to other processes

like services,

e.g. the GSP service or FS.

So, let’s look at the security.

So, the kernel is really small.
There are only like 200KB of code,

which is pure ARM code. And
there are only like 1000 functions.

So, they try to keep
the code size very low

and that makes it harder to find bugs.

The code size is really small, and

you don’t have really much to choose from

what to exploit. Also there are no
symbols included in the kernel.

Like when you run strings on it, it will
just give you some names of C++ objects,

but there are no function
names or something like that.

As we have seen earlier
it’s physically isolated

in its own memory. Which turned out
- of course - to be a good idea.

Otherwise it would have been
overwritable by the CPU eventually.

And all objects have a reference counting.

So that’s similar to the
C++ shared pointer

where every object has a small field

like a counter field and everytime
the kernel wants to use an object

this counter gets increased.
And everytime the…

like when the reference is no longer
needed it will decrease the counter

and when the counter reaches Zero it
will automatically delete that object

from the Slab Heap. So they are basically
trying to prevent use after freeze.

Also I’m not sure if that’s
a security measurement

but there are more than 100
panic calls in the kernel

and that’s every 10th function

- per average. And they have
the syscall access restriction.

So you - as I said - you only have
access to like 50 system calls.

All the interesting ones are disabled.

E.g. you can’t map executable pages.

On the other hand there
is no ASLR. But at least

they’re trying to change the
memory mapping every time

during a larger kernel update.

Also there’s no stack protection. And
the Userland is always mapped.

So once you’ve got control
over the program counter

you can just jump to

Userland pages that are
marked as executable.

So you don’t have to do ROP in the kernel.

It’s pretty nice.

But they tried to have
an execution prevention

in the kernel that is: they’re
marking executable kernel pages

– that is the code – they’re
marking them as executable

in their Page Table. So let’s take a look.

The highlighted parts in orange
are the kernel code sections.

And as you can see like when
looking at the first highlighted line

it says ‘virtual address #FFF00’ etc.

is mapped to the physical
address 1FF80000.

And it is marked as executable
and you only have access to it

in Kernel Mode, of course,
and only Read access. Right?

So this is correct.

But when you look at the second
line of that Page Table dump

you will notice that
there is another section

which covers the entire AXI WRAM

and it’s mapped as Read-Write.

So it doesn’t really make sense. Yeah.

So basically it’s completely useless.
We have Read-Write access to it.

So, to summarize everything,

there’s actually no exploitation
protection. Once we found

an exploitable bug it’s
pretty likely that we gain

code execution in kernel mode.

So, let’s find that bug.

And I started at looking at the SVC table.

So this is kind of the interface
between kernel land and userland.

And this shows all system calls

that are available in the kernel. So
you have like normal system calls.

For memory management you can
map read- and writable pages;

you can mirror pages and do
other memory management stuff.

And there’s also some
configuration for threads like

you can choose which
core should be used for

executing the thread and all that stuff.

You have a really large range
of synchronization objects

like kernel mute tags and
all that stuff. And of course

you have IPC requesting, so you can

send messages to services. And
there’s a more advanced section

like this is used by services mostly,

because they have to
respond to your IPC requests.

And there’s also Kernel DMA,
cache control, some things.

And they have a set of debug system calls.

It’s just basic debugging.
You can set breakpoints,

read and write process memory.
But you don’t have access to them.

Like on retail it’s not actually used.

And so one last section
is the Privileged section.

And here are all the
interesting system calls

that allow you to create processes and

map executable memory and all that stuff.

Unfortunately, we can’t use the Advanced,
Debug and Privileged system calls.

I mean that would require
exploiting some service.

And that’s just more work for us.

So this leaves us with
the normal system calls.

But IPC sounds really interesting.

But unfortunately it’s full of panics.

Also there’s not much to attack at
synchronization object system calls.

So you only have like this
more interesting system call

for local memory management. And in
theory there’s a lot that you can mess up.

Right? There’s a lot that can possibly
go wrong. And also we have

unchecked DMA access!
Like through the GPU.

So maybe we can do
something useful with that.

Okay, so let’s have a look
at the memory allocator.

There are 2 types of memory allocators.

First is the regular one. And it’s
just for mapping normal heap

like for malloc in C, e.g. And you
have the linear memory allocator

that is used for GPU textures, like

when memory has to be
physically continuous

you use the linear memory allocator.

And there’s the FCRAM memory
layout that we saw earlier.

You have these 3 regions
and every region has

its own set of free pages.

So how are they keeping track of them?

So you have a region descriptor
which tells us the dimensions like:

where does it start, the region,
and its size. And you get also

a pointer to the first
free piece of memory

in that region. And each
free piece of memory

which we call a Memchunk
has a Memchunk header

right at the beginning. And
it basically tells the kernel

how large that Memchunk
is. And it’s also linked

in a Doubly Linked List. So you
have a next and previous pointer

pointing to the next and
previous Memchunk headers.

It kind of looks like that.
So you have the red parts

which are the free Memchunks
and the green parts are memory

that is already allocated. So

allocation is pretty straightforward.
It’s not really complicated.

So the first thing that the
allocator function does:

it loads the next free pointer
from the region descriptor.

And for regular memory it
just goes through the list

following the pointers
and it sums up their size

until the requested size is reached.
For linear memory it would just

look for a suitable memory chunk to make
sure that the memory is really continuous.

So when it found enough memory
it sets the next pointer

of the very last Memchunk
to Zero. It will then

update the list and also
the next free pointer

for the region descriptor
and finally it will return

a pointer to the first
Memchunk. So,

let’s look at this from
a security perspective.

And there’s a problem. They
basically have kernel structures

inside the FCRAM!
And that is a problem

because we have DMA access
to it through the GPU.

And there was an attack by yellows8

that is called ‘memchunkhax’.
And what he did

is basically: he overwrote
memchunk headers

with the GPU DMA
flaw. And then

he gained an arbitrary kernel write

when it’s deallocating memory. So because

next/prev pointers have been modified.

So, unfortunately, this
was fixed by Nintendo

in system update 9.3 last year,

like 1 year ago. And the new kernel will
now verify every memchunk header

during allocation. Like its size
and also next/prev pointers.

So, in theory, everything has been fixed.
Invalid pointers or invalid sizes

will just result in a
kernel panic. In theory.

So when you look at the system
call for Controlmemory…

we have access to it. It’s one
of the normal system calls.

It does basic stuff. You
can map/free RW pages,

but not executable of course. And it
takes an address and size as argument.

And also an operation code
which tells the kernel what to do:

to map or free pages, whatever.

So first it does some basic
checks on the address

and eventually it will
call a very large function.

And I just call that function
kern::controlmemory.

So what can kern::controlmemory:
it calls the allocator function

and it will just return a
memchunk header pointer

– as we have seen earlier. Then it goes
through all of the allocated memchunks

and it’s mapping them to user space.

And it’s also updating some block
information for KProcess object.

So there’s a problem. There’s
obviously a race condition.

Like we can overwrite memchunk
headers after they have been allocated.

So we could try using the GPU
but it’s really slow, actually,

because we would have to ask
the GSP service to read memory

and we have to go to this
very large IPC kernel code.

And that would be probably too
slow. Allocation is really fast.

Let’s dig a little bit deeper.

I tried to reconstruct
the source code in C.

So this is the first step.
It tries to allocate memory.

For this example, it will just
allocate regular memory.

So when it found a memchunk

which means that it’s not
enough memory is available.

It will then execute this
really interesting do-while loop.

I know, it’s a lot of code. I’m not
sure that you can actually read it.

So let’s go quickly through this code.

The pages read from the Memchunk header.
It gets converted to a physical address.

And that physical address
gets mapped to userland

by mem_map function. And then
it will go to the next memchunk.

Here. And it will also update
the userland virtual address.

And then it will clear that memory. So

what’s wrong here?

The problem is they’re mapping
the Memorychunk into userland.

And after it has been mapped
they’re accessing it again.

And what they access is the next pointer.

So we can just overwrite it.

When we have 2 threads running we can

– from another CPU core –
try to overwrite that pointer.

So our goal would be to map
kernel pages to userspace.

But there are some problems. It
requires really, really perfect timing.

There’s only a very small
time frame to do the overwrite.

Also, we need a Memchunk header
structure at the next pointer address…

…to do this. To make sure
we get a perfect timing

I came up with a kernel
address arbiter oracle.

It is actually used for thread
synchronization, we don’t care about it.

But it tries to read from address and
returns an error when the address is

not accessible by userland. So
we can use that system call

to make sure that the memory
has been mapped to userland.

And once it has been mapped
we’re trying to overwrite it.

So one last problem: we have to
inject a memory chunk error

in kernel. I did this
by using the Slab Heap.

We can just create some KObject
and set their member variables

to create a faked memchunk header.

So this is the Slab Heap.
We’ve got C++ objects,

vtable pointer and some attributes.

So the Slab Heap is basically just
a really large area of C++ objects.

And what I did was
I changed the attributes

and used them as Memchunk
header. And I am redirecting

the next-pointer to that
object and it will map

multiple C++ objects to userland.
And that’s really nice because

we have vtable pointers, so
we can just overwrite them.

And that means that
we gain code execution.

So, as a summary, we set
up some kernel objects,

change their attributes, request
memory from the kernel;

and once it becomes available
we patch the next-pointer,

overwrite that mapped
SlabHeap pages and

then we call a system call
which closes the handle

for the kernel objects that
we created in step one.

So it will eventually call
some vtable function

and it will just jump to our
modified vtable function.

And we got ARM11
Level0 Code Execution!!

<i>applause, motivated by smea</i>

So, now plutoo will tell us
what nice things you can do

once you gained ARM11
Code execution.

plutoo: Hey guys! Okay, so… the ARM9.

Let’s go.

The ARM9 is actually also used
for executing old DS games.

So what they do is, they actually,
you could say, reused the ARM9

which is their backwards compatibility
processor. They use it

as a security processor
when executing 3DS code.

And like smea said it’s running
a stripped-down version

of the ARM11 kernel. It basically
only does threading sequencation,

things like that. And there’s
no MMU. There’s an MPU,

8 regions you can configure.

You could do no-execute
within those regions etc. but

the granularity is not very
nice. And they only have 8.

So they basically ran out of space.
And .data+stack is executable

as long as you can jump to
it. And .text is writable

so that’s bad. Basically whenever you can

write code into arbitrary memory
you can just overwrite code.

These features – you don’t want
them on a security processor.

<i>laughter</i>

So let’s go. So it turns out that

there have been lots of exploits over
the years and most of them are fixed.

And most of them used the
normal command interface.

But in this case we’re taking
a different approach. So

on the 3DS the memory-mapped
I/O is split up into 3 regions.

There’s the ARM9-only I/O: it does crypto,

it does DMA engine,

things like that. Then there’s
the Shared I/O region.

And then, finally, there’s the
ARM11 I/O region which contains

the GPU video decoder.

Thanks to derrek and smea
we have full ARM11 control.

We execute kernel mode.

So the question is: can we use
the shared I/O region, somehow,

to own the ARM9? So it turns out

the interface for reading old
DS cartridges is actually

in the shared I/O region.

We’re not sure why this is, but

they have it there for some
reason. And it’s only the ARM9

which is actually using this region.
But ARM11 still has access to it.

So when you insert the cartridge
it starts by reading the banner.

And it does this by writing this
magic value to CTRL register.

And basically it just asks
for 0x200 [hex] bytes.

And then there’s this loop.

And this Assembler code
is on the right side.

You can see it basically waits
for some bits to clear / to set

and then they read 4 bytes and
then they wait for another bit.

And there’s no range check on the
buffer. But it’s always 200 bytes,

so it should be fine.

What if we overwrite the
CTRL register from ARM11

asking for 0x4000 bytes?

Boom!

We have a nice buffer overrun.
It’s in the DSS segment but…

it’s still nice. And can control the data.

So the data actually comes
from the cartridge.

We need to make our
own DS cartridge. So,

there’s this old device, called the
PassMe. It’s for the original DS,

where you basically plug
old DS cartridge in

and it basically modifies
the header as its read. So,

these are available online for 5 bucks.

And then you add an FPGA.

I implemented this and it
works, but it’s very gimmicky.

I don’t recommend it.

And here’s my soldering,
it’s not very nice.

This gives us ARM9 code execution
and this works on latest firmware.

But we want something better.
Let’s look at the chain of trust.

The chain of trust: the idea is of course,
you verify all the code that is running.

But you’re basically verifying
everything at load time.

The 3DS has the simplest
chain of trust you can have.

There’s the Boot ROM at
the start. And then it loads

the firmware binary from
NAND and it jumps to it.

On the new 3DS they were a bit clever.

They added an extra crypto
layer on the ARM9 portion.

But it’s actually part
of the firmware binary.

We call this ‘ARM9 loader’.

So the theory that Nintendo had was:

“Let’s add another layer of
crypto, so we change the keys,

we introduce new keys,
and they can’t break it”.

And they don’t have any worked-out
place to put those keys.

So they placed them in NAND!

But they’re encrypted with
the per-Console key that’s

based on a hash of the OTP
that’s unique for each Console.

And then OTP access is
disabled early in the Boot.

So later on you can’t dump the OTP
and you can’t figure out the keys.

This looks safe, in theory.
So here’s the implementation.

So they calculate some hash of the OTP.
They read the key-sector from NAND.

And they decrypt the key.
And they put it in a keyslot.

It’s basically an isolated memory area.

And then they generate
a bunch of sub keys and

they verify that the key they loaded
from NAND is the correct one.

So even if we were to switch the key
they would detect that and just panic.

And then they decrypt the ARM9 binary
and they jump to the entry point.

But… they forgot to clear the 0x11 key!

So we can just get code execution
later on. And we can just regenerate

all those keys! So this
implementation is useless.

Okay.
<i>laughs</i>

<i>applause</i>

And they fixed this because they have
more than 1 key hidden in the NAND.

So they took their next key.

It’s basically the same idea: you
calculate the same hash, you read

the key sector from NAND, you generate
all the previous keys for compatibility,

and then you decrypt a
new key, we call it Key#2.

And then you decrypt ARM9
binary using the second key.

You clear the keyslot, and
you jump to entry point.

But they forgot to verify the second key!
<i>audience laughs</i>

This is epic fail!
<i>applause</i>

So let’s exploit this. ‘ARM9LOADERHAX’.

We can change the second key. ARM9
loader will just decrypt the binary

to garbage and jump to it.

If you look at the encoding
of a ARM Branch instruction:

the probability is pretty high that
there will just be a Branch instruction.

And just any random data will eventually…
like if you try enough keys,

it will eventually become a Branch
instruction to some memory.

So if we try a lot of keys, eventually
we will find some garbage

that is useful.

This is the NAND of the Flash
memory of an unmodified 3DS

– a new 3DS. So there’s a small key
section, marked in teal, like, blue.

And it contains those keys
that we’re talking about.

And then there are 2 firmware partitions.

One is used for backup, in
case one gets corrupted;

so it doesn’t brick the device, whatever.

We installed our custom key.

And we installed the largest
firm binary we have

in the firm0 partition. And we keep
the one with the vulnerability

in the firm1 partition. And
then we put our code payload

on top of the firmware0 binary.

And then we reboot.
And so what will happen?

The Bootrom is executed.

It will load the first firmware partition.

And it has our code in the end,
but it doesn’t know about it.

And then it decrypts it.
And, you see, it looks okay.

There’s the ARM9 loader stub in the front;
and then comes the encrypted binary.

And then, finally,
there’s our payload.

But Bootrom checks the
hash, right? And it fails.

So it thinks the partition got corrupted.

So it will load the smaller one on top.
You see we have our payload in memory,

at Boot. And then it decrypts firmware1

which is smaller and it still has ARM9
loader and another encrypted ARM9 binary.

And then it jumps to ARM9 loader
because the hash checks out.

And then the ARM9 loader will
decrypt our corrupted key

from NAND and it will
decrypt this one to garbage

and it will jump to it. And
hopefully it jumps to our code.

So this gives us ARM9 code
execution from cold Boot.

Early, very early. So it turns out we
can actually use this to get some keys

that are later not available
because they clear those…

they use a certain memory area for seeding
encryption engine to generate keys

and the memory is later cleared.
So you can’t regenerate the keys.

But with this we can actually
get those 2 keys.

They’re called the firmware 6.x save-key

and firmware 7.x NCCH-key.

That’s a bonus.

We talked a bit about the AES engine.
It’s used everywhere for the crypto

and it’s used for everything, basically.

It supports all the usual
block cipher modes.

It has 2 security features: it has
write-only keys. Which is really useful.

Like you write a key and then
you can never ever read it back.

This means that they can
fill in the keys by the Bootrom

and we can’t dump them later.

So they can keep the keys secret.

Even if we hacked the ARM9, even if we get
code execution we’ll never get the keys.

And then there’s the key scrambler.
Which is that the key is actually

– it’s an optional thing –
where the actual key is hidden,

calculated by a hardware
function, that is never…

that we don’t know about. So the key
is actually never exposed to the CPU

– the actual key. So we just feed it 2
values, 2 keys and then it generates

a new key based on that. And
we don’t know what that key is.

So this creates a situation similar to
the isolated SPUs on the PS3

where you can ask it to decrypt
stuff, but you don’t get the keys.

And if you don’t get the keys,
then… we want the keys!!

We want to decrypt things on
our PC because we’re lazy.

So there’re 2 keys –
KeyX, KeyY we call them.

They’re 128bits and the
normal key is derived

as a function of those 2;
and that function is unknown.

It’s implemented in hardware, in silicon.

So even if we know X and Y we
can’t figure out the normal key

and we can’t decrypt things
without asking the 3DS first.

But we can poke this hardware engine.

The first thing you notice when you
do this is that if you set the N-th bit

of the X key and the N+2 bit in
the Y key you get the same result.

And in general, you find that
the function that we’re looking for

is actually just a function
of one variable where it’s

the XOR between the X rotated by 2…

so this is rotation, not shift,
and XOR-ed with Y.

But we still don’t know the key.
But we want to know keys. So…

So step back a little bit.

The keyscrambler is used for Mii QR-codes.

It’s used for everything, right? So it’s
used for network protocol, called UDS,

and it’s used for Download Play – which
is when you download games over WiFi,

temporary games. But the
Wii U also supports all of this.

But it doesn’t have the
key scrambler in hardware.

So the Wii U must be using normal keys.

<i>applause</i>
<i>screamed from audience: WHAT?</i>

<i>applause</i>

So we make a table of the shared keys and

these are the 3 keys that
are shared with the Wii U.

Who is where the KeyX
and KeyY on the 3DS…

where they are set. And 2 of them
have KeyY set by firmware.

So we can’t read the keys set by the
Bootrom because it’s locked away

and we don’t have it. But can
we still figure out G? Let’s see.

So I gave shoutout to shuffle2 and
to fail0verflow who hacked the WiiU

and they helped us… or shuffle
helped us extract the Wii U keys.

So thank you! Now we have KeyY and
we know the normal key from the Wii U.

However, KeyX is still unknown.

And if G(t) is ‘bad’ then a
small change in the KeyY

will only lead to a small
change in the normal key.

It’s bad! So let’s look at the data.

So when we flip one bit in the
KeyY we can brute-force all keys

similar to the normal key which
is just within a couple of bit flips

and we find that it always
results in the normal key

with bits flipped at
position either 87 or 88,

sometimes 89, but never 86.

So this reminds me of an adder
where you had a carry bit

being propagated to upper
bits, but never to lower ones.

So let’s guess that this is
an adder and let’s try:

it’s an adder with a rotation so
we guess that G(t) = (t+C)

– some constant C, we don’t know it –
and rotated to the left by 87.

And then we plug it in to our original
formula and we don’t know KeyX, remember,

because it’s set by Bootrom,
we don’t have it.

We don’t know the constant C because
it’s in silicon, it’s in hardware.

But if we look at the formula,
and we consider the inequality,

where we basically rotate right by 87

– we’re basically undoing
the outer rotation.

And then we plug in our formula
our guess. And then we get this.

And then we subtract C from
both sides. We end up with this.

And this is basically… we’re XOR-ing
2 different keys with the same X value

rotated to the left by 2.

Well if you stare for
this bit you’ll see that

if y0 and y1 – which are 2 different
KeyY’s – are equal except for

at one bit position then
the XOR is smallest

for the one which shares
the same bit value

at the position that the
2 Y’s are differing at.

It’s actually pretty simple
but it sounds difficult.

XOR is Zero if they’re the same
input and One if they’re different.

If they’re the same it’s
Zero and it’s smaller.

So we actually look
bit-by-bit on this. And

we repeat this 128 times. And we
recover all 128 bits of the KeyX.

And when we have the KeyX we can
calculate the silicon constant C.

So the end result is: the key
scrambler is figured out

and we have also the secret Bootrom
KeyX for a couple of keyslots, as a bonus.

<i>applause, motivated by smea</i>

I didn’t think trough the constants in
the slides because I want this to be

an exercise for the listener.

When the new 3DS was released
they rushed it, we think,

because they left some interesting
commands in the PsPs service. And

it included an early version of the NFC
crypto used for the Amiibo figurines.

This implementation, the first
one, uses a normal key. And the…

the newer one changed it to KeyY.

So they accidently gave us one of
these pairs in the firmware images.

We don’t need to use the Wii U at all.

So anyone who can decrypt
3DS firmware binaries

can perform this attack
to get the constants.

So anyone out there: Good luck!

And now: back to smea, for a summary.

<i>applause</i>

smea: Right, I’m just gonna conclude
really quickly. So, some take-aways of

what we talked about
today: first thing is:

it’s all pretty obvious lessons,
but – you know – bare with me

Giving access to physical memory to
any application, through GPU or whatever,

is dangerous. You should always be
careful about that. Even if you think

you’ve protected stuff, there’s probably
gonna be stuff that you forgot. So just,

like “you don’t do it or do it right”.

Other thing is: Shared I/O is
dangerous if you don’t know

what can actually control the I/O, then,
well, again, you should be very careful.

Also, only checking your data
before decryption is dangerous,

and - both that and not checking the key
when you know that it could possibly

be modified by an attacker
is a bad idea. And finally,

secrets in hardware are great
unless you give them away, so…

don’t do that! <i>laughs
</i>audience laughs*

Beyond that we just wanted to talk about
the state of Homebrew really quickly.

You might recall, on the - during the
Wii U talk around here

2 years ago. And fail0verflow said
that they didn’t think necessarily

there was much of a future for console
Homebrew. And there’s definitely

an argument for that with
the rise of phones, mostly.

Anyone can make an app, can make
a game for any number of devices

and sell it to millions of people.
But you know, we disagree.

<i>cheers and applause</i>

It’s been a year since we started
releasing 3DS homebrew. And

– this is supposed to be moving,
but… let’s imagine it’s moving.

Well, there in there - like a bunch of
3DS Homebrew. It’s been awesome!

We’ve been working on this really hard.
A lot of people had been joining us.

It’s a great community effort. And
basically what I want to say is

we want more developers.
So if you’d like to join us

there is a very… well it’s not
very mature, but it’s maturing,

our SDK. And you know what:
reverse-engineering hardware is fun.

When we don’t have any documentation,
reverse-engineering software is fun.

We can always use more reverse-engineers
and just people who want to make cool shit,

so… Yeah, oh… right! Just one more thing.

Lately there has been a wave
of patches by Nintendo,

of known exploits, which
has been really annoying.

So for our Browser Hacks, well,
yellows8’s Browser Hacks,

menu hacks, stuff like that…
Yellows8’s been working pretty hard,

so he actually brought back browser
hacks, it should have been released

about 10 minutes ago.
<i>laughter, applause</i>

But we also had ironhax for an
eShop game, a free eShop game,

so you could just download it. That was
patched. The thing is, there’s actually

a way to download the old version from
the eShop application with some patches.

So we’re also releasing that right now!
So basically if you can get Homebrew

and get on to the eShop
with a modified patch.

That should also be released in about…
well, whenever this is done.

So get it as soon as possible,
this is a free game, it will get you

Homebrew forever. So just do that.
And also, yellows8 just released

a new version of menuhax which
works on latest firmware version.

This was also patched like a couple of
weeks or months ago. So, this is all out

right now. If you have a 3DS, get it.
If you have friends who have 3DS’s,

well, tell them and tell them to get it.
Because it might not last super long.

Yeah, so we would like to thank yellows8
who unfortunately can not be here tonight

but has been super helpful, has been
doing a ton of work on the 3DS.

And honestly, a ton of this could
not have been done without him.

And thanks to everyone on the
#3DSDEV Homebrew channel,

everyone who is attending tonight.
Thanks for this.

And if you have any questions,
I don’t think we have a lot of time,

but we’ll accommodate. Thanks!
<i>applause</i>

Herald: Thank you for your patience, if
you got questions, please come upfront

to these guys, because we have no more
time for structured Q&A. Thank you!

<i>postroll music</i>

<i>Subtitles created by c3subtitles.de
in the year 2016. Join and help us!</i>