32C3 preroll music Herald: Okay, welcome to our last talk in this hall today! It’s about Console Hacking and I guess that’s the reason why you are here. Console hacking has a long tradition at our great conference and we have seen lots of funny things. People doing stuff with Xboxes, Playstations and everything. Okay. Today we got a team which deals with the Nintendo DS, so give a warm applause for plutoo, derrek and smea! applause smea: Hi! I’m smea, this is plutoo, this is derrek, and today we are going to talk to you about our work on the Nintendo 3DS. So, the way this talk is going to be structured, is we are just going to go over all the hardware, organisation, software, like… Just give you a basic overview about how the system works. And after that we are going to go into basically every layer of security the system has, and break every one of them. laughter applause Okay. So, as you probably know, the 3DS, the original Nintendo 3DS was released in 2011. It’s a system that is kind of underpowered. It’s got, like… It’s got an ARM11 dual core CPU, 268Mhz, it’s got a nice proprietary GPU, a bit of RAM, you know, the usual. It’s also backwards compatible with the DS games, which is nice. Then the new 3DS was released in 2014 and 2015, there was like different regions. And it was basically just the same console, just some improvements in the hardware. You’ve got a better CPU, it has got more cores. It’s faster, it has got more RAM. Basically everywhere. So, it is just the same thing, it runs the same software, exactly. It has got some exclusive software, but not much. So, in terms of a hardware overview, this is what what we are going to talk about looks like; in general. So you got the top part right here, which is what we are going to go into first. This is like the ARM11 part. Basically, you’ve got the ARM11, which is the main CPU. It runs the main operating system. It has 2 cores as I just said, or 4 cores. So, it runs the main operating system, it runs the games, it runs all the applications. Basically, it’s just – if you’re doing something on the 3DS that you can… you can see it happening, it’s happening on that CPU. It has got access to all of the main memory. So that includes FCRAM, which is 128MB or 256MB, depending on which model it is. And FCRAM is actually divided into 3 separate regions. So you first got the Application Region, which contains the currently running game or application. The System Region, which contains applets, which are basically tiny applications, which run in the background. So, that includes the home menu, which is actually always running in background, and the web browser, which you can actually run at the same time as your game, so it has to run there. And then you got the Base Region, which is more interesting. It contains all the system modules of the operating system, as well as some kernel data, such as handle tables and MMU tables. So it is kind of sensitive stuff. And then we got a WRAM, which is tiny and contains all the kernel code, and, well, most of the kernel structures as well. So it’s also an interesting target. Then we’ve got the lower part, which is the ARM9 part of the hardware. So the ARM9 is basically a separate, well… it’s an entirely separate CPU, which has access to… well… So it runs basically the same microkernel as the ARM11. It’s mostly the same code, it has just got some pure features. Mostly it runs a single process, which is called ‘Process9’, which does everything the ARM9 does. Beyond that the role of the ARM9 is to broker access to hardware that might be sensitive in terms of security. So one of the things it does is it brokers access to all storage media, so that includes the permanent storage as well as the SD card. And then it does all sorts of crypto stuff, which is really important, and does that by using hardware, actually. So there is this hardware key scrambler, which is used to.. to store secrets in hardware basically. The idea is, you feed it two separate keys, and it is going to generate a normal key and feed that directly into the hardware implementation of the AES algorithm. So that way, we never actually see the final keys. So that’s something that is kind of annoying. And then beyond that what you can see is: the ARM9 has access to all of main memory without much of, well, without any restrictions. But it has also got its own internal memory which the ARM11 does not have access to. So the ARM9 internal memory is where the ARM9 stores all its code, all of its data; and this way we can’t actually take over the ARM9 just from the ARM11 without some kind of exploit. So it’s basically a security CPU. So this leads us to having 4 layers of security. Basically, you’re first going to have the ARM11 userland, which is what… well, like your games, your applications, whatever. On top of that, you’re going to have, well, below that, I guess, the ARM11 kernel. So that is going to have full privileges on the ARM11. And then you’re going to have ARM9 userland, which is ‘Process9’. Beyond that you’ll have ARM9 kernel mode. So that’s in theory. In practice, the microkernel has a system call, which we call… syscall… we call it ‘svc backdoor’. Because essentially you feed it a function pointer and it just executes that function in kernel mode. So you don’t even need an exploit if you have access to that syscall. Of course, on the ARM11 no application or title or anything ever has access to that, but on the ARM9 ‘Process9’ actually has access to it. Which means, that from here we actually… well, userland and kernel mode are basically the same thing. When you got userland on the ARM9, you got kernel mode. So that’s nice. Beyond that, in terms of cryptography on the system, basically, they went out loud (?). So, anything that can be signed, is signed. So, that includes the firmware, that includes every application. Signatures are checked not only at install time but also at runtime, so that’s something to keep in mind. Same thing: anything that can be encrypted is encrypted. And anything that can be made, well, console-specific through cryptography or authentication, such as internal permanent storage or the data that is stored on the SD card, or savegames, or extra data for games, this is all made console-specific. And gamecard-specific in regards of savegames. So, that’s kind of annoying as well. And, of course, all this is handled by the ARM9 using the hardware… the crypto hardware, so we got to get through that if we want to do interesting things. So, first we are going to go through the first layer, which is the ARM11 userland. Basically, getting a full hold onto the system. So, we first need to find some kind of entry point. There are problems… well, there are challenges there. One of the challenges is that the system implements strict Data Execution Prevention. So, existing pages will never be read… well, will never be read-write-executable. It’s all only going to be read-only, or read-writable or read-executable. There’s no way from a standard application to reprotect or map new pages that are read-write-executable. Because all of the system calls are locked out, except for higher privileged system modules. Another thing is that there is no ASLR, so that is not a challenge, that’s actually kind of nice. The nice thing here is that we… well, that makes savegame vulnerabilities totally fair game because, well, we don’t need an actual scripting environment or any kind of exotic vulnerability to exploit this. As long as we can get past DEP somehow. And then, of course, the fact that all savegames are both encrypted and made specific either to the gamecard or the game console, in the case of eShop games, is really annoying for savegame vulnerabilities because basically you can’t use those as an initial entry point in most cases, because, well, you can’t generate the right, well, ES MAC, or just… you don’t know the right cryptography. So, that’s annoying. Thankfully, the 3DS runs Webkit… laughter So, that’s nice. Can always use that. applause So, Webkit is used in a number of places, obviously it’s using the main web browser, which you can access from the home menu. It’s also used in the Youtube application, which is available free on the eShop and doesn’t use any kind of client side authentication for the server, so you can just redirect traffic through, like a DNS server for example. Miiverse applet, other stuff, that also uses it. Slightly more secure, but might be usable at some point, I don’t know. Anywho, the important part here, is that it’s not only using webkit, it is using a very old version of webkit. Basically, they do cherrypick some patches into the version of webkit they use, but only after we exploit those on release, so it comes a little too late, most of the time. So yeah, this has been used by multiple people, most notably yellows8, but it has proven to be a very efficient, reliable entry point. Beyond that, we got Cubic Ninja as initial entry point. Cubic Ninja is a game that was released in 2011 on Nintendo 3DS. It is nice, because it actually allows users to share levels that they make themselves through QR codes; and also it is really bad at parsing those levels. So what you can do, is just, well, manufacture your own QR code that is going to crash the game and give you access. So these are nice initial entry points. So, once we’ve got this, what we have to remember is that we might be able to crash the game and may be able to control registers, but we don’t actually have our code running because of that. So, the obvious solution to hit this, is to use ROP. For those of you, who are not familiar with ROP: You build your own fake stack that lets you return into code snippets that are located right before return instructions. That way… so this is an example. You can just jump to this kind of instruction, so ‘pop {r0, pc}’ and then this is going to let you load your own register value and then it is going to jump to the next instruction that you give it. So, this is a way of executing code without actually executing code, which is widely used; so this is like the obvious thing to do. Of course, ROP is annoying. It is very limiting. It can be enough to actually execute an exploit to get higher privileges, but overall it is just annoying and very limiting for homebrew, for example. And of course, as I mentioned earlier, we don’t have access to any of the system calls that would let us map read-writable-executable pages. Also, the system does support dynamically linked libraries, so that might be a way, but these are signed and checked in places that we can’t access at this point. So, what we’re going to look at next is the GPU to see if we use that to bypass that. What you can see here is that the GPU has access not only to video RAM, but also to FCRAM, which is, if you recall it, main memory. So, if you look at this, with all the different memory regions, we have got the Application Region here, which is entirely contained within what the GPU can access within FCRAM. Of course, the GPU can not actually access all of that FCRAM, so that is kind of limiting. What we can see here, is that, of course, application code is within range of the GPU’s level of access. The reason the GPU has access to FCRAM and Video RAM, through DMA, by the way, is, so that it can access information such as textures, vertex buffers, this sort of thing. So, it’s actually kind of important. And the reason it can write to it is because it has to render its data somewhere. The point is, that we can use this to render data into main memory. And main memory contains application code. And since the physical layout is actually completely deterministic, and even if it wasn’t, we could just use the read capabilities of the GPU to search for what we are looking for. Well, we can use this to overwrite our current application’s text section and we get code execution that way, in spite of DEP. Yeah, so this is where we get code execution… applause We execute our own, unsigned code, which is very… applause It’s great, but we are still confined within the application sandbox. So, we bypassed DEP, we are inside the sandbox. This means we can only access our current application’s savedata, so if we want to install some kind of secondary exploit, this is too limiting. We can only access certain services and system calls, which is also limiting and frustrating. And we can’t alter memory layout, so we can’t allocate more executable pages than I mentioned earlier. So, we are still kind of limited at this point. So, what we are going to do, is look at what else the GPU can access. And you can see, is that, of course, there is this entirely separate memory region the GPU can modify. So it can access most of the System Region. And the System Region contains a few things. It contains the home menu, as I mentioned, because that is an applet. It contains the internet browser, and it contains actually a single system module, which is called ‘NS’, which we think stands for ‘Nintendo Shell’, we don’t really know. So, let’s look at this. First we got NS code well beyond the GPU cutoff. We got menu code, which is also well beyond GPU cutoff. But we got the menu’s heap, right here, well, actually there is separate heaps, these are well within the GPU’s range, so that’s good. NS unfortunately is still well beyond the cutoff. All of its data, all of its code. So we apparently can’t get to that. So, then the idea is, to just, well, okay, so actually… What’s interesting here, is that the cutoff is right before the end of the System Region, which as we just saw, has some interesting things, but also excludes all of Base Region, which also has very interesting things. So, it seems likely that Nintendo knew about the capabilities of GPU DMA, like the theoretical capabilities, but they didn’t do anything about it. So, it seems that they probably didn’t realize what we could do with it, which is a lot. So, basically, we got menu heaps. So what we do, is… we have a heap, and this is all C++ code. We are just going to find objects inside the heap and overwrite it. So it’s pretty simple. Just find an object, that is going to be triggered to some kind of synchronisation mechanism. In this case, it’s gonna be just ‘Return to Menu’. And we create some kind of vague vtable and get it to run our own stack pivot. And then we get… we get ROP execution under Home menu, which is cool. We still don’t have code execution in the Home menu, but that’s okay. So, we can do a bunch of stuff from ROP. We can access a new system service, which is called ‘ns:s’, which is very helpful, because it can kill any arbitrary process, as well as create new ones. Also it gives us access to SD card, which most applications actually don’t have. And it lets us decrypt/dump any title on the system. So any game, even if it uses new cryptography that Nintendo introduced, we can actually dump that, because for some reason, well, Home menu apparently needs access to that. And then we can also access and overwrite all that extra data used by any application, which is great. So we use this as a base for running homebrew. Our homebrew launcher is essentially just a service that runs in the background under Home menu process. It is written in ROP, which is kind of disgusting, but it works. laughter The ‘Service’ handles running homebrew, so the process is very simple. You just kill off the current application, you spawn a new one, and then you take it over using the GPU DMA access. And then, what we do is we send all of these new capabilities that we got through handles to the new process and that gives us some higher privilege homebrew. It also handles events, such as Home button, Power button, all that good stuff. Which is nice, because we can actually run code under any arbitrary application or game, so we can actually modify these games. We can run ROM hacks. So there has been a bunch of translations that can be run through this, for games that haven’t come out outside of Japan, so that’s pretty nice. It’s the same principle, you just launch the app, you take it over, you pass the code, and then you jump to it, essentially. All within the confines of userland, which is nice. So, the other thing is, we can actually access any game or application’s data because we can run code under it. So, these things include savegame data for any game. So we can actually install more convenient secondary entry points, which do not rely on the browser, which can be patched any moment, or on some old game. So, some examples include ‘Menuhax’ by yellows8, which exploits faulty theme handling code, which was introduced in firmware 9.0. Which is really nice, because this way, you can actually just run homebrew right as Home menu is opened, so right on boot time, which is great. Then you got other games. Of course you got a Zelda game that’s vulnerable. audience chuckles This time it wasn’t the horse’s name, but pretty similar. And then you got other games. We got tons of entry points at this point. We’re really, literally drowning in them. So, this is nice. But we forgot about ‘Nintendo Shell’, right? It’s a very attractive target, for a couple of reasons. For one thing, it has access the ‘am:u’ service, which can be used to downgrade any system title. It’s not actually designed to downgrade titles, the thing is, you can both install and uninstall titles. So, what happens is, if you uninstall a title, and then install an older version of that title, you actually bypass the version check. So, you can just do that to downgrade any system title and bring back old exploits, if that is necessary. Assuming you have access to the service. And of course it’s in a region that we can partially modify, so it’s an interesting target. Unfortunately, we can’t actually access its data right now. But maybe we can actually move it to somewhere, where we can. The idea is, if you were to kill NS, and then allocate something in it’s place, then run NS again, you can move it below the cutoff. laughter applause Thanks. But unfortunately it’s not that simple. That can’t work. The reason being, that we actually need NS to be running to launch NS again. So that kind of sucks. But… well, no. Actually we also can’t run a second instance of NS at the same time, so we can’t do that either. But interestingly… Well, the 3DS has an interesting feature, which is called ‘Safe Mode’. Basically it’s a second firmware, which is an old version of the regular one, and that creates a bunch of copies of system titles. Most of them, anyways. So that gives it a different ID. So, the idea is, that if it has got a different ID, we might be able to run it at the same time, because, well, PM might fail to notice that. Of course it doesn’t. It actually does notice that. So we can’t run the Safe Mode version of a title at the sime time as the regular version of the title. But, for some reason, in the case of NS – you might not be able to see this very well, but we’ve got NS’s regular title right here, and then we got Safe Mode NS right here. And for some reason they created a new 3DS version of the Safe Mode version of NS, though there is no new 3DS version of the original NS. So that creates a separate title ID which we can run at the same time as regular NS. So then, the exploit becomes very simple. You keep NS running, just allocate enough data, that it will be below the cutoff; and then you just run new 3DS Safe Mode NS. And then it’s within range of the GPU and you can take it over and have access to everything. So, this is nice. It’s more of an oversight than a proper exploit, but whatever. So this gives us access to a bunch of system calls. Mostly service handling system calls, so we can post our own service, which can be useful for other exploits that I won’t get into, for impersonating other services to other system modules. And then we got access to all of these services, which is great. So we can downgrade system titles arbitrarily. And this runs in background, which can always be helpful for homebrew. The only problem is at this point, it’s still new 3DS only, because it relies on this new 3DS title. But there are actually ways around that. This was just to show that we can actually get fairly high levels of privilege, even still just always staying in userland on the ARM11. And there are other, similar attacks to that. If you’re interested you can look up ‘rohax’, which is a similar attack in the system module. So, now derrek is going to talk to you about exploiting the ARM11 kernel. derrek? applause derrek: So, hi everyone! First, I will give you some very short inside view of the kernel, and then I will explain how you can exploit the latest version of the ARM11 kernel. So, this is actually Nintendo’s very first gaming console kernel. Like on any other older console, there was no kernel. All games were just running on bare metal. Like there was a kernel for the Wii, like a very small microkernel running on the security processor, but that wasn’t written by Nintendo. So it’s their very first gaming console kernel. That kernel is made to be thread safe, so it can run on multiple cores at the same time and there are like 130 system calls available. So that’s quite a lot, in my opinion. But usually, if you have gained execution in ARM11 userland, you only have access to, like, around 50 system calls. And there’s a reason for that, but I’m going to explain that in a second. So, internally, the kernel works with C++ objects. So here are some examples for system calls. So, we have ‘CreateSemaphore’, for example. That will just create a semaphore object in the kernel and it will return a handle to the userland. And when you want to do any operations on that semaphore, you have to pass that handle to the kernel, and it will look up this handle in a handle table to find the original C++ object. Also there are 2 different kinds of memory allocators. So, we have a memory allocator for the main memory, which is the FCRAM. And there is also a Slab Heap, where all the C++ objects are stored in. And this Slab Heap is located in FCRAM, which is the ARM11 memory, where all the kernel code and data is in. Also, there’s an IPC system. IPC is ‘inter process communication’. And it basically allows you to talk to other processes like services, e.g. the GSP service or FS. So, let’s look at the security. So, the kernel is really small. There are only like 200KB of code, which is pure ARM code. And there are only like 1000 functions. So, they try to keep the code size very low and that makes it harder to find bugs. The code size is really small, and you don’t have really much to choose from what to exploit. Also there are no symbols included in the kernel. Like when you run strings on it, it will just give you some names of C++ objects, but there are no function names or something like that. As we have seen earlier it’s physically isolated in its own memory. Which turned out - of course - to be a good idea. Otherwise it would have been overwritable by the CPU eventually. And all objects have a reference counting. So that’s similar to the C++ shared pointer where every object has a small field like a counter field and everytime the kernel wants to use an object this counter gets increased. And everytime the… like when the reference is no longer needed it will decrease the counter and when the counter reaches Zero it will automatically delete that object from the Slab Heap. So they are basically trying to prevent use after freeze. Also I’m not sure if that’s a security measurement but there are more than 100 panic calls in the kernel and that’s every 10th function - per average. And they have the syscall access restriction. So you - as I said - you only have access to like 50 system calls. All the interesting ones are disabled. E.g. you can’t map executable pages. On the other hand there is no ASLR. But at least they’re trying to change the memory mapping every time during a larger kernel update. Also there’s no stack protection. And the Userland is always mapped. So once you’ve got control over the program counter you can just jump to Userland pages that are marked as executable. So you don’t have to do ROP in the kernel. It’s pretty nice. But they tried to have an execution prevention in the kernel that is: they’re marking executable kernel pages – that is the code – they’re marking them as executable in their Page Table. So let’s take a look. The highlighted parts in orange are the kernel code sections. And as you can see like when looking at the first highlighted line it says ‘virtual address #FFF00’ etc. is mapped to the physical address 1FF80000. And it is marked as executable and you only have access to it in Kernel Mode, of course, and only Read access. Right? So this is correct. But when you look at the second line of that Page Table dump you will notice that there is another section which covers the entire AXI WRAM and it’s mapped as Read-Write. So it doesn’t really make sense. Yeah. So basically it’s completely useless. We have Read-Write access to it. So, to summarize everything, there’s actually no exploitation protection. Once we found an exploitable bug it’s pretty likely that we gain code execution in kernel mode. So, let’s find that bug. And I started at looking at the SVC table. So this is kind of the interface between kernel land and userland. And this shows all system calls that are available in the kernel. So you have like normal system calls. For memory management you can map read- and writable pages; you can mirror pages and do other memory management stuff. And there’s also some configuration for threads like you can choose which core should be used for executing the thread and all that stuff. You have a really large range of synchronization objects like kernel mute tags and all that stuff. And of course you have IPC requesting, so you can send messages to services. And there’s a more advanced section like this is used by services mostly, because they have to respond to your IPC requests. And there’s also Kernel DMA, cache control, some things. And they have a set of debug system calls. It’s just basic debugging. You can set breakpoints, read and write process memory. But you don’t have access to them. Like on retail it’s not actually used. And so one last section is the Privileged section. And here are all the interesting system calls that allow you to create processes and map executable memory and all that stuff. Unfortunately, we can’t use the Advanced, Debug and Privileged system calls. I mean that would require exploiting some service. And that’s just more work for us. So this leaves us with the normal system calls. But IPC sounds really interesting. But unfortunately it’s full of panics. Also there’s not much to attack at synchronization object system calls. So you only have like this more interesting system call for local memory management. And in theory there’s a lot that you can mess up. Right? There’s a lot that can possibly go wrong. And also we have unchecked DMA access! Like through the GPU. So maybe we can do something useful with that. Okay, so let’s have a look at the memory allocator. There are 2 types of memory allocators. First is the regular one. And it’s just for mapping normal heap like for malloc in C, e.g. And you have the linear memory allocator that is used for GPU textures, like when memory has to be physically continuous you use the linear memory allocator. And there’s the FCRAM memory layout that we saw earlier. You have these 3 regions and every region has its own set of free pages. So how are they keeping track of them? So you have a region descriptor which tells us the dimensions like: where does it start, the region, and its size. And you get also a pointer to the first free piece of memory in that region. And each free piece of memory which we call a Memchunk has a Memchunk header right at the beginning. And it basically tells the kernel how large that Memchunk is. And it’s also linked in a Doubly Linked List. So you have a next and previous pointer pointing to the next and previous Memchunk headers. It kind of looks like that. So you have the red parts which are the free Memchunks and the green parts are memory that is already allocated. So allocation is pretty straightforward. It’s not really complicated. So the first thing that the allocator function does: it loads the next free pointer from the region descriptor. And for regular memory it just goes through the list following the pointers and it sums up their size until the requested size is reached. For linear memory it would just look for a suitable memory chunk to make sure that the memory is really continuous. So when it found enough memory it sets the next pointer of the very last Memchunk to Zero. It will then update the list and also the next free pointer for the region descriptor and finally it will return a pointer to the first Memchunk. So, let’s look at this from a security perspective. And there’s a problem. They basically have kernel structures inside the FCRAM! And that is a problem because we have DMA access to it through the GPU. And there was an attack by yellows8 that is called ‘memchunkhax’. And what he did is basically: he overwrote memchunk headers with the GPU DMA flaw. And then he gained an arbitrary kernel write when it’s deallocating memory. So because next/prev pointers have been modified. So, unfortunately, this was fixed by Nintendo in system update 9.3 last year, like 1 year ago. And the new kernel will now verify every memchunk header during allocation. Like its size and also next/prev pointers. So, in theory, everything has been fixed. Invalid pointers or invalid sizes will just result in a kernel panic. In theory. So when you look at the system call for Controlmemory… we have access to it. It’s one of the normal system calls. It does basic stuff. You can map/free RW pages, but not executable of course. And it takes an address and size as argument. And also an operation code which tells the kernel what to do: to map or free pages, whatever. So first it does some basic checks on the address and eventually it will call a very large function. And I just call that function kern::controlmemory. So what can kern::controlmemory: it calls the allocator function and it will just return a memchunk header pointer – as we have seen earlier. Then it goes through all of the allocated memchunks and it’s mapping them to user space. And it’s also updating some block information for KProcess object. So there’s a problem. There’s obviously a race condition. Like we can overwrite memchunk headers after they have been allocated. So we could try using the GPU but it’s really slow, actually, because we would have to ask the GSP service to read memory and we have to go to this very large IPC kernel code. And that would be probably too slow. Allocation is really fast. Let’s dig a little bit deeper. I tried to reconstruct the source code in C. So this is the first step. It tries to allocate memory. For this example, it will just allocate regular memory. So when it found a memchunk which means that it’s not enough memory is available. It will then execute this really interesting do-while loop. I know, it’s a lot of code. I’m not sure that you can actually read it. So let’s go quickly through this code. The pages read from the Memchunk header. It gets converted to a physical address. And that physical address gets mapped to userland by mem_map function. And then it will go to the next memchunk. Here. And it will also update the userland virtual address. And then it will clear that memory. So what’s wrong here? The problem is they’re mapping the Memorychunk into userland. And after it has been mapped they’re accessing it again. And what they access is the next pointer. So we can just overwrite it. When we have 2 threads running we can – from another CPU core – try to overwrite that pointer. So our goal would be to map kernel pages to userspace. But there are some problems. It requires really, really perfect timing. There’s only a very small time frame to do the overwrite. Also, we need a Memchunk header structure at the next pointer address… …to do this. To make sure we get a perfect timing I came up with a kernel address arbiter oracle. It is actually used for thread synchronization, we don’t care about it. But it tries to read from address and returns an error when the address is not accessible by userland. So we can use that system call to make sure that the memory has been mapped to userland. And once it has been mapped we’re trying to overwrite it. So one last problem: we have to inject a memory chunk error in kernel. I did this by using the Slab Heap. We can just create some KObject and set their member variables to create a faked memchunk header. So this is the Slab Heap. We’ve got C++ objects, vtable pointer and some attributes. So the Slab Heap is basically just a really large area of C++ objects. And what I did was I changed the attributes and used them as Memchunk header. And I am redirecting the next-pointer to that object and it will map multiple C++ objects to userland. And that’s really nice because we have vtable pointers, so we can just overwrite them. And that means that we gain code execution. So, as a summary, we set up some kernel objects, change their attributes, request memory from the kernel; and once it becomes available we patch the next-pointer, overwrite that mapped SlabHeap pages and then we call a system call which closes the handle for the kernel objects that we created in step one. So it will eventually call some vtable function and it will just jump to our modified vtable function. And we got ARM11 Level0 Code Execution!! applause, motivated by smea So, now plutoo will tell us what nice things you can do once you gained ARM11 Code execution. plutoo: Hey guys! Okay, so… the ARM9. Let’s go. The ARM9 is actually also used for executing old DS games. So what they do is, they actually, you could say, reused the ARM9 which is their backwards compatibility processor. They use it as a security processor when executing 3DS code. And like smea said it’s running a stripped-down version of the ARM11 kernel. It basically only does threading sequencation, things like that. And there’s no MMU. There’s an MPU, 8 regions you can configure. You could do no-execute within those regions etc. but the granularity is not very nice. And they only have 8. So they basically ran out of space. And .data+stack is executable as long as you can jump to it. And .text is writable so that’s bad. Basically whenever you can write code into arbitrary memory you can just overwrite code. These features – you don’t want them on a security processor. laughter So let’s go. So it turns out that there have been lots of exploits over the years and most of them are fixed. And most of them used the normal command interface. But in this case we’re taking a different approach. So on the 3DS the memory-mapped I/O is split up into 3 regions. There’s the ARM9-only I/O: it does crypto, it does DMA engine, things like that. Then there’s the Shared I/O region. And then, finally, there’s the ARM11 I/O region which contains the GPU video decoder. Thanks to derrek and smea we have full ARM11 control. We execute kernel mode. So the question is: can we use the shared I/O region, somehow, to own the ARM9? So it turns out the interface for reading old DS cartridges is actually in the shared I/O region. We’re not sure why this is, but they have it there for some reason. And it’s only the ARM9 which is actually using this region. But ARM11 still has access to it. So when you insert the cartridge it starts by reading the banner. And it does this by writing this magic value to CTRL register. And basically it just asks for 0x200 [hex] bytes. And then there’s this loop. And this Assembler code is on the right side. You can see it basically waits for some bits to clear / to set and then they read 4 bytes and then they wait for another bit. And there’s no range check on the buffer. But it’s always 200 bytes, so it should be fine. What if we overwrite the CTRL register from ARM11 asking for 0x4000 bytes? Boom! We have a nice buffer overrun. It’s in the DSS segment but… it’s still nice. And can control the data. So the data actually comes from the cartridge. We need to make our own DS cartridge. So, there’s this old device, called the PassMe. It’s for the original DS, where you basically plug old DS cartridge in and it basically modifies the header as its read. So, these are available online for 5 bucks. And then you add an FPGA. I implemented this and it works, but it’s very gimmicky. I don’t recommend it. And here’s my soldering, it’s not very nice. This gives us ARM9 code execution and this works on latest firmware. But we want something better. Let’s look at the chain of trust. The chain of trust: the idea is of course, you verify all the code that is running. But you’re basically verifying everything at load time. The 3DS has the simplest chain of trust you can have. There’s the Boot ROM at the start. And then it loads the firmware binary from NAND and it jumps to it. On the new 3DS they were a bit clever. They added an extra crypto layer on the ARM9 portion. But it’s actually part of the firmware binary. We call this ‘ARM9 loader’. So the theory that Nintendo had was: “Let’s add another layer of crypto, so we change the keys, we introduce new keys, and they can’t break it”. And they don’t have any worked-out place to put those keys. So they placed them in NAND! But they’re encrypted with the per-Console key that’s based on a hash of the OTP that’s unique for each Console. And then OTP access is disabled early in the Boot. So later on you can’t dump the OTP and you can’t figure out the keys. This looks safe, in theory. So here’s the implementation. So they calculate some hash of the OTP. They read the key-sector from NAND. And they decrypt the key. And they put it in a keyslot. It’s basically an isolated memory area. And then they generate a bunch of sub keys and they verify that the key they loaded from NAND is the correct one. So even if we were to switch the key they would detect that and just panic. And then they decrypt the ARM9 binary and they jump to the entry point. But… they forgot to clear the 0x11 key! So we can just get code execution later on. And we can just regenerate all those keys! So this implementation is useless. Okay. laughs applause And they fixed this because they have more than 1 key hidden in the NAND. So they took their next key. It’s basically the same idea: you calculate the same hash, you read the key sector from NAND, you generate all the previous keys for compatibility, and then you decrypt a new key, we call it Key#2. And then you decrypt ARM9 binary using the second key. You clear the keyslot, and you jump to entry point. But they forgot to verify the second key! audience laughs This is epic fail! applause So let’s exploit this. ‘ARM9LOADERHAX’. We can change the second key. ARM9 loader will just decrypt the binary to garbage and jump to it. If you look at the encoding of a ARM Branch instruction: the probability is pretty high that there will just be a Branch instruction. And just any random data will eventually… like if you try enough keys, it will eventually become a Branch instruction to some memory. So if we try a lot of keys, eventually we will find some garbage that is useful. This is the NAND of the Flash memory of an unmodified 3DS – a new 3DS. So there’s a small key section, marked in teal, like, blue. And it contains those keys that we’re talking about. And then there are 2 firmware partitions. One is used for backup, in case one gets corrupted; so it doesn’t brick the device, whatever. We installed our custom key. And we installed the largest firm binary we have in the firm0 partition. And we keep the one with the vulnerability in the firm1 partition. And then we put our code payload on top of the firmware0 binary. And then we reboot. And so what will happen? The Bootrom is executed. It will load the first firmware partition. And it has our code in the end, but it doesn’t know about it. And then it decrypts it. And, you see, it looks okay. There’s the ARM9 loader stub in the front; and then comes the encrypted binary. And then, finally, there’s our payload. But Bootrom checks the hash, right? And it fails. So it thinks the partition got corrupted. So it will load the smaller one on top. You see we have our payload in memory, at Boot. And then it decrypts firmware1 which is smaller and it still has ARM9 loader and another encrypted ARM9 binary. And then it jumps to ARM9 loader because the hash checks out. And then the ARM9 loader will decrypt our corrupted key from NAND and it will decrypt this one to garbage and it will jump to it. And hopefully it jumps to our code. So this gives us ARM9 code execution from cold Boot. Early, very early. So it turns out we can actually use this to get some keys that are later not available because they clear those… they use a certain memory area for seeding encryption engine to generate keys and the memory is later cleared. So you can’t regenerate the keys. But with this we can actually get those 2 keys. They’re called the firmware 6.x save-key and firmware 7.x NCCH-key. That’s a bonus. We talked a bit about the AES engine. It’s used everywhere for the crypto and it’s used for everything, basically. It supports all the usual block cipher modes. It has 2 security features: it has write-only keys. Which is really useful. Like you write a key and then you can never ever read it back. This means that they can fill in the keys by the Bootrom and we can’t dump them later. So they can keep the keys secret. Even if we hacked the ARM9, even if we get code execution we’ll never get the keys. And then there’s the key scrambler. Which is that the key is actually – it’s an optional thing – where the actual key is hidden, calculated by a hardware function, that is never… that we don’t know about. So the key is actually never exposed to the CPU – the actual key. So we just feed it 2 values, 2 keys and then it generates a new key based on that. And we don’t know what that key is. So this creates a situation similar to the isolated SPUs on the PS3 where you can ask it to decrypt stuff, but you don’t get the keys. And if you don’t get the keys, then… we want the keys!! We want to decrypt things on our PC because we’re lazy. So there’re 2 keys – KeyX, KeyY we call them. They’re 128bits and the normal key is derived as a function of those 2; and that function is unknown. It’s implemented in hardware, in silicon. So even if we know X and Y we can’t figure out the normal key and we can’t decrypt things without asking the 3DS first. But we can poke this hardware engine. The first thing you notice when you do this is that if you set the N-th bit of the X key and the N+2 bit in the Y key you get the same result. And in general, you find that the function that we’re looking for is actually just a function of one variable where it’s the XOR between the X rotated by 2… so this is rotation, not shift, and XOR-ed with Y. But we still don’t know the key. But we want to know keys. So… So step back a little bit. The keyscrambler is used for Mii QR-codes. It’s used for everything, right? So it’s used for network protocol, called UDS, and it’s used for Download Play – which is when you download games over WiFi, temporary games. But the Wii U also supports all of this. But it doesn’t have the key scrambler in hardware. So the Wii U must be using normal keys. applause screamed from audience: WHAT? applause So we make a table of the shared keys and these are the 3 keys that are shared with the Wii U. Who is where the KeyX and KeyY on the 3DS… where they are set. And 2 of them have KeyY set by firmware. So we can’t read the keys set by the Bootrom because it’s locked away and we don’t have it. But can we still figure out G? Let’s see. So I gave shoutout to shuffle2 and to fail0verflow who hacked the WiiU and they helped us… or shuffle helped us extract the Wii U keys. So thank you! Now we have KeyY and we know the normal key from the Wii U. However, KeyX is still unknown. And if G(t) is ‘bad’ then a small change in the KeyY will only lead to a small change in the normal key. It’s bad! So let’s look at the data. So when we flip one bit in the KeyY we can brute-force all keys similar to the normal key which is just within a couple of bit flips and we find that it always results in the normal key with bits flipped at position either 87 or 88, sometimes 89, but never 86. So this reminds me of an adder where you had a carry bit being propagated to upper bits, but never to lower ones. So let’s guess that this is an adder and let’s try: it’s an adder with a rotation so we guess that G(t) = (t+C) – some constant C, we don’t know it – and rotated to the left by 87. And then we plug it in to our original formula and we don’t know KeyX, remember, because it’s set by Bootrom, we don’t have it. We don’t know the constant C because it’s in silicon, it’s in hardware. But if we look at the formula, and we consider the inequality, where we basically rotate right by 87 – we’re basically undoing the outer rotation. And then we plug in our formula our guess. And then we get this. And then we subtract C from both sides. We end up with this. And this is basically… we’re XOR-ing 2 different keys with the same X value rotated to the left by 2. Well if you stare for this bit you’ll see that if y0 and y1 – which are 2 different KeyY’s – are equal except for at one bit position then the XOR is smallest for the one which shares the same bit value at the position that the 2 Y’s are differing at. It’s actually pretty simple but it sounds difficult. XOR is Zero if they’re the same input and One if they’re different. If they’re the same it’s Zero and it’s smaller. So we actually look bit-by-bit on this. And we repeat this 128 times. And we recover all 128 bits of the KeyX. And when we have the KeyX we can calculate the silicon constant C. So the end result is: the key scrambler is figured out and we have also the secret Bootrom KeyX for a couple of keyslots, as a bonus. applause, motivated by smea I didn’t think trough the constants in the slides because I want this to be an exercise for the listener. When the new 3DS was released they rushed it, we think, because they left some interesting commands in the PsPs service. And it included an early version of the NFC crypto used for the Amiibo figurines. This implementation, the first one, uses a normal key. And the… the newer one changed it to KeyY. So they accidently gave us one of these pairs in the firmware images. We don’t need to use the Wii U at all. So anyone who can decrypt 3DS firmware binaries can perform this attack to get the constants. So anyone out there: Good luck! And now: back to smea, for a summary. applause smea: Right, I’m just gonna conclude really quickly. So, some take-aways of what we talked about today: first thing is: it’s all pretty obvious lessons, but – you know – bare with me Giving access to physical memory to any application, through GPU or whatever, is dangerous. You should always be careful about that. Even if you think you’ve protected stuff, there’s probably gonna be stuff that you forgot. So just, like “you don’t do it or do it right”. Other thing is: Shared I/O is dangerous if you don’t know what can actually control the I/O, then, well, again, you should be very careful. Also, only checking your data before decryption is dangerous, and - both that and not checking the key when you know that it could possibly be modified by an attacker is a bad idea. And finally, secrets in hardware are great unless you give them away, so… don’t do that! laughs audience laughs* Beyond that we just wanted to talk about the state of Homebrew really quickly. You might recall, on the - during the Wii U talk around here 2 years ago. And fail0verflow said that they didn’t think necessarily there was much of a future for console Homebrew. And there’s definitely an argument for that with the rise of phones, mostly. Anyone can make an app, can make a game for any number of devices and sell it to millions of people. But you know, we disagree. cheers and applause It’s been a year since we started releasing 3DS homebrew. And – this is supposed to be moving, but… let’s imagine it’s moving. Well, there in there - like a bunch of 3DS Homebrew. It’s been awesome! We’ve been working on this really hard. A lot of people had been joining us. It’s a great community effort. And basically what I want to say is we want more developers. So if you’d like to join us there is a very… well it’s not very mature, but it’s maturing, our SDK. And you know what: reverse-engineering hardware is fun. When we don’t have any documentation, reverse-engineering software is fun. We can always use more reverse-engineers and just people who want to make cool shit, so… Yeah, oh… right! Just one more thing. Lately there has been a wave of patches by Nintendo, of known exploits, which has been really annoying. So for our Browser Hacks, well, yellows8’s Browser Hacks, menu hacks, stuff like that… Yellows8’s been working pretty hard, so he actually brought back browser hacks, it should have been released about 10 minutes ago. laughter, applause But we also had ironhax for an eShop game, a free eShop game, so you could just download it. That was patched. The thing is, there’s actually a way to download the old version from the eShop application with some patches. So we’re also releasing that right now! So basically if you can get Homebrew and get on to the eShop with a modified patch. That should also be released in about… well, whenever this is done. So get it as soon as possible, this is a free game, it will get you Homebrew forever. So just do that. And also, yellows8 just released a new version of menuhax which works on latest firmware version. This was also patched like a couple of weeks or months ago. So, this is all out right now. If you have a 3DS, get it. If you have friends who have 3DS’s, well, tell them and tell them to get it. Because it might not last super long. Yeah, so we would like to thank yellows8 who unfortunately can not be here tonight but has been super helpful, has been doing a ton of work on the 3DS. And honestly, a ton of this could not have been done without him. And thanks to everyone on the #3DSDEV Homebrew channel, everyone who is attending tonight. Thanks for this. And if you have any questions, I don’t think we have a lot of time, but we’ll accommodate. Thanks! applause Herald: Thank you for your patience, if you got questions, please come upfront to these guys, because we have no more time for structured Q&A. Thank you! postroll music Subtitles created by c3subtitles.de in the year 2016. Join and help us!