33C3 preroll music Herald: You have been here on stage before. You successfully tampered with the Wii, You successfully tampered with the PS3 and got some legal challenges over there? marcan: Some unfounded legal challenges, yes. Herald: And then you fucked, and excuse my French over here – by the way, that is number 8021 to get the translation on your DECT phone. So you fucked with the Wii U as well. “Console Hacking 2016”, here we go! marcan: I’m a lazy guy, so I haven’t turned on my computer yet for the slides. So let me do that, hopefully this will work. My computer is a little bit special. It runs a lot of Open Source software. It runs FreeBSD. applause It even has things like OpenSSL in there, and nginx. And Cairo I think, and WebKit. It runs a lot of interesting Open Source software. But we all know that BSD is dying, so we can make it run something a little bit more interesting. And hopefully give a presentation about it. Let’s see if this works. It’s a good start, black screen, you know. It’s syncing to disk and file system shutting down. There we go! applause continued applause And yes, I run Gentoo Linux. applause This is the “Does Wi-Fi work?” moment. Hopefully. NTP, yeah, no… “NTP failed”. Well, that’s a bit annoying, but it still works. Hello? Yeah, it takes a bit to boot. It doesn’t run systemd, you know. It’s sane, it’s a tiny bit slower, but it’s sane. There we go. applause This is the “Does my controller work?” moment. Bluetooth in Saal 1. Okay, it does. Alright, so let’s get started. So this is “Console Hacking 2016 – PS4: PC Master Race”. I apologize for the horrible Nazi joke in the subtitle, but it’s a Reddit thing. “PC Master Race”, why? Well. PS4, is it a PC? Is it not a PC? But before we get started, I would like to dedicate this talk to my good friend Ben Byer who we all know as “bushing”. Unfortunately, he passed away in February of this year and he was a great hacker, he came to multiple congresses, one of the nicest people I’ve ever met. I’m sure that some of you who have met him would agree with that. If it weren’t for him, I wouldn’t be here. So, thank you. applause Alright. So, the PS4. Is it a PC? Is it not a PC? Well, it’s a little bit different from previous consoles. It has x86, it’s an x86 CPU. It runs FreeBSD, it runs WebKit. It doesn’t have a hypervisor, unfortunately. Then again, the PS3 had a hypervisor and it was useless, so there you go. So this is different from the PS3, but it’s not completely different. It does have a security processor that you can just ignore because it doesn’t secure anything. So that’s good. So how to own a PS4? Well, you write a WebKit exploit and you write a FreeBSD exploit, duh. Right? Everything runs WebKit, and FreeBSD is not exactly the most secure OS in the world, especially not with Sony customizations. So this is completely boring stuff. Like, what’s the point of talking about WebKit and FreeBSD exploits? Instead, this talk is going to be about something a little bit different. First of all, after you run an exploit, well, you know, step 3 “something”, step 4 “PROFIT”. What is this about? And not only that, though. Before you write an exploit, you usually want to have the code you’re trying to exploit. And with WebKit and FreeBSD you kinda do, but not the build they use, and it’s customized. And it’s annoying to write an exploit if you don’t have access to the binary. So how do you get the binary in the first place? Well, you dump the code, that’s an interesting step. So let’s get started with step zero: black-box code extraction, the fun way. A long time ago in a hackerspace far, far away fail0verflow got together after 31c3. And we looked at the PS4 motherboard and this is what we saw. So there’s an Aeolia southbridge, that’s a codename, by the way. Then there’s the Liverpool APU which is the main processor. It’s a GPU and a CPU which is done by AMD, and it has some RAM. And then the southbridge connects to a bunch of random crap like the USB ports, a hard disk, which is USB. For some inexplicable reason the internal disk on the PS4 is USB. Like it’s SATA to USB, and then to USB on the southbridge. Even though it has SATA, like, what? laughs The Blu-ray drive is SATA. The Wi-Fi, Bluetooth, SDIO and Ethernet is GMII. Okay, how do we attack this? Well, GDDR5… What just…? Oh. I have a screensaver, apparently! That’s great. laughter I thought I killed that, let me kill that screensaver real quick. applause Something had to fail, it always does. I mean, of course I can SSH into my PS4, right? So there we go, okay. Could have sworn I’d fix that. Anyway… Which one of these interfaces do you attack? Well, you know, USB, SATA, SDIO, GMII – that’s the raw ethernet interface, by the way – all these are CPU-controlled. The CPU issues commands and the devices reply. The devices can’t really do anything. They can’t write to memory or anything like that. You can exploit USB if you hide a bug in the USB driver, but we’re back to the no-code issue. DDR5, that would be great, we could just write to our memory and basically own the entire thing. But it’s a very high-speed bus. It’s definitely exploitable. If you were making a secure system don’t assume we can’t own DDR5, because we will. But it’s not the path of least resistance, so we’re not gonna do that. However, there’s a thing called PCI Express in the middle there. Hmm, that’s interesting! PCIe is very fun for hacking – even though it might seem intimidating – because it’s bus mastering, that means you can DMA to memory. It’s complicated, and complicated things are hard to implement properly. It’s robust. People think that PCIe is this voodoo-highspeed… No it’s not! It’s high-speed, but you don’t need matched traces to make it work. It will run over wet string. You can hotwire PCIe with pieces of wire and it will work. At least at short distances anyway. Believe me, it’s not as bad as you think. It’s delay-tolerant, so you can take your time to reply. And the drivers are full of fail because nobody writes a PCIe driver assuming the device is evil even though of course everybody should because devices can and will be evil. But nobody does that. So, what can we do? Well, we have a PCIe link, let’s cut the lines and plug in the southbridge to a PC motherboard that we stick on the side. Now the southbridge is a PCIe card for us. And we connect the APU to an FPGA board which then can pretend to be a PCIe device. So we can man-in-the-middle this PCIe bus and it’s now x1 width instead of x4 because it’s easier that way, but it will negotiate, that’s fine. So how do we connect that motherboard and the FPGA? There’s of course many ways of doing this. How many of you have done any hardware hacking, even Arduino or anything like that? Raise your hand! I think that’s about a third to a half or something like that, at least. When you hack some hardware, you meld some hardware, after you blink an LED, what is the first interface you use to talk to your hardware? Serial port! So we run PCIe over RS232 at 115 kBaud which makes this PCIe… laughter and applause I said it was delay-tolerant! So it makes this PCIe 0.00002x. And eventually there was a Gigabit ethernet port on the FPGA so I upgraded to that, but I only got around to doing it in one direction. So now it’s PCIe 0.0002x in one direction and 0.5x in the other direction which has to make this one of the most asymmetric buses in the world. But it works, believe me. This his hilarious. We can run PCIe over serial out. Also, we were ASCII encoding, so half the bandwidth. It works fine. It’s fine. So, PCIe 101. It’s a reliable packet-switched network. It uses a thing called “Transaction Layer Packets” which are basically just packets you send. It can be… Memory Read, Memory Write, IO Read, IO Write, Configuration Read, Configuration Write. There can be a message-signaled interrupt which is a way of saying: “Hey, listen to me!” by writing to an address in memory. Because we can write the thing, so why not write for interrupts? It has legacy interrupts which are basically emulating the old wire-low-for-interrupt-and- high-for-no-interrupt thing, you can tunnel that over PCIe. And it has completions, which are basically the replies. So if you read a value from memory the completion is what you get back with the value you tried to read. So that’s PCIe, we can just go wild with DMA. We can just read all memory, dump the kernel. Hey, it’s awesome, right? Except there’s an IOMMU in the APU. But... first, the IOMMU will protect the devices. It will only let you access what memory is mapped to your device. So the host has to allow you to read and write to memory. But just because there’s an IOMMU doesn’t mean that Sony uses it properly. Here’s some pseudo-code, it has a buffer on the stack, it says: “please read from flash to this buffer” with the correct length. Can anyone see the problem with this code? Well, it maps the buffer and it reads and it unmaps the buffer. But IOMMUs don’t just map byte “foo” to byte “bar”, they map pages, and pages are 64k on the PS4. So Sony has just mapped 64k of its stack to the device so it can just DMA straight into the stack, basically the whole stack, and take over. Now we got code execution, FreeBSD kernel dump, and WebKit and OS libs dump, just from mapping the flash. Okay, that’s step zero. We have the code. But that’s not the PS4 that we did this on, it was a giant mess of wires. Someone here knows about that, you know, flying over on Facebook. We don’t make a ‘nice’ exploit. We’ve done that because, as I said, WebKit, FreeBSD, whatever. What comes after that? We want to do something. Of course we want to run Linux, duh! How do you go from FreeBSD to Linux? It’s not a trivial process. But you use something that we call “ps4-kexec”. So how does this work? It’s simple, right? You just want to run Linux? Just ‘jmp’ to Linux, right? Well… kind of. You need to load Linux into contiguous physical RAM, set up boot parameters, shut down FreeBSD cleanly, halt secondary CPUs, make new pagetables etc. A lot of random things. I’m not going to bore you with this crap because you can read the code. But there’s a lot of iteration in getting this to work. Let’s assume that you do all this magical cleanup and you get Linux into a nice state and you can ‘jmp’ Linux. Now we jmp Linux, right? It’s cool. Yeah, you can technically jmp to Linux, and it will technically run …for a little bit. And it will stop. And you will not get any serial or any video or anything. What’s going on here? Let’s talk about hardware. What is x86? x86 is a mediocre instruction set architecture by Intel. It’s okay, I guess. It’s not great. PS4 is definitely x86, it’s x86-64. What is a PC? Aah! PC is a horrible, horrible thing built upon piles and piles of legacy crap dating back to 1981. The PS4 is definitely -not- a PC. That’s practically Sony-level hardware fail, so it could be, but it’s not. So what’s going on? A legacy PC basically has an 8259 Programmable Interrupt Controller, a 8253 Programmable Interval Timer, a UART at I/O 3f8h, which is the standard address for a serial port. It has a PS/2 keyboard controller, 8042. It has an RTC, a real-time clock with a CMOS, and everyone knows the CMOS, right? MC146818 is the chip number for that. An ISA bus – even if you think you don’t have an ISA bus your computer has an ISA bus inside the southbridge somewhere. And it has VGA. The PS4 doesn’t have -any- of these things. So what do we do? Let’s look a little bit how a PC works and how a PS4 works. This is a general simple PC system. There’s an APU or an Intel Core CPU with a southbridge, Intel calls it PCH, AMD FCH. There’s an interface that is basically PCIe although Intel calls it DMI and AMD calls it UMI. DDR3 RAM and a bunch of peripherals and SATA, whatever. The PS4 kind of looks like that, right? So you think this can’t be that dif… What’s so hard about this? Because all the crap I mentioned earlier is in the southbridge on a PC, right? The PS4 has a southbridge, right? Right? Right? Umm… so the southbridge, the AMD standard FCH implements Intel legacy from 1981. The Marvell Aeolia – Marvell is the maker of the PS4 southbridge – implements Intel legacy from 2002. What does that mean? Ah! That’s no southbridge, that’s a Marvell Armada SoC! So it’s not actually a southbridge, it was never a southbridge. It’s an ARM system-on-a-chip CPU with everything. It’s a descendant from Intel StrongARM or XScale. It has a bunch of peripherals. And what they did is, they stuck a PCIe bridge on the side and said: “Hey x86, you can now use all my ARM shit.” So it exposes all of its ARM peripherals to the x86. They added some stuff they really needed for PCs and it has its own RAM. Why do they do this? Well, it also runs FreeBSD on the ARM in standby mode. And that’s how they do the whole “download updates in the background, get content, update, whatever”. All that crap is because they have a separate OS on a separate chip running in standby mode. Okay, that’s great, but it’s also batshit insane. laughter Quick recap: This is what a PCIe bus number looks like, sorry, a device number. It has a bus number, which is 8 bits, a device number, which is 5 bits, and a function number, which is 3 bits. You’ve probably seen this in lspci if you ever done that. This is what a regular southbridge looks like. It has a USB controller, a PCI, ISA bridge, SATA, whatever. And it has a bunch of devices. So one southbridge pretends to be multiple devices. Because you only have three bits for a function number so you can only have up to eight functions in one device. Intel southbridge just says: “I’m device 14, 16, 1a, 1…, I’m just a bunch of devices, and you can talk to all of them.” If you lspci on a roughly unpatched Linux kernel on the PS4 you get something like this. So the Aeolia first of all clones itself into every PCIe device because they were too lazy to do “if device equals my number then reply, otherwise don’t reply”. No, they just said: “Oh, just reply to every single PCIe device that might query”. Linux sees the southbridge 31 different times, which is kind of annoying because it gets really confused when it sees 31 clones of the same southbridge. And then it has eight functions: ACPI, ethernet, SATA, SDMC, PCIe,… Eight functions, so all three bits. Turns out, eight functions are not enough for everybody. Function no. 4, “PCI Express Glue”, has a bridge config, MSI interrupt controller, ICC – we’ll talk about that later –, HPET timers, Flash controller, RTC, timers, 2 serial ports, I2C… All this smashed into one single PCIe device. Linux has a minimum system requirement to run on anything. You need a timer, you need interrupts, and you need some kind of console. The PS4 has no PIT, no PIC and no standard serial so none of the standard PC stuff is going to work here. The board has test points for an 8250 standard serial in a different place. So we run DMESG over that, okay, fine. Linux has earlycon which we can point to a serial port and say: “Please send all your DMESG here very early because I really want to see what’s going on”. Doesn’t need IRQs, you set console=uart8250, the type, the address, the speed. And you’ll see it says 3200 instead of 115 kBaud. That’s because their clock is different. So you set 3200 but it really means 115k. And that gets you DMESG. That actually gets you “Linux booting, uncompressing”, whatever. That’s pretty good. Okay, we need a timer. Because otherwise everything explodes. Linux supports the TSC, a built-in CPU timer which is super nice and super fun. And PS4 has that. But Linux tries to calibrate it against the legacy timer which on the PS4 doesn’t exist so that’s fail. So again, the PS4 -really- is not a PC. What we need to do here is defining a new subarchitecture because Linux supports this concept. Says: “this is not a PC, this is a PS4”. The bootloader tells Linux: “Hey! This is a PS4!” And then Linux says: “Okay, I’m not gonna do the old timestamp calibration, I’m gonna do it for the PS4” which has a special code that we wrote that calibrates against the PS4 timer. And it disables the legacy crap. So now this is officially not a PC anymore. Now we can talk about ACPI. You might know ACPI for all its horribleness and all its evilness and all its Microsoft-y-ness. ACPI - most people associate it with “Suspend” and “Suspend to Hibernate”. It’s not just power, it does other stuff, too. So we need ACPI for PCI config, for the IOMMU, for the CPU frequency. The PS4 of course has broken ACPI tables because, of course it would be. So we fixed them in ps4-kexec. Now interrupts. We have timers, we have serial, we fixed some stuff. The PS4 does message-signaled interrupts which is, what I said, the non-legacy, the nice new thing where you just write a value, and what you do is you tell the device when you want to interrupt “please write this value to this address”. The device does that, and the CPU interrupt controller sees that write and says: “Oh, this is an interrupt” and then just fires off that interrupt into the CPU. That’s great. It’s super fast and very efficient. And the value directly tells the CPU: “That’s the interrupt vector you have to go to”. Okay, that’s the standard MSI way there. Your computer does MSI that way. This is how the PS4 does MSI: The Aeolia ignores the MSI config registers in the standard location. Instead of has its own MSI controller, all stuff that’s in Function 4, which is that “glue” device. Each function gets a shared address in memory to write to and the top 27 bits of data. And every sub function, because you can’t do a lot of things into one place, only gets the different 5 bits. And all MSIs originate from Function 4, so this device has to fire an interrupt, then it goes to here, and then that device fires an interrupt. Like… what… this is all… what the hell is going on? Seriously, this is really fucked up. And – the i’s are missing in the front there. But yeah. So, driver hell. Now the devices are interdependent. Then the IRQ vector location is not sequential, so that’s not gonna work. And you need to modify all the drivers. This is really painful to develop for. So what we ended up doing is there is a core driver that implements an interrupt controller for this thing. And then we have to make sure that loads first, before the device driver. So Linux has a mechanism for that. And we had to patch the drivers. Some drivers we patched, so to use these interrupts. And others we wrapped around to use these interrupts. Unfortunately, because of the top bit thing, everything has to share one interrupt within a function. Thankfully, we can fix that with a IOMMU because it can read direct interrupt. So we can say: “Oh, interrupt no. 0 goes to here, 1 goes to here, 2 goes to here…”. That’s great 'cause it's consecutive, right? 0 1 2 3 4 5… it’s obviously gonna have the same top bits. But we have to fix the ACPI table for that because it’s broken. But this does work. So this gets us interrupts that function and they’re individual. So let’s look at the check list: we have interrupts, timers, early serial, late serial with interrupts. We can get some user space, we can stash some user space and binaries into the kernel. And it will boot and you can get a console, but you get a console and you try writing commands and sometimes it hangs. Okay. What’s going on there? So it turns out that FreeBSD masks interrupts with an AMD proprietary register set. We had to clean that up, too. And that fixes serial, and all the other interrupts. This took ages to find. It’s like: “why… interrupts on CPU serial sometimes don’t…, yeah”. I ended up dumping register sets, and I saw this #FFFFF here, not #FFFFF, what’s that? But tracking through this stack to find this was really annoying. Alright. So we have the basics. We have like a core platform we can run Linux on, even though it won’t do anything interesting. Add drivers! So we have USB xHCI which has three controllers in one device. Again, because “Let’s make it insane!”. We have SDHCI, that’s SDIO for the Wi-Fi and the Bluetooth. Needs a non-standard config, it needs quirks. Ethernet needs more hacks. It’s still partially broken, it only runs at Gigabit speed. If you plug in a 100Mbit/s switch it just doesn’t send any data. Not sure why. And then all of this worked fine in Linux 4.4, and then just three days ago I think I tried to rebase on 4.9, and so we have the latest and the greatest. And everything failed. And DMA didn’t work. And all the drivers were just throwing their hands up in the air, “what’s going on here?”. exhales Aeolia strikes back. So. That’s what… the Aeolia looks like, normally. So you have… again, it’s an ARM SoC, it’s really not a device. It’s like its own little system. But it maps, it’s low 2 GB of the address base to memory on the PC. And then the PC has a window into its registers that it can use to control those devices. So the PC can kind of play with the devices, and the DMA is to the same address and that works great. Because it’s mapped in the same place. And then has its own RAM, in its own address space. This works fine. But now we had an IOMMU. Because we needed it for the interrupts. And the IOMMU inserts its own address space in between and says: “Okay, you can map anything to anything you want, that’s great.“ It’s a page table, you can say “this address goes to that address.” Linux 4.4 did this: it would find some addresses at the bottom of the IOMMU address space, say: “page 1 goes to this, page 2 goes to that, page 3 goes to that”. And say: “device, you can now write to these pages”. And they go to this place in the x86. That worked fine. It turns out Linux 4.9, or somewhere between 4.4 and 4.9 it started doing this: it would map pages from the top of the IOMMU address space and that’s fine for the IOMMU but it’s not in the window in the Aeolia, so you say “ethernet DMA to address FExxx”, and instead of DMA-ing to the RAM on the PC it DMA-s to the RAM on the Aeolia which is not gonna work. Effectively the Aeolia implements 31 bit DMA, not 32 bit DMA because only the bottom half is usable. It’s like why… this is all really fucked up, guys! Seriously. And this is littered all over the code in Linux, so they seeded more patches, and it works, but, yeah. Painful. Okay. Devices, laying out (?) devices’ work. Now for something completely different. Who can tell me who this character is? That’s Starsha from Space Battleship Yamato. And apparently that’s the code name for the PS4 graphics chip. Or at least that’s one of the code names. Because they don’t seem to be able to agree on like what the code names are. It’s got “Liverpool” in some places, and “Starsha” in other places. Then “ThebeJ” in other places. And we think Sony calls it “Starsha” and AMD calls it “Liverpool” but we’re not sure. We are calling it “Liverpool” everywhere just to avoid confusion. Okay. What’s this GPU about? Well, it’s an AMD Sea Islands generation GPU, which is spelled CI instead of SI because “S” was taken. It’s similar to other chips in the generation. So at least that’s not a bat shit crazy new thing. But it does have quirks and customizations and oddities and things that don’t work. What we did is we took Bonaire which is another GPU that is already supported by Linux in that generation, and just kind of added a new chip and said, okay, do all the Bonaire stuff, and then change things. And hopefully adapt it to the PS4. So hacking AMD drivers, okay, well, they’re open-source but AMD does not publish register docs. They publish 3D shader and command queue documentations, so we get all the user space 3D rendering commands, that’s documented. But they don’t publish all the kernel hardware register documentation. That’s what we really want for hacking on drivers. So that’s annoying. And you’re thinking “the code is the documentation”, right? “Just read the Linux drivers”. That’s great. Yeah, but they’re incomplete, then they have magic numbers, and it’s, you know, you don’t know if you need to write a new register that’s not there, and it really sucks to try to write a GPU driver by reading other GPU drivers with no docs. So what do we do? We’re hackers, right? We google. Everytime we need information, hopefully Google will find it because Google knows everything. And any tip that you could find in any forum or code dumped somewhere is great. One of the things we found is we googled this little string, “R8XXGPU”. And we get nine results. And the second result is this place, it’s “Siliconkit”, token, was that okay? It’s an XML file. And if we look at that it looks like it’s an XML file that contains a dump of the Bonaire GPU register documentation. But it’s like broken XML, and it’s incomplete, it stops at one point. But like: “what’s this doing here?” And where did this come from, right? So let’s dig a little deeper. Okay Google, what do you know about this website? Well, there’s some random things like whatthehellno.txt and whatthehellyes.txt and some Excel files. Those are really Excel like XML cell sheets. And then there’s a thing in the (?) there called RAI.GRAMMAR.4.TXT. I wonder what that is. And it looks like it’s a grammar, being a notation description for a syntax, of some kind of register documentation file. This looks like an AMD internal format but it’s on this website. Okay. So we have these two URLs, /pragmatic/bonaire.xml and /RAI/rai.grammar4.txt. Let’s try something. How about maybe /pragmatic/bonaire.rai – nah, it’s a 404. Okay, /pragmatic/RAI/bonaire.rai – aah! Bingo! laughter and applause So this is a full – almost full Bonaire register documentation with like full register field descriptions, breakdowns, all the addresses. It’s not 100% but like of the vast majority. This seems to be AMD-internal stuff. And I looked this guy up, and apparently he worked at AMD at some point. So… But yeah… This is really, really helpful because now you know what everything means, and debug registers, and… yeah. So I wrote a working parser for this format. This was effectively writing an XML parser, something like convert this thing to XML but it was all broken. Oh – he was writing it in PHP, by the way, so there you go … So I wrote a working one in Python and you can dump it and then you can see what each register means, and it’ll tell you all the options. You can take a register dump and map it to the (?)(?) documented. You can diff dumps, you can generic defines, it’s very useful for AMD GPUs. And this, grossly speaking applies to a lot of AMD GPUs, like they share a lot of registers. So this is useful for anyone hacking on AMD GPU stuff. Over 4.000 registers are documented in the … just in the main GPU address space alone. That’s great. Okay. So we have some docs. How do we get to a frame buffer? So if you… Israel (?) is HDMI it’s easy, right? The GPU has HDMI, and if you query the GPU information you actually get that it has an HDMI port and a DisplayPort port. Okay, maybe it’s unconnected, that’s fine, right? But if you actually ask the GPU it tells you: “HDMI is not connected, DP is connected”. Okay. Yeah, they have an external HDMI encoder from DisplayPort to HDMI because just putting a wire from A to B is too difficult, because this is Sony, so: “let’s put a chip that converts some protocol A to protocol B…” sighs Yeah, yeah. applause It’s a Panasonic DisplayPort to HDMI bridge, not documented by the way. We parsed config to work, that’s why it doesn’t just work. Even though some bridges do. And you’d think, okay, it’s hooked up to the GPU I2C bus, because GPUs have in the past used these bridges, and, not this one particularly but other AMD cards have had various chips that they stuck in front. And the code has support for talking to them through the GPU I2C interface, right? That’s easy. Yay, you wish – it’s a Sony. sighs Enter ICC! So, remember the ICC thing in the Aeolia – it’s an RPC protocol you use to send commands to an MCU that is somewhere else on the motherboard. It’s a message box system, so you write some message to a memory place, and then you tell: “Hey, read this message!” and then it writes some message back, and it tells you “Hey, it’s the reply!”. The Aeolia – not the otherGPU – uses it for things like Power Button, the LEDs, turning the power on and off, and also the HDMI encoder I2C. So now we have the dependency from the GPU driver to the Aeolia driver, two different PCI devices and two different… sighs Yeah. And okay, again, ICC, but it’s I2C, you know, I2C is a simple protocol. You read a register, you write a register, that’s all you need. It super simple. Right? Now let’s make a byte code fucking scripting engine to which you I2C commands and delays and bit masking and everything. And why, Sony, why, like why would you do this? Well, because ICC is so slow? That if you actually tried to do one read and one write at a time it takes 2 seconds to bring up HDMI. exhales Yeah… I don’t even know at this point… applause I have no idea. continued applause And by the way this thing has commands where you can send scripts in a script to be run when certain events happen. So “Yo dawg, I heard you like scripts, I put scripts in your scripts so you can I2C while you I2C”. Like: “let’s just go even deeper at this point”, right? Yeah. exhales Okay. We wrote some code for this, you need more hacks, it needs all DisplayPort lanes up, Linux tries to downscale, doesn’t work. Memory bandwidth calculation is broken. Mouse cursor size is from the previous GPU generation for some reason, I guess they forgot to update that. So wait! All this crap – we get a frame buffer. But X won’t start. Ah. Well, it turns out that PS4 uses a unified memory architecture so it has a single memory pool that is shared between the x86 and the GPU. And games just put a texture in memory and say: “Hey, GPU, render this!” and that works great. And this makes a lot of sense, and their driver uses this to the fullest extents. So there’s a VRAM, you know, the legacy… GPUs had a separate VRAM and all these integrated chip sets can emulate VRAM using a chunk of the system memory. And you can usually configure that in the BIOS if you have a PC that does this. And PS4 sets it to 16 MB which is actually the lowest possible setting. And 16 Megs is not enough to have more than one Full HD frame buffer. So, obviously, that’s going to explode in Linux pretty badly. So what we do is we actually reconfigure the memory controller in the system to give 1 GB of RAM to the VRAM, and we did it on the psd-kexec. So it’s basically doing like BIOSy things. We were reconfiguring the Northbridge at this point to make this work. But it works. And with this we can get X to start because it can allocate its frame buffer. But okay, it’s 3D time, right? – Neeaah, GPU acceleration doesn’t quite work yet. So we got at least, you know, X but let’s talk a bit about the Radeon GPU for a second. So when you want to draw something on the GPU you send it a command and you do this by putting it into ‘ring’ which is really just a structure in memory, that’s a (?)(?)(?)(?). And it wraps around. So that way you can queue things to be done in the GPU, and then it does it on its own and you can go and do other things. There’s a Graphics Ring for drawing, a Compute Ring for GPGPU, and a DMA Ring for copying things around. The commands are processed by the GPU Command Processor which is really a bunch of different CPUs inside the GPU. They are called F32. And they run a proprietary AMD microcode. So this is a custom architecture. Also the rings can call out to IBs which are indirect buffers. So you can say basically “Call this piece of memory, do this stuff there, return back to the ring”. And that’s actually how the user space thing does things. So this says: “Draw this stuff” and it tells the kernel: “Hey, draw this stuff”. And the kernel tells the GPU: “Jump to that stuff, read it come back, keep doing stuff”. This is basically how most GPUs work but Radeon specifically works like, you know… with this F32 stuff. Okay. The driver complains: “Ring 0 test failed”. Technically (?), you test them, so at least you know it has nice diagnostic, and how does the test work? It’s really easy. It writes a register with a value, and then it tells the GPU with a command “Please write this other value to the register”, runs it and the checks to see if the register was actually written with the new value. So the write doesn’t happen. Thankfully, thanks to that RAI file earlier we found some debug registers that tell you exactly what’s going on inside the GPU. And it shows the Command Processor is stuck, waiting for data in the ring, so it needs more data. After a NOP command?! Yeah… NOP is hard, let’s go stalling. So packet headers in this GPU thing have a size that is SIZE-2. Whoever thought that was a good idea. So a 2 word packet has a size of zero. Then AMD implemented a 1 word packet with a size of -1. And old firmware doesn’t support that and thinks: “Oh it’s 3FFF so I’m just gonna wait for a shitload of code in the buffer”, right? It turns out that Hawaii, which is another GPU in the same gen has the same problem with old firmware. So they use a different NOP packet, so there was an exception in the driver for this. And we had to add ours to that. But again – getting to this point, many, many, many hours of headbanging. Okay. We fixed that. Now it says: “Ring 3 test failed”. That’s the SDMA ring. That’s for copying things in memory and it works in the same way. It puts a value in RAM. It tells the SDMA engine: “hey, write a different value”. And checks. This time we see the write happens but it writes “0” instead if the 0xDEADBEEF or whatever. Okay. So I tried this. I put two Write commands in the ring saying: “Write to one place, write to a different place”. And this time, if I saw, what it did is it wrote “1” to the first destination and “0” to the second destination. I’m thinking: “Okay, it’s supposed to write 0xDEADBEEF…” which is what you see there, it’s… 0xDEADBEEF is that word with the value. It writes “1”. Well, there’s a “1” there that wasn’t there before, it was a “0”, because of this padding, right? So it turns out they have it off by four, in the SDMA command parser and it reads from four words later than it should. exhales Again, this took many hours of headbanging. It was like: “Randomly try two commands, oh, one, one?” – “One”. So it reads four words too late but only in ring buffers. Indirect buffers work fine. That’s good because those come from user space. So we don’t have to mock with those. We can work around this, because it’s only used in two places in the kernel, by using a Fill command instead of a Write command. That works fine. Again,… how do they even make these mistakes?! Okay. But still the GPU doesn’t work. The ring tests pass but if you tried to draw you get a bunch of page faults. And it turns out that what happens is that on the PS4 you can’t write the page table registers from actual commands in the GPU itself. You can write to them from the CPU directly. You can say just: “Write memory – memory register write”, and then I’ll write. But you can’t tell the GPU: “Please write to the page table register this”. So the page tables don’t work, the GPU can’t see any memory, so everything is broken. Linux uses this, FreeBSD doesn’t. It uses direct writes. And we think this is maybe a Firewall somewhere in the Liverpool, some kind of security thing they added. We can directly write from the CPU. But it like breaks the regular… like it’s not asynchronous anymore. So this could break things. And it’s a really hacky solution. I would really like to fix this. And I’m thinking: “Maybe the firewall is in the firmware, right?”. But it’s proprietary and undocumented firmware. So let’s look at that firmware. It’s a thing, it needs microcode, a CP thing. It’s undocumented. But we take the blobs out of FreeBSD. And that’s great because we have don’t have to ship them. Let’s dig deeper into those blobs. So how do you reverse-engineer an unknown CPU architecture? That’s really easy, run an instruction and see what it did. And then just keep doing that. Thankfully, we upload custom firmwares, so it’s actually really easy to just have like a two-instruction firmware that does something, and then writes a register to a memory location. And that’s actually really easy to find. If you first like write the memory instruction, it’s really easy to find in the binary because you see like GPU register offsets that stand out a bit in one column. So long story short, we wrote F32DIS which is a disassembler for the proprietary AMD F32 microcode. I shamelessly stole the instruction syntax from ARM. So you may recognize that if you’ve ever seen an ARM disassembly. And this is not complete but it can disassemble every single instruction in all the firmware in Liverpool for PFP, ME, CE, MEC and RLC which are five different blocks in the GPU. As far as I notice that’s never been done before, all the firmware was like in a voodoo black magic thing that’s been shipped. Not even the non-AMD kernel developers know anything about this. So… applause ongoing applause And you can disassemble the desktop GPU stuff, too. So this could be good for debugging strange GPU shenanigans in non-PS4 stuff. Alright. Alas, it’s not in the firmware. It seems to be blocked in hardware. I found a debug register that actually says: “there was an access violation in the bus when you try to write this thing”. And I tried a bunch of workarounds and I even bought an AMD APU system, desktop. Dumped all the registers, diff’ed them against the one I had on Linux and tried setting every single value from the other GPU and hoping I find some magic bits somewhere, but… no. They probably have a setting for this, somewhere, but it’s a sea of ones and zeros, good luck finding it. It does work with a CPU Write, workaround, though. So, hey, at least we get 3D! And it’s actually pretty stable, so if there’s a race condition I’m not really seeing it. So – checklist! What works, what doesn’t work. We have interrupts, and timers – the core thing you need to run any OS – we have a serial port, we can shutdown the system and reboot, and you’ll think that’s funny but actually that goes through ICC, so again, at least some interesting code there. I actually just implemented that about four hours ago. Because pulling the plug was getting old. The Power button works. USB works. There’s a funny story with USB as it used not to work. And we said: “Fix it later, there seems to be special code missing.” And then someone pulled a repo from the USB-not-working branch, and tested it, and said: “It’s working!” It seems we fixed it by accident, by changing something else. The hard disk works which is via the USB. Blu-ray works, I wrote a driver for that, also four hours ago. – Three hours ago now? Yeah, something like that. And I spent 20 minutes looking for someone in the Hackcenter that had a DVD I could stick in to try. Apparently I’m from he past if I ask for DVDs. But it does work. So that’s good. Wi-Fi and Bluetooth works. Ethernet works, except only at GBit speeds. Frame buffer works. HDMI works. It’s currently hard-coded to 1080p so… It does work. We can fix that by improving the encoder implementation. 3D works with the ugly register write hack. And SPDIF audio works. So that’s good. HDMI audio doesn’t work. Mostly because I only got audio grossly working, in general, recently, and I haven’t had a chance to program the encoder to support the audio stuff yet. Because, again, new more annoying hacks there. And the real-time clock doesn’t work and everything. That’s simple, the clock, that device is simple. But ever since the PS2 the way Sony has implemented real-time clocks is that instead of reading and writing the time on the clock, which is what you would think is the normal thing to do, they never write the time on the clock. Instead, they store an offset from the clock to the real time, in some kind of storage location. And there’s a giant mess of… …registry it’s called, in the PS4, and I don’t even know where it’s stored. It might be on the hard drive, it might be encrypted. So basically, getting the real-time clock to actually show the right time involves a pile of nonsense that I haven’t had the chance to look at yet. But… we have NTP, right? So it’s good enough. – Oh, and we have Blinkenlights! Important! The Power LED does some interesting things, if you’re on Linux. So that’s good. So – the code: you can get the ps4-kexec code on our Github page. That has the kexec and the hardware configuration, and the bootloader Linux stuff. You can get the ps4 Linux branch which is the… our fork of the kernel, rebased on 4.9 which is the latest (?) version, I think. You can get our Radeon patches which are three, I think, really tiny patches for user space libraries just to support this new chip. Really simple stuff, the NOP thing, and a couple of commands. And the RAI and F32DIS thing I mentioned. You can get Radeon tools at that Github repo. Just push that right before the stock. So if you’re interested – there you go. And if you going at the RAI file, well, we wanna put you on a run before the guys at that website realize they really should take that down! But I’m sure the internet wayback machine has it somewhere. Okay! That’s everything for the story of how we got Linux running on the PS4. And you can reach us at that website or fail0verflow on Twitter. applause Thank you! ongoing applause I hope that wasn’t too fast, sorry, I had to rush through my 89 slides a little bit because I really wanted to do a demo. I think this kind of is the demo, right. But we can try something else. So maybe I can shut this – so I can aim with my controller. This is really not meant as a mouse! That’s not Right Button. Come on! Yeah, I think it is… Close? Close! Maybe… So we have this little icon here. I wonder what happens if it works. Do we have internet access? Hopefully Wi-Fi works, let’s then just check real quick. keyboard typing sounds This could bork really badly if we don’t. keyboard typing sounds mumbles ping 8.8.8.8 Yeah, we have internet access. So, Wi-Fi works! Okay. I wonder what happens if we click that! It takes a while to load. This is not optimized for… laughter and applause marcan laughs So the CPUs on this thing are a little bit slow. But… sounds of the machine Hey, it works! And now it’s a real game console! laughter and applause And this is… there we go, okay. So I think we can probably take some Q&A because this is a little bit slow to load. But we can try a game, maybe. Herald: If you are for Q&A I think there will be some questions. So shall we start with one from the internet. Signal Angel: Hey! The internet wants to know if most of your research will be published, or if stuff’s going to stay private. marcan: All of this… the publishing is basically the code which… and you know the explanation I just gave… I said that everything’s on Github. So all the drivers we wrote, all the… I mean… and in this case also the spec is the code. If you really want to I could write some Wiki pages on this. But roughly speaking, what’s in the drivers is what we found out. The really interesting bit, I think, is that F32 stuff from the AMD GPU stuff. And that we have a repo for. But if you have any general questions, or name a particular device, or any details, feel free to ask. I don’t know… again, it would be nice if we wrote a bunch of docs and everything. But it’s not really a matter of not wanting to write them, it’s lazy engineers not wanting to write documentation. But the code is at least… the things we have on Github are fairly clean. So. Herald: Okay, so, someone is piling up on 4. Guys, if you have questions you see the microphones over here. Just pile up over there and I’m gonna point… 4 please! Question: Just a small question. How likely is it that you upstream some of that stuff. Because… I mean… marcan: So there’s two sides to that. One side is that we need to actually get together and upstream it. The code… some of it has horrible hacks, some of it isn’t too bad. So we want to upstream it. We have to sit down and actually do it. I think most of the custom x86 based machine stuff and the kernel is doable. The drivers are probably doable. Some people might scream at the interrupt hacks. But it’s probably not terrible. And if they have a better way of doing it I’m all ears, there are other kernel devs. The Radeon stuff is quite fishy because of the encoder thing that is like (?) non-standard. And also understandably AMD GPU driver developers that work for AMD may want to have nothing to do with this. And in fact I know for a fact that at least one of them doesn’t. But they can’t really stop us from upstreaming things into the Linux kernel, right? So I think as long as we get to come to a state where it’s doable it’s fine. But most likely I think… laughter …I think most likely the non-GPU stuff will go in first if we have a chance to do that. And of course, if you wanna try upstreaming it go ahead! It’s open source, right? So. Herald: Over to microphone 1, please. Question: Hi. First I think I should employ you to try and find trouble Hudson. (?) And control him into using your FreeBSD kexec implementation in heads. Instead of having to run all of Linux in it, as a joke. But my real question is: if the reason you used Gentoo was because systemd was yet another hurdle in getting this to run? laughter marcan laughs marcan: I run Gentoo on my main machine, I run Gentoo on most of the machines I care about. I do run Arch on a few of the others and then I’d live with systemd. But the reason why I run Gentoo is, first it’s what I like and use. And second it’s super easy to use patches on Gentoo. You get those things we put onto Github, which are just patch files, it’s not really a repo. Because they’re so easy it’s not worth cloning everything. Just get those patch files, stick them on /etc/portage/patches/, have a little hook to patch, and that’s all you need. So it’s really easy to patch packages in Gentoo, that’s one of the main reasons. laughs about something in audience Herald: No. 3 please! Question: Will there be new exploits, new way to boot Linux on PS3 with modern firmwares because finding one with firmware 1.76 is really rare. marcan: That was 4.05! Question: Ah, okay. marcan: But again, our goal is to focus on… I just told you the story of the pre-exploit thing because I think that’s good like a hacker story, a good knowledge suite trying new platforms. And the Linux thing we’re working on. The reason why we don’t want to publish the exploit or really get involved in the whole exploit scene is that there is a lot of drama, it’s not rocket science in that it’s like super custom code, this is WebKit and FreeBSD. It’s actually not that hard. And we know for a fact that several people have reproduced this on various firmwares. So there’s no need for us to be the exploit provider. And we don’t want to get into that because it’s a giant drama fest as we all know, anyway. Please DIY it this time! Question: Okay. Thanks. Herald: And what is the internet saying? Signal Angel: The internet wants to know if you ever had fun with the BSD on the second processor. marcan: Oh, that’s a very good question. I myself haven’t. I don’t know if anyone else has looked at it briefly. One of the commands for rebooting will boot that CPU into FreeBSD. And there’s probably fun to be had there. But we haven’t really looked into it. Herald: And over to 5, please. Question: I was wondering if any of that stuff was applicable to the PS4 VR edition or whatever it’s called, the new one? Did you ever test it? marcan: Sorry, say it again! Question: Sony brought up a new PS4 I thought. marcan: Oh, the Pro you mean, the PS4 Pro? Question: Yes. marcan: So Linux boots on the Pro, we got that far. GPU is broken. So we would like to get this ported to the Pro and also working. It’s basically an incremental update, so it’s not that hard, but the GPU needs a new definition, new jBullet(?) stuff. Yeah, you get a lot of C frames down-burned (?), yeah… So, as you can see, 3D works, and, there you go! synth speech from game applause I only have to look up and down in this game! continued synth speech from game Herald: Well, then number 3, please. Question: I want to ask you if you want to port these Radeon patches to the new amdgpu driver because AMD now supports the Southern Island GPUs? marcan: Yes, that’s a very good question. Actually, the first attempt we made at writing this driver was with amdgpu. And at the time it wasn’t working at all. And there was a big concern about its freshness at the time and it was experimentally supporting this GPU generation. I’m told it should work. So I would like to port this… move to amdgpu and we have a working implementation, and we got to clean up code much better, we know where all the nits are, I want to try again with amdgpu and see if that works. That’s a very good question because the newer gen might require the driver maybe, so … Question: Thank you. Herald: Well then I’m gonna guess we ask the internet again. Signal Angel: Okay, the internet states that about a year ago you argued with someone on twitter that the PS4 wasn’t a PC and now you’re saying that kind of is something. And what’s about that? marcan: So again, the reason of saying it’s not a PC is that it’s not an IBM Personal Computer compatible device. It’s an x86 device that happens to be structured roughly like a current PC but if you look at the details so many things are completely different. It really isn’t a PC. Like on Linux I had to define “sub arch PS4”. It’s an x86 but it’s not a PC. And that’s actually a very important distinction because there’s a lot of things you have never heard of that are x86 but not PC. It’s like e.g. there’s a high chance your monitor at home has an 8186 CPU in it. So, yeah. Herald: So nobody’s piling at the microphones any more. Is there one last question from the internet? Signal Angel: Yes, there is. The question is… …if there was any decryption needed. marcan: No. So this is purely… you exploit WebKit, you get user mode, you exploit the kernel, you got kernel mode. You jump Linux… there’s no security like… there’s nothing like stopping you from doing all that stuff. There’s a sand box in FreeBSD but obviously you exploit around the sand box. There’s nothing… there’s no hypervisor, there’s no monitoring, there’s nothing like saying: “Oh this code should not be running.” There’s no like integrity checking. They have a security architecture but as it’s tradition for Sony you can just walk around it. laughter applause The PS3 was notable for the fact that the PS Jailbreak which is a USB… it’s effectively a piracy device that was released by someone that basically used a USB exploit in the kernel and only a USB exploit in the kernel to effectively enable piracy. So when you have like a stack of security and you break one thing and you get piracy that’s a fail! This is basically the same idea. Except I have no idea what you do to do piracy and I don’t care. But Sony doesn’t really know how to architecture secure systems. That’s it. Herald: That’s it, here we go, that’s your applause! applause postroll music subtitles created by c3subtitles.de in the year 2017. Join, and help us!