33C3 preroll music
Herald: You have been
here on stage before.
You successfully tampered with the Wii,
You successfully tampered
with the PS3 and got
some legal challenges over there?
marcan: Some unfounded
legal challenges, yes.
Herald: And then you fucked,
and excuse my French over here
– by the way, that is number 8021 to get
the translation on your DECT phone.
So you fucked with the Wii U as well.
“Console Hacking 2016”,
here we go!
marcan: I’m a lazy guy, so I haven’t
turned on my computer yet for the slides.
So let me do that,
hopefully this will work.
My computer is a little bit special.
It runs a lot of Open Source software.
It runs FreeBSD.
applause
It even has things like OpenSSL
in there, and nginx.
And Cairo I think, and WebKit. It runs a
lot of interesting Open Source software.
But we all know that BSD is dying, so
we can make it run something a little bit
more interesting. And hopefully
give a presentation about it.
Let’s see if this works.
It’s a good start, black screen, you know.
It’s syncing to disk
and file system shutting down.
There we go!
applause
continued applause
And yes, I run Gentoo Linux.
applause
This is the “Does Wi-Fi work?” moment.
Hopefully.
NTP, yeah, no… “NTP failed”. Well,
that’s a bit annoying, but it still works.
Hello? Yeah, it takes a bit to boot.
It doesn’t run systemd, you know.
It’s sane, it’s a tiny bit slower,
but it’s sane.
There we go.
applause
This is the “Does my controller
work?” moment.
Bluetooth in Saal 1.
Okay, it does.
Alright, so let’s get started.
So this is “Console Hacking 2016 –
PS4: PC Master Race”.
I apologize for the horrible Nazi joke in
the subtitle, but it’s a Reddit thing.
“PC Master Race”, why? Well.
PS4, is it a PC? Is it not a PC?
But before we get started,
I would like to dedicate this talk
to my good friend Ben Byer
who we all know as “bushing”.
Unfortunately, he passed away
in February of this year and he was
a great hacker, he came to multiple
congresses, one of the nicest people
I’ve ever met. I’m sure that some of you
who have met him would agree with that.
If it weren’t for him, I wouldn’t be here.
So, thank you.
applause
Alright. So, the PS4.
Is it a PC? Is it not a PC?
Well, it’s a little bit different
from previous consoles.
It has x86, it’s an x86 CPU.
It runs FreeBSD, it runs WebKit.
It doesn’t have a hypervisor,
unfortunately.
Then again, the PS3 had a hypervisor
and it was useless, so there you go.
So this is different from the PS3,
but it’s not completely different.
It does have a security processor
that you can just ignore because
it doesn’t secure anything.
So that’s good.
So how to own a PS4? Well, you write
a WebKit exploit and you write
a FreeBSD exploit, duh. Right?
Everything runs WebKit,
and FreeBSD is not exactly the
most secure OS in the world,
especially not with Sony customizations.
So this is completely boring stuff.
Like, what’s the point of talking about
WebKit and FreeBSD exploits?
Instead, this talk is going to be about
something a little bit different.
First of all, after you run an exploit,
well, you know, step 3 “something”,
step 4 “PROFIT”. What is this about?
And not only that, though.
Before you write an exploit, you usually
want to have the code you’re trying
to exploit. And with WebKit and FreeBSD
you kinda do, but not the build they use,
and it’s customized. And it’s annoying to
write an exploit if you don’t have access
to the binary. So how do you get
the binary in the first place?
Well, you dump the code,
that’s an interesting step.
So let’s get started with step zero:
black-box code extraction, the fun way.
A long time ago
in a hackerspace far, far away
fail0verflow got together
after 31c3.
And we looked at the PS4 motherboard
and this is what we saw. So there’s
an Aeolia southbridge, that’s a codename,
by the way. Then there’s the Liverpool APU
which is the main processor.
It’s a GPU and a CPU
which is done by AMD, and
it has some RAM. And then
the southbridge connects to a bunch
of random crap like the USB ports,
a hard disk, which is USB. For some
inexplicable reason the internal disk
on the PS4 is USB. Like it’s SATA to USB,
and then to USB on the southbridge.
Even though it has SATA,
like, what? laughs
The Blu-ray drive is SATA. The Wi-Fi,
Bluetooth, SDIO and Ethernet is GMII.
Okay, how do we attack this?
Well, GDDR5…
What just…?
Oh. I have a screensaver, apparently!
That’s great.
laughter
I thought I killed that,
let me kill that screensaver real quick.
applause
Something had to fail, it always does.
I mean, of course I can
SSH into my PS4, right?
So there we go, okay.
Could have sworn I’d fix that. Anyway…
Which one of these interfaces
do you attack? Well, you know,
USB, SATA, SDIO, GMII – that’s
the raw ethernet interface, by the way –
all these are CPU-controlled. The CPU
issues commands and the devices reply.
The devices can’t really do anything. They
can’t write to memory or anything like that.
You can exploit USB if you
hide a bug in the USB driver,
but we’re back to the no-code issue.
DDR5, that would be great,
we could just write to our memory
and basically own the entire thing.
But it’s a very high-speed bus.
It’s definitely exploitable.
If you were making a secure system
don’t assume we can’t own DDR5,
because we will.
But it’s not the path of least resistance,
so we’re not gonna do that.
However, there’s a thing called
PCI Express in the middle there.
Hmm, that’s interesting!
PCIe is very fun for hacking –
even though it might seem intimidating –
because it’s bus mastering,
that means you can DMA to memory.
It’s complicated, and complicated things
are hard to implement properly.
It’s robust. People think that PCIe is this
voodoo-highspeed… No it’s not!
It’s high-speed, but you don’t need
matched traces to make it work.
It will run over wet string. You can hotwire
PCIe with pieces of wire and it will work.
At least at short distances anyway.
Believe me, it’s not as bad as you think.
It’s delay-tolerant, so you
can take your time to reply.
And the drivers are full of fail because
nobody writes a PCIe driver assuming
the device is evil even though of course
everybody should because devices can
and will be evil.
But nobody does that.
So, what can we do?
Well, we have a PCIe link,
let’s cut the lines and plug in the
southbridge to a PC motherboard
that we stick on the side. Now
the southbridge is a PCIe card for us.
And we connect the APU to an FPGA
board which then can pretend to be
a PCIe device. So we can man-in-the-middle
this PCIe bus and it’s now x1 width
instead of x4 because it’s easier that
way, but it will negotiate, that’s fine.
So how do we connect that
motherboard and the FPGA?
There’s of course many ways of doing this.
How many of you have done
any hardware hacking, even Arduino or
anything like that? Raise your hand!
I think that’s about a third to a half
or something like that, at least.
When you hack some hardware,
you meld some hardware,
after you blink an LED, what is the first
interface you use to talk to your hardware?
Serial port! So we run
PCIe over RS232 at 115 kBaud
which makes this PCIe…
laughter and applause
I said it was delay-tolerant!
So it makes this PCIe 0.00002x.
And eventually there was a
Gigabit ethernet port on the FPGA
so I upgraded to that, but I only got
around to doing it in one direction.
So now it’s PCIe 0.0002x in one direction
and 0.5x in the other direction
which has to make this one of the most
asymmetric buses in the world.
But it works, believe me.
This his hilarious.
We can run PCIe over serial out. Also, we
were ASCII encoding, so half the bandwidth.
It works fine. It’s fine.
So, PCIe 101.
It’s a reliable packet-switched network.
It uses a thing called
“Transaction Layer Packets”
which are basically just packets you send.
It can be… Memory Read, Memory Write,
IO Read, IO Write,
Configuration Read, Configuration Write.
There can be a message-signaled interrupt
which is a way of saying: “Hey,
listen to me!” by writing
to an address in memory.
Because we can write the thing,
so why not write for interrupts?
It has legacy interrupts
which are basically emulating the old
wire-low-for-interrupt-and-
high-for-no-interrupt thing,
you can tunnel that over PCIe.
And it has completions, which are
basically the replies. So if you read
a value from memory the completion
is what you get back with the value
you tried to read. So that’s PCIe,
we can just go wild with DMA.
We can just read all memory, dump
the kernel. Hey, it’s awesome, right?
Except there’s an IOMMU in the APU.
But... first, the IOMMU will protect
the devices. It will only let you access
what memory is mapped to your device.
So the host has to allow you
to read and write to memory.
But just because there’s an IOMMU
doesn’t mean that Sony uses it properly.
Here’s some pseudo-code,
it has a buffer on the stack, it says:
“please read from flash to this buffer”
with the correct length. Can anyone
see the problem with this code?
Well, it maps the buffer and it
reads and it unmaps the buffer.
But IOMMUs don’t just map
byte “foo” to byte “bar”,
they map pages, and
pages are 64k on the PS4.
So Sony has just mapped 64k
of its stack to the device so
it can just DMA straight into the stack,
basically the whole stack, and take over.
Now we got code execution, FreeBSD
kernel dump, and WebKit and OS libs dump,
just from mapping the flash.
Okay, that’s step zero.
We have the code.
But that’s not the PS4 that we did this
on, it was a giant mess of wires.
Someone here knows about that,
you know, flying over on Facebook.
We don’t make a ‘nice’ exploit.
We’ve done that because, as I said,
WebKit, FreeBSD, whatever.
What comes after that?
We want to do something.
Of course we want to run Linux, duh!
How do you go from FreeBSD to Linux?
It’s not a trivial process.
But you use something
that we call “ps4-kexec”.
So how does this work? It’s simple,
right? You just want to run Linux?
Just ‘jmp’ to Linux, right?
Well… kind of.
You need to load Linux into contiguous
physical RAM, set up boot parameters,
shut down FreeBSD cleanly, halt secondary
CPUs, make new pagetables etc.
A lot of random things. I’m not going to
bore you with this crap because you
can read the code. But there’s a lot
of iteration in getting this to work.
Let’s assume that you do all this magical
cleanup and you get Linux into
a nice state and you can ‘jmp’ Linux.
Now we jmp Linux, right? It’s cool.
Yeah, you can technically jmp to Linux,
and it will technically run
…for a little bit. And it will stop.
And you will not get any serial or any
video or anything. What’s going on here?
Let’s talk about hardware.
What is x86?
x86 is a mediocre instruction set
architecture by Intel.
It’s okay, I guess.
It’s not great.
PS4 is definitely x86, it’s x86-64.
What is a PC? Aah!
PC is a horrible, horrible thing
built upon piles and piles of legacy crap
dating back to 1981.
The PS4 is definitely -not- a PC.
That’s practically Sony-level hardware fail,
so it could be, but it’s not.
So what’s going on? A legacy PC
basically has an 8259 Programmable
Interrupt Controller,
a 8253 Programmable Interval Timer,
a UART at I/O 3f8h,
which is the standard address
for a serial port.
It has a PS/2 keyboard controller, 8042.
It has an RTC, a real-time clock
with a CMOS, and everyone
knows the CMOS, right?
MC146818 is the chip number for that. An
ISA bus – even if you think you don’t have
an ISA bus your computer has an ISA bus
inside the southbridge somewhere.
And it has VGA.
The PS4 doesn’t have -any- of these things.
So what do we do?
Let’s look a little bit how a PC works
and how a PS4 works. This is a general
simple PC system. There’s an APU
or an Intel Core CPU with a southbridge,
Intel calls it PCH, AMD FCH.
There’s an interface that is basically
PCIe although Intel calls it DMI and AMD
calls it UMI. DDR3 RAM and a bunch
of peripherals and SATA, whatever.
The PS4 kind of looks like that, right?
So you think this can’t be that dif…
What’s so hard about this?
Because all the crap I mentioned earlier
is in the southbridge on a PC, right?
The PS4 has a southbridge, right?
Right? Right? Umm… so
the southbridge, the AMD standard FCH
implements Intel legacy from 1981.
The Marvell Aeolia
– Marvell is the maker of the PS4
southbridge – implements Intel legacy
from 2002. What does that mean?
Ah! That’s no southbridge,
that’s a Marvell Armada SoC!
So it’s not actually a southbridge,
it was never a southbridge.
It’s an ARM system-on-a-chip CPU
with everything. It’s a descendant
from Intel StrongARM or XScale.
It has a bunch of peripherals.
And what they did is, they stuck
a PCIe bridge on the side and said: “Hey
x86, you can now use all my ARM shit.”
So it exposes all of its ARM peripherals
to the x86. They added some stuff
they really needed for PCs
and it has its own RAM.
Why do they do this? Well, it also runs
FreeBSD on the ARM in standby mode.
And that’s how they do the whole
“download updates in the background,
get content, update, whatever”.
All that crap is because they have
a separate OS on a separate chip running
in standby mode. Okay, that’s great, but
it’s also batshit insane.
laughter
Quick recap: This is what a
PCIe bus number looks like,
sorry, a device number.
It has a bus number, which is 8 bits,
a device number, which is 5 bits,
and a function number, which is 3 bits.
You’ve probably seen this in lspci
if you ever done that.
This is what a regular southbridge
looks like. It has a USB controller,
a PCI, ISA bridge, SATA, whatever.
And it has a bunch of devices.
So one southbridge pretends
to be multiple devices.
Because you only have three bits
for a function number so you can only have
up to eight functions in one device.
Intel southbridge just says:
“I’m device 14, 16, 1a, 1…,
I’m just a bunch of devices,
and you can talk to all of them.”
If you lspci on a roughly unpatched
Linux kernel on the PS4
you get something like this.
So the Aeolia first of all
clones itself into every PCIe device
because they were too lazy to do
“if device equals my number then
reply, otherwise don’t reply”. No,
they just said: “Oh, just reply to every
single PCIe device that might query”.
Linux sees the southbridge 31 different
times, which is kind of annoying
because it gets really confused when it
sees 31 clones of the same southbridge.
And then it has eight functions:
ACPI, ethernet, SATA, SDMC, PCIe,…
Eight functions, so all three bits.
Turns out, eight functions
are not enough for everybody.
Function no. 4, “PCI Express Glue”, has a
bridge config, MSI interrupt controller,
ICC – we’ll talk about that later –,
HPET timers, Flash controller,
RTC, timers, 2 serial ports, I2C… All
this smashed into one single PCIe device.
Linux has a minimum system requirement
to run on anything.
You need a timer, you need interrupts,
and you need some kind of console.
The PS4 has no PIT, no PIC and no standard
serial so none of the standard PC stuff
is going to work here. The board has
test points for an 8250 standard serial
in a different place. So we run
DMESG over that, okay, fine.
Linux has earlycon which we can
point to a serial port and say:
“Please send all your DMESG here
very early because I really want to see
what’s going on”. Doesn’t need IRQs,
you set console=uart8250,
the type, the address, the speed.
And you’ll see it says 3200 instead of
115 kBaud. That’s because their clock
is different. So you set 3200 but
it really means 115k.
And that gets you DMESG.
That actually gets you “Linux booting,
uncompressing”, whatever.
That’s pretty good.
Okay, we need a timer.
Because otherwise everything explodes.
Linux supports the TSC, a built-in CPU
timer which is super nice and super fun.
And PS4 has that. But Linux tries to
calibrate it against the legacy timer
which on the PS4 doesn’t exist
so that’s fail.
So again, the PS4 -really- is not a PC.
What we need to do here is
defining a new subarchitecture
because Linux supports this concept.
Says: “this is not a PC, this is a PS4”.
The bootloader tells Linux:
“Hey! This is a PS4!”
And then Linux says: “Okay, I’m not gonna
do the old timestamp calibration,
I’m gonna do it for the PS4” which has
a special code that we wrote
that calibrates against the PS4 timer.
And it disables the legacy crap.
So now this is officially
not a PC anymore.
Now we can talk about ACPI.
You might know ACPI for all its
horribleness and all its evilness
and all its Microsoft-y-ness.
ACPI - most people associate it with
“Suspend” and “Suspend to Hibernate”.
It’s not just power,
it does other stuff, too.
So we need ACPI for PCI config,
for the IOMMU, for the CPU frequency.
The PS4 of course has broken ACPI tables
because, of course it would be.
So we fixed them in ps4-kexec.
Now interrupts. We have timers,
we have serial, we fixed some stuff.
The PS4 does message-signaled interrupts
which is, what I said, the non-legacy,
the nice new thing where you just write
a value, and what you do is you tell
the device when you want to interrupt
“please write this value to this address”.
The device does that, and the CPU
interrupt controller sees that write
and says: “Oh, this is an interrupt”
and then just fires off that interrupt
into the CPU. That’s great.
It’s super fast and very efficient.
And the value directly tells the CPU:
“That’s the interrupt vector you have
to go to”. Okay, that’s the standard MSI
way there. Your computer does MSI that way.
This is how the PS4 does MSI: The Aeolia
ignores the MSI config registers
in the standard location. Instead of
has its own MSI controller,
all stuff that’s in Function 4,
which is that “glue” device.
Each function gets a shared address in
memory to write to and the top 27 bits
of data. And every sub function, because
you can’t do a lot of things into one place,
only gets the different 5 bits.
And all MSIs originate from Function 4,
so this device has to fire an interrupt,
then it goes to here, and then
that device fires an interrupt. Like… what…
this is all… what the hell is going on?
Seriously, this is really fucked up. And
– the i’s are missing in the front there.
But yeah. So, driver hell. Now the devices
are interdependent. Then the IRQ vector
location is not sequential, so that’s not
gonna work. And you need to modify
all the drivers. This is really painful to
develop for. So what we ended up doing
is there is a core driver that implements
an interrupt controller for this thing.
And then we have to make sure that loads
first, before the device driver. So Linux
has a mechanism for that. And we had to
patch the drivers. Some drivers we patched,
so to use these interrupts. And others
we wrapped around to use these interrupts.
Unfortunately, because of the top bit
thing, everything has to share one interrupt
within a function. Thankfully, we can fix
that with a IOMMU because it can read
direct interrupt. So we can say:
“Oh, interrupt no. 0 goes to here,
1 goes to here, 2 goes to here…”.
That’s great 'cause it's consecutive, right?
0 1 2 3 4 5… it’s obviously gonna have
the same top bits. But we have to fix
the ACPI table for that because it’s
broken. But this does work. So this
gets us interrupts that function and
they’re individual. So let’s look at
the check list: we have interrupts, timers,
early serial, late serial with interrupts.
We can get some user space, we can stash
some user space and binaries into the kernel.
And it will boot and you can get a console,
but you get a console and you try
writing commands and sometimes it hangs.
Okay. What’s going on there?
So it turns out that FreeBSD masks
interrupts with an AMD proprietary
register set. We had to clean that up,
too. And that fixes serial,
and all the other interrupts.
This took ages to find. It’s like: “why…
interrupts on CPU serial
sometimes don’t…, yeah”.
I ended up dumping register sets,
and I saw this #FFFFF here, not #FFFFF,
what’s that? But tracking through this
stack to find this was really annoying.
Alright. So we have the basics. We have
like a core platform we can run Linux on,
even though it won’t do anything
interesting. Add drivers!
So we have USB xHCI which has three
controllers in one device. Again, because
“Let’s make it insane!”. We have SDHCI,
that’s SDIO for the Wi-Fi and the Bluetooth.
Needs a non-standard config, it needs
quirks. Ethernet needs more hacks.
It’s still partially broken, it only runs at
Gigabit speed. If you plug in a 100Mbit/s
switch it just doesn’t send any data.
Not sure why.
And then all of this worked fine in
Linux 4.4, and then just three days ago
I think I tried to rebase on 4.9, and so
we have the latest and the greatest.
And everything failed. And DMA didn’t
work. And all the drivers were just
throwing their hands up in the air,
“what’s going on here?”.
exhales
Aeolia strikes back. So.
That’s what… the Aeolia looks like,
normally. So you have… again,
it’s an ARM SoC, it’s really not a device.
It’s like its own little system. But
it maps, it’s low 2 GB of the address base
to memory on the PC. And then the PC
has a window into its registers that it
can use to control those devices.
So the PC can kind of play with the
devices, and the DMA is to the same address
and that works great. Because it’s mapped
in the same place. And then has its own RAM,
in its own address space. This works fine.
But now we had an IOMMU. Because
we needed it for the interrupts. And the
IOMMU inserts its own address space
in between and says: “Okay, you can map
anything to anything you want, that’s great.“
It’s a page table, you can say “this
address goes to that address.”
Linux 4.4 did this: it would find some
addresses at the bottom of the IOMMU
address space, say: “page 1 goes to this,
page 2 goes to that, page 3 goes to that”.
And say: “device, you can now write to these
pages”. And they go to this place in the x86.
That worked fine. It turns out Linux 4.9,
or somewhere between 4.4 and 4.9
it started doing this: it would map pages
from the top of the IOMMU address space
and that’s fine for the IOMMU but it’s
not in the window in the Aeolia, so
you say “ethernet DMA to address
FExxx”, and instead of DMA-ing
to the RAM on the PC it DMA-s to the RAM
on the Aeolia which is not gonna work.
Effectively the Aeolia implements 31 bit
DMA, not 32 bit DMA because only
the bottom half is usable. It’s like why…
this is all really fucked up, guys!
Seriously. And this is littered all over
the code in Linux, so they seeded
more patches, and it works, but, yeah.
Painful. Okay. Devices, laying out (?)
devices’ work.
Now for something completely different.
Who can tell me who this character is?
That’s Starsha from Space Battleship Yamato.
And apparently that’s the code name
for the PS4 graphics chip. Or at least that’s
one of the code names. Because
they don’t seem to be able to agree
on like what the code names are.
It’s got “Liverpool” in some places, and
“Starsha” in other places. Then “ThebeJ”
in other places. And we think Sony calls
it “Starsha” and AMD calls it “Liverpool”
but we’re not sure. We are calling it
“Liverpool” everywhere just to avoid
confusion. Okay.
What’s this GPU about?
Well, it’s an AMD Sea
Islands generation GPU,
which is spelled CI instead of SI because
“S” was taken. It’s similar to other chips
in the generation. So at least that’s
not a bat shit crazy new thing.
But it does have quirks and customizations
and oddities and things that don’t work.
What we did is we took Bonaire which is
another GPU that is already supported
by Linux in that generation, and just kind
of added a new chip and said, okay,
do all the Bonaire stuff, and then change
things. And hopefully adapt it to the PS4.
So hacking AMD drivers, okay, well,
they’re open-source but AMD does not
publish register docs. They publish 3D
shader and command queue documentations,
so we get all the user space 3D rendering
commands, that’s documented. But they
don’t publish all the kernel hardware
register documentation. That’s what
we really want for hacking on drivers. So
that’s annoying. And you’re thinking
“the code is the documentation”,
right? “Just read the Linux drivers”.
That’s great. Yeah, but they’re incomplete,
then they have magic numbers, and
it’s, you know, you don’t know if you need
to write a new register that’s not there,
and it really sucks to try to write a GPU
driver by reading other GPU drivers
with no docs. So what do we do? We’re
hackers, right? We google. Everytime
we need information, hopefully Google will
find it because Google knows everything.
And any tip that you could find in any
forum or code dumped somewhere is
great. One of the things we found is we
googled this little string, “R8XXGPU”.
And we get nine results. And the second
result is this place, it’s “Siliconkit”,
token, was that okay? It’s an XML file.
And if we look at that it looks like
it’s an XML file that contains a dump of
the Bonaire GPU register documentation.
But it’s like broken XML, and it’s
incomplete, it stops at one point.
But like: “what’s this doing here?”
And where did this come from, right?
So let’s dig a little deeper. Okay Google,
what do you know about this website?
Well, there’s some random things like
whatthehellno.txt and whatthehellyes.txt
and some Excel files. Those are
really Excel like XML cell sheets.
And then there’s a thing in the (?) there
called RAI.GRAMMAR.4.TXT.
I wonder what that is. And it looks like
it’s a grammar, being a notation description
for a syntax, of some kind of register
documentation file. This looks like
an AMD internal format but it’s on this
website. Okay. So we have these two URLs,
/pragmatic/bonaire.xml
and /RAI/rai.grammar4.txt.
Let’s try something. How about maybe
/pragmatic/bonaire.rai – nah, it’s a 404.
Okay, /pragmatic/RAI/bonaire.rai – aah!
Bingo!
laughter and applause
So this is a full – almost full Bonaire
register documentation with like
full register field descriptions, breakdowns,
all the addresses. It’s not 100% but
like of the vast majority. This seems to
be AMD-internal stuff. And I looked
this guy up, and apparently he worked
at AMD at some point. So…
But yeah… This is really, really helpful
because now you know what everything
means, and debug registers, and… yeah.
So I wrote a working parser for this format.
This was effectively writing an XML parser,
something like convert this thing to XML
but it was all broken. Oh – he was writing
it in PHP, by the way, so there you go …
So I wrote a working one in Python and
you can dump it and then you can see
what each register means, and it’ll tell
you all the options. You can take
a register dump and map it to the (?)(?)
documented. You can diff dumps,
you can generic defines, it’s very useful
for AMD GPUs. And this, grossly speaking
applies to a lot of AMD GPUs, like they
share a lot of registers. So this is useful
for anyone hacking on AMD GPU stuff. Over
4.000 registers are documented in the …
just in the main GPU address space alone.
That’s great. Okay. So we have some docs.
How do we get to a frame buffer? So if you…
Israel (?) is HDMI it’s easy, right? The GPU
has HDMI, and if you query the GPU
information you actually get that it has
an HDMI port and a DisplayPort port. Okay,
maybe it’s unconnected, that’s fine, right?
But if you actually ask the GPU it tells
you: “HDMI is not connected, DP is connected”.
Okay. Yeah, they have an external HDMI
encoder from DisplayPort to HDMI because
just putting a wire from A to B is too
difficult, because this is Sony, so:
“let’s put a chip that converts some
protocol A to protocol B…” sighs
Yeah, yeah.
applause
It’s a Panasonic DisplayPort to HDMI
bridge, not documented by the way.
We parsed config to work, that’s why it
doesn’t just work. Even though some bridges do.
And you’d think, okay, it’s hooked up to the
GPU I2C bus, because GPUs have in the past
used these bridges, and, not this one
particularly but other AMD cards have had
various chips that they stuck in front. And
the code has support for talking to them
through the GPU I2C interface, right?
That’s easy. Yay, you wish – it’s a Sony.
sighs
Enter ICC! So, remember the ICC thing
in the Aeolia – it’s an RPC protocol you
use to send commands to an MCU that is
somewhere else on the motherboard. It’s
a message box system, so you write some
message to a memory place, and then you
tell: “Hey, read this message!” and then
it writes some message back, and it tells
you “Hey, it’s the reply!”.
The Aeolia – not the otherGPU – uses it for things like
Power Button, the LEDs, turning the power
on and off, and also the HDMI encoder I2C.
So now we have the dependency from the
GPU driver to the Aeolia driver, two different
PCI devices and two different… sighs
Yeah. And okay, again, ICC, but it’s I2C,
you know, I2C is a simple protocol.
You read a register, you write a register,
that’s all you need. It super simple.
Right? Now let’s make a byte code
fucking scripting engine to which you I2C
commands and delays and bit masking
and everything. And why, Sony, why, like
why would you do this? Well, because
ICC is so slow? That if you actually tried
to do one read and one write at a time
it takes 2 seconds to bring up HDMI.
exhales
Yeah…
I don’t even know at this point…
applause
I have no idea.
continued applause
And by the way this thing has commands
where you can send scripts in a script
to be run when certain events happen. So
“Yo dawg, I heard you like scripts, I put
scripts in your scripts so you can I2C
while you I2C”. Like: “let’s just go
even deeper at this point”, right? Yeah.
exhales
Okay. We wrote some code for this,
you need more hacks, it needs all
DisplayPort lanes up, Linux tries to downscale,
doesn’t work. Memory bandwidth calculation
is broken. Mouse cursor size is from the
previous GPU generation for some reason,
I guess they forgot to update that. So
wait! All this crap – we get a frame buffer.
But X won’t start. Ah. Well, it turns out
that PS4 uses a unified memory architecture
so it has a single memory pool that is
shared between the x86 and the GPU.
And games just put a texture in memory
and say: “Hey, GPU, render this!” and
that works great. And this makes a lot of
sense, and their driver uses this to the
fullest extents. So there’s a VRAM,
you know, the legacy… GPUs had
a separate VRAM and all these integrated
chip sets can emulate VRAM using a chunk
of the system memory. And you can usually
configure that in the BIOS if you have
a PC that does this. And PS4 sets it to
16 MB which is actually the lowest possible
setting. And 16 Megs is not enough to have
more than one Full HD frame buffer. So,
obviously, that’s going to explode in
Linux pretty badly. So what we do is
we actually reconfigure the memory
controller in the system to give 1 GB
of RAM to the VRAM, and we did it on the
psd-kexec. So it’s basically doing like
BIOSy things. We were reconfiguring the
Northbridge at this point to make this work.
But it works. And with this we can get X
to start because it can allocate its frame buffer.
But okay, it’s 3D time, right? – Neeaah,
GPU acceleration doesn’t quite work yet.
So we got at least, you know, X but let’s
talk a bit about the Radeon GPU
for a second. So when you want to draw
something on the GPU you send it a command
and you do this by putting it into ‘ring’
which is really just a structure in memory,
that’s a (?)(?)(?)(?). And it wraps around.
So that way you can queue things to be done
in the GPU, and then it does it on its own
and you can go and do other things.
There’s a Graphics Ring for drawing,
a Compute Ring for GPGPU, and a DMA Ring
for copying things around. The commands
are processed by the GPU Command Processor
which is really a bunch of different CPUs
inside the GPU. They are called F32.
And they run a proprietary AMD microcode.
So this is a custom architecture.
Also the rings can call out to IBs which
are indirect buffers. So you can say
basically “Call this piece of memory, do
this stuff there, return back to the ring”.
And that’s actually how the user space
thing does things. So this says:
“Draw this stuff” and it tells the kernel:
“Hey, draw this stuff”. And the kernel
tells the GPU: “Jump to that stuff,
read it come back, keep doing stuff”.
This is basically how most GPUs work but
Radeon specifically works like, you know…
with this F32 stuff. Okay. The driver
complains: “Ring 0 test failed”.
Technically (?), you test them, so at least
you know it has nice diagnostic,
and how does the test work? It’s really
easy. It writes a register with a value,
and then it tells the GPU with a command
“Please write this other value
to the register”, runs it and the checks
to see if the register was actually written
with the new value. So the write doesn’t
happen. Thankfully, thanks to that RAI file
earlier we found some debug registers that
tell you exactly what’s going on inside
the GPU. And it shows the Command
Processor is stuck, waiting for data
in the ring, so it needs more data.
After a NOP command?! Yeah…
NOP is hard, let’s go stalling. So packet
headers in this GPU thing have a size
that is SIZE-2. Whoever thought that was
a good idea. So a 2 word packet
has a size of zero. Then AMD implemented
a 1 word packet with a size of -1.
And old firmware doesn’t support that and
thinks: “Oh it’s 3FFF so I’m just gonna wait
for a shitload of code in the buffer”,
right? It turns out that Hawaii,
which is another GPU in the same gen
has the same problem with old firmware.
So they use a different NOP packet, so
there was an exception in the driver
for this. And we had to add ours to that.
But again – getting to this point, many,
many, many hours of headbanging.
Okay. We fixed that. Now it says:
“Ring 3 test failed”.
That’s the SDMA ring. That’s for copying
things in memory and it works
in the same way. It puts a value in RAM.
It tells the SDMA engine: “hey, write
a different value”. And checks. This time
we see the write happens but it writes “0”
instead if the 0xDEADBEEF or whatever.
Okay. So I tried this.
I put two Write commands in the ring
saying: “Write to one place, write to
a different place”. And this time,
if I saw, what it did is it wrote “1”
to the first destination and “0” to the
second destination. I’m thinking:
“Okay, it’s supposed to write 0xDEADBEEF…”
which is what you see there, it’s…
0xDEADBEEF is that word
with the value. It writes “1”.
Well, there’s a “1” there that
wasn’t there before, it was a “0”,
because of this padding, right? So it
turns out they have it off by four,
in the SDMA command parser
and it reads from four words later
than it should.
exhales
Again, this took many hours of
headbanging. It was like:
“Randomly try two commands, oh, one, one?”
– “One”.
So it reads four words too late but only
in ring buffers. Indirect buffers work fine.
That’s good because those come from user
space. So we don’t have to mock with those.
We can work around this, because it’s
only used in two places in the kernel,
by using a Fill command instead of a Write
command. That works fine. Again,…
how do they even make these mistakes?!
Okay. But still the GPU doesn’t work.
The ring tests pass but if you tried
to draw you get a bunch of page faults.
And it turns out that what happens is that
on the PS4 you can’t write the page table
registers from actual commands in the GPU
itself. You can write to them from the CPU
directly. You can say just: “Write memory
– memory register write”, and then
I’ll write. But you can’t tell the GPU:
“Please write to the page table register this”.
So the page tables don’t work, the GPU
can’t see any memory, so everything is broken.
Linux uses this, FreeBSD doesn’t. It uses
direct writes. And we think this is maybe
a Firewall somewhere in the Liverpool,
some kind of security thing they added.
We can directly write from the CPU.
But it like breaks the regular…
like it’s not asynchronous anymore. So
this could break things. And it’s a really
hacky solution. I would really like to fix
this. And I’m thinking: “Maybe the firewall
is in the firmware, right?”. But it’s
proprietary and undocumented firmware.
So let’s look at that firmware. It’s
a thing, it needs microcode, a CP thing.
It’s undocumented. But we take the blobs
out of FreeBSD. And that’s great because
we have don’t have to ship them. Let’s
dig deeper into those blobs. So how do you
reverse-engineer an unknown CPU
architecture? That’s really easy,
run an instruction and see what it did.
And then just keep doing that. Thankfully,
we upload custom firmwares, so it’s
actually really easy to just have like
a two-instruction firmware that does
something, and then writes a register
to a memory location. And that’s actually
really easy to find. If you first like
write the memory instruction, it’s really
easy to find in the binary because you see
like GPU register offsets that stand out
a bit in one column. So long story short,
we wrote F32DIS which is a disassembler
for the proprietary AMD F32 microcode.
I shamelessly stole the instruction
syntax from ARM. So you may recognize
that if you’ve ever seen an ARM disassembly.
And this is not complete but it can
disassemble every single instruction
in all the firmware in Liverpool for PFP,
ME, CE, MEC and RLC which are five
different blocks in the GPU. As far
as I notice that’s never been done before,
all the firmware was like in a voodoo
black magic thing that’s been shipped.
Not even the non-AMD kernel developers
know anything about this. So…
applause
ongoing applause
And you can disassemble the desktop
GPU stuff, too. So this could be good for
debugging strange GPU shenanigans
in non-PS4 stuff.
Alright. Alas, it’s not in the firmware.
It seems to be blocked in hardware.
I found a debug register that actually
says: “there was an access violation
in the bus when you try to write this
thing”. And I tried a bunch of workarounds
and I even bought an AMD APU system,
desktop. Dumped all the registers,
diff’ed them against the one I had on Linux
and tried setting every single value
from the other GPU and hoping I find some
magic bits somewhere, but… no.
They probably have a setting for this,
somewhere, but it’s a sea of ones and zeros,
good luck finding it. It does work with
a CPU Write, workaround, though.
So, hey, at least we get 3D! And it’s
actually pretty stable, so if there’s
a race condition I’m not really seeing it.
So – checklist! What works,
what doesn’t work. We have interrupts,
and timers – the core thing you need
to run any OS – we have a serial port,
we can shutdown the system and reboot,
and you’ll think that’s funny but actually
that goes through ICC, so again,
at least some interesting code there.
I actually just implemented that about
four hours ago. Because pulling the plug
was getting old. The Power button works.
USB works. There’s a funny story with USB
as it used not to work. And we said:
“Fix it later, there seems to be special
code missing.” And then someone
pulled a repo from the USB-not-working
branch, and tested it, and said:
“It’s working!” It seems we fixed it by
accident, by changing something else.
The hard disk works which is via the USB.
Blu-ray works, I wrote a driver for that,
also four hours ago. – Three hours ago
now? Yeah, something like that.
And I spent 20 minutes looking for someone
in the Hackcenter that had a DVD I could
stick in to try. Apparently I’m from
he past if I ask for DVDs.
But it does work. So that’s good. Wi-Fi
and Bluetooth works.
Ethernet works, except only at GBit speeds.
Frame buffer works. HDMI works.
It’s currently hard-coded to 1080p so…
It does work. We can fix that
by improving the encoder implementation.
3D works with the ugly register write hack.
And SPDIF audio works. So that’s good.
HDMI audio doesn’t work. Mostly because
I only got audio grossly working, in
general, recently, and I haven’t had
a chance to program the encoder to support
the audio stuff yet. Because, again,
new more annoying hacks there. And the
real-time clock doesn’t work and everything.
That’s simple, the clock, that device is
simple. But ever since the PS2 the way
Sony has implemented real-time clocks
is that instead of reading and writing
the time on the clock, which is what you
would think is the normal thing to do,
they never write the time on the clock.
Instead, they store an offset from the clock
to the real time, in some kind of storage
location. And there’s a giant mess of…
…registry it’s called, in the PS4, and
I don’t even know where it’s stored.
It might be on the hard drive, it might be
encrypted. So basically, getting
the real-time clock to actually show the
right time involves a pile of nonsense
that I haven’t had the chance to look at
yet. But… we have NTP, right?
So it’s good enough. – Oh, and we have
Blinkenlights! Important! The Power LED
does some interesting things, if you’re
on Linux. So that’s good.
So – the code: you can get the ps4-kexec
code on our Github page. That has
the kexec and the hardware configuration,
and the bootloader Linux stuff.
You can get the ps4 Linux branch which is
the… our fork of the kernel,
rebased on 4.9 which is the latest (?)
version, I think.
You can get our Radeon patches which are
three, I think, really tiny patches for
user space libraries just to support this
new chip. Really simple stuff, the NOP
thing, and a couple of commands. And the
RAI and F32DIS thing I mentioned.
You can get Radeon tools at that Github
repo. Just push that right before the stock.
So if you’re interested – there you go.
And if you going at the RAI file, well,
we wanna put you on a run before the guys
at that website realize they really should
take that down! But I’m sure the internet
wayback machine has it somewhere.
Okay! That’s everything for the story of
how we got Linux running on the PS4.
And you can reach us at that website
or fail0verflow on Twitter.
applause
Thank you!
ongoing applause
I hope that wasn’t too fast, sorry, I had
to rush through my 89 slides a little bit
because I really wanted to do a demo.
I think this kind of is the demo, right.
But we can try something else.
So maybe I can shut this –
so I can aim with my controller.
This is really not meant as a mouse!
That’s not Right Button.
Come on! Yeah, I think it is…
Close? Close! Maybe…
So we have this little icon here.
I wonder what happens if it works.
Do we have internet access? Hopefully
Wi-Fi works, let’s then just check real quick.
keyboard typing sounds
This could bork really badly if we don’t.
keyboard typing sounds
mumbles ping 8.8.8.8
Yeah, we have internet access.
So, Wi-Fi works!
Okay. I wonder what happens
if we click that!
It takes a while to load.
This is not optimized for…
laughter and applause
marcan laughs
So the CPUs on this thing are
a little bit slow. But…
sounds of the machine
Hey, it works!
And now it’s a real game console!
laughter and applause
And this is… there we go, okay.
So I think we can probably take some Q&A
because this is a little bit slow to load.
But we can try a game, maybe.
Herald: If you are for Q&A I think
there will be some questions.
So shall we start with one
from the internet.
Signal Angel: Hey! The internet wants to
know if most of your research will be
published, or if stuff’s
going to stay private.
marcan: All of this… the publishing is
basically the code which… and you know
the explanation I just gave… I said that
everything’s on Github. So all the drivers
we wrote, all the… I mean… and in this
case also the spec is the code.
If you really want to I could write some
Wiki pages on this. But roughly speaking,
what’s in the drivers is what we found
out. The really interesting bit,
I think, is that F32 stuff from the AMD
GPU stuff. And that we have a repo for.
But if you have any general questions, or
name a particular device, or any details,
feel free to ask. I don’t know… again, it
would be nice if we wrote a bunch
of docs and everything. But it’s not really
a matter of not wanting to write them,
it’s lazy engineers not wanting to write
documentation. But the code is at least…
the things we have on Github are fairly
clean. So.
Herald: Okay, so, someone is piling up
on 4. Guys, if you have questions
you see the microphones over here.
Just pile up over there
and I’m gonna point… 4 please!
Question: Just a small question.
How likely is it that you upstream
some of that stuff. Because… I mean…
marcan: So there’s two sides to that.
One side is that we need to actually
get together and upstream it. The code…
some of it has horrible hacks, some of it
isn’t too bad. So we want to upstream it.
We have to sit down and actually do it.
I think most of the custom x86 based
machine stuff and the kernel is doable.
The drivers are probably doable.
Some people might scream at the interrupt
hacks. But it’s probably not terrible.
And if they have a better way of doing it
I’m all ears, there are other kernel devs.
The Radeon stuff is quite fishy because of
the encoder thing that is like (?) non-standard.
And also understandably
AMD GPU driver developers
that work for AMD may want to have nothing
to do with this. And in fact I know
for a fact that at least
one of them doesn’t. But
they can’t really stop us from upstreaming
things into the Linux kernel, right?
So I think as long as we get to come
to a state where it’s doable it’s fine.
But most likely I think…
laughter
…I think most likely the non-GPU stuff
will go in first if we have a chance
to do that. And of course, if you wanna
try upstreaming it go ahead!
It’s open source, right? So.
Herald: Over to microphone 1, please.
Question: Hi. First I think I should
employ you to try and find trouble Hudson. (?)
And control him into using your FreeBSD
kexec implementation in heads.
Instead of having to run all of Linux in it,
as a joke. But my real question is:
if the reason you used Gentoo was
because systemd was yet another hurdle
in getting this to run?
laughter
marcan laughs
marcan: I run Gentoo on my main machine,
I run Gentoo on most of the machines
I care about. I do run Arch on a few of
the others and then I’d live with systemd.
But the reason why I run Gentoo is, first
it’s what I like and use. And second it’s
super easy to use patches on Gentoo.
You get those things we put onto Github,
which are just patch files, it’s not really
a repo. Because they’re so easy
it’s not worth cloning everything. Just
get those patch files, stick them on
/etc/portage/patches/, have a little hook to patch,
and that’s all you need. So it’s really
easy to patch packages in Gentoo,
that’s one of the main reasons.
laughs about something in audience
Herald: No. 3 please!
Question: Will there be new exploits,
new way to boot Linux
on PS3 with modern firmwares
because finding one
with firmware 1.76 is really rare.
marcan: That was 4.05!
Question: Ah, okay.
marcan: But again, our goal is to focus
on… I just told you the story of the
pre-exploit thing because I think
that’s good like a hacker story, a good
knowledge suite trying new platforms.
And the Linux thing we’re working on.
The reason why we don’t want to publish
the exploit or really get involved in the
whole exploit scene is that there is
a lot of drama, it’s not rocket science
in that it’s like super custom code,
this is WebKit and FreeBSD. It’s actually not
that hard. And we know for a fact
that several people have reproduced this
on various firmwares. So there’s no need
for us to be the exploit provider. And
we don’t want to get into that because
it’s a giant drama fest as we all know,
anyway. Please DIY it this time!
Question: Okay. Thanks.
Herald: And what is the internet saying?
Signal Angel: The internet wants to know
if you ever had fun with the BSD
on the second processor.
marcan: Oh, that’s a very good question.
I myself haven’t. I don’t know if anyone
else has looked at it briefly. One of the
commands for rebooting will boot
that CPU into FreeBSD. And there’s
probably fun to be had there.
But we haven’t really looked into it.
Herald: And over to 5, please.
Question: I was wondering if any of that
stuff was applicable to the PS4 VR edition
or whatever it’s called, the new one?
Did you ever test it?
marcan: Sorry, say it again!
Question: Sony brought up a new PS4
I thought.
marcan: Oh, the Pro you mean,
the PS4 Pro?
Question: Yes.
marcan: So Linux boots on the Pro,
we got that far. GPU is broken. So we
would like to get this ported to the Pro
and also working. It’s basically an
incremental update, so it’s not that hard,
but the GPU needs a new definition,
new jBullet(?) stuff.
Yeah, you get a lot of C frames
down-burned (?), yeah…
So, as you can see, 3D works,
and, there you go!
synth speech from game
applause
I only have to look up and down in this game!
continued synth speech from game
Herald: Well, then number 3, please.
Question: I want to ask you if you want to
port these Radeon patches to the new
amdgpu driver because AMD now supports
the Southern Island GPUs?
marcan: Yes, that’s a very good question.
Actually, the first attempt we made
at writing this driver was with amdgpu.
And at the time it wasn’t working at all.
And there was a big concern about its
freshness at the time and it was
experimentally supporting this GPU
generation. I’m told it should work.
So I would like to port this… move to
amdgpu and we have a working
implementation, and we got to clean up
code much better, we know where all
the nits are, I want to try again with
amdgpu and see if that works.
That’s a very good question because the
newer gen might require the driver maybe, so …
Question: Thank you.
Herald: Well then I’m gonna guess we ask
the internet again.
Signal Angel: Okay, the internet states
that about a year ago you argued
with someone on twitter that the PS4 wasn’t
a PC and now you’re saying that kind of
is something. And what’s about that?
marcan: So again, the reason of saying
it’s not a PC is that it’s not an IBM
Personal Computer compatible device.
It’s an x86 device that happens to
be structured roughly like a current PC
but if you look at the details
so many things are completely different.
It really isn’t a PC. Like on Linux I had
to define “sub arch PS4”. It’s an x86
but it’s not a PC. And that’s actually
a very important distinction because
there’s a lot of things you have
never heard of that are x86 but not PC.
It’s like e.g. there’s a high chance
your monitor at home has
an 8186 CPU in it. So, yeah.
Herald: So nobody’s piling at the
microphones any more.
Is there one last question
from the internet?
Signal Angel: Yes, there is.
The question is…
…if there was any
decryption needed.
marcan: No. So this is purely… you
exploit WebKit, you get user mode,
you exploit the kernel, you got kernel
mode. You jump Linux…
there’s no security like… there’s nothing
like stopping you from doing
all that stuff. There’s a sand box in
FreeBSD but obviously you exploit
around the sand box. There’s nothing…
there’s no hypervisor, there’s no monitoring,
there’s nothing like saying: “Oh this code
should not be running.” There’s no
like integrity checking. They have a security
architecture but as it’s tradition for Sony
you can just walk around it.
laughter
applause
The PS3 was notable for the fact that
the PS Jailbreak which is a USB…
it’s effectively a piracy device
that was released by someone
that basically used a USB exploit
in the kernel and only a USB exploit
in the kernel to effectively enable piracy.
So when you have like a stack of security
and you break one thing and you get
piracy that’s a fail! This is basically
the same idea. Except I have no idea what
you do to do piracy and I don’t care.
But Sony doesn’t really know how to
architecture secure systems.
That’s it.
Herald: That’s it, here we go,
that’s your applause!
applause
postroll music
subtitles created by c3subtitles.de
in the year 2017. Join, and help us!