<i>rC3 opening music</i>

Jiska: Hello everyone and welcome to my
talk, Fuzzing the phone in the iPhone. The

phone in the iPhone is the component that
receives SMS, sends SMS, receives phone

calls, makes phone calls and also manages
your Internet connection when you are not

on Wi-Fi. However, you might now wonder,
what is it exactly? So I'm talking about

CommC enter and fuzzing it via the QMI and
ARI interfaces. But this is a bit too

technical for most of you. So I will first
introduce you to the concept of fuzzing in

general and protocol fuzzing before I dive
into further details. For those of you

have not yet heard about the concept of
fuzzing - you can send a lot of random

messages and then try to test the security
of an interface with this. And in this

video, you can see how I send SMS over a
Frida-based fuzzer with something like 400

fuzzcases per second. And then the IMH
receives them, catches them and sends a

couple of them also to the smartphone.
Let's start with a motivation and an

explanation to the attacker model. So, if
you look into a modern smartphone, you

have two components if you want to show it
in a simple way. So first of all, there's

the hardware part with a lot of chips. And
then on top of this, there is an operating

system and applications. However, it's not
as simple as this because even those chips

are so complex that they run their own
little real-time operating systems to

preprocess data. So this means that you
can even get code execution on such a

chip. And this is usually much easier than
in the operating system itself, because

those chips cannot have that many
mitigations. However, so what do you even

do if you have code execution in such a
chip, so if you are in a baseband chip,

then one escalation strategy from the chip
towards the operating system might be to

manipulate traffic in the browser.
However, I don't think that this is the

case, because if you look at the Zerodium
price list, then actually the browser

exploits are much more expensive. So it's
probably not done like this. And there

must be other ways to escalate from this
chip into the operating system. In

general, the traffic manipulation is
something that you can always do in

wireless transmission or also on the
Internet. So if you look how those systems

work these days, so you have something
like the Internet in general that serve

websites and so on, and also the core
network of your mobile provider. And there

are many, many ways to manipulate traffic,
either if you are a state level actor who

is able to have something in the core
network or just by sending around websites

or modifying websites. And then there is
the base station subsystem, there might

also be dragons. We don't know exactly.
And of course, there are over-the-air

transmissions and wireless transmissions
are very special because, if there is

something just slightly broken in the
encryption, for example, then it's also

possible to manipulate traffic there, if
you have a software defined radio, for

example. So all of this could be attacked
to manipulate traffic. And I don't think

that for this, one would craft a baseband
exploit. Already in 2014 at the CCC, there

have been two talks about a SS7 protocol
which is run in the core network and is

actually meant to connect different
mobile carriers to each other. And this

can also be used to intercept phone calls,
for example. And this also has been

exploited recently. So even though, there
have been some mitigations, etc. since

then, it's still exploited for the same
purpose to spy on people. So really,

really, really, basement exploits only
exists to escalate from the chip into the

operating system. But now the question is,
what are the strategies? So if it's not

via the browser, what else could it be? So
the browser really I'm sure it is not,

because also you need to have some
traffic and so on, it doesn't really work

instantly, you need to visit the website
to replace traffic on a website and so on.

There must be something else. So
if you are on the chip with remote code

execution and want to go into the
operating system, there is some interface.

And this means that something in those
interfaces needs to be exploitable, so

that you can escalate the privileges from
the chip into the system. And also, those

interfaces are very interesting from a
reverse engineer's perspective. So even if

you don't want to attack anything, just
understanding how they work, is also a

goal of this work. So, for example, if you
have a baseband debug profile, you can

just download it onto your iPhone and then
you open your iDevice syslog, you can

already see a lot of management messages
that are exchanged between the chip and

the iPhone. And if you have a jailbreak
and Frida, you can even inject packets or

modify packets to change the behaviour of
your modem.But if you want to start to

work on such a thing, the question is
like, how do you even start? Where do you

start? And fuzzing is actually a method
that can be used to understand such an

interface. So initially, if you identified
an interface, just to check if it is the

correct interface, so, if it really
changes behaviour, if you flip some bytes,

but also how powerful this interface is.
So what are the features? What breaks

instantly? And if things break, also you
can check if the whole interface has been

designed with security in mind. Now, let's
start with an introduction to wireless

protocol fuzzing, this will also be a
short rant because the current tooling for

fuzzing is usually not made to fuzz a
protocol. So let's start with a very

simple fuzzer, a fuzzer that is just an
image parser. So, you browse your

smartphone for unicorn pictures or PNGs or
JPEGs, and then you send them to the image

parser and in the image parser you might
be able to observe which functions are

executed in the form of basic blocks. And
then, during this initialization, the

image parser can even report which parts
were executed and you can just go to image

again and again with different images and
get this basic block coverage back. In a

next step, you can then combine existing
images or flip bits in these images and

send them to the image parser and again
observe the coverage, most of the time, it

won't generate any new coverage. So you
just say you are not looking into this

image in particular, but sometimes you
might get new coverage, like here, and

then you add this image to your corpus. So
over time, you can increase your corpus

and increase your coverage. Another method
can be, if you know how exactly an image

format looks like, so you might know the
JPEG specification and because of this,

you could just generate images that are
more or less specification compliant and

they look more artificial like this. So
you just generate images and send them to

the image parser and at some point you
might observe a crash. So that also

depends, again, on your harnessing. Maybe
you can observe basic blocks, maybe you

can just observe crashes and then you know
at which image you had a crash. You might

even be able to combine these two
approaches just depending on what you know

about your input and how you can harness
your target. Now it looks a bit different

for a protocol. So, in a protocol, you can
have a very complex state. Let's say you

are in an active phone call or just
something like, you receive an SMS. You

can actually force the iPhone to receive
SMS, if you have a second iPhone and send

SMS. And then during the fuzzing, you can
replace some bits and bytes, like this and

then you would have a modification. So
this is a very simple approach and it

preserves the state. So no matter how
complex the thing is, that you're

currently doing, it's very simple to flip
a bit here and there in an active

interaction. But it's also a bit annoying,
because you need to have these active

phone calls, etc. So something that's more
efficient is injection. So you would

observe certain messages and then just send
them again - and then you don't even need

the second phone to make calls, etc., -
you can just send a lot, a lot, a lot of

data. And this is the effect, when your
iPhone goes di-di-di-di-dimm or something

because of all the notifications and all
the data that is sent. But issue here is,

that this does not preserve state. So
there might be actions where the iPhone

requests something that is then answered.
So, the iPhone might request, for example,

a date and only then the chip would reply
with a date and only then the iPhone would

accept a date. But it's still very
interesting to do this. So even though you

cannot reach certain states because you
can do this without a SIM card and you can

do this very, very fast. So, just to
summarize the issues here: if you fuzz the

wireless protocol, you can have very
significant state differences and just

injecting packets cannot reach all
states. The fact, that you cannot reach

all states also shows in very simple stuff
like a trace replay. So a trace of

something that you record. So let's say I
have an active phone call, I record all

the packets, and I can also observe the
coverage. So , with Frida, you can observe

coverage on an iPhone while the phone call
is active. And then, in a second step, you

would do some injection. But the only
thing that you can inject are the packets

sent from the basement to the smartphone,
not the opposite direction. And this

results usually in much less coverage. So
you are missing a lot of things due to a

missing state. And even worse, if you do
the same thing again, you might be in a

different state, and you might observe a
different coverage. So you do the exact

same thing, but you get different
coverage.So, even replaying recorded

messages results in less or inconsistent
coverage. Anyway, let's take a look into

an injection example. So, in this video,
you can see how I'm in the Unicorn Network

on an iPhone 8, which has obviously 5G,
but also does a lot of fuzzing and in the

fuzzing, what is interesting is, that you
might do a lot of states in a combination

that are not usually possible, like you
have a lost network connection while you

have to confirm a pin or you have a
network connection during this, etc. To

summarize my rant, some states cannot be
reached solely by injecting packets. So,

even if we have a very good corpus and do
very good mutations, we might miss

80% of the code, but we can just fuzz
anyway. But we need to keep in mind, that

some stuff is just not fuzzable. We looked
into a lot of wireless protocols and have

seen more in the past, so, it's worth to
also consider, which tooling we already

had available for fuzzing protocols. The
most advanced tooling, that we have, is

Frankenstein and it's built by Jan. So,
what Jan did is, he emulated the firmware

and attached it to a virtual modem and
also a Linux host. For this, he first

looked into the firmware, that's here, and
we had some partial symbols for this and

also some information about registers.
Then, Frankenstein is actually taking a

snapshot, that you can see here, including
some of those registers of the modem. And

with this, you can build a virtual modem
and fuzz input as if it would come over

the air. Then Frankenstein also emulates
the whole firmware, including thread

switches. So it gets into very complex
states and it's even attached to a Linux

host. So, it also fuzzes a bit of Linux
while actually fuzzing the firmware

itself. Now, the issue with this is that
basement firmware is usually 10 times the

size of bluetooth firmware or even more,
and we don't have any symbols for this, so

it's a lot of work to customize this. And
even if, one would do all those steps and

put all the work into this, it's only, so
to say, code execution in the baseband.

It's not yet a privilege escalation into
the operating system. The next interesting

tooling was built by Steffen and what
Steffen did, he built a fuzzer based on

DTrace and AFL. DTrace is a tool that can
provide functional level coverage in the

macOS kernel and user space. With some
modifications you can even get basic

block coverage in the user space, which is
required for AFL to work. So, in the end,

you have AFL or AFL++ as a fuzzer on any
program on macOS. It's even slightly

faster than Frida, at least the version
that he used. And he gets a couple of

thousand fuzz cases per second, even on a
very old iMac. So, in our lab, we just had

an old iMac 2012 for this and it works on
this. But the issue is, that Wi-Fi and

Bluetooth, which he fuzzed, are very complex
protocols, so he couldn't find any new

bugs with AFL. And also, in the kernel
space, you only get this function level

coverage. He still, despite not finding
any bugs in Wi-Fi or Bluetooth, got a CVE,

because DTrace also has bugs. So, at least
some funding, but on iOS, this is not

supported out of the box. So it might be
possible to get DTrace working with some

tweaks, but it's a lot of work. So
probably it's easier to just use Frida in

the iOS user space. Also during this, so
while Steffen was building all this very

advanced tooling, Wang Yu found issues in
the macOS Bluetooth and Wi-Fi drivers, and

so he was very, very successful in
comparison to us. That's really a pity.

And I think, what he did, is much better
state modelling, so, of how the messages

interact and what is important to reach
certain functions. So what is still left?

So, usually fuzzing the baseband means
that you need to modify firmware or also

emulate firmware, you need to implement
very complex specifications on a software

defined radio if you want to fuzz over the
air or build proof of concepts. And for

everything that's somewhat proprietary,
you need to do protocol reverse

engineering, so you can spend a lot of
time and money just to do very, very basic

research. Or, you can also use Frida, so
you can fuzz with Frida and all you need

to do for this is, write a few lines of
code in JavaScript. So I kid you not. The

option is Frida. Dennis was the first in
our team who was advised as a thesis

student who built a Frida-based fuzzer,
and it's called ToothPicker. It's based on

Frizzer and Radamsa. So what it does is,
well, it hooks into these connections or

into the protocols of the bluetooth
daemon, you could also think of this upper

part here, as one block. So the protocols
are implemented in the Bluetooth daemon,

but we want to fuzz certain protocol
handlers. And to increase the coverage, he

creates a virtual connection. So a virtual
connection holds a connection and pretends

to the Bluetooth daemon that there would
be an active connection to a device. And

of course, the chip would then say, I
don't know anything about this connection.

So, there are also some abstractions in
here, so that the connection is not

terminated. So, that's a very simple tool,
but it really found a lot of bugs and

issues and even there were some issues in
the protocols themselves that also apply to

macOS. So it's not just iOS bugs, but also
protocol bugs in macOS that Dennis found.

And this really got me thinking,
because ToothPicker with only 20

fuzz cases per second, so it's really,
really slow and we were still able to find

Bluetooth vulnerabilities at this speed.
So, why is this? So first of all, if you

try to fuzz Bluetooth over the air, then
the over-the-air connections are

terminated after something like five
invalid packets. So, over-the-air fuzzing

is really, really inefficient. And with
Frida you can actually patch these

functions, so it's gone. Then the
virtual connections are a very important

factor. So they are really, really
important for having coverage. It's still

a lot of coverage that we missed during
replay and fuzzing. But it's

really an advantage compared to the
other fuzzing approaches where you just

inject packets. And in addition, there is
an issue here, because if you have a

virtual connection, it might be that this
virtual connection triggers behaviour,

that you cannot reproduce over the air.
So, that means that everything that you

find, you need also to confirm that it
works over the air. At least the

inconsistent coverage is also fixed in
ToothPicker, because ToothPicker

replays all packets five times in a row.
But the issue here is that it also means

that if you have a sequence of packets,
that is like generating a certain bug -

so you need multiple packets - this is
nothing that the mutator is aware of and

also nothing that's logged properly in
ToothPicker. And because of this, I got a

bit anxious. Maybe we missed a 
lot of things? So once I got the

intuition that we are actually missing
certain state information, I had the idea

to replace bytes in active connections.
And this is one part of that you can see

on a keyboard, so I'm just replacing bytes
on keyboard input and see what happens.

And I let this run for a couple of weeks,
also for different protocols and so on to

see, if there are further bugs or not that
we didn't find previously. So here you

can see the same for AirPods with SCO and
then they produce crack-sounds for the

replace bytes, it's even worse for ACL, so
actual music, because then you can hear

very noisy chirps. I let this fuzzer run
for multiple weeks and it didn't find

any bugs that ToothPicker hadn't
discovered before. So, I think the reason

for this is that I mainly passed in active
connections like the one with the audio

or the keyboard, but I only passed a few
active pairings because this requires me

to actually perform those pairings by
hand, so, nothing really interesting. The

only bad thing that I could produce with
this, but not worth a CVE, is that the

sound quality of my AirPods is now a
really, really bad. Well, OK. And also the

Broadcom chips on iOS don't check the UART
lengths, but that's not that bad. So, I

mean, if you consider that they removed
the write-RAM recently, then you might now

still be able to write into the RAM via UART
buffer overflows. But yeah, nothing too

interesting. So after all of this, I asked
myself: "What is still left for fuzzing if

we cannot find a new Bluetooth or Wi-Fi
bugs?" Well, the iPhone baseband - or

actually the iPhone basebands, because
there are two. The first variant of iPhone

baseband, that you can get, are Qualcomm
chips and they are in the US devices they

use the Qualcomm MSM interface. And this
interface comes with some documentation

and there are even open source
implementations for it. So it's something

that's probably easy to understand and
easy to fuzz. On the other hand in almost

all devices that I had on my table, were
Intel chips. Intel has been recently

bought by Apple, at least the part that
does the baseband chips and these are the

chips in the European devices, that's
the reason why almost all my devices had

Intel chips. And they use a special
protocol. It's called Apple Remote

Invocation. And if you search for this on
the Internet, I even checked it like

just today, there are no Google hits at
all. So it really hasn't been researched

before, at least not publicly. It's
completely undocumented and it's a very

custom interface. So it's not even used
for Android. It's really an interface

just for Apple. The component that we are
going to fuzz in the following is CommCenter.

So CommCenter is the equivalent of, for
example, the Bluetooth or Wi-FI daemon,

but for telephony. It's sandboxed as the
user "wireless", but it comes with a lot of

XPC interfaces. And this is something
that we will also see later in the

fuzzing results. The next part is that
there are two flavors of libraries, so

depending on if you have a Qualcomm or an
Intel chip, different libraries will be

used before certain actions or data
actually is then processed by the

CommCenter itself. So we have a different
code paths here. But all of this runs in

user space, and this means that both
libraries can be hooked with Frida and can

be fuzzed with Frida. So that's very
interesting. There is still a lot of stuff

that goes on in the kernel. So what you
can see here is that QMI and ARI have some

management information that is sent to
CommCenter, but they don't contain the

raw network or audio data. So they don't
contain your phone call, they don't

contain your website that you are opening.
And the next issue is that QMI and ARI

are not directly sent over the air, but
what is sent over the air are normal

baseband interactions and these generate
QMI and ARI messages. So there's still

some section in between, but of course,
there are now two ways: either you have

interaction that you can do over the air,
that is causing ARI and QMI messages

directly, that are something that causes an
issue in the upper layers. Or you might

have this full exploit chain requirement
that you first need to exploit the chip

over the air, and then from the chip
break the interface into the CommCenter.

Now, QMI, the code has a lot of
assertions. So it's really asserting

everything about a protocol, delaying the
TRV format and so on, and if anything goes

wrong, it really terminates CommCenter.
So if you just send one invalid packet,

CommCenter is terminated. This doesn't
matter a lot because if your protocol is

stable and you usually don't send any
invalid packets, then you know an attack

is ongoing, so it's valid to terminate
the CommCenter. And furthermore, it

doesn't matter a lot to the user. So the
worst thing that happens when CommCenter

crashes, for example, while you have an
active phone call, it's just that the

phone call gets lost or your LTE
connection is re-established. So you don't

really notice it. It just feels like your
Internet connection breaks for a short

moment. In contrast, there is the ARI
protoctol, and this is the part that just

works very, very, very different. So
whatever it's getting, it just parses it,

and it doesn't terminate CommCenter.
So you can send many, many,

many fancy things and it just
continues, continues, continues,

because the developers were probably very,
very happy once they got their special

protocol for Apple working and then they
never touched it again. But what does it

look like? So it has a very basic format,
also with some TLS(?), and the first

thing that I noticed when I fuzzed it is
that in the iDevice syslog, it always

complained about this sequence number
being wrong. So it just said I expected

the follow-up sequence number, so and so.
So I started to fix this. And if you open

it in IDA, you can see that the range,
that is expected it's between zero and

0x7ff hexadecimal. So you know it is
the range and then it gets weird. So the

sequence number is spread over three
different bytes in single bits and

shifted around and so on. And it's not
even continuous. So very weird code.

Probably they just added those
sequence numbers to confirm some race

conditions or something. I really don't
know. Or out-of-order packets? Something

weird going on there. But I wrote the
code, I fixed the sequence number and

then during the replay of packets, I
noticed, well, it doesn't even matter! So

no matter if your sequence number is valid
or invalid, parsing continues and even

worse, even packets with a wrong sequence
number are parsed. Probably because

otherwise there would be too many issues,
because the protocol implementation is too

buggy. And there are also a couple of
other things, so, for example, if you sent

the first four magic bytes wrong or a
wrong length or something, then the

packet is potentially ignored. But parsing
continues and CommCenter is not terminated

like in QMI. Since it's a proprietary
protocol, there is currently no tooling

available. But, Tobias is working on a
Wireshark dissector and once he finishes

his thesis, it will also be publicly
released. So you need to wait a while, but

then you will have a tool for this.
Anyway, let's also talk about fuzzing

this, so I would not recommend to fuzz
this, because you might brick your device

or at least get into weird states. So
just don't do this on your productive

iPhone. I mean, obviously, I know what
I'm doing, so, yeah, just fuzzing packets,

right? But I'm not so sure about what
exactly I'm doing, so the only direction

that I fuzz is from the baseband to the
iPhone here, not the opposite direction.

So I hopefully do prevent anything weird
on the chip, right? But the iPhone might

still answer with something invalid and
this might confuse the baseband or cause

other crashes. And so I actually had to
call for help, like mimimimimi, I broke my

iPhone - I mean, just one of my research
devices - but still so it booted into

pongoOS but no longer into iOS and it
didn't tell me any debug message that was

useful. Well, it turns out, at least
under Qualcomm chips, and that's where

this happens, it just boots after a
couple of hours again. But before it's

just entering a boot loop and on the
Intel iPhones I also almost bricked an

iPhone 8, but luckily it didn't
completely break. So the issue there is if

you enable the baseband debug profile,
then it writes a lot of stuff to the ISTP

files, so that is some debug format of
Intel, and every few minutes it just

creates something like 500MB of data, at
least on the iPhone 8. On the newer

iPhones, this debug format is a bit
shorter, so it doesn't create as much

data, but still a lot. And if you don't
delete this regularly, then of course

your disk will be full and an iPhone
behaves quite strange if it has a full

disk. So you can still interact with the
user interface, but you can no longer

delete photos because deleting a photo, it
seems, it just needs some file

interaction. Also, you can no longer log
in with SSH, which is also an issue

because it somehow seems to create a file
when logging in, so you can no longer

delete any files. And I was just
rebooting the iPhone after trying a couple

of things and luckily it came back and
deleted some files and I was able to log

in and removed the baseband logs. But be
careful when doing this. And of course,

all the iPhones are very confused from
the fuzzing. So they really lose

everything about their identity and
location and they want to be activated

again. So here you can see a smartphone
that lost its location and really wants

to be activated, activated, activated.
During SMS fuzzing, you might even get

Flash messages. And if you click on the
head menu on dark theme, they are

displayed black on gray, so probably
nobody ever tested it. Also great if you

have a locked iPhone, you can still
display SIM menus and SIM messages on top

of the lock. OK, so I guess I have to
revise my first instruction. So fuzz this!

Really, really fuzz this! It's a lot of
fun. Maybe just not on your primary

device, but you will enjoy fuzzing these
interfaces. But first of all, you

obviously need to build a fuzzer, so how
do you build a fuzzer? The first fuzzer

that I used was the one that I also used
for Bluetooth that just uses the

existing bytestream protocol and then
flips single bits and bytes. So it has

this high state-awareness. But it also
means that like some kind of monkey I was

just calling myself, writing SMS to
myself, enabling flight mode, everything

that you could just imagine. And it's a
very boring task. But it also found very

fancy bugs that I couldn't reproduce with
the other fuzzers yet, because it can

reach states that just injection of
packets cannot reach. So at least it was

quite successful. And when I fuzzed with
this for something like three days and

already found a bugs, that's very
different with the Bluetooth fuzzers, so

there seemed to be more bugs in
CommCenter. And so I just wrote to Apple

PR: "Hey there, I wrote this really,
really ugly 10-lines-of-code fuzzer and

see what it found. Awesome, awesome,
awesome! And crash logs are attached. And

obviously this is simple to reproduce
because I only fuzzed for three days. Got

most of these crashes multiple times.
Yeah. So here you go. Enjoy my fuzzer."

And this was probably quite
stupid because it's not that simple. So

it's really not easy to reproduce the
crashes. First of all, well, of course

this script is so generic that it runs on
all iPhones with an Intel chip, so no

matter if I take an iPhone 7 or an iPhone
11, it will just work. But the crash logs

that you get are very different depending
on if you fuzz on a pre-A12, so iPhone 7

and 8, or on later versions like the iPhone 11
and SE2. So you cannot reproduce the same

crash logs that easy. And also it depends
a lot on the SIM. So even on a passive

iPhone, if you don't do any phone calls
and so on, you would get different

results. So I started my fuzzing actually
with a Singaporean SIM card

without any data contract or phone
contract on top of it and already found a

couple of things. But it might just 
behave very different on just a slightly

different configuration. Anyway, let's
listen to a null pointer that it found. And

this null pointer has been fixed in iOS
14.2 and it's in the audio controller, so

you can hear some loop going on there.
What you can see here is me calling the

Deutsche Telekom and so on. So they have
this very important text.

Announcement: Guten Tag, und herzlich
willkommen beim Kundenservice der Telekom.

jiska: And then I call again and have a
crash. And now let's listen to the crash.

<i>Telekom jingle starts playing,
 final part loops ten times</i>

jiska: Just for the sound effect, I also recorded
another one, so this one is with ALDI TALK.

Announcement: Guten Tag, ALDI TALK gibt
die Senkung der Mehrwertsteuer vom ersten...

jiska: And now let's listen to a special
offer by ALDI TALK.

In 3, 2, 1... di-dimm...

Announcement: Guten Tag, ALDI TALK gibt die
Senkung der Mehrwersteuer vom

<i>loops ten times</i>
erst-erst-erst-erst-erst-erst-erst-erst-erst-er

Jiska: Since his first fuzzing results
were very promising, I decided to use

the latest ToothPicker version and extend
it for fuzzing ARI and I called it

ICEPicker because the Intel chips are also
called ICE. So I just cloned Dennis'

latest ToothPicker alpha, which is very,
very unstable, but this one actually

runs on the iPhone locally without any
interaction with Mac OS or Linux. So it

doesn't need to exchange any the payload
via USB and also it's using AFL++, which

is a much faster mutator than Radamsa.
So from a speed consideration, this is a

much better design. However, AFL++ didn't
turn out to be the best fuzzer for

protocol, so most of the time is actually
spent trying to brute force the first

magic bytes, the first four bytes, because
it tries to shorten inputs. It's also not

aware of something like a packet order, so
it was just brute forcing those first four

bytes. And well, the next issue is, that
for some reason, if the first four bytes

are invalid, the ARI parser slows down a
lot. So I was suddenly down to something

like less than 10 fuzz cases per second.
And also there is no awareness of the

ICEPicker in this case, of the ARI host
state. So ARI sometimes shuts down this

interface, if it thinks that something is
very invalid and the fuzzer will just

continue. So I looked into the iDevice
syslog after the fuzzer couldn't find any

new coverage for more than six hours.
And I was wondering: "What is the

issue here? Is the implementation
wrong or is it the fuzzer?" And it really

looks like the fuzzer is producing inputs
that are not good for protocol fuzzing.

Of course, this is stuff that you can
optimize, so AFL++ can do a lot here, so

you can tell it a bit how the protocol
looks like and also get it to not brute

force the first four magic bytes. But for
this I would have to recompile the whole

thing. And it was something that compiled
on Dennis' machine, but it didn't compile

on my machine , because I had my Xcode
beta in a weird state. And well, of

course, some of you now say:
"Just download and install a new Xcode!"

But this takes so long that actually
writing the next fuzzer seemed to be.

easier. Still, this variant of ICEPicker
was interesting to me because it was the

first time when I saw that the fuzzer
initialization works, including

coverage and also my replay works across
multiple iPhone versions. So my call was

collected on an iPhone SE2, was replayable
on an iPhone 7. So it was not useless in

that sense, but I just decided to not
use this configuration. So I just wrote a

very simple fuzzer again and I didn't do
the porting of everything to run locally

on iOS. I just kept the design a bit
simpler or at least easier to code and had

my fuzzer running on Linux and then using
only Frida on iOS. It cannot reproduce all

the states and crashes that I observed
with my very first fuzzer, but most

crashes could be reproduced. I didn't do
any coverage. I didn't do any smart

mutations, just very stupid mutations. And
basically I just did a very blind

injection. But this was super fast, so
instead of the 20 fuzz cases per second, I

already had something like 400 fuzz cases
per second on an iPhone 7, which was about

the same speed or even faster than the
AFL++ variant. And I can at least correct

the length field, sequence number and so
on before injecting the payload. Since it

doesn't do that great mutations, at
least, I need to collect a good corpus

with many SIMSs, many calls. And I'm also
logging the packet order with this. So

it's at least aware of a pocket sequence
in the sense of, I can reproduce the

sequence later on. I had this fuzzer
running on a couple of iPhones in

parallel for multiple weeks, and it found
a lot of interesting crashes. So that's

my go-to fuzzer. I still wanted to
confirm that not collecting coverage

wasn't an issue, so I also cloned the
publicly released of ToothPicker, which

definitely finds new coverage, and it's
using the Radamsa-mutator, which is very,

very slow, but it does a bit smarter
mutations, at least in terms of protocol

fuzzing. It's still only a aware of
single packets and it's only using the

same packets five times in a row to
confirm coverage, etc. And also an issue

is that it cannot catch a lot of the
crashes of CommCenter. So it happens

quite often that CommCenter crashes. And
then if you cannot catch the crash with

Frida and everything crashes, then you
need to start the fuzzer again. But you

also need to delete the files in the
corpus that led to the crash because

otherwise you would just run into the same
crash very fast. So it needs a lot of

babysitting. I also had it running for a
couple of weeks, but sadly, it didn't find

any crashes. So at least I can be sure
that fuzzing, much slower, but with

coverage, is not any improvement. Still,
the mutations it creates are quite useful,

as you can see in the following. So you
can even see this phone numbers scrolling

here and so on. So it generated a very
long phone number correctly into some TLV

structure here. And that's quite
interesting to see. So this is something

that you could not reach by just 
flipping bits and bytes.

There is one big shortcoming that all of
these fuzzers have, including the initial

ToothPicker which is they don't have any kind
of memory sanitization. So the framework

that you would usually use in user space
on iOS is the MallocStackLogging

framework. I even got this running for
CommCenter, so it's a bit of a command

line juggling. But in the end you can
enable MallocStackLogging also for

CommCenter. The issue here is that it
increases the memory usage a lot and even

if you configure CommCenter to have a
higher memory allowance, it is so high

that it's just immediately killed by the
out-of-memory killer. So this doesn't

work. Then there is also libgmalloc. It
doesn't exist for iOS, it's just exists on

Xcode. I got one of the Xcode libraries
running on one of my iPhones. I have no

idea if this is an expected configuration
or not. At least I could execute smaller

programs. And then when you use this on
CommCenter, it just crashes with a

libgmalloc error on parsing some of the
configuration files very, very early when

starting the CommCenter. So all of this
didn't work. And this also means that the

fuzzer cannot find certain bug types or
crashes much later when encountering

bugs. So all of the fuzzers that I created
are not perfect, but at least they found

a lot of different crashes. Let's look
into this. I mean, the first obvious

number that you see here is the 42. So I
stopped fuzzing after 42 crashes - at

least crashes that I think are individual
crashes and that are not caused by Frida -

so I tried to filter out Frida crashes
and this corresponds to the total amount

of crashes, but only some of them are
replayable by either one or multiple

packets. And for the replayable crashes I
can also check if they were fixed in

recent iOS versions or the most recent iOS
14.3 or not. Then I also marked two

colors here because there is the Intel
libraries, but there's also the

Qualcomm libraries. And for the Qualcomm
libraries, I didn't spend as much time

fuzzing, because I have less Qualcomm
phones, but also all the asserts in the

code prevent a lot of issues from being
reached. So the libraries themselves have

less issues and also within CommCenter,
less of the code that has improper state

handling is reached. The location daemon is
marked also with a big grey box here,

because the location daemon is similarly to
the CommCenter using some of the raw

packet inputs and parses them. So it has
special parsers for Qualcomm and Intel.

And it's also an interesting target
because of this. Other than this I got

really a lot, a lot, a lot of different
daemons crashing. Some of them, even with

replayable behaviour. So, for example,
there is the wireless radio manager daemon

that you can just crash via one Intel
packet. But, this has been fixed. And then

there is one interesting crash that I
actually got via Qualcomm and Intel

libraries. So in the mobile Internet
sharing daemon, this also has been fixed

and some of the crashes only happened via
Qualcomm, but I'm not sure if that's like

a Qualcomm-specific thing or it's just
randomness of the fuzzer. So the mobile

Internet sharing demon has an issue where
it accesses memory at configuration

strings, so there's different strings at
this memory address and I found this quite

early, but I was not aware of the fact,
that so many other daemons are actually

crashing when I fuzz CommCenter. So, I
didn't look into this in the very

beginning. And when I reported it to
Apple, they said: "Yeah, yeah, we already

know about this and we fixed it and a
beta prior to your report." So certainly

nothing that I got a CVE for. Another
interesting crash in the CellMonitor, but

only of the Intel library. The CellMonitor
is something that is running passively in

the background all the time and it parses,
for example, GSM and UMTS cell

information. I already found this on the
Singaporean SIM without any active

data plan in my very first round of
fuzzing and reported it back then to

Apple. I don't know, if it's triggerable
over the air or not. So I guess it's

something that you first need to get code
execution for. And it has been fixed in

iOS 14.2. And I wrote a lot of emails with
Apple because I thought, that they didn't

fix it. And the reason for this is that
both the GSM cell info and the UMTS cell

info function, when they parse data, they
have two different bugs. So I still got

crashes in the same functions and I
thought: "OK, same function, still a

crash: The bug is not fixed.". But actually,
it's very high quality code and it's just

multiple bugs per function. And there is
even one more issue in the CellMonitor,

even though I think the remaining bugs are
very simple crashes or nothing that could

be exploitable at all, but still hints to
the great code quality. And the same story

is, that there're even more bugs to be
fixed. So most of them are probably just

stability improvements, but some of them
are still interesting. So, let's see how

this goes. So since I told, that it's a
very simple fuzzer, some of you might have

already started coding those 10 lines of
code for fuzzing, while I continued talking

and grabbed their old iPhones, that they are
willing to lose, if something goes wrong.

So, how can we actually build a fuzzer
that is performant and replicates some of

the bugs that I found just within a day.
Let's take a look. When you look, Frida

fuzzing, a lot of the stuff that you do,
is limited by the processing power of the

iPhone. So your iPhone will get very,
very, very hot and it might even drain

more battery, than it can get via the USB
port. So it might even discharge while

fuzzing. And performance is really key. So
you need to identify bottlenecks.

I said ToothPicker or ICEPicker, the
initial version is just 20 fuzz cases per

second and you can tune this to something
like 20.000 fuzz cases per second. So, I

already told, that I tuned it to something
like 400 or 500 fuzz cases per second,

but, why the 20.000? So, initially, a
student of mine, did some fuzzing in a

very different parser and said: "On my
iPhone 6S, it's running with 20.000 fuzz

cases per second." I was like: "No way, no
way!" But actually, you can do this. So,

this depends a lot on the Frida design.
The first variant, how most Frida scripts

are written is, that you have some Python
script that runs on Linux or macOS, and it

has a couple of functions that you can see
here. So first of all, it has this

on_message callback. So, this on_message
callback is something that we need later.

And we just register it to our Frida
script, the Frida script, that I'm going

to show you in a second. And you load the
script and the script can then even call

functions on your iPhone. For this, you
load a second script on your iPhone. So

this is JavaScript injected into the iOS
target process and it can, for example,

use to send function to send something
back to the on message function. And it

can export functions via RPCs. So, you can
then call them. All this happens via JSON.

And so it needs serialization and
deserialization, which means you cannot

send hex data or binary data directly. So
you have a hex string that you encode into

JSON, which is then parsed as binary data
and also it's all via USB. So you also

have the speed limitation by USB. And, of
course, if you use the Frida C-bindings

locally on the iOS smartphone, it is a bit
faster, but it's still not perfect. So,

the more you can prevent from this JSON
part and the USB part, the better. The

actual fuzzer looks a bit like this. So,
you are in the libARIServer, so that's the

lowest library from the diagram before.
And then you define this inbound message

callback function, which has two
arguments, which are the payload and the

length. So, this looks a bit cryptic, but
that's basically it. And then you can, but

you don't have to, add this interceptor
here because you might want to fix your

sequence number or add basic block 
coverage to your fuzzer, etc. So, this is also

done there. And then you can just call this
inbound message callback of ARI and send

ARI payloads. So, this already can be very
different. So, if you now call this via

RPC export, via a Python script on your
laptop, you can reach something like 500

fuzz cases per second, if you inject SMS,
which are quite processing intensive

payload. Or, if you just do the same
thing and if you just run this inbound

message callback in a loop, locally with
JavaScript, without any external Python

script, then you would get 22.000 fuzz
cases per second on the very same device.

So this is the speed difference that the
JSON serialization, deserialization and

the USB in between make. So, I did a few
more measurements, and certainly on the

iPhone 8, there is a bug that prevents me
from collecting coverage. But, what you

can see is, so, the first part here is if
you have just a bit flipper in a loop that

calls the target function, you can get
17.000 fuzz cases per second on an iPhone 7.

As soon as you start collecting basic
block coverage, not processing it, just

collecting, you drop to 250 fuzz cases per
second. So, you need to ask yourself, if

your fuzzer gets really that much better
from collecting coverage. And another

thing is - that's this line above - so, if you
just print the packet, that you fuzzed or

injected and print this via Python to your
laptop, you also have a huge slow down,

which is not as large as the coverage
slowdown. But still, you can see every

print and every sending off a message in
between the Python script and JavaScript

takes a lot of time. Now, if you have this
remote SMS injection that I had before,

then you drop to 200 fuzz cases per
second. So it is a blind injection without

any coverage. If you collect coverage but
don't process coverage, then you are down

to 100 fuzz cases per second. So, for the
initial ToothPicker design, this would be

the optimum. But, because the Radamsa
mutator is very slow and because you also

need to process the coverage information,
et cetera, that's down to 20 fuzz cases

per second. So, this is the comparison
here. And now you can imagine why

collecting coverage probably isn't always
useful and why also having your laptop

calculating better mutation because it's
easier to write a mutator there, than

directly in JavaScript, is not always the
best idea. So let's watch one last demo

video. What you can see here, is when you
try to delete SMS, after all of the

fuzzing, it really doesn't work neither
via the settings nor via the SMS app. So,

you really need to reset your iPhone after
fuzzing it for too long. No other chance

than this to delete the messages. With
this, we are already at the end of this

talk, but of course, there will be a Q&amp;A
session and if you missed the Q&amp;A session,

you can also ask me on Twitter or write me
an email. Thanks for watching!

<i>rC3 music</i>

Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!