rC3 opening music
Jiska: Hello everyone and welcome to my
talk, Fuzzing the phone in the iPhone. The
phone in the iPhone is the component that
receives SMS, sends SMS, receives phone
calls, makes phone calls and also manages
your Internet connection when you are not
on Wi-Fi. However, you might now wonder,
what is it exactly? So I'm talking about
CommC enter and fuzzing it via the QMI and
ARI interfaces. But this is a bit too
technical for most of you. So I will first
introduce you to the concept of fuzzing in
general and protocol fuzzing before I dive
into further details. For those of you
have not yet heard about the concept of
fuzzing - you can send a lot of random
messages and then try to test the security
of an interface with this. And in this
video, you can see how I send SMS over a
Frida-based fuzzer with something like 400
fuzzcases per second. And then the IMH
receives them, catches them and sends a
couple of them also to the smartphone.
Let's start with a motivation and an
explanation to the attacker model. So, if
you look into a modern smartphone, you
have two components if you want to show it
in a simple way. So first of all, there's
the hardware part with a lot of chips. And
then on top of this, there is an operating
system and applications. However, it's not
as simple as this because even those chips
are so complex that they run their own
little real-time operating systems to
preprocess data. So this means that you
can even get code execution on such a
chip. And this is usually much easier than
in the operating system itself, because
those chips cannot have that many
mitigations. However, so what do you even
do if you have code execution in such a
chip, so if you are in a baseband chip,
then one escalation strategy from the chip
towards the operating system might be to
manipulate traffic in the browser.
However, I don't think that this is the
case, because if you look at the Zerodium
price list, then actually the browser
exploits are much more expensive. So it's
probably not done like this. And there
must be other ways to escalate from this
chip into the operating system. In
general, the traffic manipulation is
something that you can always do in
wireless transmission or also on the
Internet. So if you look how those systems
work these days, so you have something
like the Internet in general that serve
websites and so on, and also the core
network of your mobile provider. And there
are many, many ways to manipulate traffic,
either if you are a state level actor who
is able to have something in the core
network or just by sending around websites
or modifying websites. And then there is
the base station subsystem, there might
also be dragons. We don't know exactly.
And of course, there are over-the-air
transmissions and wireless transmissions
are very special because, if there is
something just slightly broken in the
encryption, for example, then it's also
possible to manipulate traffic there, if
you have a software defined radio, for
example. So all of this could be attacked
to manipulate traffic. And I don't think
that for this, one would craft a baseband
exploit. Already in 2014 at the CCC, there
have been two talks about a SS7 protocol
which is run in the core network and is
actually meant to connect different
mobile carriers to each other. And this
can also be used to intercept phone calls,
for example. And this also has been
exploited recently. So even though, there
have been some mitigations, etc. since
then, it's still exploited for the same
purpose to spy on people. So really,
really, really, basement exploits only
exists to escalate from the chip into the
operating system. But now the question is,
what are the strategies? So if it's not
via the browser, what else could it be? So
the browser really I'm sure it is not,
because also you need to have some
traffic and so on, it doesn't really work
instantly, you need to visit the website
to replace traffic on a website and so on.
There must be something else. So
if you are on the chip with remote code
execution and want to go into the
operating system, there is some interface.
And this means that something in those
interfaces needs to be exploitable, so
that you can escalate the privileges from
the chip into the system. And also, those
interfaces are very interesting from a
reverse engineer's perspective. So even if
you don't want to attack anything, just
understanding how they work, is also a
goal of this work. So, for example, if you
have a baseband debug profile, you can
just download it onto your iPhone and then
you open your iDevice syslog, you can
already see a lot of management messages
that are exchanged between the chip and
the iPhone. And if you have a jailbreak
and Frida, you can even inject packets or
modify packets to change the behaviour of
your modem.But if you want to start to
work on such a thing, the question is
like, how do you even start? Where do you
start? And fuzzing is actually a method
that can be used to understand such an
interface. So initially, if you identified
an interface, just to check if it is the
correct interface, so, if it really
changes behaviour, if you flip some bytes,
but also how powerful this interface is.
So what are the features? What breaks
instantly? And if things break, also you
can check if the whole interface has been
designed with security in mind. Now, let's
start with an introduction to wireless
protocol fuzzing, this will also be a
short rant because the current tooling for
fuzzing is usually not made to fuzz a
protocol. So let's start with a very
simple fuzzer, a fuzzer that is just an
image parser. So, you browse your
smartphone for unicorn pictures or PNGs or
JPEGs, and then you send them to the image
parser and in the image parser you might
be able to observe which functions are
executed in the form of basic blocks. And
then, during this initialization, the
image parser can even report which parts
were executed and you can just go to image
again and again with different images and
get this basic block coverage back. In a
next step, you can then combine existing
images or flip bits in these images and
send them to the image parser and again
observe the coverage, most of the time, it
won't generate any new coverage. So you
just say you are not looking into this
image in particular, but sometimes you
might get new coverage, like here, and
then you add this image to your corpus. So
over time, you can increase your corpus
and increase your coverage. Another method
can be, if you know how exactly an image
format looks like, so you might know the
JPEG specification and because of this,
you could just generate images that are
more or less specification compliant and
they look more artificial like this. So
you just generate images and send them to
the image parser and at some point you
might observe a crash. So that also
depends, again, on your harnessing. Maybe
you can observe basic blocks, maybe you
can just observe crashes and then you know
at which image you had a crash. You might
even be able to combine these two
approaches just depending on what you know
about your input and how you can harness
your target. Now it looks a bit different
for a protocol. So, in a protocol, you can
have a very complex state. Let's say you
are in an active phone call or just
something like, you receive an SMS. You
can actually force the iPhone to receive
SMS, if you have a second iPhone and send
SMS. And then during the fuzzing, you can
replace some bits and bytes, like this and
then you would have a modification. So
this is a very simple approach and it
preserves the state. So no matter how
complex the thing is, that you're
currently doing, it's very simple to flip
a bit here and there in an active
interaction. But it's also a bit annoying,
because you need to have these active
phone calls, etc. So something that's more
efficient is injection. So you would
observe certain messages and then just send
them again - and then you don't even need
the second phone to make calls, etc., -
you can just send a lot, a lot, a lot of
data. And this is the effect, when your
iPhone goes di-di-di-di-dimm or something
because of all the notifications and all
the data that is sent. But issue here is,
that this does not preserve state. So
there might be actions where the iPhone
requests something that is then answered.
So, the iPhone might request, for example,
a date and only then the chip would reply
with a date and only then the iPhone would
accept a date. But it's still very
interesting to do this. So even though you
cannot reach certain states because you
can do this without a SIM card and you can
do this very, very fast. So, just to
summarize the issues here: if you fuzz the
wireless protocol, you can have very
significant state differences and just
injecting packets cannot reach all
states. The fact, that you cannot reach
all states also shows in very simple stuff
like a trace replay. So a trace of
something that you record. So let's say I
have an active phone call, I record all
the packets, and I can also observe the
coverage. So , with Frida, you can observe
coverage on an iPhone while the phone call
is active. And then, in a second step, you
would do some injection. But the only
thing that you can inject are the packets
sent from the basement to the smartphone,
not the opposite direction. And this
results usually in much less coverage. So
you are missing a lot of things due to a
missing state. And even worse, if you do
the same thing again, you might be in a
different state, and you might observe a
different coverage. So you do the exact
same thing, but you get different
coverage.So, even replaying recorded
messages results in less or inconsistent
coverage. Anyway, let's take a look into
an injection example. So, in this video,
you can see how I'm in the Unicorn Network
on an iPhone 8, which has obviously 5G,
but also does a lot of fuzzing and in the
fuzzing, what is interesting is, that you
might do a lot of states in a combination
that are not usually possible, like you
have a lost network connection while you
have to confirm a pin or you have a
network connection during this, etc. To
summarize my rant, some states cannot be
reached solely by injecting packets. So,
even if we have a very good corpus and do
very good mutations, we might miss
80% of the code, but we can just fuzz
anyway. But we need to keep in mind, that
some stuff is just not fuzzable. We looked
into a lot of wireless protocols and have
seen more in the past, so, it's worth to
also consider, which tooling we already
had available for fuzzing protocols. The
most advanced tooling, that we have, is
Frankenstein and it's built by Jan. So,
what Jan did is, he emulated the firmware
and attached it to a virtual modem and
also a Linux host. For this, he first
looked into the firmware, that's here, and
we had some partial symbols for this and
also some information about registers.
Then, Frankenstein is actually taking a
snapshot, that you can see here, including
some of those registers of the modem. And
with this, you can build a virtual modem
and fuzz input as if it would come over
the air. Then Frankenstein also emulates
the whole firmware, including thread
switches. So it gets into very complex
states and it's even attached to a Linux
host. So, it also fuzzes a bit of Linux
while actually fuzzing the firmware
itself. Now, the issue with this is that
basement firmware is usually 10 times the
size of bluetooth firmware or even more,
and we don't have any symbols for this, so
it's a lot of work to customize this. And
even if, one would do all those steps and
put all the work into this, it's only, so
to say, code execution in the baseband.
It's not yet a privilege escalation into
the operating system. The next interesting
tooling was built by Steffen and what
Steffen did, he built a fuzzer based on
DTrace and AFL. DTrace is a tool that can
provide functional level coverage in the
macOS kernel and user space. With some
modifications you can even get basic
block coverage in the user space, which is
required for AFL to work. So, in the end,
you have AFL or AFL++ as a fuzzer on any
program on macOS. It's even slightly
faster than Frida, at least the version
that he used. And he gets a couple of
thousand fuzz cases per second, even on a
very old iMac. So, in our lab, we just had
an old iMac 2012 for this and it works on
this. But the issue is, that Wi-Fi and
Bluetooth, which he fuzzed, are very complex
protocols, so he couldn't find any new
bugs with AFL. And also, in the kernel
space, you only get this function level
coverage. He still, despite not finding
any bugs in Wi-Fi or Bluetooth, got a CVE,
because DTrace also has bugs. So, at least
some funding, but on iOS, this is not
supported out of the box. So it might be
possible to get DTrace working with some
tweaks, but it's a lot of work. So
probably it's easier to just use Frida in
the iOS user space. Also during this, so
while Steffen was building all this very
advanced tooling, Wang Yu found issues in
the macOS Bluetooth and Wi-Fi drivers, and
so he was very, very successful in
comparison to us. That's really a pity.
And I think, what he did, is much better
state modelling, so, of how the messages
interact and what is important to reach
certain functions. So what is still left?
So, usually fuzzing the baseband means
that you need to modify firmware or also
emulate firmware, you need to implement
very complex specifications on a software
defined radio if you want to fuzz over the
air or build proof of concepts. And for
everything that's somewhat proprietary,
you need to do protocol reverse
engineering, so you can spend a lot of
time and money just to do very, very basic
research. Or, you can also use Frida, so
you can fuzz with Frida and all you need
to do for this is, write a few lines of
code in JavaScript. So I kid you not. The
option is Frida. Dennis was the first in
our team who was advised as a thesis
student who built a Frida-based fuzzer,
and it's called ToothPicker. It's based on
Frizzer and Radamsa. So what it does is,
well, it hooks into these connections or
into the protocols of the bluetooth
daemon, you could also think of this upper
part here, as one block. So the protocols
are implemented in the Bluetooth daemon,
but we want to fuzz certain protocol
handlers. And to increase the coverage, he
creates a virtual connection. So a virtual
connection holds a connection and pretends
to the Bluetooth daemon that there would
be an active connection to a device. And
of course, the chip would then say, I
don't know anything about this connection.
So, there are also some abstractions in
here, so that the connection is not
terminated. So, that's a very simple tool,
but it really found a lot of bugs and
issues and even there were some issues in
the protocols themselves that also apply to
macOS. So it's not just iOS bugs, but also
protocol bugs in macOS that Dennis found.
And this really got me thinking,
because ToothPicker with only 20
fuzz cases per second, so it's really,
really slow and we were still able to find
Bluetooth vulnerabilities at this speed.
So, why is this? So first of all, if you
try to fuzz Bluetooth over the air, then
the over-the-air connections are
terminated after something like five
invalid packets. So, over-the-air fuzzing
is really, really inefficient. And with
Frida you can actually patch these
functions, so it's gone. Then the
virtual connections are a very important
factor. So they are really, really
important for having coverage. It's still
a lot of coverage that we missed during
replay and fuzzing. But it's
really an advantage compared to the
other fuzzing approaches where you just
inject packets. And in addition, there is
an issue here, because if you have a
virtual connection, it might be that this
virtual connection triggers behaviour,
that you cannot reproduce over the air.
So, that means that everything that you
find, you need also to confirm that it
works over the air. At least the
inconsistent coverage is also fixed in
ToothPicker, because ToothPicker
replays all packets five times in a row.
But the issue here is that it also means
that if you have a sequence of packets,
that is like generating a certain bug -
so you need multiple packets - this is
nothing that the mutator is aware of and
also nothing that's logged properly in
ToothPicker. And because of this, I got a
bit anxious. Maybe we missed a
lot of things? So once I got the
intuition that we are actually missing
certain state information, I had the idea
to replace bytes in active connections.
And this is one part of that you can see
on a keyboard, so I'm just replacing bytes
on keyboard input and see what happens.
And I let this run for a couple of weeks,
also for different protocols and so on to
see, if there are further bugs or not that
we didn't find previously. So here you
can see the same for AirPods with SCO and
then they produce crack-sounds for the
replace bytes, it's even worse for ACL, so
actual music, because then you can hear
very noisy chirps. I let this fuzzer run
for multiple weeks and it didn't find
any bugs that ToothPicker hadn't
discovered before. So, I think the reason
for this is that I mainly passed in active
connections like the one with the audio
or the keyboard, but I only passed a few
active pairings because this requires me
to actually perform those pairings by
hand, so, nothing really interesting. The
only bad thing that I could produce with
this, but not worth a CVE, is that the
sound quality of my AirPods is now a
really, really bad. Well, OK. And also the
Broadcom chips on iOS don't check the UART
lengths, but that's not that bad. So, I
mean, if you consider that they removed
the write-RAM recently, then you might now
still be able to write into the RAM via UART
buffer overflows. But yeah, nothing too
interesting. So after all of this, I asked
myself: "What is still left for fuzzing if
we cannot find a new Bluetooth or Wi-Fi
bugs?" Well, the iPhone baseband - or
actually the iPhone basebands, because
there are two. The first variant of iPhone
baseband, that you can get, are Qualcomm
chips and they are in the US devices they
use the Qualcomm MSM interface. And this
interface comes with some documentation
and there are even open source
implementations for it. So it's something
that's probably easy to understand and
easy to fuzz. On the other hand in almost
all devices that I had on my table, were
Intel chips. Intel has been recently
bought by Apple, at least the part that
does the baseband chips and these are the
chips in the European devices, that's
the reason why almost all my devices had
Intel chips. And they use a special
protocol. It's called Apple Remote
Invocation. And if you search for this on
the Internet, I even checked it like
just today, there are no Google hits at
all. So it really hasn't been researched
before, at least not publicly. It's
completely undocumented and it's a very
custom interface. So it's not even used
for Android. It's really an interface
just for Apple. The component that we are
going to fuzz in the following is CommCenter.
So CommCenter is the equivalent of, for
example, the Bluetooth or Wi-FI daemon,
but for telephony. It's sandboxed as the
user "wireless", but it comes with a lot of
XPC interfaces. And this is something
that we will also see later in the
fuzzing results. The next part is that
there are two flavors of libraries, so
depending on if you have a Qualcomm or an
Intel chip, different libraries will be
used before certain actions or data
actually is then processed by the
CommCenter itself. So we have a different
code paths here. But all of this runs in
user space, and this means that both
libraries can be hooked with Frida and can
be fuzzed with Frida. So that's very
interesting. There is still a lot of stuff
that goes on in the kernel. So what you
can see here is that QMI and ARI have some
management information that is sent to
CommCenter, but they don't contain the
raw network or audio data. So they don't
contain your phone call, they don't
contain your website that you are opening.
And the next issue is that QMI and ARI
are not directly sent over the air, but
what is sent over the air are normal
baseband interactions and these generate
QMI and ARI messages. So there's still
some section in between, but of course,
there are now two ways: either you have
interaction that you can do over the air,
that is causing ARI and QMI messages
directly, that are something that causes an
issue in the upper layers. Or you might
have this full exploit chain requirement
that you first need to exploit the chip
over the air, and then from the chip
break the interface into the CommCenter.
Now, QMI, the code has a lot of
assertions. So it's really asserting
everything about a protocol, delaying the
TRV format and so on, and if anything goes
wrong, it really terminates CommCenter.
So if you just send one invalid packet,
CommCenter is terminated. This doesn't
matter a lot because if your protocol is
stable and you usually don't send any
invalid packets, then you know an attack
is ongoing, so it's valid to terminate
the CommCenter. And furthermore, it
doesn't matter a lot to the user. So the
worst thing that happens when CommCenter
crashes, for example, while you have an
active phone call, it's just that the
phone call gets lost or your LTE
connection is re-established. So you don't
really notice it. It just feels like your
Internet connection breaks for a short
moment. In contrast, there is the ARI
protoctol, and this is the part that just
works very, very, very different. So
whatever it's getting, it just parses it,
and it doesn't terminate CommCenter.
So you can send many, many,
many fancy things and it just
continues, continues, continues,
because the developers were probably very,
very happy once they got their special
protocol for Apple working and then they
never touched it again. But what does it
look like? So it has a very basic format,
also with some TLS(?), and the first
thing that I noticed when I fuzzed it is
that in the iDevice syslog, it always
complained about this sequence number
being wrong. So it just said I expected
the follow-up sequence number, so and so.
So I started to fix this. And if you open
it in IDA, you can see that the range,
that is expected it's between zero and
0x7ff hexadecimal. So you know it is
the range and then it gets weird. So the
sequence number is spread over three
different bytes in single bits and
shifted around and so on. And it's not
even continuous. So very weird code.
Probably they just added those
sequence numbers to confirm some race
conditions or something. I really don't
know. Or out-of-order packets? Something
weird going on there. But I wrote the
code, I fixed the sequence number and
then during the replay of packets, I
noticed, well, it doesn't even matter! So
no matter if your sequence number is valid
or invalid, parsing continues and even
worse, even packets with a wrong sequence
number are parsed. Probably because
otherwise there would be too many issues,
because the protocol implementation is too
buggy. And there are also a couple of
other things, so, for example, if you sent
the first four magic bytes wrong or a
wrong length or something, then the
packet is potentially ignored. But parsing
continues and CommCenter is not terminated
like in QMI. Since it's a proprietary
protocol, there is currently no tooling
available. But, Tobias is working on a
Wireshark dissector and once he finishes
his thesis, it will also be publicly
released. So you need to wait a while, but
then you will have a tool for this.
Anyway, let's also talk about fuzzing
this, so I would not recommend to fuzz
this, because you might brick your device
or at least get into weird states. So
just don't do this on your productive
iPhone. I mean, obviously, I know what
I'm doing, so, yeah, just fuzzing packets,
right? But I'm not so sure about what
exactly I'm doing, so the only direction
that I fuzz is from the baseband to the
iPhone here, not the opposite direction.
So I hopefully do prevent anything weird
on the chip, right? But the iPhone might
still answer with something invalid and
this might confuse the baseband or cause
other crashes. And so I actually had to
call for help, like mimimimimi, I broke my
iPhone - I mean, just one of my research
devices - but still so it booted into
pongoOS but no longer into iOS and it
didn't tell me any debug message that was
useful. Well, it turns out, at least
under Qualcomm chips, and that's where
this happens, it just boots after a
couple of hours again. But before it's
just entering a boot loop and on the
Intel iPhones I also almost bricked an
iPhone 8, but luckily it didn't
completely break. So the issue there is if
you enable the baseband debug profile,
then it writes a lot of stuff to the ISTP
files, so that is some debug format of
Intel, and every few minutes it just
creates something like 500MB of data, at
least on the iPhone 8. On the newer
iPhones, this debug format is a bit
shorter, so it doesn't create as much
data, but still a lot. And if you don't
delete this regularly, then of course
your disk will be full and an iPhone
behaves quite strange if it has a full
disk. So you can still interact with the
user interface, but you can no longer
delete photos because deleting a photo, it
seems, it just needs some file
interaction. Also, you can no longer log
in with SSH, which is also an issue
because it somehow seems to create a file
when logging in, so you can no longer
delete any files. And I was just
rebooting the iPhone after trying a couple
of things and luckily it came back and
deleted some files and I was able to log
in and removed the baseband logs. But be
careful when doing this. And of course,
all the iPhones are very confused from
the fuzzing. So they really lose
everything about their identity and
location and they want to be activated
again. So here you can see a smartphone
that lost its location and really wants
to be activated, activated, activated.
During SMS fuzzing, you might even get
Flash messages. And if you click on the
head menu on dark theme, they are
displayed black on gray, so probably
nobody ever tested it. Also great if you
have a locked iPhone, you can still
display SIM menus and SIM messages on top
of the lock. OK, so I guess I have to
revise my first instruction. So fuzz this!
Really, really fuzz this! It's a lot of
fun. Maybe just not on your primary
device, but you will enjoy fuzzing these
interfaces. But first of all, you
obviously need to build a fuzzer, so how
do you build a fuzzer? The first fuzzer
that I used was the one that I also used
for Bluetooth that just uses the
existing bytestream protocol and then
flips single bits and bytes. So it has
this high state-awareness. But it also
means that like some kind of monkey I was
just calling myself, writing SMS to
myself, enabling flight mode, everything
that you could just imagine. And it's a
very boring task. But it also found very
fancy bugs that I couldn't reproduce with
the other fuzzers yet, because it can
reach states that just injection of
packets cannot reach. So at least it was
quite successful. And when I fuzzed with
this for something like three days and
already found a bugs, that's very
different with the Bluetooth fuzzers, so
there seemed to be more bugs in
CommCenter. And so I just wrote to Apple
PR: "Hey there, I wrote this really,
really ugly 10-lines-of-code fuzzer and
see what it found. Awesome, awesome,
awesome! And crash logs are attached. And
obviously this is simple to reproduce
because I only fuzzed for three days. Got
most of these crashes multiple times.
Yeah. So here you go. Enjoy my fuzzer."
And this was probably quite
stupid because it's not that simple. So
it's really not easy to reproduce the
crashes. First of all, well, of course
this script is so generic that it runs on
all iPhones with an Intel chip, so no
matter if I take an iPhone 7 or an iPhone
11, it will just work. But the crash logs
that you get are very different depending
on if you fuzz on a pre-A12, so iPhone 7
and 8, or on later versions like the iPhone 11
and SE2. So you cannot reproduce the same
crash logs that easy. And also it depends
a lot on the SIM. So even on a passive
iPhone, if you don't do any phone calls
and so on, you would get different
results. So I started my fuzzing actually
with a Singaporean SIM card
without any data contract or phone
contract on top of it and already found a
couple of things. But it might just
behave very different on just a slightly
different configuration. Anyway, let's
listen to a null pointer that it found. And
this null pointer has been fixed in iOS
14.2 and it's in the audio controller, so
you can hear some loop going on there.
What you can see here is me calling the
Deutsche Telekom and so on. So they have
this very important text.
Announcement: Guten Tag, und herzlich
willkommen beim Kundenservice der Telekom.
jiska: And then I call again and have a
crash. And now let's listen to the crash.
Telekom jingle starts playing,
final part loops ten times
jiska: Just for the sound effect, I also recorded
another one, so this one is with ALDI TALK.
Announcement: Guten Tag, ALDI TALK gibt
die Senkung der Mehrwertsteuer vom ersten...
jiska: And now let's listen to a special
offer by ALDI TALK.
In 3, 2, 1... di-dimm...
Announcement: Guten Tag, ALDI TALK gibt die
Senkung der Mehrwersteuer vom
loops ten times
erst-erst-erst-erst-erst-erst-erst-erst-erst-er
Jiska: Since his first fuzzing results
were very promising, I decided to use
the latest ToothPicker version and extend
it for fuzzing ARI and I called it
ICEPicker because the Intel chips are also
called ICE. So I just cloned Dennis'
latest ToothPicker alpha, which is very,
very unstable, but this one actually
runs on the iPhone locally without any
interaction with Mac OS or Linux. So it
doesn't need to exchange any the payload
via USB and also it's using AFL++, which
is a much faster mutator than Radamsa.
So from a speed consideration, this is a
much better design. However, AFL++ didn't
turn out to be the best fuzzer for
protocol, so most of the time is actually
spent trying to brute force the first
magic bytes, the first four bytes, because
it tries to shorten inputs. It's also not
aware of something like a packet order, so
it was just brute forcing those first four
bytes. And well, the next issue is, that
for some reason, if the first four bytes
are invalid, the ARI parser slows down a
lot. So I was suddenly down to something
like less than 10 fuzz cases per second.
And also there is no awareness of the
ICEPicker in this case, of the ARI host
state. So ARI sometimes shuts down this
interface, if it thinks that something is
very invalid and the fuzzer will just
continue. So I looked into the iDevice
syslog after the fuzzer couldn't find any
new coverage for more than six hours.
And I was wondering: "What is the
issue here? Is the implementation
wrong or is it the fuzzer?" And it really
looks like the fuzzer is producing inputs
that are not good for protocol fuzzing.
Of course, this is stuff that you can
optimize, so AFL++ can do a lot here, so
you can tell it a bit how the protocol
looks like and also get it to not brute
force the first four magic bytes. But for
this I would have to recompile the whole
thing. And it was something that compiled
on Dennis' machine, but it didn't compile
on my machine , because I had my Xcode
beta in a weird state. And well, of
course, some of you now say:
"Just download and install a new Xcode!"
But this takes so long that actually
writing the next fuzzer seemed to be.
easier. Still, this variant of ICEPicker
was interesting to me because it was the
first time when I saw that the fuzzer
initialization works, including
coverage and also my replay works across
multiple iPhone versions. So my call was
collected on an iPhone SE2, was replayable
on an iPhone 7. So it was not useless in
that sense, but I just decided to not
use this configuration. So I just wrote a
very simple fuzzer again and I didn't do
the porting of everything to run locally
on iOS. I just kept the design a bit
simpler or at least easier to code and had
my fuzzer running on Linux and then using
only Frida on iOS. It cannot reproduce all
the states and crashes that I observed
with my very first fuzzer, but most
crashes could be reproduced. I didn't do
any coverage. I didn't do any smart
mutations, just very stupid mutations. And
basically I just did a very blind
injection. But this was super fast, so
instead of the 20 fuzz cases per second, I
already had something like 400 fuzz cases
per second on an iPhone 7, which was about
the same speed or even faster than the
AFL++ variant. And I can at least correct
the length field, sequence number and so
on before injecting the payload. Since it
doesn't do that great mutations, at
least, I need to collect a good corpus
with many SIMSs, many calls. And I'm also
logging the packet order with this. So
it's at least aware of a pocket sequence
in the sense of, I can reproduce the
sequence later on. I had this fuzzer
running on a couple of iPhones in
parallel for multiple weeks, and it found
a lot of interesting crashes. So that's
my go-to fuzzer. I still wanted to
confirm that not collecting coverage
wasn't an issue, so I also cloned the
publicly released of ToothPicker, which
definitely finds new coverage, and it's
using the Radamsa-mutator, which is very,
very slow, but it does a bit smarter
mutations, at least in terms of protocol
fuzzing. It's still only a aware of
single packets and it's only using the
same packets five times in a row to
confirm coverage, etc. And also an issue
is that it cannot catch a lot of the
crashes of CommCenter. So it happens
quite often that CommCenter crashes. And
then if you cannot catch the crash with
Frida and everything crashes, then you
need to start the fuzzer again. But you
also need to delete the files in the
corpus that led to the crash because
otherwise you would just run into the same
crash very fast. So it needs a lot of
babysitting. I also had it running for a
couple of weeks, but sadly, it didn't find
any crashes. So at least I can be sure
that fuzzing, much slower, but with
coverage, is not any improvement. Still,
the mutations it creates are quite useful,
as you can see in the following. So you
can even see this phone numbers scrolling
here and so on. So it generated a very
long phone number correctly into some TLV
structure here. And that's quite
interesting to see. So this is something
that you could not reach by just
flipping bits and bytes.
There is one big shortcoming that all of
these fuzzers have, including the initial
ToothPicker which is they don't have any kind
of memory sanitization. So the framework
that you would usually use in user space
on iOS is the MallocStackLogging
framework. I even got this running for
CommCenter, so it's a bit of a command
line juggling. But in the end you can
enable MallocStackLogging also for
CommCenter. The issue here is that it
increases the memory usage a lot and even
if you configure CommCenter to have a
higher memory allowance, it is so high
that it's just immediately killed by the
out-of-memory killer. So this doesn't
work. Then there is also libgmalloc. It
doesn't exist for iOS, it's just exists on
Xcode. I got one of the Xcode libraries
running on one of my iPhones. I have no
idea if this is an expected configuration
or not. At least I could execute smaller
programs. And then when you use this on
CommCenter, it just crashes with a
libgmalloc error on parsing some of the
configuration files very, very early when
starting the CommCenter. So all of this
didn't work. And this also means that the
fuzzer cannot find certain bug types or
crashes much later when encountering
bugs. So all of the fuzzers that I created
are not perfect, but at least they found
a lot of different crashes. Let's look
into this. I mean, the first obvious
number that you see here is the 42. So I
stopped fuzzing after 42 crashes - at
least crashes that I think are individual
crashes and that are not caused by Frida -
so I tried to filter out Frida crashes
and this corresponds to the total amount
of crashes, but only some of them are
replayable by either one or multiple
packets. And for the replayable crashes I
can also check if they were fixed in
recent iOS versions or the most recent iOS
14.3 or not. Then I also marked two
colors here because there is the Intel
libraries, but there's also the
Qualcomm libraries. And for the Qualcomm
libraries, I didn't spend as much time
fuzzing, because I have less Qualcomm
phones, but also all the asserts in the
code prevent a lot of issues from being
reached. So the libraries themselves have
less issues and also within CommCenter,
less of the code that has improper state
handling is reached. The location daemon is
marked also with a big grey box here,
because the location daemon is similarly to
the CommCenter using some of the raw
packet inputs and parses them. So it has
special parsers for Qualcomm and Intel.
And it's also an interesting target
because of this. Other than this I got
really a lot, a lot, a lot of different
daemons crashing. Some of them, even with
replayable behaviour. So, for example,
there is the wireless radio manager daemon
that you can just crash via one Intel
packet. But, this has been fixed. And then
there is one interesting crash that I
actually got via Qualcomm and Intel
libraries. So in the mobile Internet
sharing daemon, this also has been fixed
and some of the crashes only happened via
Qualcomm, but I'm not sure if that's like
a Qualcomm-specific thing or it's just
randomness of the fuzzer. So the mobile
Internet sharing demon has an issue where
it accesses memory at configuration
strings, so there's different strings at
this memory address and I found this quite
early, but I was not aware of the fact,
that so many other daemons are actually
crashing when I fuzz CommCenter. So, I
didn't look into this in the very
beginning. And when I reported it to
Apple, they said: "Yeah, yeah, we already
know about this and we fixed it and a
beta prior to your report." So certainly
nothing that I got a CVE for. Another
interesting crash in the CellMonitor, but
only of the Intel library. The CellMonitor
is something that is running passively in
the background all the time and it parses,
for example, GSM and UMTS cell
information. I already found this on the
Singaporean SIM without any active
data plan in my very first round of
fuzzing and reported it back then to
Apple. I don't know, if it's triggerable
over the air or not. So I guess it's
something that you first need to get code
execution for. And it has been fixed in
iOS 14.2. And I wrote a lot of emails with
Apple because I thought, that they didn't
fix it. And the reason for this is that
both the GSM cell info and the UMTS cell
info function, when they parse data, they
have two different bugs. So I still got
crashes in the same functions and I
thought: "OK, same function, still a
crash: The bug is not fixed.". But actually,
it's very high quality code and it's just
multiple bugs per function. And there is
even one more issue in the CellMonitor,
even though I think the remaining bugs are
very simple crashes or nothing that could
be exploitable at all, but still hints to
the great code quality. And the same story
is, that there're even more bugs to be
fixed. So most of them are probably just
stability improvements, but some of them
are still interesting. So, let's see how
this goes. So since I told, that it's a
very simple fuzzer, some of you might have
already started coding those 10 lines of
code for fuzzing, while I continued talking
and grabbed their old iPhones, that they are
willing to lose, if something goes wrong.
So, how can we actually build a fuzzer
that is performant and replicates some of
the bugs that I found just within a day.
Let's take a look. When you look, Frida
fuzzing, a lot of the stuff that you do,
is limited by the processing power of the
iPhone. So your iPhone will get very,
very, very hot and it might even drain
more battery, than it can get via the USB
port. So it might even discharge while
fuzzing. And performance is really key. So
you need to identify bottlenecks.
I said ToothPicker or ICEPicker, the
initial version is just 20 fuzz cases per
second and you can tune this to something
like 20.000 fuzz cases per second. So, I
already told, that I tuned it to something
like 400 or 500 fuzz cases per second,
but, why the 20.000? So, initially, a
student of mine, did some fuzzing in a
very different parser and said: "On my
iPhone 6S, it's running with 20.000 fuzz
cases per second." I was like: "No way, no
way!" But actually, you can do this. So,
this depends a lot on the Frida design.
The first variant, how most Frida scripts
are written is, that you have some Python
script that runs on Linux or macOS, and it
has a couple of functions that you can see
here. So first of all, it has this
on_message callback. So, this on_message
callback is something that we need later.
And we just register it to our Frida
script, the Frida script, that I'm going
to show you in a second. And you load the
script and the script can then even call
functions on your iPhone. For this, you
load a second script on your iPhone. So
this is JavaScript injected into the iOS
target process and it can, for example,
use to send function to send something
back to the on message function. And it
can export functions via RPCs. So, you can
then call them. All this happens via JSON.
And so it needs serialization and
deserialization, which means you cannot
send hex data or binary data directly. So
you have a hex string that you encode into
JSON, which is then parsed as binary data
and also it's all via USB. So you also
have the speed limitation by USB. And, of
course, if you use the Frida C-bindings
locally on the iOS smartphone, it is a bit
faster, but it's still not perfect. So,
the more you can prevent from this JSON
part and the USB part, the better. The
actual fuzzer looks a bit like this. So,
you are in the libARIServer, so that's the
lowest library from the diagram before.
And then you define this inbound message
callback function, which has two
arguments, which are the payload and the
length. So, this looks a bit cryptic, but
that's basically it. And then you can, but
you don't have to, add this interceptor
here because you might want to fix your
sequence number or add basic block
coverage to your fuzzer, etc. So, this is also
done there. And then you can just call this
inbound message callback of ARI and send
ARI payloads. So, this already can be very
different. So, if you now call this via
RPC export, via a Python script on your
laptop, you can reach something like 500
fuzz cases per second, if you inject SMS,
which are quite processing intensive
payload. Or, if you just do the same
thing and if you just run this inbound
message callback in a loop, locally with
JavaScript, without any external Python
script, then you would get 22.000 fuzz
cases per second on the very same device.
So this is the speed difference that the
JSON serialization, deserialization and
the USB in between make. So, I did a few
more measurements, and certainly on the
iPhone 8, there is a bug that prevents me
from collecting coverage. But, what you
can see is, so, the first part here is if
you have just a bit flipper in a loop that
calls the target function, you can get
17.000 fuzz cases per second on an iPhone 7.
As soon as you start collecting basic
block coverage, not processing it, just
collecting, you drop to 250 fuzz cases per
second. So, you need to ask yourself, if
your fuzzer gets really that much better
from collecting coverage. And another
thing is - that's this line above - so, if you
just print the packet, that you fuzzed or
injected and print this via Python to your
laptop, you also have a huge slow down,
which is not as large as the coverage
slowdown. But still, you can see every
print and every sending off a message in
between the Python script and JavaScript
takes a lot of time. Now, if you have this
remote SMS injection that I had before,
then you drop to 200 fuzz cases per
second. So it is a blind injection without
any coverage. If you collect coverage but
don't process coverage, then you are down
to 100 fuzz cases per second. So, for the
initial ToothPicker design, this would be
the optimum. But, because the Radamsa
mutator is very slow and because you also
need to process the coverage information,
et cetera, that's down to 20 fuzz cases
per second. So, this is the comparison
here. And now you can imagine why
collecting coverage probably isn't always
useful and why also having your laptop
calculating better mutation because it's
easier to write a mutator there, than
directly in JavaScript, is not always the
best idea. So let's watch one last demo
video. What you can see here, is when you
try to delete SMS, after all of the
fuzzing, it really doesn't work neither
via the settings nor via the SMS app. So,
you really need to reset your iPhone after
fuzzing it for too long. No other chance
than this to delete the messages. With
this, we are already at the end of this
talk, but of course, there will be a Q&A
session and if you missed the Q&A session,
you can also ask me on Twitter or write me
an email. Thanks for watching!
rC3 music
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!