Sylvain Munaut: osmo-gmr: What's up with sat-phones ?

Edit subtitles

0:09 - 0:12

This talk will be by Sylvain Munaut
0:12 - 0:15

who works with the Osmocom project.
0:15 - 0:19

He will talk today about the
osmo-gmr satellite phones.
0:19 - 0:23

After the talk there will be the
opportunity to ask questions.
0:23 - 0:25

You can approach one of the microphones
0:25 - 0:28

and then you have the chance
to ask one question.
0:28 - 0:33

Please give a round of applause
to Mr Sylvain Munaut!
0:33 - 0:36

Applause
0:36 - 0:39

- Hi! Thank you for being here.
0:39 - 0:45

As he said I'm going to talk about
satellite phones today.
0:45 - 0:50

I'll first talk with a quick recap of the
previous work we've done.
0:50 - 0:54

At 28C3 there was a presentation
on this subject.
0:54 - 0:56

But there were some pieces missing.
0:56 - 0:59

So, I'm just gonna recap
what we did previously
0:59 - 1:01

and then move on to the new stuff
1:01 - 1:03

which is the speech codec and cipher
1:03 - 1:07

reverse engineering and cracking
that's been done.
1:07 - 1:10

First, what is GMR exactly?
1:10 - 1:13

GMR stands for GEO-Mobile Radio and
1:13 - 1:17

it's an ETSI standard
for satellite phones.
1:17 - 1:19

It's heavily inspired from GSM and
1:19 - 1:21

if you read the GMR specs
1:21 - 1:24

there is plenty of reference
to go see the GSM spec.
1:24 - 1:27

There is actually two standards named GMR:
1:27 - 1:29

GMR-1 and GMR-2.
1:29 - 1:31

They are not an evolution of one another.
1:31 - 1:33

They are competing standards
1:33 - 1:35

that have been developed
by distinct companies
1:35 - 1:38

and then both have been
standardized to ETSI.
1:38 - 1:41

Today, we're gonna be looking
at GMR-1.
1:41 - 1:46

That comes in 3 evolutions.
It started with plain old GMR-1
1:46 - 1:49

which is voice calls and SMS.
1:49 - 1:53

It evolved to include packet data.
That's called GMPRS.
1:53 - 1:55

And then there was a later evolution
1:55 - 1:58

to interconnect with the 3G core network
1:58 - 2:01

and improve the air interface
and that's GMR-1 3G.
2:01 - 2:04

But we'll be mostly looking at
the first version
2:04 - 2:08

which is still used.
2:09 - 2:13

Among the deployments the one
we focused on was Thuraya.
2:13 - 2:16

The reason why is that
it's the most common
2:16 - 2:19

commercial operator
where you can actually buy SIM
2:19 - 2:21

and place phone calls.
2:21 - 2:23

The satellites are visible from Europe
2:23 - 2:27

which is kind of a requirement for me
to look at the signal.
2:27 - 2:31

This operator is mostly active in the
middle east, Asia and Africa.
2:31 - 2:34

In Europe you don't see too many people
2:34 - 2:37

with satellite phones
because we have GSM coverage.
2:37 - 2:41

But there are other operators
and we actually read recently
2:41 - 2:44

that there is a new deployment
planned for Europe
2:44 - 2:47

called Solaris Mobile and they said
2:47 - 2:50

they're going to use GRM-1 3G
2:50 - 2:53

but the satellite isn't launched yet.
2:53 - 2:56

This is the coverage zone for Thuraya.
2:56 - 2:58

As you can see Europe is very well covered
2:58 - 3:01

so there is no problem if you want to
receive the signal.
3:01 - 3:04

If you are somewhere in that area
you can see it.
3:04 - 3:07

So, since it's so heavily inspired by GSM
3:07 - 3:10

it makes sense to compare it to it.
3:10 - 3:13

The first thing they did is to
rename everything.
3:13 - 3:16

So, instead of a base transceiver station
3:16 - 3:18

you have a GEO transceiver station.
3:18 - 3:20

Instead of a base station controller
3:20 - 3:22

you have a GEO station controller.
3:22 - 3:26

Instead of a mobile station
you have a mobile earth station.
3:26 - 3:30

Then they did some actual
useful stuff like
3:30 - 3:32

use specialized features for satellites.
3:32 - 3:35

The first one is
terminal-to-terminal calls.
3:35 - 3:37

When you're usually placing a call
3:37 - 3:40

your signal goes from your
satellite phone to the satellite
3:40 - 3:43

then it goes back to the
core network on earth
3:43 - 3:46

and then, if you're calling
another satellite phone
3:46 - 3:49

it would have to again go
back up to the satellite
3:49 - 3:51

and back to the other phone
and that means that
3:51 - 3:54

you pay 4 trips from earth to space
3:54 - 3:57

and since the satellite is in
geosynchronous orbit
3:57 - 3:59

it's pretty far away.
3:59 - 4:02

So, to reduce the delay in communication
4:02 - 4:06

they have the so-called
terminal-to-terminal calls where
4:06 - 4:09

the call network will still
handle establishing
4:09 - 4:12

the channel but once
the call is established
4:12 - 4:15

the data is going directly
from one phone to the satellite
4:15 - 4:18

and directly back down to the other phone
4:18 - 4:20

without any involvement from the network.
4:20 - 4:22

It never even sees the packets.
4:22 - 4:26

They also have this call called
“High Penetration Alerting”
4:26 - 4:29

That's because like in the movies
you see people using
4:29 - 4:32

satellite phones in bunkers
and stuff like that
4:32 - 4:34

in practice, as soon as you are inside
4:34 - 4:37

you can't see the satellite and
you can't place a call at all
4:37 - 4:40

but you still might want to
be able to at least…
4:40 - 4:42

you know, you can't pick up the phone
4:42 - 4:45

but you want to know that
somebody is calling you
4:45 - 4:48

and so they have this specialized channel
4:48 - 4:51

which has an incredible amount
of error correction on it
4:51 - 4:54

so that even with very very low signal
4:54 - 4:57

you can still get an indication
that somebody is trying to call you
4:57 - 5:01

please run outside so
you can take the call.
5:01 - 5:04

There's also a very
tight integration to GPS
5:04 - 5:07

For example, all the Almanac
and Ephemeris data
5:07 - 5:10

are actually broadcasted
by the Thuraya satellite
5:10 - 5:12

so that your phone can get a lock faster
5:12 - 5:15

and when you place a phone call
the very first thing
5:15 - 5:19

that your phone will do
is send your exact GPS coordinates
5:19 - 5:22

in clear to the operator.
5:23 - 5:25

They do this for two reasons:
5:25 - 5:29

The first one is:
If you're not on the right frequency
5:29 - 5:32

for this particular geographic location
they can tell you
5:32 - 5:34

“OK. You shouldn't be connecting
on this frequency
5:34 - 5:37

use this one! It has a much better signal
5:37 - 5:39

and you get better quality”.
5:39 - 5:42

That's one reason.
The other reason is purely commercial:
5:42 - 5:45

It's for billing because if you're calling
5:45 - 5:47

while you are in certain countries
it's gonna cost more
5:47 - 5:50

than if you are on the other side
of the border
5:50 - 5:52

and so they need to know where you are
5:52 - 5:53

to bill you correctly.
5:53 - 5:56

And then compared to GSM,
of course, they introduced
5:56 - 5:58

the two things we're gonna look at today:
5:58 - 6:01

They changed the speech codec
to something called AMBE
6:01 - 6:04

and they changed the cipher.
They are not using A5/1
6:04 - 6:06

or A5/2 or A5/3 or that kind of stuff.
6:06 - 6:10

They use something called A5/GMR-1.
6:10 - 6:13

If you look at the protocol stack
6:13 - 6:15

anything in the lower layers
6:15 - 6:18

(radio modulation, TDMA frame structure)
6:18 - 6:21

it's completely different
because you have to deal with
6:21 - 6:24

the particular of the
satellite communication.
6:24 - 6:26

You find the same kind of concept
6:26 - 6:29

and you find the control channels
and broadcast channels
6:29 - 6:32

but their implementation is different.
6:32 - 6:35

If you go up one layer
to the data link layer
6:35 - 6:39

you have something very similar to LAPDm.
6:39 - 6:43

Again, they needed some minor adaptation
6:43 - 6:45

because bandwidth costs a lot
on a satellite.
6:45 - 6:48

They reduce the size of the overhead
by reducing the header
6:48 - 6:50

and since you have so much delay
6:50 - 6:53

going from your phone to the satellite and
6:53 - 6:56

back down to the earth station
then again to your phone
6:56 - 7:00

you have something like
nearly 500 ms of delay
7:00 - 7:05

because you have basically 4…
no 250 ms, sorry,
7:05 - 7:10

because you have both paths up and down
7:10 - 7:14

and so, with that delay you need more
time for the acknowledgement
7:14 - 7:17

to come in,
so they increased the window size.
7:17 - 7:19

Anything above that, so, layer 3
7:19 - 7:22

radio resource, since
it's managed the radio channel
7:22 - 7:25

is still a bit different
but anything above that
7:25 - 7:28

is strictly the same as GSM.
It actually interoperates
7:28 - 7:31

with the GSM core network
which means I can take
7:31 - 7:34

my Belgian operator SIM
put it into a Thuraya phone
7:34 - 7:38

and I can roam onto the satellite
and place calls
7:38 - 7:40

which is really nice
except for the price you pay
7:40 - 7:43

at the end of the month.
7:43 - 7:46

For packet data it's essentially
the same thing.
7:46 - 7:47

Anything that's close to the lower level
7:47 - 7:50

like RLC and MAC is gonna be different,
7:50 - 7:53

anything above interoperates with GSM
7:53 - 7:56

so you're gonna be speaking to
a GSM core network
7:56 - 8:00

SGSN and GGSN and all that good stuff.
8:00 - 8:05

So, osmo-gmr, what we presented last time,
8:05 - 8:10

is essentially everything you needed to go
from RF to wireshark.
8:10 - 8:13

That includes the hardware setup
8:13 - 8:20

(how you can build an antenna,
what LNA to use, what SDR receiver to use)
8:20 - 8:23

At that time we didn't
actually have RTL-SDR,
8:23 - 8:26

the cheap DVB dongle you can use as SDR.
8:26 - 8:29

Nowadays we do which means that
for less than 100 bucks
8:29 - 8:33

you can get an antenna,
LNA and SDR receiver and all
8:33 - 8:36

that we need to listen to those signals.
8:36 - 8:39

Then we did all the
SDR processing which is
8:39 - 8:41

taking that raw data, filtering it,
selecting the channel
8:41 - 8:45

doing the demodulation
getting actual data bits out of it.
8:45 - 8:48

Then, channel coding which is
converting those data bits into
8:48 - 8:52

layer 2 frames. Then,
interpreting those layer 2 frames
8:52 - 8:56

into channel assignment
and follow those assignments
8:56 - 8:59

into a demonstration application.
8:59 - 9:02

And finally, the wireshark dissector
which takes this
9:02 - 9:05

and presents it nicely in a way
you can read stuff.
9:05 - 9:09

But there is two things we couldn't do:
9:09 - 9:11

First, we couldn't see past the
ciphering mode command
9:11 - 9:14

which as soon as ciphering was enabled
9:14 - 9:16

we couldn't see anything because
9:16 - 9:18

we knew the key because
it was our own calls
9:18 - 9:20

and we can read the key from the SIM
9:20 - 9:23

but we didn't know the algorithm
that was used to cipher
9:23 - 9:26

so there was no way for us to
decipher that.
9:26 - 9:29

And we also had no way to
get the voice data,
9:29 - 9:33

converting the frame into
actual audio that we can listen to
9:33 - 9:38

We couldn't do that and that's
what we're gonna look into.
9:38 - 9:44

So the first thing is the speech codec.
9:46 - 9:51

It's called AMBE for
Advanced Multi-Band Excitation.
9:51 - 9:56

It's not a codec in itself.
AMBE is more a family of codecs
9:56 - 10:00

which means you have several codecs
which are named AMBE
10:00 - 10:02

which are different versions
of one another
10:02 - 10:05

slightly different so that
they're not compatible.
10:05 - 10:08

It's not documented in the standard
10:08 - 10:12

which is really annoying.
When I started working on GMR
10:12 - 10:15

I really thought that
the codec was in there
10:15 - 10:20

and then I discovered it wasn't.
That was a bad day.
10:20 - 10:23

There is a bit of specification in there
10:23 - 10:27

but it only gives a high level
description like
10:27 - 10:31

“The codec takes audio as input
and produces 80 bits of output
10:31 - 10:35

every 20 ms” but nothing
that can be used to
10:35 - 10:39

realistically implement a decoder.
10:41 - 10:44

That codec is made by a company DVSI Inc.
10:44 - 10:49

whose entire business is codecs.
That's all they do
10:49 - 10:51

which is probably why there are
incompatible versions
10:51 - 10:55

of AMBE because they can
sell different versions.
10:55 - 10:59

These guys, they do sell
a small USB stick that
10:59 - 11:02

can decode some of the variants.
For example, you can decode
11:02 - 11:07

DStar audio which is also
another AMBE codec
11:07 - 11:11

or P25 which is used in
law enforcement radio in the US
11:11 - 11:14

(it's also an AMBE variant)
11:14 - 11:17

and the cheap USB stick can decode that.
11:17 - 11:20

Unfortunately, it's incompatible
with the GMR-1 variant
11:20 - 11:22

because that's the first thing I did,
I contacted DVSI
11:22 - 11:25

to know what were my options.
11:25 - 11:31

And short of buying a source code license
which is just too expensive
11:32 - 11:35

is buying the appliance
which is called NET-2000.
11:35 - 11:39

And not only does it cost like 2000 Euros
11:39 - 11:43

which is way too much for a hobby project
11:43 - 11:46

but you also have to special-order it
with a GMR-1 firmware
11:46 - 11:50

and you have to sign some
non-reverse engineering agreement
11:50 - 11:54

which… That wasn't an option.
11:54 - 11:57

We still had some hope.
The first hope we had is that
11:57 - 12:03

as I said P25 uses an AMBE variant as well
12:03 - 12:07

and this particular variant
is actually documented.
12:07 - 12:11

So you can download the standard
for that particular
12:11 - 12:15

variant of AMBE and you have
all the math and all the specs
12:15 - 12:18

so you can write a decoder
and somebody actually wrote
12:18 - 12:22

a decoder which is open source
which is called ambelib
12:22 - 12:28

and so we can use that code as a base
to modify it. That was a lead.
12:28 - 12:32

The other lead, of course, is that the
codec is somewhere in the phone.
12:32 - 12:34

The phone is obviously capable
of decoding audio
12:34 - 12:37

so maybe we can find it in there.
12:37 - 12:41

But before we can start
searching for the codec
12:41 - 12:45

it's good to have a basic understanding
of how the codec works
12:45 - 12:48

so that we know what we're looking for.
12:48 - 12:50

The first thing to understand:
it's a vocoder.
12:50 - 12:53

It's not a general-purpose audio codec
12:53 - 12:58

which means if you try to feed it music
it's gonna perform horribly
12:58 - 13:01

because a general-purpose
audio codec will try to model
13:01 - 13:06

what your ear can actually hear
and what it can't
13:06 - 13:10

whereas a vocoder is gonna try
to model the speech
13:10 - 13:13

and reconstructs
something that sounds like
13:13 - 13:16

but is not actually the same thing.
13:16 - 13:19

They will drop a lot of information.
13:19 - 13:23

To do that the first thing they do
is to split your speech
13:23 - 13:26

into small segments which are
compressed independently.
13:26 - 13:32

In the case of GMR-1
those segments are 10 ms long and
13:32 - 13:37

they are combined by pair
and each pair is compressed
13:37 - 13:41

into a single frame. And then,
for each of those small segments
13:41 - 13:46

they will represent it in 4 parameters.
Those 4 parameters are
13:46 - 13:52

the pitch which is the
fundamental frequency
13:52 - 13:56

of the periodic component of your voice,
13:56 - 13:59

the gain which is
how loud you are talking,
13:59 - 14:01

something called
voiced/unvoiced decision,
14:01 - 14:04

we're gonna see what this is right after
14:04 - 14:07

but essentially you have the
fundamental frequency
14:07 - 14:09

and for each harmonic,
two times the frequency,
14:09 - 14:11

three times the frequency,
four times the frequency,
14:11 - 14:16

you have a bit that says
is that component voiced or unvoiced
14:16 - 14:22

and then you have the amplitudes
for the various harmonics.
14:22 - 14:27

If you want to play back the voice
you have to do 3 things:
14:27 - 14:29

The first thing is unpacking.
14:29 - 14:33

That's taking the 80 bits
of the voice frame
14:33 - 14:37

and reconstruct quantized
parameters from it.
14:37 - 14:40

All the parameters are not just
put in order one after the other
14:40 - 14:43

because some of the bits in a frame
will receive more error correction
14:43 - 14:47

than others and some of the bits
are more sensitive to errors than others
14:47 - 14:51

and so they will be placed at
specific places in the frame
14:51 - 14:55

so that if errors do happen,
hopefully it will affect something
14:55 - 14:57

that is not too important.
14:57 - 15:00

That's the unpacking step.
15:00 - 15:02

The second one is dequantization.
15:02 - 15:05

That's going from the quantized,
compressed parameters
15:05 - 15:09

into floating point value
or integer values
15:09 - 15:11

which will represent the real parameters
15:11 - 15:13

that are going to be used in the last step
15:13 - 15:17

which is synthesis. And synthesis
is taking those parameters
15:17 - 15:23

and reconstructing actual audio
that sounds like the original voice.
15:23 - 15:26

To do a quick overview
of the synthesis step
15:26 - 15:29

what you do is you take
for each possible harmonic,
15:29 - 15:32

so, f0, 2 f0, 3 f0,…
15:32 - 15:36

you have the voiced/unvoiced decision
15:36 - 15:40

and what voiced means is
that you're gonna reconstruct
15:40 - 15:43

that particular frequency
as a pure sinusoidal tone,
15:43 - 15:50

and if it's unvoiced you're just gonna
fill that small band with noise.
15:50 - 15:55

And with that they can reconstruct
human voice pretty well.
15:55 - 16:00

So, now that we know more or less
how the codec works
16:00 - 16:03

and what we are looking for
16:03 - 16:08

we can start looking at
where we are gonna find this codec.
16:08 - 16:13

The phone we looked at is
the Thuraya SO-2510.
16:13 - 16:17

We looked at it mostly because
it was the cheapest phone on eBay.
16:17 - 16:20

And that also means that it's the simplest
16:20 - 16:25

which means you have
less places to look into.
16:26 - 16:31

We knew the codec was in the DSP.
How did we know that is that
16:31 - 16:33

there is simply no other places it could be.
16:33 - 16:37

There are a bunch of chips on that phone, of course,
16:37 - 16:40

there is only two where it could possibly be:
16:40 - 16:42

the TI OMAP main processor,
16:42 - 16:44

and then they also have like a small ASIC
16:44 - 16:46

which is named the DALMA ASIC because
16:46 - 16:50

there is a big DALMA marking on it.
16:50 - 16:54

At first, we thought that
that chip was the AMBE codec.
16:54 - 16:58

But it turned out that
we found some Korean paper
16:58 - 17:00

from the phone manufacturer
that described
17:00 - 17:02

the phone architecture.
And there is a schematic
17:02 - 17:05

of the internals of that ASIC
and it showed
17:05 - 17:08

it only had satellite radio functions
17:08 - 17:10

and absolutely nothing
to do with the codec
17:10 - 17:14

which means there was a good chance
it was in the TI OMAP processor.
17:14 - 17:17

Why in the DSP? Well, because
it just makes sense
17:17 - 17:20

to make the codec in the DSP
especially since DVSI
17:20 - 17:25

the company that makes the codec
mostly sells those codecs as a
17:25 - 17:30

DSP precompiled library that you can buy.
17:31 - 17:39

Pretty standard process: We get the
firmware update from the Internet.
17:39 - 17:42

From there we can extract
the actual DSP image
17:42 - 17:46

which is the code that is
loaded into the DSP.
17:46 - 17:48

Thankfully it's supported by IDA
17:48 - 17:51

which makes the whole process
easier to reverse engineer.
17:51 - 17:54

But it's still like 250KB of binary data
17:54 - 17:57

and since it's a DSP it has no I/O
17:57 - 18:01

which means there is not
a single string in it.
18:01 - 18:05

And also it's a DSP which means
it's written in DSP assembly.
18:05 - 18:08

I don't know if you ever
tried to read that
18:08 - 18:13

but it's not the most
understandable thing in the world
18:13 - 18:16

because it's highly optimized
for speed, of course.
18:16 - 18:19

And I actually looked at that
code and went over it a few times
18:19 - 18:22

and took a few hours and looked at it
18:22 - 18:25

and I couldn't see anything in it.
18:25 - 18:31

And one day I get an email
from a colleague in OSMOCOM,
18:31 - 18:35

Dieter Spaar, asking me
if I had ever looked at that codec
18:35 - 18:37

and I told him “yeah I looked
18:37 - 18:38

but I couldn't really find anything.
18:38 - 18:40

Do you see anything?”
18:40 - 18:42

And six hours later he sent me
18:42 - 18:44

“I think this is the entry point
18:44 - 18:46

for the encoding and decoding functions”
18:46 - 18:49

“Oh, OK. Good!”
18:49 - 18:53

The way he found that
from what I remember is that
18:53 - 18:57

you look at what would a codec use.
18:57 - 19:00

And of course it's gonna
access the audio path
19:00 - 19:06

so you can look at the audio DMA setup,
and audio DMA interrupts.
19:06 - 19:09

You can also look for constants
that you know are used in the codec,
19:09 - 19:12

for example, I said that
the frames are 80 bits,
19:12 - 19:15

so you can look for the constant 80
somewhere in the code.
19:15 - 19:17

You can look at it.
19:17 - 19:20

It will produce 160 audio samples
per frame decoded
19:20 - 19:23

so you can look at the constant 160.
19:23 - 19:26

Once you actually found one function
19:26 - 19:29

there is something
that kind of stands out,
19:29 - 19:32

it's that since they ship that
as a binary library
19:32 - 19:37

they don't really trust the guy
who implements the rest of the DSP code
19:37 - 19:39

to setup the C runtime properly
19:39 - 19:43

and so what they do is they switch
everything at each call into the codec.
19:43 - 19:48

They switch the stack, they
re-setup/reconfigure the C runtime,
19:48 - 19:51

so that it's really independent.
19:51 - 19:54

Once you find one function,
finding the other is pretty easy,
19:54 - 20:00

you can just look for that
particular function prelude
20:00 - 20:07

that happens at each call at
the audio library.
20:07 - 20:11

OK, so, we know where the code is
20:11 - 20:13

but it's still a massive amount of code.
20:13 - 20:18

I think it's like one third of
the DSP code is the audio codec.
20:18 - 20:24

So, It's not something you can easily
reverse engineer in a few hours.
20:24 - 20:29

So, before trying reverse
engineering we just tried,
20:29 - 20:32

OK, maybe we can just run it.
20:32 - 20:35

It turns out that TI has this
really nice program called
20:35 - 20:39

TI Code Composer Studio
which includes a simulator
20:39 - 20:43

and it's a really nice piece
of software because
20:43 - 20:47

not only can you pretty accurately
simulate any of the DSPs
20:47 - 20:51

you can also completely configure
the surrounding environment
20:51 - 20:56

in which it runs so you can setup
arbitrary memory map.
20:56 - 20:59

You can also trace all memory access
to see what part
20:59 - 21:04

of memory is being read or being written
or being executed.
21:04 - 21:07

You can also access a file
on your host system.
21:07 - 21:09

So you can use fread and fwrite
21:09 - 21:12

and the simulator would
automatically translate those calls
21:12 - 21:17

into actual reading from the audio,
hard drive and write to your hard drive
21:17 - 21:22

so you can read a compressed
file that you saved
21:22 - 21:28

and it will write an audio
wave file at the output.
21:28 - 21:32

And the way to do that is
pretty straightforward.
21:32 - 21:35

You essentially just need to
take the binary dump that
21:35 - 21:39

you have from the firmware update
21:39 - 21:45

and you need to find a way to
mix in your own custom code.
21:45 - 21:51

The way to do that: You create
a new object file like ELF
21:51 - 21:55

except in this case it's called COFF,
a common object file format.
21:55 - 21:58

But it's the same idea.
21:58 - 22:05

You can just link to it like you would
link any other piece of binary.
22:05 - 22:07

You write a simple main function
22:07 - 22:12

that will basically fread your
compressed samples,
22:12 - 22:15

call the decoding function
or at least what you hope
22:15 - 22:20

is the decoding function
and fwrite the audio samples.
22:20 - 22:25

And surprisingly it worked.
It didn't require that much effort.
22:25 - 22:28

In like a couple of days
it was all working.
22:28 - 22:33

It didn't work on the first try.
There was a couple of pitfalls.
22:33 - 22:37

But essentially the IDE is there
and it can probably be applied
22:37 - 22:40

to other things as well.
22:40 - 22:42

Of course there is some
pretty big downsides.
22:42 - 22:46

At first, it's pretty slow.
Definitely sub-real-time
22:46 - 22:51

even on a modern laptop.
22:51 - 22:54

It also requires TI Code Composer Studio
22:54 - 22:58

which is a Windows application
and it's also not free.
22:58 - 23:02

You have a 30 days evaluation period
but that's about it.
23:02 - 23:05

So it's really not that practical.
23:05 - 23:08

But it was running in the simulator.
23:08 - 23:11

There is no real reason
it cannot run on real hardware.
23:11 - 23:14

Of course, that's what we try next.
23:14 - 23:17

Unfortunately, on real hardware
23:17 - 23:20

you can't just put memory
anywhere you want.
23:20 - 23:24

When the codec runs in the phone,
23:24 - 23:29

it runs inside a TI-OMAP
which is an ARM plus a DSP
23:29 - 23:31

and that chip has an MMU
which means
23:31 - 23:35

you can create
any virtual memory you want.
23:35 - 23:39

But if you're trying to run
on a cheap DSP board
23:39 - 23:42

that you can get for 50 bucks,
23:42 - 23:46

those usually don't include an MMU at all
23:46 - 23:48

which means you need to find a board
23:48 - 23:50

which has memory
at the right places
23:50 - 23:55

because the binary you have has been
linked to a specific physical addresses
23:55 - 23:58

and it needs to run at
that particular addresses.
23:58 - 24:06

Thankfully, Dieter found one such board
with a compatible memory map
24:06 - 24:12

and it had SDRAM.
Well, we needed it to have some memory.
24:12 - 24:17

It also included Ethernet which means
we can make some networked appliance
24:17 - 24:20

where we submit samples and get them back.
24:20 - 24:23

That was really nice.
24:23 - 24:26

The process of migrating the code
from the simulator
24:26 - 24:29

to the board is actually
pretty smooth because
24:29 - 24:32

of the way TI Code Composer Studio works.
24:32 - 24:34

You just basically select that as a target
24:34 - 24:40

and since the memory was at the right
place it pretty much worked.
24:40 - 24:43

Now, it wasn't quite as fast as we want it
24:43 - 24:48

because it was only running
about real-time at the first try
24:48 - 24:52

and that's because SDRAM is slow
especially when you do
24:52 - 24:57

a bunch of random accesses
the SDRAM isn't really fast.
24:57 - 25:00

The codec has unfortunately
a lot of big data tables
25:00 - 25:05

which are accessed constantly
like to compute cosines.
25:05 - 25:10

So, what Dieter did is basically
identify a couple
25:10 - 25:14

of those big tables
that are accessed very often
25:14 - 25:16

and then just migrated them
to internal SRAM.
25:16 - 25:19

You just find the one or two places
25:19 - 25:22

where the code reads from those tables
25:22 - 25:26

and just patch the address to another address
25:26 - 25:29

and redirect it to there.
25:29 - 25:33

And that got the code running much faster,
about 16x faster than real-time.
25:33 - 25:35

And honestly, that was good.
25:35 - 25:39

I wasn't really planning on doing
anything else on the hardware
25:39 - 25:44

and so I ordered the same board…
except I didn't.
25:44 - 25:48

I somehow ended up clicking
on the wrong button in the
25:48 - 25:50

TI store or something and
when the board arrived
25:50 - 25:53

I didn't have the right one.
25:53 - 25:56

At that point I had two choices:
25:56 - 25:59

I could just order the same board
25:59 - 26:01

and wait one more week,
26:01 - 26:04

or I could try to run
the code on it anyway.
26:04 - 26:07

The second board didn't have Ethernet.
It had USB.
26:07 - 26:10

But most importantly it didn't
have any SDRAM at all.
26:10 - 26:12

It only had internal SRAM
26:12 - 26:16

and unfortunately not at
the right address.
26:16 - 26:21

So I just figured: OK.
I'm just gonna try to relocate the binary
26:21 - 26:25

and make it run on another address.
How hard can it be?
26:25 - 26:28

It turned out to be pretty straightforward
because
26:28 - 26:31

in I think less than a day it was running
26:31 - 26:34

and that's really thanks to IDAPython
and the simulator.
26:34 - 26:38

Basically what I did is in IDAPython
which is the
26:38 - 26:42

scripting language in
IDA reverse engineering software
26:42 - 26:46

you can iterate through all the opcodes,
then for that
26:46 - 26:50

opcode you can test:
Is there an absolute memory reference
26:50 - 26:53

somewhere in that opcode?
If no, then don't do anything.
26:53 - 26:56

If there is an absolute memory reference,
26:56 - 27:00

is that reference falling
somewhere in a place that
27:00 - 27:02

I'm trying to relocate or not?
If it is a place
27:02 - 27:04

I'm trying to relocate then
just patch that opcode
27:04 - 27:07

and change the absolute memory reference
to the new place.
27:07 - 27:10

Go through all the code like that
and try to run it.
27:10 - 27:13

Of course, it didn't work
the first time around
27:13 - 27:20

because there was a couple of
pitfalls in that approach
27:20 - 27:25

but using the simulator it was
pretty easy to find out where
27:25 - 27:30

things were going wrong because
you take the original code
27:30 - 27:33

you run everything in the simulator
and you trace every
27:33 - 27:38

memory access, then you take your
potentially relocated code
27:38 - 27:42

you run it again in the simulator
and trace every memory access
27:42 - 27:45

Decoding the same frame
it should be deterministic
27:45 - 27:50

and so you should have the exact same
memory access patterns
27:50 - 27:53

in both cases. And as soon as you see
that they are diverging
27:53 - 27:56

you know that somewhere
a branch wasn't taken or
27:56 - 27:59

something went wrong and
you can go at that place in your code
27:59 - 28:03

look and see why the
relocation didn't work.
28:03 - 28:07

It only took two tries to
have the code running.
28:07 - 28:11

It was running on my board and
I didn't have to order a new one
28:11 - 28:13

which was pretty nice.
28:13 - 28:16

So we have a hardware USB decoder
28:16 - 28:19

which is very useful but
you still have to carry it around
28:19 - 28:23

or plug it into a network
server or something
28:23 - 28:26

which isn't necessarily
something you want.
28:26 - 28:32

I wanted something I could
checkout into git, basically.
28:32 - 28:39

So, I started reverse engineering
the entire codec.
28:39 - 28:43

Now, it turned out to be very painful.
28:43 - 28:47

As I said, to reconstruct audio
you have three main steps.
28:47 - 28:51

The first one is unpacking and
it's just basically bit shuffling.
28:51 - 28:55

It's like the first thing
the codec does which means
28:55 - 28:58

it's really easy to find and
it's really easy to follow.
28:58 - 29:00

So, no problem there.
29:00 - 29:04

The second step is dequantization
and it took like 95% percent
29:04 - 29:08

of the work because it's lot of math.
29:08 - 29:12

It's all written in DSP assembly,
in fixed-point
29:12 - 29:15

and sometimes you're looking at
a function to figure out
29:15 - 29:19

what that function does you
have to run it with different
29:19 - 29:23

parameters to see what does
that thing compute.
29:23 - 29:26

Graphing is really useful for that.
You just feed some
29:26 - 29:29

inputs into the simulator see
what output is generated
29:29 - 29:33

and just plot it in PyLab or something.
29:33 - 29:37

And then you can see, OK,
that actually computes a logarithm.
29:37 - 29:42

There is still a lot of code there
and there was no
29:42 - 29:46

other option for that part
because the quantization step
29:46 - 29:49

is completely specific to GMR.
It's not shared with any
29:49 - 29:52

of the other AMBE variants.
29:52 - 29:55

The synthesis part is even
more complicated, is even more
29:55 - 29:58

code in the DSP but I really didn't want…
29:58 - 30:02

after finishing the dequantization
I really didn't want
30:02 - 30:05

to the synthesis, so I just assumed
that the synthesis step
30:05 - 30:09

was gonna be very similar
to the one in P25.
30:09 - 30:14

So I took the mbelib existing
open-source code for P25,
30:14 - 30:18

I ripped out the unpacking
and dequantization out of it.
30:18 - 30:21

That's because it's the one for P25,
of course.
30:21 - 30:25

I just plugged in my own implementation
30:25 - 30:28

and run it through the synthesis.
30:28 - 30:31

There's a couple of things
you need to take care of
30:31 - 30:38

mostly because P25 uses 20 ms frames,
GMR uses 10 ms frames.
30:38 - 30:42

So, I basically dropped one frame
every other frame in GMR
30:42 - 30:46

and just used the 10 ms parameter
synthesized 20 ms of speech
30:46 - 30:49

but it produced something
where you could follow
30:49 - 30:52

a conversation and it worked.
30:52 - 30:56

I was still not really happy about it
because the code quality
30:56 - 30:59

of mbelib was really not that good.
30:59 - 31:01

Its performance is also not that good.
31:01 - 31:05

It wasn't actually faster
than the hardware decoder.
31:05 - 31:10

So, in the end I went back to
the actual P25 specifications
31:10 - 31:14

that I had and
rewrote a clean implementation
31:14 - 31:17

of the synthesis step.
I just guessed the difference.
31:17 - 31:21

Every time it says 20 ms
I just put 10 ms. When there is
31:21 - 31:26

a 512-bin FFT I just put 256
31:26 - 31:30

and adapted stuff as I went.
31:30 - 31:34

And the resulting code is in the git
31:34 - 31:37

and you can decode voice.
31:37 - 31:40

I think it works pretty well.
31:40 - 31:43

You can hear some subtle differences
31:43 - 31:45

compared to the original codec
31:45 - 31:48

because DVSI did do a bunch of stuff to
31:48 - 31:51

improve the voice quality.
It's their whole business.
31:51 - 31:56

And I didn't go through all of that.
31:56 - 32:00

But… From my point of view,
I'm done on this and I'm not
32:00 - 32:03

gonna go any further.
We have C code we can run.
32:03 - 32:06

So. That's good.
32:06 - 32:13

Applause
32:14 - 32:17

Thank you.
32:17 - 32:20

Next, we move on to the cipher.
32:20 - 32:24

We didn't reverse engineer the cipher.
32:24 - 32:28

That was done by a team
at the Bochum university
32:28 - 32:30

led by Benedikt Driessen.
32:30 - 32:34

Right after we publ…
After 28C3, basically…
32:34 - 32:38

Actually, at 28C3 we already had
some contacts with them but
32:38 - 32:41

the paper wasn't published yet.
32:41 - 32:45

And we actually have them validate
their data and their attack
32:45 - 32:46

using osmo-gmr.
32:46 - 32:50

I definitely encourage you to
read the paper that they published
32:50 - 32:52

because the method they used.
They are actually the ones
32:52 - 32:57

who extracted the DSP image which
we used to reverse engineer the codec.
32:57 - 33:02

They were the first ones
to extract it from the
33:02 - 33:05

file that you can download online.
33:05 - 33:09

And then, there is some pretty
interesting stuff in there.
33:09 - 33:11

As I said, they also published
an attack on that cipher
33:11 - 33:14

which is a ciphertext-only attack
33:14 - 33:18

and I'll show some of the results later on.
33:18 - 33:23

So, what does the cipher look like?
Well, it looks like that.
33:23 - 33:27

If you are familiar with A5/2 or even A5/1
33:27 - 33:30

that should look very familiar because
33:30 - 33:32

it's pretty much the same thing.
33:32 - 33:36

So you have 4 linear feedback registers.
33:36 - 33:40

The first 3 ones are combined
using a nonlinear output function
33:40 - 33:44

into the actual key stream.
And then you have a 4th register
33:44 - 33:48

which a few of the bits are tapped.
They go to a nonlinear
33:48 - 33:51

clocking function and that function
is gonna determine
33:51 - 33:55

how fast the other registers are clocked.
33:55 - 34:00

So, not all registers advance
at each output bit.
34:00 - 34:05

And as I said, it is very similar to…
it's actually a copy of A5/2
34:05 - 34:08

except they changed every number like…
they don't…
34:08 - 34:11

the polynomial for the linear
feedback registers are different,
34:11 - 34:14

which bits are tapped to combine
into the output are different.
34:14 - 34:18

Same thing for R4 but
it's the exact same structure
34:18 - 34:23

which means we can reuse the attack
that works on A5/2.
34:23 - 34:27

This cipher obviously has
an initialization step
34:27 - 34:30

where you basically take the key
and the frame number
34:30 - 34:32

that you want to encrypt
34:32 - 34:36

and you have some process to initialize
34:36 - 34:38

the LFSR with those data.
34:38 - 34:40

But really the only thing to retain
34:40 - 34:43

is that the initialization
process is entirely linear.
34:43 - 34:46

There is some XOR and LFSR being involved.
34:46 - 34:50

So that's pretty much…
34:50 - 34:54

…the thing to remember.
34:54 - 35:00

The actual irregular clocking
only is applied when you
35:00 - 35:06

generate bitstream
for encrypting the frame.
35:06 - 35:10

Since we are gonna talk about
attacks on ciphers
35:10 - 35:15

in some details there are a few things
that you may not remember
35:15 - 35:18

if you don't do algebra all the time.
35:18 - 35:21

This is a very very quick reminder.
35:21 - 35:26

So, you should know is that
when you see GF(2) that's
35:26 - 35:28

just a fancy term for binary.
35:28 - 35:32

What we call addition in GF(2)
is just the XOR operation.
35:32 - 35:34

Multiplication is just the logic AND
35:34 - 35:36

and you can do everything with that.
35:36 - 35:40

If you accept those definitions,
then algebra over binary
35:40 - 35:44

is gonna be pretty much the same as
algebra over real numbers
35:44 - 35:47

and all the nice properties
are gonna be applicable
35:47 - 35:50

to binary math as well.
35:50 - 35:54

I mentioned previously “linear”
and I will mention it
35:54 - 35:56

several times in the rest of this talk.
35:56 - 36:01

When I say “linear” it's either a constant,
36:01 - 36:04

or a variable or unknown
multiplied by a constant
36:04 - 36:08

but you never have two variables
multiplied together
36:08 - 36:13

because then it becomes quadratic
and it's not linear anymore.
36:14 - 36:18

Being linear has some nice properties
like linear systems of equations
36:18 - 36:23

we have very fast algorithms to
solve them
36:23 - 36:28

and every linear operation can be
represented as matrix operation
36:28 - 36:31

which is especially useful…
we're gonna be dealing with
36:31 - 36:34

hundreds of unknowns and
spelling them out is not an option.
36:34 - 36:38

We have to have some way of
representing that and that's
36:38 - 36:42

matrix operations, essentially.
36:42 - 36:46

One other important thing to know is:
36:46 - 36:50

When you have an operation
that adds redundancy, for example,
36:50 - 36:53

channel coding where you add the CRC
36:53 - 36:55

or where you do convolutional coding,
36:55 - 36:58

adding error correction adds
redundancy to your message
36:58 - 37:01

and when you have redundancy
you can create what's called
37:01 - 37:05

a parity check matrix to check
if that redundancy in that present.
37:05 - 37:08

So, for a CRC you check
the CRC for example.
37:08 - 37:11

But it can be generalized to anything
that adds redundancy.
37:11 - 37:14

It can be checked.
And that becomes very useful to
37:14 - 37:17

make the attack much faster.
37:17 - 37:21

So, a little more detail about
what the guys at Bochum did.
37:21 - 37:25

They published a ciphertext-only attack
37:25 - 37:28

that means you only need to
capture data off the air
37:28 - 37:30

and directly with that
you feed it to the attack
37:30 - 37:32

and it can give you the key.
37:32 - 37:35

Their target was called TCH3 frames
37:35 - 37:38

and those are traffic frames,
they're what carries the voice
37:38 - 37:41

which means you have a lot of them because
37:41 - 37:44

as long as the conversation is still going,
37:44 - 37:46

frames that are being generated constantly,
37:46 - 37:49

several tens of them per second.
37:49 - 37:52

They didn't start from nothing. As I said
37:52 - 37:55

the cipher is heavily based on A5/2, so,
37:55 - 37:57

it kind of makes sense to read the papers
37:57 - 38:01

that were published about A5/2 attacks.
38:01 - 38:06

The most interesting one and one of the
most advanced one is called
38:06 - 38:12

“Instant Ciphertext-Only Cryptanalysis
of GSM encrypted communication”
38:12 - 38:16

It's kind of math-heavy but
38:16 - 38:21

it provides an attack on A5/2
that recovers the key instantly
38:21 - 38:24

which is really nice.
38:24 - 38:26

They modified this attack.
They tweaked it for GMR-1
38:26 - 38:28

because the frames are different.
38:28 - 38:30

There's different amount of bits.
38:30 - 38:33

The cipher is slightly different.
38:33 - 38:35

And they also did something a little weird.
38:35 - 38:37

They tried to guess more bits
38:37 - 38:39

some of the bits they tried to brute-force.
38:39 - 38:42

I'm not exactly sure why.
38:42 - 38:44

I think it's to reduce
the number of unknowns.
38:44 - 38:48

But I don't know if it actually
had the desired effect.
38:48 - 38:52

And in the RUB paper
they published their result:
38:52 - 39:00

With 350 GB of precomputed data
using 32 TCH3 frames
39:00 - 39:03

they managed to recover the key
in something like 40 minutes
39:03 - 39:06

which is not bad, definitely, and it works.
39:06 - 39:09

But the original attack is named “instant”.
39:09 - 39:13

40 minutes isn't instant – to me, at least.
39:13 - 39:18

So, we started looking for a better attack.
39:18 - 39:22

We started off the same paper
because it's kind of
39:22 - 39:25

the state of the art of A5/2 attacks
39:25 - 39:27

but we really didn't do anything fancy.
39:27 - 39:33

We just used it exactly as they said
in the original paper
39:33 - 39:37

and we just did the necessary tweaks
for A5/GMR-1,
39:37 - 39:40

changing the matrix and
the length of the vector,
39:40 - 39:42

really nothing fancy.
39:42 - 39:46

We also decided to attack
a different type of channel
39:46 - 39:49

and this is probably what
makes the most difference.
39:49 - 39:52

We decided to target FACCH3 control frames
39:52 - 39:55

instead of the TCH3 voice frames.
39:55 - 39:59

There is a very big advantage to
targeting those frames.
39:59 - 40:03

First, they use a different kind of
modulation and training sequence.
40:03 - 40:06

From an RF point of view
they are much easier to receive.
40:06 - 40:10

And that's pretty important if
you're trying to mount an attack because
40:10 - 40:13

as soon as you have a
bit error in the reception
40:13 - 40:15

it will make your equations inconsistent
40:15 - 40:19

and you won't actually be able
to solve for the key as easily.
40:19 - 40:22

So, this is a very nice property
of those frames.
40:22 - 40:25

The other very nice property is
that we have known
40:25 - 40:29

plaintext on the FACCH3 channel,
the control channel.
40:29 - 40:32

There are some predictable frames
that we know will there
40:32 - 40:33

and we know where they will be.
40:33 - 40:36

the voice traffic is
completely unpredictable.
40:36 - 40:38

It's someone talking.
40:38 - 40:41

The control traffic will follow
some very known patterns
40:41 - 40:46

and I'll show an example of
plaintext which is really trivial.
40:47 - 40:50

Some other of the nice properties
of the control channel is:
40:50 - 40:53

There is much more redundancy.
40:53 - 40:58

In TCH3 not all the bits are
actually error-corrected
40:58 - 41:01

which means you don't have
that much information
41:01 - 41:05

while for the control channel
each layer-2 bit that you transmit
41:05 - 41:10

is almost quadrupled
before you put it on the air
41:10 - 41:15

which means you can build
a lot of equations very fast
41:15 - 41:20

without many bursts.
In the RUB attack they needed 32 frames,
41:20 - 41:27

probably because there is not much
redundancy in the TCH3 frames.
41:27 - 41:30

And also, as a nice bonus that
we discovered later is that
41:30 - 41:34

the FACCH3 control channel
is also used to negotiate
41:34 - 41:38

other types of channels like
the channels that are used
41:38 - 41:43

when you send FAX or
when you establish a data call
41:43 - 41:45

over satellite networks.
41:45 - 41:48

It's actually negotiated using
that type of control channel
41:48 - 41:51

which means the same kind of attack
can work to crack
41:51 - 41:55

not only voice but also data calls and FAX.
41:55 - 41:58

This is an example of the plaintext.
41:58 - 42:01

I don't know if you can really see it.
42:01 - 42:03

But essentially on the top
you have the cipher mode command
42:03 - 42:06

the first line in blue is
the cipher mode command.
42:06 - 42:09

So everything after that
is in theory ciphered.
42:09 - 42:11

This is, of course, the enciphered view.
42:11 - 42:14

And the three marked packets
in gray and black
42:14 - 42:16

are the very predictable texts
42:16 - 42:19

and what they are is acknowledgements.
42:19 - 42:21

Because of the delay on the GSM channel
42:21 - 42:23

there is some outstanding
acknowledgement
42:23 - 42:28

that still need to be sent
when the ciphering is turned on.
42:28 - 42:32

You get empty packets with
incrementing sequence number
42:32 - 42:35

and the ACK bit set.
42:35 - 42:37

Something changes in them,
the sequence number, but
42:37 - 42:41

as soon as you can count,
predicting them is trivial.
42:41 - 42:45

This definitely works in practice
like 100% of the time.
42:45 - 42:50

I never really had any problem with that.
42:51 - 42:56

So, how do you build a plaintext attack?
43:04 - 43:08

The general goal is…
what you're trying to achieve
43:08 - 43:12

is build a giant system of equations
that's linear
43:12 - 43:13

and that you can solve.
43:13 - 43:17

It's gonna have the form
A * x = b
43:17 - 43:21

b is the cipher stream that
is generated by A5/GMR-1
43:21 - 43:26

x is some representation of
the internal state of the cipher
43:26 - 43:28

that you are trying to find
43:28 - 43:31

and A is a giant matrix
that you can precompute
43:31 - 43:35

that's only dependent on
the structure of the cipher.
43:35 - 43:40

Each row of A and b are gonna represent
one bit of the output
43:40 - 43:45

and how to combine that internal state
to generate that cipher stream bit.
43:45 - 43:48

As I mentioned,
the initialization process is linear
43:48 - 43:51

and that's very important
for two things.
43:51 - 43:54

First,
from the internal state of the cipher
43:54 - 43:58

you're gonna be able to recover the key
43:58 - 44:02

which is not entirely true.
There is one bit of the key
44:02 - 44:05

that you can't recover
because that bit of the key
44:05 - 44:08

turns out to have absolutely no effect
on the output whatsoever.
44:08 - 44:12

so, you can't recover it
but doesn't matter either.
44:12 - 44:15

The other thing is that
44:15 - 44:18

you're gonna need to
build a lot of equations.
44:18 - 44:21

That system is gonna be big.
44:21 - 44:24

In a single burst of data
you won't have enough equations.
44:24 - 44:26

You're gonna need multiple of them.
44:26 - 44:29

But, of course, those multiple bursts
are gonna have different
44:29 - 44:31

frame numbers because they're sequential.
44:31 - 44:34

They're not gonna wrap before a long time.
44:34 - 44:37

But since the initialization
is entirely linear
44:37 - 44:41

you can actually derive equation
from one frame number
44:41 - 44:44

from an equation built out of
another frame number.
44:44 - 44:50

But, of course, it would be too easy.
The cipher isn't entirely linear.
44:50 - 44:53

so, we have to “linearize” it and
44:53 - 44:57

when I first started looking at
a crypto attack on A5/2
44:57 - 45:00

in the paper it was always
one line that said
45:00 - 45:03

oh yeah, we linearize the cipher
45:03 - 45:06

and they kind of always assumed
the readers know what that means
45:06 - 45:09

which I didn't at the time.
45:09 - 45:12

So, I decided to be a little more explicit.
45:12 - 45:17

The two nonlinear elements are
the output function
45:17 - 45:20

which uses a majority function
where you take three bits
45:20 - 45:24

and it outputs whichever
bit is more common
45:24 - 45:27

and you can rewrite that
majority function as being
45:27 - 45:31

the sum of a plus b multiplied by c
45:31 - 45:37

or if you wish a XOR b AND c.
45:37 - 45:41

And the problem is the
b multiplied by c because
45:41 - 45:44

that's quadratic, that's not linear.
45:44 - 45:46

That means that
we can't introduce that.
45:46 - 45:50

And the way to linearize that
is to kind of ignore it.
45:50 - 45:54

For every possible quadratic term
that's gonna be generated
45:54 - 45:58

by that nonlinear output function
45:58 - 46:00

you generate a new unknown.
46:00 - 46:03

In reality that unknown
isn't really unknown.
46:03 - 46:07

It's a function of your other variables but
46:07 - 46:10

you can't represent it in a linear fashion.
46:10 - 46:15

So, you just ignore it and say
“OK, this is a new unknown, deal with it”.
46:15 - 46:19

Actually, there is a lot of
possible combinations
46:19 - 46:22

which adds 594 new unknowns
to your equation system
46:22 - 46:25

which means that you're gonna need
a bunch more equations
46:25 - 46:28

to be able to actually solve it.
46:28 - 46:32

But at least it makes it linear.
So, there is hope.
46:33 - 46:40

The other nonlinear thing in that cipher
is the clocking.
46:40 - 46:44

As I said, all the LSFR don't advance
at the same rate.
46:44 - 46:50

Some are clocked once or zero times
at each output bit.
46:50 - 46:56

And this is a function of the value of R4
at that particular time.
46:56 - 47:00

So, how do we linearize that?
47:00 - 47:05

Well, R4 is a 17-bit
linear feedback register.
47:05 - 47:09

So, we can know
all its future states from any…
47:09 - 47:13

As soon as we have a good state
at one particular time
47:13 - 47:16

you can predict all the future states
47:16 - 47:19

because it's clocked all the time.
47:20 - 47:24

And in those 17 bits
the initialization step forces
47:24 - 47:26

one of those bits to one.
47:26 - 47:29

So, there is only 16 bits
that you don't know
47:29 - 47:33

which means there is only
65536 possible clocking
47:33 - 47:36

patterns for the cipher.
47:36 - 47:40

So, the way we approach
that is we just brute-force it.
47:40 - 47:43

You just assume that R4 is
gonna be equal to that value
47:43 - 47:48

after the initialization
and you just assume that
47:48 - 47:53

and try to solve and crack the cipher
by using that assumption.
47:53 - 47:56

And if you find the Kc, well,
then maybe it's the right one.
47:56 - 48:01

You just repeat that 65536 times
and most of the time
48:01 - 48:04

if your guess about R4 is incorrect
48:04 - 48:07

the system will end up being inconsistent
48:07 - 48:11

and you won't actually have any solution.
48:11 - 48:18

But this still requires solving
65536 systems
48:18 - 48:22

which is fast but not that fast.
48:22 - 48:28

There is actually a trick that we can
apply to do better guesses.
48:28 - 48:32

When you build these
giant equation systems
48:32 - 48:35

it turns out that some of the
equations that you construct
48:35 - 48:38

they are not gonna be independent.
Some of them are
48:38 - 48:40

gonna be related to each other.
48:40 - 48:45

And so some of the rows are redundant.
And what's nice is:
48:45 - 48:49

If they're redundant
we can actually test for that redundancy.
48:49 - 48:53

For the 65536 possible systems
that you need to solve,
48:53 - 48:55

you also have an
An * x = b
48:55 - 48:59

there is a bunch of rows in An
that are gonna be linear dependent.
48:59 - 49:04

So, for each of them you can
build a parity check matrix Hn
49:04 - 49:08

so that you have this nice property
of a parity check matrix
49:08 - 49:13

where you multiply b by Hn
and you get a zero matrix.
49:13 - 49:16

But all those Hn, you can precompute them.
49:16 - 49:20

That's the offline data of the attack.
49:20 - 49:23

You precompute those giant matrices,
49:23 - 49:29

then you take b which is the cipher text
you got from the cipher,
49:30 - 49:33

you multiply it by this matrix
and if it gives you zero,
49:33 - 49:37

you know that this particular R4 value
might be correct.
49:37 - 49:39

You don't know if it's correct or not,
49:39 - 49:41

but at least it might be.
49:41 - 49:44

If the result is non-zero
then you know that R4 is definitely
49:44 - 49:47

not the correct R4 and so
instead of solving an entire
49:47 - 49:50

system of equations the only thing
you need to do is
49:50 - 49:54

do one matrix multiplication
and that is pretty fast.
49:54 - 49:58

And in the end you're gonna have
a couple of possible matches
49:58 - 50:02

so you're gonna solve like maybe
3 or 4 systems of equations
50:02 - 50:05

instead of solving 65536 and
that's much much faster.
50:05 - 50:10

Now, how do you go from
a known-plaintext attack
50:10 - 50:13

to a ciphertext-only attack?
50:13 - 50:16

I'm not gonna detail that much
but essentially
50:16 - 50:20

the channel coding operation
introduces redundancy
50:20 - 50:24

and you can kind of null out that…
50:24 - 50:27

if you take the last equation you have
50:27 - 50:30

H multiplied by y plus g.
50:30 - 50:34

H is a function of the channel coding.
50:34 - 50:36

You can compute it. It's known.
50:36 - 50:40

y is the data that
you received off the air.
50:40 - 50:42

So, you know it as well.
50:42 - 50:46

g is also a function of
the channel coding operation.
50:46 - 50:50

So, you know it.
So everything on the left you know.
50:50 - 50:55

And the last value is H multiplied by A
50:55 - 50:58

which is what you had in
the known-plaintext attack,
50:58 - 51:00

so you know it as well.
51:00 - 51:03

And everything you don't know
is x and so, again,
51:03 - 51:06

you have a linear system of equations
that you can solve.
51:06 - 51:10

And the same R4 quick-scan
method can be applied
51:10 - 51:14

to scan very quickly and not
have to solve everything.
51:14 - 51:17

So, this is the result.
51:17 - 51:21

If you use the fact that
you know some plaintext
51:21 - 51:24

you need about 50 Megabytes
of data precomputed,
51:24 - 51:27

you need to receive 8 bursts
or even 4 if you
51:27 - 51:31

by any chance have the right
frame number alignment,
51:31 - 51:33

and in like 500 ms
you can get the key back.
51:33 - 51:36

And that's on a 6 year old laptop.
51:36 - 51:40

On any recent machine it's really instant.
51:40 - 51:43

If you want to use the
ciphertext-only variant,
51:43 - 51:46

and honestly I really don't know
why you would
51:46 - 51:50

because the known-plaintext works,
so you might as well use it.
51:50 - 51:54

But if you want it purely
for academic purposes,
51:54 - 51:57

then you need 8 bursts,
51:57 - 52:01

about 5 Gigabytes of data,
52:01 - 52:04

and then it takes a little longer
52:04 - 52:09

like about 1 second to recover the key.
52:09 - 52:13

A few things I would like to
look at in the future,
52:13 - 52:15

is C-band.
52:15 - 52:20

The satellite has to talk back
to the earth station
52:20 - 52:23

and this happens
in a different frequency band
52:23 - 52:25

and this is what we would like to look at,
52:25 - 52:27

essentially, the feeder link
52:27 - 52:30

If any of you have a large dish,
satellite dish,
52:30 - 52:33

and want to provide data for us
please contact us,
52:33 - 52:35

we're looking for it.
52:35 - 52:38

We're also started
looking into packet data
52:38 - 52:41

and some other stuff.
52:41 - 52:45

A quick note about
other satellite phone systems,
52:45 - 52:48

please don't assume that
just because Thuraya GMR-1
52:48 - 52:51

is broken that the others aren't
unless you have
52:51 - 52:53

actual proof that they aren't
because the only
52:53 - 52:56

reason we chose Thuraya is
because the phone was
52:56 - 52:58

the cheapest on eBay.
52:58 - 53:02

So, given that all the satellite
phone systems also…
53:02 - 53:06

you know, you can find commercial
interceptors for them,
53:06 - 53:11

there is a big chance that they're
not much more secure.
53:11 - 53:14

I like to thank a few people:
53:14 - 53:19

Dimitry Stolnikov,
the main other author in osmo-gmr
53:19 - 53:23

doing a bunch of the RF capture stuff,
53:23 - 53:28

Dieter Spaar
for reverse engineering of the AMBE codec
53:28 - 53:30

and all the help he provided there
53:30 - 53:33

and the RUB team, of course,
for their support in
53:33 - 53:35

reverse engineering the cipher,
53:35 - 53:40

and the organizer for having me
53:40 - 53:43

and thank you for your attention!
53:43 - 53:47

Applause
53:47 - 53:52

- Thank you very much!
We now have time for a few questions.
53:52 - 53:57

First, are there questions
from the Internet?
53:58 - 54:01

Not yet. So,
if you have a question, please
54:01 - 54:03

approach one of the microphones.
54:03 - 54:06

Yes, please, microphone number 3!
54:06 - 54:13

- Yes, I have a rather not technical
but political question:
54:13 - 54:18

So, you said in the beginning
there was like a short
54:18 - 54:21

way of communication that
only the satellite transmits
54:21 - 54:25

the voice data, so there is
no ground station involved?
54:25 - 54:30

- Yes. - Do you think that this
might upset some of the people
54:30 - 54:34

who like to intercept calls
because it makes intercepting
54:34 - 54:36

the voice data much more complicated
54:36 - 54:39

than just going to ground station and say
54:39 - 54:41

“I want to listen to this”?
54:41 - 54:44

- First, just because the satellite
will send the data back
54:44 - 54:46

directly to their mobile phone
54:46 - 54:49

doesn't mean the satellite
can't also provide a copy
54:49 - 54:53

of it to the earth station.
It just doesn't go through.
54:53 - 54:56

But the satellite can obviously
send a copy of it if they want to.
54:56 - 55:00

- And probably there will be a
possibility for the ground station
55:00 - 55:02

to say
“Don't do that direction connection”.
55:02 - 55:06

- On the other hand they can also
just intercept it off the air
55:06 - 55:08

because it's so easy to listen to it.
55:08 - 55:11

Last year there was a talk with somebody
55:11 - 55:14

that looked at satellite pictures
55:14 - 55:17

and they saw that one of the
spy satellites was relocated
55:17 - 55:22

like right next to the
Thuraya-2 satellite.
55:22 - 55:25

You can look that up.
It was a presentation last year.
55:25 - 55:29

It was pretty interesting
to see that… yeah…
55:29 - 55:31

If you want to listen to it you can do it.
55:31 - 55:33

- Thank you!
55:33 - 55:37

- Next question from microphone number 2!
55:39 - 55:46

- I've two questions: Can you hear
both sides of the conversation
55:46 - 55:50

or only one side of the conversation?
The second question is:
55:50 - 55:53

How big of a dish do you need?
55:53 - 55:55

- If you want to receive L-band which is
55:55 - 55:58

just listen to the communication
from the satellite to the phone
55:58 - 56:02

you don't need a dish,
you need like a small antenna.
56:02 - 56:06

You can even do it with a modified
GPS antenna and a piece of metal.
56:06 - 56:10

So, it's really easy and
really cheap to do.
56:10 - 56:13

You can only hear the the
part of the conversation
56:13 - 56:16

that comes from the satellite
to the phone.
56:16 - 56:20

So, if by any chance that guy
is speaking to another phone
56:20 - 56:24

you can hear both sides
because you can intercept both
56:24 - 56:27

channels simultaneously.
56:27 - 56:30

But in general, you either
56:30 - 56:34

only hear one side or you can
also be close enough to the
56:34 - 56:39

actual satellite phones
to receive the uplink
56:39 - 56:43

that goes from the phone to the satellite.
56:43 - 56:46

Dimitry showed me like a
commercial intercept system
56:46 - 56:51

where apparently they use like
planes to intercept the uplink.
56:51 - 56:54

They just fly over the area of interest
56:54 - 56:59

so they can more easily receive
the uplink from the phones.
56:59 - 57:01

- One final question. Please make it short
57:01 - 57:03

because we are running out of time.
57:03 - 57:05

- I'll try my best.
57:05 - 57:08

Launching your own satellite
is certainly not an option
57:08 - 57:11

but do you see any practical applications
57:11 - 57:16

in terms of what you did with GSM?
57:17 - 57:21

- Not really. It would be
interesting to try start a net
57:21 - 57:24

because there is nothing that says
that you have to be on a satellite.
57:24 - 57:28

You could technically start a base station on earth
57:28 - 57:32

using that protocol and it would work.
57:32 - 57:35

And the phones have much more power
than a GSM phone.
57:35 - 57:38

But other than that,
you could build an IMSI catcher
57:38 - 57:41

for satellite phones if you wanted to
57:41 - 57:44

and you'd definitely beat the
signal strength of the satellite
57:44 - 57:47

with your own setup.
57:47 - 57:48

- Thank you.
57:48 - 57:54

- Please give the speaker
another round of applause!
57:54 - 58:06

subtitles created by c3subtitles.de
Join, and help us!

Title:: Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
Description:: http://media.ccc.de/browse/congress/2014/31c3_-_6267_-_en_-_saal_g_-_201412271600_-_osmo-gmr_what_s_up_with_sat-phones_-_sylvain_munaut.html

At 28C3 we introduced the very first steps of the osmo-gmr projects. During this talk, we will present the various advances that have been made in this project on various aspects (voice codec, crypto algorithm, ...)

Sylvain Munaut

more » « less
Video Language:: English
Duration:: 58:06

	C3Subtitles edited English subtitles for Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
	s.gesemann edited English subtitles for Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
	s.gesemann edited English subtitles for Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
	s.gesemann edited English subtitles for Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
	s.gesemann edited English subtitles for Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
	s.gesemann edited English subtitles for Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
	s.gesemann edited English subtitles for Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
	s.gesemann edited English subtitles for Sylvain Munaut: osmo-gmr: What's up with sat-phones ?

Show all

English subtitles

Revisions

Revision 15 Edited

C3Subtitles

Sylvain Munaut: osmo-gmr: What's up with sat-phones ?

Revisions

Our website uses cookies

Operating cookies (Required)