Return to Video

Sylvain Munaut: osmo-gmr: What's up with sat-phones ?

  • 0:09 - 0:12
    This talk will be by Sylvain Munaut
  • 0:12 - 0:15
    who works with the Osmocom project.
  • 0:15 - 0:19
    He will talk today about the
    osmo-gmr satellite phones.
  • 0:19 - 0:23
    After the talk there will be the
    opportunity to ask questions.
  • 0:23 - 0:25
    You can approach one of the microphones
  • 0:25 - 0:28
    and then you have the chance
    to ask one question.
  • 0:28 - 0:33
    Please give a round of applause
    to Mr Sylvain Munaut!
  • 0:33 - 0:36
    Applause
  • 0:36 - 0:39
    - Hi! Thank you for being here.
  • 0:39 - 0:45
    As he said I'm going to talk about
    satellite phones today.
  • 0:45 - 0:50
    I'll first talk with a quick recap of the
    previous work we've done.
  • 0:50 - 0:54
    At 28C3 there was a presentation
    on this subject.
  • 0:54 - 0:56
    But there were some pieces missing.
  • 0:56 - 0:59
    So, I'm just gonna recap
    what we did previously
  • 0:59 - 1:01
    and then move on to the new stuff
  • 1:01 - 1:03
    which is the speech codec and cipher
  • 1:03 - 1:07
    reverse engineering and cracking
    that's been done.
  • 1:07 - 1:10
    First, what is GMR exactly?
  • 1:10 - 1:13
    GMR stands for GEO-Mobile Radio and
  • 1:13 - 1:17
    it's an ETSI standard
    for satellite phones.
  • 1:17 - 1:19
    It's heavily inspired from GSM and
  • 1:19 - 1:21
    if you read the GMR specs
  • 1:21 - 1:24
    there is plenty of reference
    to go see the GSM spec.
  • 1:24 - 1:27
    There is actually two standards named GMR:
  • 1:27 - 1:29
    GMR-1 and GMR-2.
  • 1:29 - 1:31
    They are not an evolution of one another.
  • 1:31 - 1:33
    They are competing standards
  • 1:33 - 1:35
    that have been developed
    by distinct companies
  • 1:35 - 1:38
    and then both have been
    standardized to ETSI.
  • 1:38 - 1:41
    Today, we're gonna be looking
    at GMR-1.
  • 1:41 - 1:46
    That comes in 3 evolutions.
    It started with plain old GMR-1
  • 1:46 - 1:49
    which is voice calls and SMS.
  • 1:49 - 1:53
    It evolved to include packet data.
    That's called GMPRS.
  • 1:53 - 1:55
    And then there was a later evolution
  • 1:55 - 1:58
    to interconnect with the 3G core network
  • 1:58 - 2:01
    and improve the air interface
    and that's GMR-1 3G.
  • 2:01 - 2:04
    But we'll be mostly looking at
    the first version
  • 2:04 - 2:08
    which is still used.
  • 2:09 - 2:13
    Among the deployments the one
    we focused on was Thuraya.
  • 2:13 - 2:16
    The reason why is that
    it's the most common
  • 2:16 - 2:19
    commercial operator
    where you can actually buy SIM
  • 2:19 - 2:21
    and place phone calls.
  • 2:21 - 2:23
    The satellites are visible from Europe
  • 2:23 - 2:27
    which is kind of a requirement for me
    to look at the signal.
  • 2:27 - 2:31
    This operator is mostly active in the
    middle east, Asia and Africa.
  • 2:31 - 2:34
    In Europe you don't see too many people
  • 2:34 - 2:37
    with satellite phones
    because we have GSM coverage.
  • 2:37 - 2:41
    But there are other operators
    and we actually read recently
  • 2:41 - 2:44
    that there is a new deployment
    planned for Europe
  • 2:44 - 2:47
    called Solaris Mobile and they said
  • 2:47 - 2:50
    they're going to use GRM-1 3G
  • 2:50 - 2:53
    but the satellite isn't launched yet.
  • 2:53 - 2:56
    This is the coverage zone for Thuraya.
  • 2:56 - 2:58
    As you can see Europe is very well covered
  • 2:58 - 3:01
    so there is no problem if you want to
    receive the signal.
  • 3:01 - 3:04
    If you are somewhere in that area
    you can see it.
  • 3:04 - 3:07
    So, since it's so heavily inspired by GSM
  • 3:07 - 3:10
    it makes sense to compare it to it.
  • 3:10 - 3:13
    The first thing they did is to
    rename everything.
  • 3:13 - 3:16
    So, instead of a base transceiver station
  • 3:16 - 3:18
    you have a GEO transceiver station.
  • 3:18 - 3:20
    Instead of a base station controller
  • 3:20 - 3:22
    you have a GEO station controller.
  • 3:22 - 3:26
    Instead of a mobile station
    you have a mobile earth station.
  • 3:26 - 3:30
    Then they did some actual
    useful stuff like
  • 3:30 - 3:32
    use specialized features for satellites.
  • 3:32 - 3:35
    The first one is
    terminal-to-terminal calls.
  • 3:35 - 3:37
    When you're usually placing a call
  • 3:37 - 3:40
    your signal goes from your
    satellite phone to the satellite
  • 3:40 - 3:43
    then it goes back to the
    core network on earth
  • 3:43 - 3:46
    and then, if you're calling
    another satellite phone
  • 3:46 - 3:49
    it would have to again go
    back up to the satellite
  • 3:49 - 3:51
    and back to the other phone
    and that means that
  • 3:51 - 3:54
    you pay 4 trips from earth to space
  • 3:54 - 3:57
    and since the satellite is in
    geosynchronous orbit
  • 3:57 - 3:59
    it's pretty far away.
  • 3:59 - 4:02
    So, to reduce the delay in communication
  • 4:02 - 4:06
    they have the so-called
    terminal-to-terminal calls where
  • 4:06 - 4:09
    the call network will still
    handle establishing
  • 4:09 - 4:12
    the channel but once
    the call is established
  • 4:12 - 4:15
    the data is going directly
    from one phone to the satellite
  • 4:15 - 4:18
    and directly back down to the other phone
  • 4:18 - 4:20
    without any involvement from the network.
  • 4:20 - 4:22
    It never even sees the packets.
  • 4:22 - 4:26
    They also have this call called
    “High Penetration Alerting”
  • 4:26 - 4:29
    That's because like in the movies
    you see people using
  • 4:29 - 4:32
    satellite phones in bunkers
    and stuff like that
  • 4:32 - 4:34
    in practice, as soon as you are inside
  • 4:34 - 4:37
    you can't see the satellite and
    you can't place a call at all
  • 4:37 - 4:40
    but you still might want to
    be able to at least…
  • 4:40 - 4:42
    you know, you can't pick up the phone
  • 4:42 - 4:45
    but you want to know that
    somebody is calling you
  • 4:45 - 4:48
    and so they have this specialized channel
  • 4:48 - 4:51
    which has an incredible amount
    of error correction on it
  • 4:51 - 4:54
    so that even with very very low signal
  • 4:54 - 4:57
    you can still get an indication
    that somebody is trying to call you
  • 4:57 - 5:01
    please run outside so
    you can take the call.
  • 5:01 - 5:04
    There's also a very
    tight integration to GPS
  • 5:04 - 5:07
    For example, all the Almanac
    and Ephemeris data
  • 5:07 - 5:10
    are actually broadcasted
    by the Thuraya satellite
  • 5:10 - 5:12
    so that your phone can get a lock faster
  • 5:12 - 5:15
    and when you place a phone call
    the very first thing
  • 5:15 - 5:19
    that your phone will do
    is send your exact GPS coordinates
  • 5:19 - 5:22
    in clear to the operator.
  • 5:23 - 5:25
    They do this for two reasons:
  • 5:25 - 5:29
    The first one is:
    If you're not on the right frequency
  • 5:29 - 5:32
    for this particular geographic location
    they can tell you
  • 5:32 - 5:34
    “OK. You shouldn't be connecting
    on this frequency
  • 5:34 - 5:37
    use this one! It has a much better signal
  • 5:37 - 5:39
    and you get better quality”.
  • 5:39 - 5:42
    That's one reason.
    The other reason is purely commercial:
  • 5:42 - 5:45
    It's for billing because if you're calling
  • 5:45 - 5:47
    while you are in certain countries
    it's gonna cost more
  • 5:47 - 5:50
    than if you are on the other side
    of the border
  • 5:50 - 5:52
    and so they need to know where you are
  • 5:52 - 5:53
    to bill you correctly.
  • 5:53 - 5:56
    And then compared to GSM,
    of course, they introduced
  • 5:56 - 5:58
    the two things we're gonna look at today:
  • 5:58 - 6:01
    They changed the speech codec
    to something called AMBE
  • 6:01 - 6:04
    and they changed the cipher.
    They are not using A5/1
  • 6:04 - 6:06
    or A5/2 or A5/3 or that kind of stuff.
  • 6:06 - 6:10
    They use something called A5/GMR-1.
  • 6:10 - 6:13
    If you look at the protocol stack
  • 6:13 - 6:15
    anything in the lower layers
  • 6:15 - 6:18
    (radio modulation, TDMA frame structure)
  • 6:18 - 6:21
    it's completely different
    because you have to deal with
  • 6:21 - 6:24
    the particular of the
    satellite communication.
  • 6:24 - 6:26
    You find the same kind of concept
  • 6:26 - 6:29
    and you find the control channels
    and broadcast channels
  • 6:29 - 6:32
    but their implementation is different.
  • 6:32 - 6:35
    If you go up one layer
    to the data link layer
  • 6:35 - 6:39
    you have something very similar to LAPDm.
  • 6:39 - 6:43
    Again, they needed some minor adaptation
  • 6:43 - 6:45
    because bandwidth costs a lot
    on a satellite.
  • 6:45 - 6:48
    They reduce the size of the overhead
    by reducing the header
  • 6:48 - 6:50
    and since you have so much delay
  • 6:50 - 6:53
    going from your phone to the satellite and
  • 6:53 - 6:56
    back down to the earth station
    then again to your phone
  • 6:56 - 7:00
    you have something like
    nearly 500 ms of delay
  • 7:00 - 7:05
    because you have basically 4…
    no 250 ms, sorry,
  • 7:05 - 7:10
    because you have both paths up and down
  • 7:10 - 7:14
    and so, with that delay you need more
    time for the acknowledgement
  • 7:14 - 7:17
    to come in,
    so they increased the window size.
  • 7:17 - 7:19
    Anything above that, so, layer 3
  • 7:19 - 7:22
    radio resource, since
    it's managed the radio channel
  • 7:22 - 7:25
    is still a bit different
    but anything above that
  • 7:25 - 7:28
    is strictly the same as GSM.
    It actually interoperates
  • 7:28 - 7:31
    with the GSM core network
    which means I can take
  • 7:31 - 7:34
    my Belgian operator SIM
    put it into a Thuraya phone
  • 7:34 - 7:38
    and I can roam onto the satellite
    and place calls
  • 7:38 - 7:40
    which is really nice
    except for the price you pay
  • 7:40 - 7:43
    at the end of the month.
  • 7:43 - 7:46
    For packet data it's essentially
    the same thing.
  • 7:46 - 7:47
    Anything that's close to the lower level
  • 7:47 - 7:50
    like RLC and MAC is gonna be different,
  • 7:50 - 7:53
    anything above interoperates with GSM
  • 7:53 - 7:56
    so you're gonna be speaking to
    a GSM core network
  • 7:56 - 8:00
    SGSN and GGSN and all that good stuff.
  • 8:00 - 8:05
    So, osmo-gmr, what we presented last time,
  • 8:05 - 8:10
    is essentially everything you needed to go
    from RF to wireshark.
  • 8:10 - 8:13
    That includes the hardware setup
  • 8:13 - 8:20
    (how you can build an antenna,
    what LNA to use, what SDR receiver to use)
  • 8:20 - 8:23
    At that time we didn't
    actually have RTL-SDR,
  • 8:23 - 8:26
    the cheap DVB dongle you can use as SDR.
  • 8:26 - 8:29
    Nowadays we do which means that
    for less than 100 bucks
  • 8:29 - 8:33
    you can get an antenna,
    LNA and SDR receiver and all
  • 8:33 - 8:36
    that we need to listen to those signals.
  • 8:36 - 8:39
    Then we did all the
    SDR processing which is
  • 8:39 - 8:41
    taking that raw data, filtering it,
    selecting the channel
  • 8:41 - 8:45
    doing the demodulation
    getting actual data bits out of it.
  • 8:45 - 8:48
    Then, channel coding which is
    converting those data bits into
  • 8:48 - 8:52
    layer 2 frames. Then,
    interpreting those layer 2 frames
  • 8:52 - 8:56
    into channel assignment
    and follow those assignments
  • 8:56 - 8:59
    into a demonstration application.
  • 8:59 - 9:02
    And finally, the wireshark dissector
    which takes this
  • 9:02 - 9:05
    and presents it nicely in a way
    you can read stuff.
  • 9:05 - 9:09
    But there is two things we couldn't do:
  • 9:09 - 9:11
    First, we couldn't see past the
    ciphering mode command
  • 9:11 - 9:14
    which as soon as ciphering was enabled
  • 9:14 - 9:16
    we couldn't see anything because
  • 9:16 - 9:18
    we knew the key because
    it was our own calls
  • 9:18 - 9:20
    and we can read the key from the SIM
  • 9:20 - 9:23
    but we didn't know the algorithm
    that was used to cipher
  • 9:23 - 9:26
    so there was no way for us to
    decipher that.
  • 9:26 - 9:29
    And we also had no way to
    get the voice data,
  • 9:29 - 9:33
    converting the frame into
    actual audio that we can listen to
  • 9:33 - 9:38
    We couldn't do that and that's
    what we're gonna look into.
  • 9:38 - 9:44
    So the first thing is the speech codec.
  • 9:46 - 9:51
    It's called AMBE for
    Advanced Multi-Band Excitation.
  • 9:51 - 9:56
    It's not a codec in itself.
    AMBE is more a family of codecs
  • 9:56 - 10:00
    which means you have several codecs
    which are named AMBE
  • 10:00 - 10:02
    which are different versions
    of one another
  • 10:02 - 10:05
    slightly different so that
    they're not compatible.
  • 10:05 - 10:08
    It's not documented in the standard
  • 10:08 - 10:12
    which is really annoying.
    When I started working on GMR
  • 10:12 - 10:15
    I really thought that
    the codec was in there
  • 10:15 - 10:20
    and then I discovered it wasn't.
    That was a bad day.
  • 10:20 - 10:23
    There is a bit of specification in there
  • 10:23 - 10:27
    but it only gives a high level
    description like
  • 10:27 - 10:31
    “The codec takes audio as input
    and produces 80 bits of output
  • 10:31 - 10:35
    every 20 ms” but nothing
    that can be used to
  • 10:35 - 10:39
    realistically implement a decoder.
  • 10:41 - 10:44
    That codec is made by a company DVSI Inc.
  • 10:44 - 10:49
    whose entire business is codecs.
    That's all they do
  • 10:49 - 10:51
    which is probably why there are
    incompatible versions
  • 10:51 - 10:55
    of AMBE because they can
    sell different versions.
  • 10:55 - 10:59
    These guys, they do sell
    a small USB stick that
  • 10:59 - 11:02
    can decode some of the variants.
    For example, you can decode
  • 11:02 - 11:07
    DStar audio which is also
    another AMBE codec
  • 11:07 - 11:11
    or P25 which is used in
    law enforcement radio in the US
  • 11:11 - 11:14
    (it's also an AMBE variant)
  • 11:14 - 11:17
    and the cheap USB stick can decode that.
  • 11:17 - 11:20
    Unfortunately, it's incompatible
    with the GMR-1 variant
  • 11:20 - 11:22
    because that's the first thing I did,
    I contacted DVSI
  • 11:22 - 11:25
    to know what were my options.
  • 11:25 - 11:31
    And short of buying a source code license
    which is just too expensive
  • 11:32 - 11:35
    is buying the appliance
    which is called NET-2000.
  • 11:35 - 11:39
    And not only does it cost like 2000 Euros
  • 11:39 - 11:43
    which is way too much for a hobby project
  • 11:43 - 11:46
    but you also have to special-order it
    with a GMR-1 firmware
  • 11:46 - 11:50
    and you have to sign some
    non-reverse engineering agreement
  • 11:50 - 11:54
    which… That wasn't an option.
  • 11:54 - 11:57
    We still had some hope.
    The first hope we had is that
  • 11:57 - 12:03
    as I said P25 uses an AMBE variant as well
  • 12:03 - 12:07
    and this particular variant
    is actually documented.
  • 12:07 - 12:11
    So you can download the standard
    for that particular
  • 12:11 - 12:15
    variant of AMBE and you have
    all the math and all the specs
  • 12:15 - 12:18
    so you can write a decoder
    and somebody actually wrote
  • 12:18 - 12:22
    a decoder which is open source
    which is called ambelib
  • 12:22 - 12:28
    and so we can use that code as a base
    to modify it. That was a lead.
  • 12:28 - 12:32
    The other lead, of course, is that the
    codec is somewhere in the phone.
  • 12:32 - 12:34
    The phone is obviously capable
    of decoding audio
  • 12:34 - 12:37
    so maybe we can find it in there.
  • 12:37 - 12:41
    But before we can start
    searching for the codec
  • 12:41 - 12:45
    it's good to have a basic understanding
    of how the codec works
  • 12:45 - 12:48
    so that we know what we're looking for.
  • 12:48 - 12:50
    The first thing to understand:
    it's a vocoder.
  • 12:50 - 12:53
    It's not a general-purpose audio codec
  • 12:53 - 12:58
    which means if you try to feed it music
    it's gonna perform horribly
  • 12:58 - 13:01
    because a general-purpose
    audio codec will try to model
  • 13:01 - 13:06
    what your ear can actually hear
    and what it can't
  • 13:06 - 13:10
    whereas a vocoder is gonna try
    to model the speech
  • 13:10 - 13:13
    and reconstructs
    something that sounds like
  • 13:13 - 13:16
    but is not actually the same thing.
  • 13:16 - 13:19
    They will drop a lot of information.
  • 13:19 - 13:23
    To do that the first thing they do
    is to split your speech
  • 13:23 - 13:26
    into small segments which are
    compressed independently.
  • 13:26 - 13:32
    In the case of GMR-1
    those segments are 10 ms long and
  • 13:32 - 13:37
    they are combined by pair
    and each pair is compressed
  • 13:37 - 13:41
    into a single frame. And then,
    for each of those small segments
  • 13:41 - 13:46
    they will represent it in 4 parameters.
    Those 4 parameters are
  • 13:46 - 13:52
    the pitch which is the
    fundamental frequency
  • 13:52 - 13:56
    of the periodic component of your voice,
  • 13:56 - 13:59
    the gain which is
    how loud you are talking,
  • 13:59 - 14:01
    something called
    voiced/unvoiced decision,
  • 14:01 - 14:04
    we're gonna see what this is right after
  • 14:04 - 14:07
    but essentially you have the
    fundamental frequency
  • 14:07 - 14:09
    and for each harmonic,
    two times the frequency,
  • 14:09 - 14:11
    three times the frequency,
    four times the frequency,
  • 14:11 - 14:16
    you have a bit that says
    is that component voiced or unvoiced
  • 14:16 - 14:22
    and then you have the amplitudes
    for the various harmonics.
  • 14:22 - 14:27
    If you want to play back the voice
    you have to do 3 things:
  • 14:27 - 14:29
    The first thing is unpacking.
  • 14:29 - 14:33
    That's taking the 80 bits
    of the voice frame
  • 14:33 - 14:37
    and reconstruct quantized
    parameters from it.
  • 14:37 - 14:40
    All the parameters are not just
    put in order one after the other
  • 14:40 - 14:43
    because some of the bits in a frame
    will receive more error correction
  • 14:43 - 14:47
    than others and some of the bits
    are more sensitive to errors than others
  • 14:47 - 14:51
    and so they will be placed at
    specific places in the frame
  • 14:51 - 14:55
    so that if errors do happen,
    hopefully it will affect something
  • 14:55 - 14:57
    that is not too important.
  • 14:57 - 15:00
    That's the unpacking step.
  • 15:00 - 15:02
    The second one is dequantization.
  • 15:02 - 15:05
    That's going from the quantized,
    compressed parameters
  • 15:05 - 15:09
    into floating point value
    or integer values
  • 15:09 - 15:11
    which will represent the real parameters
  • 15:11 - 15:13
    that are going to be used in the last step
  • 15:13 - 15:17
    which is synthesis. And synthesis
    is taking those parameters
  • 15:17 - 15:23
    and reconstructing actual audio
    that sounds like the original voice.
  • 15:23 - 15:26
    To do a quick overview
    of the synthesis step
  • 15:26 - 15:29
    what you do is you take
    for each possible harmonic,
  • 15:29 - 15:32
    so, f0, 2 f0, 3 f0,…
  • 15:32 - 15:36
    you have the voiced/unvoiced decision
  • 15:36 - 15:40
    and what voiced means is
    that you're gonna reconstruct
  • 15:40 - 15:43
    that particular frequency
    as a pure sinusoidal tone,
  • 15:43 - 15:50
    and if it's unvoiced you're just gonna
    fill that small band with noise.
  • 15:50 - 15:55
    And with that they can reconstruct
    human voice pretty well.
  • 15:55 - 16:00
    So, now that we know more or less
    how the codec works
  • 16:00 - 16:03
    and what we are looking for
  • 16:03 - 16:08
    we can start looking at
    where we are gonna find this codec.
  • 16:08 - 16:13
    The phone we looked at is
    the Thuraya SO-2510.
  • 16:13 - 16:17
    We looked at it mostly because
    it was the cheapest phone on eBay.
  • 16:17 - 16:20
    And that also means that it's the simplest
  • 16:20 - 16:25
    which means you have
    less places to look into.
  • 16:26 - 16:31
    We knew the codec was in the DSP.
    How did we know that is that
  • 16:31 - 16:33
    there is simply no other places it could be.
  • 16:33 - 16:37
    There are a bunch of chips on that phone, of course,
  • 16:37 - 16:40
    there is only two where it could possibly be:
  • 16:40 - 16:42
    the TI OMAP main processor,
  • 16:42 - 16:44
    and then they also have like a small ASIC
  • 16:44 - 16:46
    which is named the DALMA ASIC because
  • 16:46 - 16:50
    there is a big DALMA marking on it.
  • 16:50 - 16:54
    At first, we thought that
    that chip was the AMBE codec.
  • 16:54 - 16:58
    But it turned out that
    we found some Korean paper
  • 16:58 - 17:00
    from the phone manufacturer
    that described
  • 17:00 - 17:02
    the phone architecture.
    And there is a schematic
  • 17:02 - 17:05
    of the internals of that ASIC
    and it showed
  • 17:05 - 17:08
    it only had satellite radio functions
  • 17:08 - 17:10
    and absolutely nothing
    to do with the codec
  • 17:10 - 17:14
    which means there was a good chance
    it was in the TI OMAP processor.
  • 17:14 - 17:17
    Why in the DSP? Well, because
    it just makes sense
  • 17:17 - 17:20
    to make the codec in the DSP
    especially since DVSI
  • 17:20 - 17:25
    the company that makes the codec
    mostly sells those codecs as a
  • 17:25 - 17:30
    DSP precompiled library that you can buy.
  • 17:31 - 17:39
    Pretty standard process: We get the
    firmware update from the Internet.
  • 17:39 - 17:42
    From there we can extract
    the actual DSP image
  • 17:42 - 17:46
    which is the code that is
    loaded into the DSP.
  • 17:46 - 17:48
    Thankfully it's supported by IDA
  • 17:48 - 17:51
    which makes the whole process
    easier to reverse engineer.
  • 17:51 - 17:54
    But it's still like 250KB of binary data
  • 17:54 - 17:57
    and since it's a DSP it has no I/O
  • 17:57 - 18:01
    which means there is not
    a single string in it.
  • 18:01 - 18:05
    And also it's a DSP which means
    it's written in DSP assembly.
  • 18:05 - 18:08
    I don't know if you ever
    tried to read that
  • 18:08 - 18:13
    but it's not the most
    understandable thing in the world
  • 18:13 - 18:16
    because it's highly optimized
    for speed, of course.
  • 18:16 - 18:19
    And I actually looked at that
    code and went over it a few times
  • 18:19 - 18:22
    and took a few hours and looked at it
  • 18:22 - 18:25
    and I couldn't see anything in it.
  • 18:25 - 18:31
    And one day I get an email
    from a colleague in OSMOCOM,
  • 18:31 - 18:35
    Dieter Spaar, asking me
    if I had ever looked at that codec
  • 18:35 - 18:37
    and I told him “yeah I looked
  • 18:37 - 18:38
    but I couldn't really find anything.
  • 18:38 - 18:40
    Do you see anything?”
  • 18:40 - 18:42
    And six hours later he sent me
  • 18:42 - 18:44
    “I think this is the entry point
  • 18:44 - 18:46
    for the encoding and decoding functions”
  • 18:46 - 18:49
    “Oh, OK. Good!”
  • 18:49 - 18:53
    The way he found that
    from what I remember is that
  • 18:53 - 18:57
    you look at what would a codec use.
  • 18:57 - 19:00
    And of course it's gonna
    access the audio path
  • 19:00 - 19:06
    so you can look at the audio DMA setup,
    and audio DMA interrupts.
  • 19:06 - 19:09
    You can also look for constants
    that you know are used in the codec,
  • 19:09 - 19:12
    for example, I said that
    the frames are 80 bits,
  • 19:12 - 19:15
    so you can look for the constant 80
    somewhere in the code.
  • 19:15 - 19:17
    You can look at it.
  • 19:17 - 19:20
    It will produce 160 audio samples
    per frame decoded
  • 19:20 - 19:23
    so you can look at the constant 160.
  • 19:23 - 19:26
    Once you actually found one function
  • 19:26 - 19:29
    there is something
    that kind of stands out,
  • 19:29 - 19:32
    it's that since they ship that
    as a binary library
  • 19:32 - 19:37
    they don't really trust the guy
    who implements the rest of the DSP code
  • 19:37 - 19:39
    to setup the C runtime properly
  • 19:39 - 19:43
    and so what they do is they switch
    everything at each call into the codec.
  • 19:43 - 19:48
    They switch the stack, they
    re-setup/reconfigure the C runtime,
  • 19:48 - 19:51
    so that it's really independent.
  • 19:51 - 19:54
    Once you find one function,
    finding the other is pretty easy,
  • 19:54 - 20:00
    you can just look for that
    particular function prelude
  • 20:00 - 20:07
    that happens at each call at
    the audio library.
  • 20:07 - 20:11
    OK, so, we know where the code is
  • 20:11 - 20:13
    but it's still a massive amount of code.
  • 20:13 - 20:18
    I think it's like one third of
    the DSP code is the audio codec.
  • 20:18 - 20:24
    So, It's not something you can easily
    reverse engineer in a few hours.
  • 20:24 - 20:29
    So, before trying reverse
    engineering we just tried,
  • 20:29 - 20:32
    OK, maybe we can just run it.
  • 20:32 - 20:35
    It turns out that TI has this
    really nice program called
  • 20:35 - 20:39
    TI Code Composer Studio
    which includes a simulator
  • 20:39 - 20:43
    and it's a really nice piece
    of software because
  • 20:43 - 20:47
    not only can you pretty accurately
    simulate any of the DSPs
  • 20:47 - 20:51
    you can also completely configure
    the surrounding environment
  • 20:51 - 20:56
    in which it runs so you can setup
    arbitrary memory map.
  • 20:56 - 20:59
    You can also trace all memory access
    to see what part
  • 20:59 - 21:04
    of memory is being read or being written
    or being executed.
  • 21:04 - 21:07
    You can also access a file
    on your host system.
  • 21:07 - 21:09
    So you can use fread and fwrite
  • 21:09 - 21:12
    and the simulator would
    automatically translate those calls
  • 21:12 - 21:17
    into actual reading from the audio,
    hard drive and write to your hard drive
  • 21:17 - 21:22
    so you can read a compressed
    file that you saved
  • 21:22 - 21:28
    and it will write an audio
    wave file at the output.
  • 21:28 - 21:32
    And the way to do that is
    pretty straightforward.
  • 21:32 - 21:35
    You essentially just need to
    take the binary dump that
  • 21:35 - 21:39
    you have from the firmware update
  • 21:39 - 21:45
    and you need to find a way to
    mix in your own custom code.
  • 21:45 - 21:51
    The way to do that: You create
    a new object file like ELF
  • 21:51 - 21:55
    except in this case it's called COFF,
    a common object file format.
  • 21:55 - 21:58
    But it's the same idea.
  • 21:58 - 22:05
    You can just link to it like you would
    link any other piece of binary.
  • 22:05 - 22:07
    You write a simple main function
  • 22:07 - 22:12
    that will basically fread your
    compressed samples,
  • 22:12 - 22:15
    call the decoding function
    or at least what you hope
  • 22:15 - 22:20
    is the decoding function
    and fwrite the audio samples.
  • 22:20 - 22:25
    And surprisingly it worked.
    It didn't require that much effort.
  • 22:25 - 22:28
    In like a couple of days
    it was all working.
  • 22:28 - 22:33
    It didn't work on the first try.
    There was a couple of pitfalls.
  • 22:33 - 22:37
    But essentially the IDE is there
    and it can probably be applied
  • 22:37 - 22:40
    to other things as well.
  • 22:40 - 22:42
    Of course there is some
    pretty big downsides.
  • 22:42 - 22:46
    At first, it's pretty slow.
    Definitely sub-real-time
  • 22:46 - 22:51
    even on a modern laptop.
  • 22:51 - 22:54
    It also requires TI Code Composer Studio
  • 22:54 - 22:58
    which is a Windows application
    and it's also not free.
  • 22:58 - 23:02
    You have a 30 days evaluation period
    but that's about it.
  • 23:02 - 23:05
    So it's really not that practical.
  • 23:05 - 23:08
    But it was running in the simulator.
  • 23:08 - 23:11
    There is no real reason
    it cannot run on real hardware.
  • 23:11 - 23:14
    Of course, that's what we try next.
  • 23:14 - 23:17
    Unfortunately, on real hardware
  • 23:17 - 23:20
    you can't just put memory
    anywhere you want.
  • 23:20 - 23:24
    When the codec runs in the phone,
  • 23:24 - 23:29
    it runs inside a TI-OMAP
    which is an ARM plus a DSP
  • 23:29 - 23:31
    and that chip has an MMU
    which means
  • 23:31 - 23:35
    you can create
    any virtual memory you want.
  • 23:35 - 23:39
    But if you're trying to run
    on a cheap DSP board
  • 23:39 - 23:42
    that you can get for 50 bucks,
  • 23:42 - 23:46
    those usually don't include an MMU at all
  • 23:46 - 23:48
    which means you need to find a board
  • 23:48 - 23:50
    which has memory
    at the right places
  • 23:50 - 23:55
    because the binary you have has been
    linked to a specific physical addresses
  • 23:55 - 23:58
    and it needs to run at
    that particular addresses.
  • 23:58 - 24:06
    Thankfully, Dieter found one such board
    with a compatible memory map
  • 24:06 - 24:12
    and it had SDRAM.
    Well, we needed it to have some memory.
  • 24:12 - 24:17
    It also included Ethernet which means
    we can make some networked appliance
  • 24:17 - 24:20
    where we submit samples and get them back.
  • 24:20 - 24:23
    That was really nice.
  • 24:23 - 24:26
    The process of migrating the code
    from the simulator
  • 24:26 - 24:29
    to the board is actually
    pretty smooth because
  • 24:29 - 24:32
    of the way TI Code Composer Studio works.
  • 24:32 - 24:34
    You just basically select that as a target
  • 24:34 - 24:40
    and since the memory was at the right
    place it pretty much worked.
  • 24:40 - 24:43
    Now, it wasn't quite as fast as we want it
  • 24:43 - 24:48
    because it was only running
    about real-time at the first try
  • 24:48 - 24:52
    and that's because SDRAM is slow
    especially when you do
  • 24:52 - 24:57
    a bunch of random accesses
    the SDRAM isn't really fast.
  • 24:57 - 25:00
    The codec has unfortunately
    a lot of big data tables
  • 25:00 - 25:05
    which are accessed constantly
    like to compute cosines.
  • 25:05 - 25:10
    So, what Dieter did is basically
    identify a couple
  • 25:10 - 25:14
    of those big tables
    that are accessed very often
  • 25:14 - 25:16
    and then just migrated them
    to internal SRAM.
  • 25:16 - 25:19
    You just find the one or two places
  • 25:19 - 25:22
    where the code reads from those tables
  • 25:22 - 25:26
    and just patch the address to another address
  • 25:26 - 25:29
    and redirect it to there.
  • 25:29 - 25:33
    And that got the code running much faster,
    about 16x faster than real-time.
  • 25:33 - 25:35
    And honestly, that was good.
  • 25:35 - 25:39
    I wasn't really planning on doing
    anything else on the hardware
  • 25:39 - 25:44
    and so I ordered the same board…
    except I didn't.
  • 25:44 - 25:48
    I somehow ended up clicking
    on the wrong button in the
  • 25:48 - 25:50
    TI store or something and
    when the board arrived
  • 25:50 - 25:53
    I didn't have the right one.
  • 25:53 - 25:56
    At that point I had two choices:
  • 25:56 - 25:59
    I could just order the same board
  • 25:59 - 26:01
    and wait one more week,
  • 26:01 - 26:04
    or I could try to run
    the code on it anyway.
  • 26:04 - 26:07
    The second board didn't have Ethernet.
    It had USB.
  • 26:07 - 26:10
    But most importantly it didn't
    have any SDRAM at all.
  • 26:10 - 26:12
    It only had internal SRAM
  • 26:12 - 26:16
    and unfortunately not at
    the right address.
  • 26:16 - 26:21
    So I just figured: OK.
    I'm just gonna try to relocate the binary
  • 26:21 - 26:25
    and make it run on another address.
    How hard can it be?
  • 26:25 - 26:28
    It turned out to be pretty straightforward
    because
  • 26:28 - 26:31
    in I think less than a day it was running
  • 26:31 - 26:34
    and that's really thanks to IDAPython
    and the simulator.
  • 26:34 - 26:38
    Basically what I did is in IDAPython
    which is the
  • 26:38 - 26:42
    scripting language in
    IDA reverse engineering software
  • 26:42 - 26:46
    you can iterate through all the opcodes,
    then for that
  • 26:46 - 26:50
    opcode you can test:
    Is there an absolute memory reference
  • 26:50 - 26:53
    somewhere in that opcode?
    If no, then don't do anything.
  • 26:53 - 26:56
    If there is an absolute memory reference,
  • 26:56 - 27:00
    is that reference falling
    somewhere in a place that
  • 27:00 - 27:02
    I'm trying to relocate or not?
    If it is a place
  • 27:02 - 27:04
    I'm trying to relocate then
    just patch that opcode
  • 27:04 - 27:07
    and change the absolute memory reference
    to the new place.
  • 27:07 - 27:10
    Go through all the code like that
    and try to run it.
  • 27:10 - 27:13
    Of course, it didn't work
    the first time around
  • 27:13 - 27:20
    because there was a couple of
    pitfalls in that approach
  • 27:20 - 27:25
    but using the simulator it was
    pretty easy to find out where
  • 27:25 - 27:30
    things were going wrong because
    you take the original code
  • 27:30 - 27:33
    you run everything in the simulator
    and you trace every
  • 27:33 - 27:38
    memory access, then you take your
    potentially relocated code
  • 27:38 - 27:42
    you run it again in the simulator
    and trace every memory access
  • 27:42 - 27:45
    Decoding the same frame
    it should be deterministic
  • 27:45 - 27:50
    and so you should have the exact same
    memory access patterns
  • 27:50 - 27:53
    in both cases. And as soon as you see
    that they are diverging
  • 27:53 - 27:56
    you know that somewhere
    a branch wasn't taken or
  • 27:56 - 27:59
    something went wrong and
    you can go at that place in your code
  • 27:59 - 28:03
    look and see why the
    relocation didn't work.
  • 28:03 - 28:07
    It only took two tries to
    have the code running.
  • 28:07 - 28:11
    It was running on my board and
    I didn't have to order a new one
  • 28:11 - 28:13
    which was pretty nice.
  • 28:13 - 28:16
    So we have a hardware USB decoder
  • 28:16 - 28:19
    which is very useful but
    you still have to carry it around
  • 28:19 - 28:23
    or plug it into a network
    server or something
  • 28:23 - 28:26
    which isn't necessarily
    something you want.
  • 28:26 - 28:32
    I wanted something I could
    checkout into git, basically.
  • 28:32 - 28:39
    So, I started reverse engineering
    the entire codec.
  • 28:39 - 28:43
    Now, it turned out to be very painful.
  • 28:43 - 28:47
    As I said, to reconstruct audio
    you have three main steps.
  • 28:47 - 28:51
    The first one is unpacking and
    it's just basically bit shuffling.
  • 28:51 - 28:55
    It's like the first thing
    the codec does which means
  • 28:55 - 28:58
    it's really easy to find and
    it's really easy to follow.
  • 28:58 - 29:00
    So, no problem there.
  • 29:00 - 29:04
    The second step is dequantization
    and it took like 95% percent
  • 29:04 - 29:08
    of the work because it's lot of math.
  • 29:08 - 29:12
    It's all written in DSP assembly,
    in fixed-point
  • 29:12 - 29:15
    and sometimes you're looking at
    a function to figure out
  • 29:15 - 29:19
    what that function does you
    have to run it with different
  • 29:19 - 29:23
    parameters to see what does
    that thing compute.
  • 29:23 - 29:26
    Graphing is really useful for that.
    You just feed some
  • 29:26 - 29:29
    inputs into the simulator see
    what output is generated
  • 29:29 - 29:33
    and just plot it in PyLab or something.
  • 29:33 - 29:37
    And then you can see, OK,
    that actually computes a logarithm.
  • 29:37 - 29:42
    There is still a lot of code there
    and there was no
  • 29:42 - 29:46
    other option for that part
    because the quantization step
  • 29:46 - 29:49
    is completely specific to GMR.
    It's not shared with any
  • 29:49 - 29:52
    of the other AMBE variants.
  • 29:52 - 29:55
    The synthesis part is even
    more complicated, is even more
  • 29:55 - 29:58
    code in the DSP but I really didn't want…
  • 29:58 - 30:02
    after finishing the dequantization
    I really didn't want
  • 30:02 - 30:05
    to the synthesis, so I just assumed
    that the synthesis step
  • 30:05 - 30:09
    was gonna be very similar
    to the one in P25.
  • 30:09 - 30:14
    So I took the mbelib existing
    open-source code for P25,
  • 30:14 - 30:18
    I ripped out the unpacking
    and dequantization out of it.
  • 30:18 - 30:21
    That's because it's the one for P25,
    of course.
  • 30:21 - 30:25
    I just plugged in my own implementation
  • 30:25 - 30:28
    and run it through the synthesis.
  • 30:28 - 30:31
    There's a couple of things
    you need to take care of
  • 30:31 - 30:38
    mostly because P25 uses 20 ms frames,
    GMR uses 10 ms frames.
  • 30:38 - 30:42
    So, I basically dropped one frame
    every other frame in GMR
  • 30:42 - 30:46
    and just used the 10 ms parameter
    synthesized 20 ms of speech
  • 30:46 - 30:49
    but it produced something
    where you could follow
  • 30:49 - 30:52
    a conversation and it worked.
  • 30:52 - 30:56
    I was still not really happy about it
    because the code quality
  • 30:56 - 30:59
    of mbelib was really not that good.
  • 30:59 - 31:01
    Its performance is also not that good.
  • 31:01 - 31:05
    It wasn't actually faster
    than the hardware decoder.
  • 31:05 - 31:10
    So, in the end I went back to
    the actual P25 specifications
  • 31:10 - 31:14
    that I had and
    rewrote a clean implementation
  • 31:14 - 31:17
    of the synthesis step.
    I just guessed the difference.
  • 31:17 - 31:21
    Every time it says 20 ms
    I just put 10 ms. When there is
  • 31:21 - 31:26
    a 512-bin FFT I just put 256
  • 31:26 - 31:30
    and adapted stuff as I went.
  • 31:30 - 31:34
    And the resulting code is in the git
  • 31:34 - 31:37
    and you can decode voice.
  • 31:37 - 31:40
    I think it works pretty well.
  • 31:40 - 31:43
    You can hear some subtle differences
  • 31:43 - 31:45
    compared to the original codec
  • 31:45 - 31:48
    because DVSI did do a bunch of stuff to
  • 31:48 - 31:51
    improve the voice quality.
    It's their whole business.
  • 31:51 - 31:56
    And I didn't go through all of that.
  • 31:56 - 32:00
    But… From my point of view,
    I'm done on this and I'm not
  • 32:00 - 32:03
    gonna go any further.
    We have C code we can run.
  • 32:03 - 32:06
    So. That's good.
  • 32:06 - 32:13
    Applause
  • 32:14 - 32:17
    Thank you.
  • 32:17 - 32:20
    Next, we move on to the cipher.
  • 32:20 - 32:24
    We didn't reverse engineer the cipher.
  • 32:24 - 32:28
    That was done by a team
    at the Bochum university
  • 32:28 - 32:30
    led by Benedikt Driessen.
  • 32:30 - 32:34
    Right after we publ…
    After 28C3, basically…
  • 32:34 - 32:38
    Actually, at 28C3 we already had
    some contacts with them but
  • 32:38 - 32:41
    the paper wasn't published yet.
  • 32:41 - 32:45
    And we actually have them validate
    their data and their attack
  • 32:45 - 32:46
    using osmo-gmr.
  • 32:46 - 32:50
    I definitely encourage you to
    read the paper that they published
  • 32:50 - 32:52
    because the method they used.
    They are actually the ones
  • 32:52 - 32:57
    who extracted the DSP image which
    we used to reverse engineer the codec.
  • 32:57 - 33:02
    They were the first ones
    to extract it from the
  • 33:02 - 33:05
    file that you can download online.
  • 33:05 - 33:09
    And then, there is some pretty
    interesting stuff in there.
  • 33:09 - 33:11
    As I said, they also published
    an attack on that cipher
  • 33:11 - 33:14
    which is a ciphertext-only attack
  • 33:14 - 33:18
    and I'll show some of the results later on.
  • 33:18 - 33:23
    So, what does the cipher look like?
    Well, it looks like that.
  • 33:23 - 33:27
    If you are familiar with A5/2 or even A5/1
  • 33:27 - 33:30
    that should look very familiar because
  • 33:30 - 33:32
    it's pretty much the same thing.
  • 33:32 - 33:36
    So you have 4 linear feedback registers.
  • 33:36 - 33:40
    The first 3 ones are combined
    using a nonlinear output function
  • 33:40 - 33:44
    into the actual key stream.
    And then you have a 4th register
  • 33:44 - 33:48
    which a few of the bits are tapped.
    They go to a nonlinear
  • 33:48 - 33:51
    clocking function and that function
    is gonna determine
  • 33:51 - 33:55
    how fast the other registers are clocked.
  • 33:55 - 34:00
    So, not all registers advance
    at each output bit.
  • 34:00 - 34:05
    And as I said, it is very similar to…
    it's actually a copy of A5/2
  • 34:05 - 34:08
    except they changed every number like…
    they don't…
  • 34:08 - 34:11
    the polynomial for the linear
    feedback registers are different,
  • 34:11 - 34:14
    which bits are tapped to combine
    into the output are different.
  • 34:14 - 34:18
    Same thing for R4 but
    it's the exact same structure
  • 34:18 - 34:23
    which means we can reuse the attack
    that works on A5/2.
  • 34:23 - 34:27
    This cipher obviously has
    an initialization step
  • 34:27 - 34:30
    where you basically take the key
    and the frame number
  • 34:30 - 34:32
    that you want to encrypt
  • 34:32 - 34:36
    and you have some process to initialize
  • 34:36 - 34:38
    the LFSR with those data.
  • 34:38 - 34:40
    But really the only thing to retain
  • 34:40 - 34:43
    is that the initialization
    process is entirely linear.
  • 34:43 - 34:46
    There is some XOR and LFSR being involved.
  • 34:46 - 34:50
    So that's pretty much…
  • 34:50 - 34:54
    …the thing to remember.
  • 34:54 - 35:00
    The actual irregular clocking
    only is applied when you
  • 35:00 - 35:06
    generate bitstream
    for encrypting the frame.
  • 35:06 - 35:10
    Since we are gonna talk about
    attacks on ciphers
  • 35:10 - 35:15
    in some details there are a few things
    that you may not remember
  • 35:15 - 35:18
    if you don't do algebra all the time.
  • 35:18 - 35:21
    This is a very very quick reminder.
  • 35:21 - 35:26
    So, you should know is that
    when you see GF(2) that's
  • 35:26 - 35:28
    just a fancy term for binary.
  • 35:28 - 35:32
    What we call addition in GF(2)
    is just the XOR operation.
  • 35:32 - 35:34
    Multiplication is just the logic AND
  • 35:34 - 35:36
    and you can do everything with that.
  • 35:36 - 35:40
    If you accept those definitions,
    then algebra over binary
  • 35:40 - 35:44
    is gonna be pretty much the same as
    algebra over real numbers
  • 35:44 - 35:47
    and all the nice properties
    are gonna be applicable
  • 35:47 - 35:50
    to binary math as well.
  • 35:50 - 35:54
    I mentioned previously “linear”
    and I will mention it
  • 35:54 - 35:56
    several times in the rest of this talk.
  • 35:56 - 36:01
    When I say “linear” it's either a constant,
  • 36:01 - 36:04
    or a variable or unknown
    multiplied by a constant
  • 36:04 - 36:08
    but you never have two variables
    multiplied together
  • 36:08 - 36:13
    because then it becomes quadratic
    and it's not linear anymore.
  • 36:14 - 36:18
    Being linear has some nice properties
    like linear systems of equations
  • 36:18 - 36:23
    we have very fast algorithms to
    solve them
  • 36:23 - 36:28
    and every linear operation can be
    represented as matrix operation
  • 36:28 - 36:31
    which is especially useful…
    we're gonna be dealing with
  • 36:31 - 36:34
    hundreds of unknowns and
    spelling them out is not an option.
  • 36:34 - 36:38
    We have to have some way of
    representing that and that's
  • 36:38 - 36:42
    matrix operations, essentially.
  • 36:42 - 36:46
    One other important thing to know is:
  • 36:46 - 36:50
    When you have an operation
    that adds redundancy, for example,
  • 36:50 - 36:53
    channel coding where you add the CRC
  • 36:53 - 36:55
    or where you do convolutional coding,
  • 36:55 - 36:58
    adding error correction adds
    redundancy to your message
  • 36:58 - 37:01
    and when you have redundancy
    you can create what's called
  • 37:01 - 37:05
    a parity check matrix to check
    if that redundancy in that present.
  • 37:05 - 37:08
    So, for a CRC you check
    the CRC for example.
  • 37:08 - 37:11
    But it can be generalized to anything
    that adds redundancy.
  • 37:11 - 37:14
    It can be checked.
    And that becomes very useful to
  • 37:14 - 37:17
    make the attack much faster.
  • 37:17 - 37:21
    So, a little more detail about
    what the guys at Bochum did.
  • 37:21 - 37:25
    They published a ciphertext-only attack
  • 37:25 - 37:28
    that means you only need to
    capture data off the air
  • 37:28 - 37:30
    and directly with that
    you feed it to the attack
  • 37:30 - 37:32
    and it can give you the key.
  • 37:32 - 37:35
    Their target was called TCH3 frames
  • 37:35 - 37:38
    and those are traffic frames,
    they're what carries the voice
  • 37:38 - 37:41
    which means you have a lot of them because
  • 37:41 - 37:44
    as long as the conversation is still going,
  • 37:44 - 37:46
    frames that are being generated constantly,
  • 37:46 - 37:49
    several tens of them per second.
  • 37:49 - 37:52
    They didn't start from nothing. As I said
  • 37:52 - 37:55
    the cipher is heavily based on A5/2, so,
  • 37:55 - 37:57
    it kind of makes sense to read the papers
  • 37:57 - 38:01
    that were published about A5/2 attacks.
  • 38:01 - 38:06
    The most interesting one and one of the
    most advanced one is called
  • 38:06 - 38:12
    “Instant Ciphertext-Only Cryptanalysis
    of GSM encrypted communication”
  • 38:12 - 38:16
    It's kind of math-heavy but
  • 38:16 - 38:21
    it provides an attack on A5/2
    that recovers the key instantly
  • 38:21 - 38:24
    which is really nice.
  • 38:24 - 38:26
    They modified this attack.
    They tweaked it for GMR-1
  • 38:26 - 38:28
    because the frames are different.
  • 38:28 - 38:30
    There's different amount of bits.
  • 38:30 - 38:33
    The cipher is slightly different.
  • 38:33 - 38:35
    And they also did something a little weird.
  • 38:35 - 38:37
    They tried to guess more bits
  • 38:37 - 38:39
    some of the bits they tried to brute-force.
  • 38:39 - 38:42
    I'm not exactly sure why.
  • 38:42 - 38:44
    I think it's to reduce
    the number of unknowns.
  • 38:44 - 38:48
    But I don't know if it actually
    had the desired effect.
  • 38:48 - 38:52
    And in the RUB paper
    they published their result:
  • 38:52 - 39:00
    With 350 GB of precomputed data
    using 32 TCH3 frames
  • 39:00 - 39:03
    they managed to recover the key
    in something like 40 minutes
  • 39:03 - 39:06
    which is not bad, definitely, and it works.
  • 39:06 - 39:09
    But the original attack is named “instant”.
  • 39:09 - 39:13
    40 minutes isn't instant – to me, at least.
  • 39:13 - 39:18
    So, we started looking for a better attack.
  • 39:18 - 39:22
    We started off the same paper
    because it's kind of
  • 39:22 - 39:25
    the state of the art of A5/2 attacks
  • 39:25 - 39:27
    but we really didn't do anything fancy.
  • 39:27 - 39:33
    We just used it exactly as they said
    in the original paper
  • 39:33 - 39:37
    and we just did the necessary tweaks
    for A5/GMR-1,
  • 39:37 - 39:40
    changing the matrix and
    the length of the vector,
  • 39:40 - 39:42
    really nothing fancy.
  • 39:42 - 39:46
    We also decided to attack
    a different type of channel
  • 39:46 - 39:49
    and this is probably what
    makes the most difference.
  • 39:49 - 39:52
    We decided to target FACCH3 control frames
  • 39:52 - 39:55
    instead of the TCH3 voice frames.
  • 39:55 - 39:59
    There is a very big advantage to
    targeting those frames.
  • 39:59 - 40:03
    First, they use a different kind of
    modulation and training sequence.
  • 40:03 - 40:06
    From an RF point of view
    they are much easier to receive.
  • 40:06 - 40:10
    And that's pretty important if
    you're trying to mount an attack because
  • 40:10 - 40:13
    as soon as you have a
    bit error in the reception
  • 40:13 - 40:15
    it will make your equations inconsistent
  • 40:15 - 40:19
    and you won't actually be able
    to solve for the key as easily.
  • 40:19 - 40:22
    So, this is a very nice property
    of those frames.
  • 40:22 - 40:25
    The other very nice property is
    that we have known
  • 40:25 - 40:29
    plaintext on the FACCH3 channel,
    the control channel.
  • 40:29 - 40:32
    There are some predictable frames
    that we know will there
  • 40:32 - 40:33
    and we know where they will be.
  • 40:33 - 40:36
    the voice traffic is
    completely unpredictable.
  • 40:36 - 40:38
    It's someone talking.
  • 40:38 - 40:41
    The control traffic will follow
    some very known patterns
  • 40:41 - 40:46
    and I'll show an example of
    plaintext which is really trivial.
  • 40:47 - 40:50
    Some other of the nice properties
    of the control channel is:
  • 40:50 - 40:53
    There is much more redundancy.
  • 40:53 - 40:58
    In TCH3 not all the bits are
    actually error-corrected
  • 40:58 - 41:01
    which means you don't have
    that much information
  • 41:01 - 41:05
    while for the control channel
    each layer-2 bit that you transmit
  • 41:05 - 41:10
    is almost quadrupled
    before you put it on the air
  • 41:10 - 41:15
    which means you can build
    a lot of equations very fast
  • 41:15 - 41:20
    without many bursts.
    In the RUB attack they needed 32 frames,
  • 41:20 - 41:27
    probably because there is not much
    redundancy in the TCH3 frames.
  • 41:27 - 41:30
    And also, as a nice bonus that
    we discovered later is that
  • 41:30 - 41:34
    the FACCH3 control channel
    is also used to negotiate
  • 41:34 - 41:38
    other types of channels like
    the channels that are used
  • 41:38 - 41:43
    when you send FAX or
    when you establish a data call
  • 41:43 - 41:45
    over satellite networks.
  • 41:45 - 41:48
    It's actually negotiated using
    that type of control channel
  • 41:48 - 41:51
    which means the same kind of attack
    can work to crack
  • 41:51 - 41:55
    not only voice but also data calls and FAX.
  • 41:55 - 41:58
    This is an example of the plaintext.
  • 41:58 - 42:01
    I don't know if you can really see it.
  • 42:01 - 42:03
    But essentially on the top
    you have the cipher mode command
  • 42:03 - 42:06
    the first line in blue is
    the cipher mode command.
  • 42:06 - 42:09
    So everything after that
    is in theory ciphered.
  • 42:09 - 42:11
    This is, of course, the enciphered view.
  • 42:11 - 42:14
    And the three marked packets
    in gray and black
  • 42:14 - 42:16
    are the very predictable texts
  • 42:16 - 42:19
    and what they are is acknowledgements.
  • 42:19 - 42:21
    Because of the delay on the GSM channel
  • 42:21 - 42:23
    there is some outstanding
    acknowledgement
  • 42:23 - 42:28
    that still need to be sent
    when the ciphering is turned on.
  • 42:28 - 42:32
    You get empty packets with
    incrementing sequence number
  • 42:32 - 42:35
    and the ACK bit set.
  • 42:35 - 42:37
    Something changes in them,
    the sequence number, but
  • 42:37 - 42:41
    as soon as you can count,
    predicting them is trivial.
  • 42:41 - 42:45
    This definitely works in practice
    like 100% of the time.
  • 42:45 - 42:50
    I never really had any problem with that.
  • 42:51 - 42:56
    So, how do you build a plaintext attack?
  • 43:04 - 43:08
    The general goal is…
    what you're trying to achieve
  • 43:08 - 43:12
    is build a giant system of equations
    that's linear
  • 43:12 - 43:13
    and that you can solve.
  • 43:13 - 43:17
    It's gonna have the form
    A * x = b
  • 43:17 - 43:21
    b is the cipher stream that
    is generated by A5/GMR-1
  • 43:21 - 43:26
    x is some representation of
    the internal state of the cipher
  • 43:26 - 43:28
    that you are trying to find
  • 43:28 - 43:31
    and A is a giant matrix
    that you can precompute
  • 43:31 - 43:35
    that's only dependent on
    the structure of the cipher.
  • 43:35 - 43:40
    Each row of A and b are gonna represent
    one bit of the output
  • 43:40 - 43:45
    and how to combine that internal state
    to generate that cipher stream bit.
  • 43:45 - 43:48
    As I mentioned,
    the initialization process is linear
  • 43:48 - 43:51
    and that's very important
    for two things.
  • 43:51 - 43:54
    First,
    from the internal state of the cipher
  • 43:54 - 43:58
    you're gonna be able to recover the key
  • 43:58 - 44:02
    which is not entirely true.
    There is one bit of the key
  • 44:02 - 44:05
    that you can't recover
    because that bit of the key
  • 44:05 - 44:08
    turns out to have absolutely no effect
    on the output whatsoever.
  • 44:08 - 44:12
    so, you can't recover it
    but doesn't matter either.
  • 44:12 - 44:15
    The other thing is that
  • 44:15 - 44:18
    you're gonna need to
    build a lot of equations.
  • 44:18 - 44:21
    That system is gonna be big.
  • 44:21 - 44:24
    In a single burst of data
    you won't have enough equations.
  • 44:24 - 44:26
    You're gonna need multiple of them.
  • 44:26 - 44:29
    But, of course, those multiple bursts
    are gonna have different
  • 44:29 - 44:31
    frame numbers because they're sequential.
  • 44:31 - 44:34
    They're not gonna wrap before a long time.
  • 44:34 - 44:37
    But since the initialization
    is entirely linear
  • 44:37 - 44:41
    you can actually derive equation
    from one frame number
  • 44:41 - 44:44
    from an equation built out of
    another frame number.
  • 44:44 - 44:50
    But, of course, it would be too easy.
    The cipher isn't entirely linear.
  • 44:50 - 44:53
    so, we have to “linearize” it and
  • 44:53 - 44:57
    when I first started looking at
    a crypto attack on A5/2
  • 44:57 - 45:00
    in the paper it was always
    one line that said
  • 45:00 - 45:03
    oh yeah, we linearize the cipher
  • 45:03 - 45:06
    and they kind of always assumed
    the readers know what that means
  • 45:06 - 45:09
    which I didn't at the time.
  • 45:09 - 45:12
    So, I decided to be a little more explicit.
  • 45:12 - 45:17
    The two nonlinear elements are
    the output function
  • 45:17 - 45:20
    which uses a majority function
    where you take three bits
  • 45:20 - 45:24
    and it outputs whichever
    bit is more common
  • 45:24 - 45:27
    and you can rewrite that
    majority function as being
  • 45:27 - 45:31
    the sum of a plus b multiplied by c
  • 45:31 - 45:37
    or if you wish a XOR b AND c.
  • 45:37 - 45:41
    And the problem is the
    b multiplied by c because
  • 45:41 - 45:44
    that's quadratic, that's not linear.
  • 45:44 - 45:46
    That means that
    we can't introduce that.
  • 45:46 - 45:50
    And the way to linearize that
    is to kind of ignore it.
  • 45:50 - 45:54
    For every possible quadratic term
    that's gonna be generated
  • 45:54 - 45:58
    by that nonlinear output function
  • 45:58 - 46:00
    you generate a new unknown.
  • 46:00 - 46:03
    In reality that unknown
    isn't really unknown.
  • 46:03 - 46:07
    It's a function of your other variables but
  • 46:07 - 46:10
    you can't represent it in a linear fashion.
  • 46:10 - 46:15
    So, you just ignore it and say
    “OK, this is a new unknown, deal with it”.
  • 46:15 - 46:19
    Actually, there is a lot of
    possible combinations
  • 46:19 - 46:22
    which adds 594 new unknowns
    to your equation system
  • 46:22 - 46:25
    which means that you're gonna need
    a bunch more equations
  • 46:25 - 46:28
    to be able to actually solve it.
  • 46:28 - 46:32
    But at least it makes it linear.
    So, there is hope.
  • 46:33 - 46:40
    The other nonlinear thing in that cipher
    is the clocking.
  • 46:40 - 46:44
    As I said, all the LSFR don't advance
    at the same rate.
  • 46:44 - 46:50
    Some are clocked once or zero times
    at each output bit.
  • 46:50 - 46:56
    And this is a function of the value of R4
    at that particular time.
  • 46:56 - 47:00
    So, how do we linearize that?
  • 47:00 - 47:05
    Well, R4 is a 17-bit
    linear feedback register.
  • 47:05 - 47:09
    So, we can know
    all its future states from any…
  • 47:09 - 47:13
    As soon as we have a good state
    at one particular time
  • 47:13 - 47:16
    you can predict all the future states
  • 47:16 - 47:19
    because it's clocked all the time.
  • 47:20 - 47:24
    And in those 17 bits
    the initialization step forces
  • 47:24 - 47:26
    one of those bits to one.
  • 47:26 - 47:29
    So, there is only 16 bits
    that you don't know
  • 47:29 - 47:33
    which means there is only
    65536 possible clocking
  • 47:33 - 47:36
    patterns for the cipher.
  • 47:36 - 47:40
    So, the way we approach
    that is we just brute-force it.
  • 47:40 - 47:43
    You just assume that R4 is
    gonna be equal to that value
  • 47:43 - 47:48
    after the initialization
    and you just assume that
  • 47:48 - 47:53
    and try to solve and crack the cipher
    by using that assumption.
  • 47:53 - 47:56
    And if you find the Kc, well,
    then maybe it's the right one.
  • 47:56 - 48:01
    You just repeat that 65536 times
    and most of the time
  • 48:01 - 48:04
    if your guess about R4 is incorrect
  • 48:04 - 48:07
    the system will end up being inconsistent
  • 48:07 - 48:11
    and you won't actually have any solution.
  • 48:11 - 48:18
    But this still requires solving
    65536 systems
  • 48:18 - 48:22
    which is fast but not that fast.
  • 48:22 - 48:28
    There is actually a trick that we can
    apply to do better guesses.
  • 48:28 - 48:32
    When you build these
    giant equation systems
  • 48:32 - 48:35
    it turns out that some of the
    equations that you construct
  • 48:35 - 48:38
    they are not gonna be independent.
    Some of them are
  • 48:38 - 48:40
    gonna be related to each other.
  • 48:40 - 48:45
    And so some of the rows are redundant.
    And what's nice is:
  • 48:45 - 48:49
    If they're redundant
    we can actually test for that redundancy.
  • 48:49 - 48:53
    For the 65536 possible systems
    that you need to solve,
  • 48:53 - 48:55
    you also have an
    An * x = b
  • 48:55 - 48:59
    there is a bunch of rows in An
    that are gonna be linear dependent.
  • 48:59 - 49:04
    So, for each of them you can
    build a parity check matrix Hn
  • 49:04 - 49:08
    so that you have this nice property
    of a parity check matrix
  • 49:08 - 49:13
    where you multiply b by Hn
    and you get a zero matrix.
  • 49:13 - 49:16
    But all those Hn, you can precompute them.
  • 49:16 - 49:20
    That's the offline data of the attack.
  • 49:20 - 49:23
    You precompute those giant matrices,
  • 49:23 - 49:29
    then you take b which is the cipher text
    you got from the cipher,
  • 49:30 - 49:33
    you multiply it by this matrix
    and if it gives you zero,
  • 49:33 - 49:37
    you know that this particular R4 value
    might be correct.
  • 49:37 - 49:39
    You don't know if it's correct or not,
  • 49:39 - 49:41
    but at least it might be.
  • 49:41 - 49:44
    If the result is non-zero
    then you know that R4 is definitely
  • 49:44 - 49:47
    not the correct R4 and so
    instead of solving an entire
  • 49:47 - 49:50
    system of equations the only thing
    you need to do is
  • 49:50 - 49:54
    do one matrix multiplication
    and that is pretty fast.
  • 49:54 - 49:58
    And in the end you're gonna have
    a couple of possible matches
  • 49:58 - 50:02
    so you're gonna solve like maybe
    3 or 4 systems of equations
  • 50:02 - 50:05
    instead of solving 65536 and
    that's much much faster.
  • 50:05 - 50:10
    Now, how do you go from
    a known-plaintext attack
  • 50:10 - 50:13
    to a ciphertext-only attack?
  • 50:13 - 50:16
    I'm not gonna detail that much
    but essentially
  • 50:16 - 50:20
    the channel coding operation
    introduces redundancy
  • 50:20 - 50:24
    and you can kind of null out that…
  • 50:24 - 50:27
    if you take the last equation you have
  • 50:27 - 50:30
    H multiplied by y plus g.
  • 50:30 - 50:34
    H is a function of the channel coding.
  • 50:34 - 50:36
    You can compute it. It's known.
  • 50:36 - 50:40
    y is the data that
    you received off the air.
  • 50:40 - 50:42
    So, you know it as well.
  • 50:42 - 50:46
    g is also a function of
    the channel coding operation.
  • 50:46 - 50:50
    So, you know it.
    So everything on the left you know.
  • 50:50 - 50:55
    And the last value is H multiplied by A
  • 50:55 - 50:58
    which is what you had in
    the known-plaintext attack,
  • 50:58 - 51:00
    so you know it as well.
  • 51:00 - 51:03
    And everything you don't know
    is x and so, again,
  • 51:03 - 51:06
    you have a linear system of equations
    that you can solve.
  • 51:06 - 51:10
    And the same R4 quick-scan
    method can be applied
  • 51:10 - 51:14
    to scan very quickly and not
    have to solve everything.
  • 51:14 - 51:17
    So, this is the result.
  • 51:17 - 51:21
    If you use the fact that
    you know some plaintext
  • 51:21 - 51:24
    you need about 50 Megabytes
    of data precomputed,
  • 51:24 - 51:27
    you need to receive 8 bursts
    or even 4 if you
  • 51:27 - 51:31
    by any chance have the right
    frame number alignment,
  • 51:31 - 51:33
    and in like 500 ms
    you can get the key back.
  • 51:33 - 51:36
    And that's on a 6 year old laptop.
  • 51:36 - 51:40
    On any recent machine it's really instant.
  • 51:40 - 51:43
    If you want to use the
    ciphertext-only variant,
  • 51:43 - 51:46
    and honestly I really don't know
    why you would
  • 51:46 - 51:50
    because the known-plaintext works,
    so you might as well use it.
  • 51:50 - 51:54
    But if you want it purely
    for academic purposes,
  • 51:54 - 51:57
    then you need 8 bursts,
  • 51:57 - 52:01
    about 5 Gigabytes of data,
  • 52:01 - 52:04
    and then it takes a little longer
  • 52:04 - 52:09
    like about 1 second to recover the key.
  • 52:09 - 52:13
    A few things I would like to
    look at in the future,
  • 52:13 - 52:15
    is C-band.
  • 52:15 - 52:20
    The satellite has to talk back
    to the earth station
  • 52:20 - 52:23
    and this happens
    in a different frequency band
  • 52:23 - 52:25
    and this is what we would like to look at,
  • 52:25 - 52:27
    essentially, the feeder link
  • 52:27 - 52:30
    If any of you have a large dish,
    satellite dish,
  • 52:30 - 52:33
    and want to provide data for us
    please contact us,
  • 52:33 - 52:35
    we're looking for it.
  • 52:35 - 52:38
    We're also started
    looking into packet data
  • 52:38 - 52:41
    and some other stuff.
  • 52:41 - 52:45
    A quick note about
    other satellite phone systems,
  • 52:45 - 52:48
    please don't assume that
    just because Thuraya GMR-1
  • 52:48 - 52:51
    is broken that the others aren't
    unless you have
  • 52:51 - 52:53
    actual proof that they aren't
    because the only
  • 52:53 - 52:56
    reason we chose Thuraya is
    because the phone was
  • 52:56 - 52:58
    the cheapest on eBay.
  • 52:58 - 53:02
    So, given that all the satellite
    phone systems also…
  • 53:02 - 53:06
    you know, you can find commercial
    interceptors for them,
  • 53:06 - 53:11
    there is a big chance that they're
    not much more secure.
  • 53:11 - 53:14
    I like to thank a few people:
  • 53:14 - 53:19
    Dimitry Stolnikov,
    the main other author in osmo-gmr
  • 53:19 - 53:23
    doing a bunch of the RF capture stuff,
  • 53:23 - 53:28
    Dieter Spaar
    for reverse engineering of the AMBE codec
  • 53:28 - 53:30
    and all the help he provided there
  • 53:30 - 53:33
    and the RUB team, of course,
    for their support in
  • 53:33 - 53:35
    reverse engineering the cipher,
  • 53:35 - 53:40
    and the organizer for having me
  • 53:40 - 53:43
    and thank you for your attention!
  • 53:43 - 53:47
    Applause
  • 53:47 - 53:52
    - Thank you very much!
    We now have time for a few questions.
  • 53:52 - 53:57
    First, are there questions
    from the Internet?
  • 53:58 - 54:01
    Not yet. So,
    if you have a question, please
  • 54:01 - 54:03
    approach one of the microphones.
  • 54:03 - 54:06
    Yes, please, microphone number 3!
  • 54:06 - 54:13
    - Yes, I have a rather not technical
    but political question:
  • 54:13 - 54:18
    So, you said in the beginning
    there was like a short
  • 54:18 - 54:21
    way of communication that
    only the satellite transmits
  • 54:21 - 54:25
    the voice data, so there is
    no ground station involved?
  • 54:25 - 54:30
    - Yes. - Do you think that this
    might upset some of the people
  • 54:30 - 54:34
    who like to intercept calls
    because it makes intercepting
  • 54:34 - 54:36
    the voice data much more complicated
  • 54:36 - 54:39
    than just going to ground station and say
  • 54:39 - 54:41
    “I want to listen to this”?
  • 54:41 - 54:44
    - First, just because the satellite
    will send the data back
  • 54:44 - 54:46
    directly to their mobile phone
  • 54:46 - 54:49
    doesn't mean the satellite
    can't also provide a copy
  • 54:49 - 54:53
    of it to the earth station.
    It just doesn't go through.
  • 54:53 - 54:56
    But the satellite can obviously
    send a copy of it if they want to.
  • 54:56 - 55:00
    - And probably there will be a
    possibility for the ground station
  • 55:00 - 55:02
    to say
    “Don't do that direction connection”.
  • 55:02 - 55:06
    - On the other hand they can also
    just intercept it off the air
  • 55:06 - 55:08
    because it's so easy to listen to it.
  • 55:08 - 55:11
    Last year there was a talk with somebody
  • 55:11 - 55:14
    that looked at satellite pictures
  • 55:14 - 55:17
    and they saw that one of the
    spy satellites was relocated
  • 55:17 - 55:22
    like right next to the
    Thuraya-2 satellite.
  • 55:22 - 55:25
    You can look that up.
    It was a presentation last year.
  • 55:25 - 55:29
    It was pretty interesting
    to see that… yeah…
  • 55:29 - 55:31
    If you want to listen to it you can do it.
  • 55:31 - 55:33
    - Thank you!
  • 55:33 - 55:37
    - Next question from microphone number 2!
  • 55:39 - 55:46
    - I've two questions: Can you hear
    both sides of the conversation
  • 55:46 - 55:50
    or only one side of the conversation?
    The second question is:
  • 55:50 - 55:53
    How big of a dish do you need?
  • 55:53 - 55:55
    - If you want to receive L-band which is
  • 55:55 - 55:58
    just listen to the communication
    from the satellite to the phone
  • 55:58 - 56:02
    you don't need a dish,
    you need like a small antenna.
  • 56:02 - 56:06
    You can even do it with a modified
    GPS antenna and a piece of metal.
  • 56:06 - 56:10
    So, it's really easy and
    really cheap to do.
  • 56:10 - 56:13
    You can only hear the the
    part of the conversation
  • 56:13 - 56:16
    that comes from the satellite
    to the phone.
  • 56:16 - 56:20
    So, if by any chance that guy
    is speaking to another phone
  • 56:20 - 56:24
    you can hear both sides
    because you can intercept both
  • 56:24 - 56:27
    channels simultaneously.
  • 56:27 - 56:30
    But in general, you either
  • 56:30 - 56:34
    only hear one side or you can
    also be close enough to the
  • 56:34 - 56:39
    actual satellite phones
    to receive the uplink
  • 56:39 - 56:43
    that goes from the phone to the satellite.
  • 56:43 - 56:46
    Dimitry showed me like a
    commercial intercept system
  • 56:46 - 56:51
    where apparently they use like
    planes to intercept the uplink.
  • 56:51 - 56:54
    They just fly over the area of interest
  • 56:54 - 56:59
    so they can more easily receive
    the uplink from the phones.
  • 56:59 - 57:01
    - One final question. Please make it short
  • 57:01 - 57:03
    because we are running out of time.
  • 57:03 - 57:05
    - I'll try my best.
  • 57:05 - 57:08
    Launching your own satellite
    is certainly not an option
  • 57:08 - 57:11
    but do you see any practical applications
  • 57:11 - 57:16
    in terms of what you did with GSM?
  • 57:17 - 57:21
    - Not really. It would be
    interesting to try start a net
  • 57:21 - 57:24
    because there is nothing that says
    that you have to be on a satellite.
  • 57:24 - 57:28
    You could technically start a base station on earth
  • 57:28 - 57:32
    using that protocol and it would work.
  • 57:32 - 57:35
    And the phones have much more power
    than a GSM phone.
  • 57:35 - 57:38
    But other than that,
    you could build an IMSI catcher
  • 57:38 - 57:41
    for satellite phones if you wanted to
  • 57:41 - 57:44
    and you'd definitely beat the
    signal strength of the satellite
  • 57:44 - 57:47
    with your own setup.
  • 57:47 - 57:48
    - Thank you.
  • 57:48 - 57:54
    - Please give the speaker
    another round of applause!
  • 57:54 - 58:06
    subtitles created by c3subtitles.de
    Join, and help us!
Title:
Sylvain Munaut: osmo-gmr: What's up with sat-phones ?
Description:

http://media.ccc.de/browse/congress/2014/31c3_-_6267_-_en_-_saal_g_-_201412271600_-_osmo-gmr_what_s_up_with_sat-phones_-_sylvain_munaut.html

At 28C3 we introduced the very first steps of the osmo-gmr projects. During this talk, we will present the various advances that have been made in this project on various aspects (voice codec, crypto algorithm, ...)

Sylvain Munaut

more » « less
Video Language:
English
Duration:
58:06

English subtitles

Revisions