33c3 prerol music Herald: As mentioned before, Internet of Things, it would be great if it would work and one big part of Internet of Things is the Internet part. So stuff has to talk and cables are shit. So we use Wi-Fi and other wireless protocols. So our next speaker is going to take a very close look at the physical layer of LoRa, a low power wireless area network, and he built some stuff to actually sniff what's happening and inject stuff. And apparently he offered his sacrifices to the gods. So we'll see something. Please give a warm round of applause to Matt Knight. applause Matt Knight: Thank you for that warm introduction and thank you all for coming. I'm really excited to be here. So for the next hour or so, we're going to be talking about the LoRa failure. And LoRa is a low power wide area network, wireless technology that is designed for the Internet of Things. So first, a little bit of background. Myself, a software engineer and a security researcher with bestial networks, I have a bachelor's in engineering, electrical engineering and better systems from Dartmouth. But really, my interests are in applied RF security research. So that means everything from reverse engineering wireless protocols to developing functional based bands and software and HDL and also all the way up to software networking stocks. So all these things are interesting, interesting to me, but I'm really excited about the material we're going to talk about today. So before we get started, there aren't going to be any like zero days or traditional security related exploits here. But we are going to take apart a cutting edge wireless protocol. Let's talk about why that's important in a minute. But first, I'd just like to survey the room and get a sense for who's here so I can figure out where to spend more of my time. So if you'd be so kind as to raise your hand if you've heard of software defined radio. That's a lot of hands. That's great. OK, how about raise your hand if you know what is best for you, transform is awesome. And how about a symbol in the context of wireless wireless systems? OK, cool, this we're going to do well, this is going be fun, so why why is this sort of network forensics interesting or why is it relevant? Why is this important? The Cisco Internet Business Solutions Group has a figure that I really like that states that by 2020 they're going to be 50 billion devices connected to the Internet in some way. As we know, with the growth of mobile and the Internet of Things, fewer and fewer of those devices are connected with wires every year. And as we know, tools like Wireshark and Monitor Mode weren't always a thing, even for common interfaces like Wi-Fi and able to 11. Those those tools that we come to rely on every day exist because somebody thought to look below the layer they had and make it. And I believe that low level security, low level access to interfaces is essential for an enabling comprehensive security on various interfaces. So we're going to begin by discussing L.P winds at a high level and then we're going to do a little bit of a background on some technical radio concepts just so we can level out our domain knowledge and inform the rest of the conversation. Then I'm going to take you through my recent reverse engineering of the law failure that was powered through separate fun radio. And finally, I'm going to give you a demo of this tool called Jaala that I've made. That is an open source implementation of of this FI that will enable you to begin doing your own security research with it. So to begin, what is LoRa, what is this thing? It is a wireless Iot protocol and Iot is in red because some of us are are marketers. We're all engineers. We know that this is a dirty term. Right? Iot is really code for connected embedded devices and there are tons of common standards for embedded systems already. Everything like ITOCHU 54 and all of its friends like Ziggy and six Lappin, Itochu, eleven wi fi and then also more common things like Bluetooth and Bluetooth, low energy. And the list goes on. Right. We've got all these standards. What is wrong with them? Why don't we just use just one of these existing ones? Well, all the ones we just mentioned all require some degree of local provisioning. You need to connect your device to in side or hook your your Zuby device up to a coordinator in order to get a communicating. Some of them require gateways to talk out to to the Internet. And in the case of eight to 11, it's very power intensive. So you can't run a device for a long time on a battery. So what's ideal? What about cellular cellular works everywhere? It's easy to install. You don't have to worry about any hardware on premises. As long as you can talk to a tower, there could be miles away. You're good to go. Well, it's power intensive and in the case of certain types of the standards, they're going away. And I'm talking about to give us an edge service in in the United States. AT&T, one of the largest carriers, is saying they're going to sunset their 2G network in about three days in Australia. This has already happened. Telstra, which is one of the largest telecom companies in Australia, sunset their GPS service earlier this month. And all the other major carriers are soon to follow. So 2G is is works everywhere. It's very battery conscious and it's fairly cheap. So this is exactly what the Internet of Things needs to to power its communication standards. Now, say you're a developer and you want to move on to a new wireless standard that won't, you know, deprecate in three days you can either go to 3G or more modern cell stack, which which comes with a more expensive radio and harder power requirements. Or you can wait for the 3G up, which is the standards body that makes and maintains the cellular standards to come out with their Iot focused, with their Iot focused standards that are currently in development. And the indications that I've gotten state that those won't be ready until the end of next year, really at the earliest. So it's gonna be the end of twenty seventeen at the beginning of twenty eighteen before we start to see these things in the wild, which means that until then there's a massive hole in the market. So if you want to, if you want to develop a embedded system that requires this type of connectivity, you're going to have to look elsewhere. And that brings us to the topic of low power, wide area networks. And you can think of these networks as being just like cellular, but optimized for Iot and M2M communications. The architecture is almost exactly the same and that you have a network of base stations or gateways worldwide and then end nodes uplink directly to those base stations without any meshing or routing among themselves. It's just like a star network. Basically, you have these nodes, the connect directly to the base station and they have a range on the order Miles. It's a very similar topology to cellular. There are tons of standards that are there are popping up more and more every day. But the two that have the most momentum are LoRa and Sigge Fox. There's been a ton of investment in both of these technologies, actually. Just last month, LoRa Ersek Fox closed a hundred and fifty million Euro Series F, some late stage funding round in the Wall Street Journal, wrote an article recently that stated they were investigating a U.S. IPO soon. Additionally, Senate and activity, two of the biggest backers of the wharfie have raised a combined fifty one million dollars in the last year or two, so one from raising one hundred fifty million dollars, they're absolutely going for it. They're investing like crazy in these technologies. So when we say that these networks are optimized for the Internet of Things, we're really talking about two things. They're battery conscious sic. Fox advertises that they can get up to 10 years of battery on the amount of energy and a single AAA battery and their long range. And if you turn all the knobs and LoRa just right and have a perfect noiseless channel, they advertise that you can get thirteen point six miles on one of these very long range devices. And if you compare that with, you know, some of the standards we talked about earlier, that's pretty competitive. So how how do they do that? How does that work? Well, they've designed the entire system around the fact that they're willing to accept compromises in the protocol and the functionality of these devices. When I talk about compromises, I'm talking about aggressive duty cycling, both transmitting and listening, very sparse data, grams, so tiny packet sizes. And they're highly limited, meaning they can't send that many packets that often. Now, for example, signal limits. This is built into the FYE limits devices to 140 12 byte data grams per day. That's like that's like nothing. I think that's less than like a look at to you. It's tiny now and then weightless in another LP when standard is uplink only. So it can only send messages up to Gateway but can't receive any downlink. So for example, if you had a device deployed, you can never deliver firmware to it later unless you rolled a truck to it or climbed up the telephone pole to where it's mounted. And finally, LoRa, classi devices can only receive downlink for a brief window after they uplink. So if you're if you're an application operator and you want to send a message to a device you have in the field, you have to wait for that device to call home before you had your brief window to tell it what you want. So these systems are built around compromises, but that's what enables them to get some pretty incredible performance. All right. Let's get into the details with LoRa. So LoRa is an LP when it's developed by some tech, which is a French semiconductor company. Biffy was patented June in 2014 and LoRa when McCan network STAC was published in January of 2015. So this this entire standard is less than two and a half years old. It's brand new and it's supported by an industry trade group called the LoRa Alliance, which has tripled in size every year since its founding. So growing quite a bit before we move on. Just want to clear up some nomenclature that will help us focus in on what this talk is going to center on, and that is disambiguate. LoRa and LoRa, when LoRa refers strictly to the player, the physical layer of the standard, LoRa when defines a Mac and a networking, some upper layer stacks that right on top of LoRa, the LoRa Wanne standard, the upper layer has been published and that's public. But the FIGLIA itself is totally closed. So the LoRa, when upper layer stack gives some information about its topology, it's kind of interesting, suggests that they were really thinking about security when they designed it. There are kind of four stages in the network all the way out in the field. On your sensor you have the node and that connects to Gateway over a wireless link. That's the LoRa link. And then once you get into the gateway, everything from there up is all on. It's all on IP networks, just standard commercial IP networks. And then they have roaming that works on different networks. So you'll be able to take your device and move to different areas of coverage and have it all play nicely. And then you can hook your application server up to that as well to receive packets to and from the network servers. It's all over IP and they actually went as far as to define two different mechanisms for encrypting it. There are two different keys. You have the network key, which goes from the which covers from the node up to the network server, and then you have the application key, which is actually fully end to end. It goes from the end device all the way up to the to the application server. So if you design that right, the network should never see your traffic unencrypted. And they also provide a mechanism for having unique keys per device. It's built into the standard, but it's not required. So it's still up to the implementor to to do that and get that right. So there are some good thoughts that went into security with lawin. However, that's not what we're talking about today. That's all we're going to say about lawin. We're just going to tell you it exists that it rides above LoRa, but we're not going to go into any more detail than that. So from here on out, it's all LoRa all the time. We're just talking about the file here. So let's get into what makes that really interesting. One of the big defining features of LoRa and Cig Fox, the two biggest LP wins, is that they're designed to use what are called isman spectrum. That's what's called in the United States. It stands for industrial, scientific and medical. And what's cool about these these bands is they're what are called unlicenced, which means that you don't need a specific license from the FCC or you or your telecom regulation. Authority to operate on it. So if you go and you buy any Wi-Fi router on Amazon, you take it home, you plug it in, you don't need to then go and apply for a specific license to to be able to communicate on it because it was built to a certain standard. It is compliant with those unlicensed band rules and therefore can just work. So these these devices use that same spectrum, but to much greater effect, much longer ranges in a much different use case. So that's quite novel. And some other things that use these technologies are, you know, wi fi, Bluetooth, cordless phones, baby monitors, things like that. So you can think of this as occupying the same space in the spectrum as these. Now, why is this noteworthy, well, contrasted with the cellular model where cellular technologies use what is used protected spectrum, where you have to have specific rights to transmit on it in order to to legally use it. And regular regulatory authorities sell the spectrum for fortunes. But billions of dollars is what the spectrum sells for in the US. I'm sure it's the same over here. And I just want to call your attention to how expensive this is on the left here we have a picture. It's an excerpt from a document that I found that was related to the RFQs TV white space reverse auction. They're trying to repurpose a lot of spectrum that used to be used for digital TV. They're selling it off. And if you want to come in and buy some really prime low UHF spectrum to use for whatever purposes you have, mind you, this is just one TV station in the New York area. You can get out your checkbook and write a nine hundred million dollar check and take over CBS TV in New York. So getting into the cellular cellular game is crazy expensive. It costs a fortune. But there are a lot of us in here. Maybe we can pass the hat and and buy some spectrum at the end of this. So as a result of this unlicensed nature, there are a number of different models of commercialization that are starting to emerge. We have the traditional telecom model we're seeing through companies like Senate, which is a company that deploys home heating, heating oil tank monitoring solutions in the United States. They're also opening the network up for Iot applications to right on top of that traffic as well. And you'd operate with them just like you would operate with like Verizon or AT&T or Deutsche Telekom or whoever whoever you work with here. Also interesting is I believe it's CPN has rolled out Laurer, a commercial or network lawin network throughout the entire region of the Netherlands. So countries entirely covered with LoRa. So that's the commercial side in the middle. We also have crowdsourced networks. The one that I like to talk about is this group called the Things Network, where basically they have defined in the cloud the network server architecture for operating a worldwide lawin network. So if you want to provide, Laurieann, service on the Things Network in your your area, you can get your hands on Allura Gateway pointed at their network servers and basically become become a base station in their network from your living room, which is kind of cool. So it can kind of spread and grow organically based on the needs of of people like me and you who want, you know, the sort of service. Then finally all the way up at the up at the kind of independent amateur side, we have people like Travis Goodspeed and some of his friends that are working on a technology called LoRa Him. And that's leveraging the fact that you can actually get more radios that work in workaround for thirty three, which is in the I think it's the 70 centimeter hand band in the United States. So you can actually put a reasonable amount of power behind LoRa into tech based communications in the clear. So they're developing a Allura base mesh networking system for doing basic like ASCII packet radio and communicating. It's not public yet, but I like Pete. He's blessed me to come and tell you that he's that he's working on this and it should be out soon. So there are all sorts of different ways to use these technologies. So this is a very different paradigm, which we're used to. And it's opening up lots of different opportunities for how this technology might be used and grow. OK, so that wraps up our background on LoRa. We're about to get into some really technical stuff, but before we do, I want to go through a very short crash course on some basic radio fundamentals to try to even the playing field so that we can all understand this. And I call it the obscenely short radio crash course. But with apologies to any use a real telecom whizzes in the room. I think this is probably more appropriate. We're going to we're going to blow through this material. And I'm just going to try to pick out a few points that are really essential to understanding the rest of this talk. I'll tell you what's important and just try to grab those concepts and we'll reiterate them later as we go through it. So, again, we're going to be talking about the physical layer. And if you think about the Ossi data model that we've all seen, the physical layer refers to how your bits, your data get mapped into physical phenomena that represent them in reality. And when you're dealing with wireless systems, the mapping maps, the bits into into patterns of energy in an RF medium, RF sensor radio frequency, and there it's basically electromagnetic waves or energy that is just everywhere. And you can manipulate RF by using a device called a radio. And radios can either be harder to find where the RF kind of mechanics and the protocol are baked into the silicon and are inflexible. Or you can use a software defined radio where you have some very general, flexible silicon up front that basically just grab some raw information and feeds it to some sort of a processor, which can either be a traditional CPU or an FPGA to implement some of the more radio specific things and has come a long way in the most most recent few years. And it's now incredibly powerful. So we're going to be talking about both harder to find radios and tougher to find radios throughout this talk. So if you put together a radio coherently, you can start to develop it into a fire. And a fire has a has one main component or several components. But one of the main components is this notion of the modulation in the modulation is the algorithm that defines how your digital values, your bits are mapped into RF energy. And there are a few parameters that we can kind of tweak to do that. And those are amplitude frequency and phase. And then we can put them together and use some combination of them as well. In modulators can modulate either analog or digital information. But we're going to be talking about modulating digital information today. And an essential concept with that is this notion of a symbol. This is something it's very important to remember. And the symbol represents a discrete RF energy state that represents some quantity of information. So it's discretely sampled. And just think of it as being like a state in your RF medium. That means something. And will illustrate this in just a moment. So here we have two pictures of two different modulations. And I just want to put these up here to help you maybe get a grasp on what a symbol looks like. So on top, we have Frequency King, where you can see your signal is alternating between two frequencies. When it's on the left, it's swelling on one frequency. When it's on the right, it's dwelling on another frequency. Which symbol is present is based on where basically what frequency that signal is on at a discretely sampled moment in time. So you could think of this as being like, you know, it's a zero when the signal is rolling on the first frequency, the one on the left and it's one. And the signal is dwelling on the right frequency frequency, too. And you can see the see the analog with the bottom modulation off keying where the signal being present represents the one in the signal being off represents a zero. So hopefully that helps you get a grasp of what it is that we're talking about. There are, of course, more complicated Iot fires. We have spread spectrum where data can be basically chipped at a higher rate. It'll occupy more spectrum, but it makes it more resilient to noise. And then we have some technologies to do that, like eight to 15 for us, one that uses a spread spectrum mechanism. So we talked a bit about radios just a moment ago. We're going to use two different kinds of radios when when going through this talk. First, we have a harder to find radio, which is a microchip. LoRa, are in two, nine and three module. And this is basically a death word that has a harder to find lower radio built on to it. So this is going to be a transmitter they're going to be targeting. And then finally, a receiver is the software defined radio right here. This is an ETA USCAP B to ten. It's just a commodity software defined radio board. And basically what this thing does is it gets raw RF information from the air, serves it to my computer so they can start to work with it. With commodity tools like Python, I can do radio, things like that to start to process it. One less thing to cover is the fast forward to transform the Esperia transform basically takes a signal in decomposes it into all of the the the smaller signals, the some carriers, the composite and any periodic signal can be models of some of harmonic sine waves. So basically the FFT takes any signal and unravels it into the components. And why we care about this is it takes it's basically a very easy way for analyzing and visualizing signals in the frequency domain. So when we put it take a bunch of 50s and put them together, we get this picture called a spectrogram where you have time in the the ones we're going to be looking at all the time in the Y axis frequency in the Z axis and then sorry, frequency in the X axis and power in the Z axis. So the intensity of the color is how how powerful that component is at that instant in time. So here you can start to visualize all the different signals that are present. OK, raise your hand if you're an expert. I see a few heads. Hopefully this is all that we're going to need. I'm going to reiterate some of these concepts as we go through. So I really hope that doesn't doesn't alarm you, son. You're running for the door. It's going can be very visual as we go through it. And hopefully the graphics will help keep this all grounded. So let's get into the meat of how this Laurer fireworks. LoRa uses a really neat proprietary fire that's built on a modulation called chirp spread spectrum success for short. Now, what is a chirp? Chirp is a signal whose frequency continuously increases or decreases. You can think of it as being like a sweet tone. And if we visualize it, using a spectrogram is before it looks kind of like this. In this case, we have a finite amount of bandwidth and the frequency either increases or decreases. You can have up chirps or down chirps until it reaches the end of its band. And then it wraps around back to the bottom, back to the beginning and continues. So here you can see that the frequency that the first derivative of frequency is constant. So the frequency is always increasing or decreasing at the same rate. And then when it hits the end of the band, it just wraps it keeps going. So why use something like success, it has really great it has properties that make it really resilient to noise and very performance, low power. So all these things with Iot focused radios and having having very long battery life, these are properties that lend directly to that sort of efficiency. It's also really resilient to multi path and Doppler, which is great for urban and mobile uses. So this is an interesting set of sort of features here. Where else do we see chirps radar is. I just heard it. Thank you. Yeah. Radar is a really common common usage. And you'll see military marine radars sometimes refer to chirps as wide band or pulse compression if they're using chirping in the radar scheme. And they're also used for scientific over the horizon radars as well. And there's an open source project called the New Chirp Sounder that has some some features like that for for visualizing these over the horizon scientific radars. And also in a past life, I worked on a scientific radar called Super Dhan, which is a similar over the horizon radar for visualizing ionospheric activity. Cool. So that's a little bit of background on the technology here. So this is kind of my journey into into starting to work with LoRa here. In December. Twenty fifteen, I joined this company, Bestilo, where I'm currently. And on the research team we have these weekly meetings where we get together and we look at new either new R.F. techniques or protocols, things that are interesting. And we basically just have a deep brainstorm on how they work. And and what's interesting and the first meeting that I participated in, it was the first week that I joined. They were mentioning they were talking about these L.P technologies. They sounded pretty cool. So we broke for Christmas. So I went back to to New York where I'm from, and, you know, brought my radio and sort of poking around and seeing what I could find. And my colleagues looked in San Francisco, Atlanta, and I also worked in Boston. I was there, too. And we didn't see LoRa anywhere in December. Fortunately, a few weeks later, I was I was at a meetup and I encountered this company, Senate. I was living in Cambridge, Massachusetts, at the time. And they were talking about their their home heating oil monitoring network sounded pretty cool. So I looked him up later and was watching one of the marketing videos. And there was like a two or three second bit where you could see one of their technicians operating a computer. Right. And they put up this picture and this looks just like a coverage map. Right. So, you know, this could be fake data or it could be live. And I took a bit of a closer look and I realized where that is. That's Portsmouth, New Hampshire. That's like an hour away from Boston. So there's really only one thing to do. So I hop in my car, I drive up to New Hampshire, to Maine border, and there's, you know, me behind the wheel, my Saab with the USPI on the dash. And after about ten minutes in the Marriott parking lot across the street from there from their headquarters, we have our first sighting of LoRa in the wild. There it is. It's the first signal I recorded. So let's take a closer look at what we have here. So if we look at the top third of the picture, we have a series of repeated up trips. You can see the signal is just continuously increasing until it hits the band and then it wraps and continues. And knowing what we know about digital communication systems, most of them have some notion of a preamble or training sequence to tell a receiver that, hey, heads up, you're about to get a packet. So probably with that is following that, you can see the chip direction changes right in the middle and you have two and a quarter downtowners. And this looks like a start, a frame delimiter or a synchronization element. So this tells the receiver, hey, heads up, preambles over. You're about to get you're about to get the data. You're about to get get the payload here. And finally, you can see the chip direction again, changes to the up chirps. But this time the chirps are kind of choppy. You see, they jump around throughout the band, you know, just kind of arbitrarily. It's not arbitrary, though. That's actually the data being encoded into the fire. So here we can see that the chirp frequency, that is the first derivative of the frequency, the rate at which the frequency changes remains constant. Right. However, the instantaneous frequency may change within the band. So you may have these jumps, but remember that the rate at which it's changing is always constant. You can just have those discontinuities in those instantaneous frequency changes represent data being modulated onto the chirps. You can kind of think of this as being like a frequency modulated chirp with an FM signal. You have a static carrier, a carrier at a fixed frequency that you're modulating to produce that signal. The modulated signal here we're modulating a chirp signal to produce the to produce that. So rather than having a fixed frequency that you're modulating your modulating this continuous chirp. Cool. So let's get our hands dirty. Let's figure out how this thing works and start to pull some data out of it before we dove into the modulating it, let's take a look at what we know through some open source intelligence. And using open source intelligence is a great way to really kind of shortcut the reverse engineering process. Because otherwise, you can you can wind up doing a lot more work than you have to. So there are a few things that are really useful. We'll talk about these as we go through this. This material first thing we found. First thing I found was the Simsek European patent application. It was in the EU market, but basically defined it modulation. That looked a lot like what Lura could be. That's the number if you want to look it up later. But that had some pretty good information in their final year. Secondly, we have the law of the law when spek. And again, that's the layer to add up spec that's open, not the PHY, but it still has some references and define some terms that are likely going to be analogous to the file. So it's still pretty useful. And finally, we have two application notes from some tech that were pretty juicy. The first one and there are the the 18 one contained a number of reference algorithms for implementing a whitening sequence, which is like a scrambler. We'll talk through that or we'll talk about that momentarily. And then twenty two had just a general overview of the fine, define some terms. Also, there was some prior art online. There was a partial implementation in RTL Strangelove that didn't really seem to be maintained. It seemed pretty neglected and I never really got it to to do anything at all. But we're still good to look at and had some really good hints in there. And then there were also some very high level observations in the FI in this wiki page based an else decoding LoRa. It was mostly just like looking at the spectrum and seeing that it's a chirp modulation and example recordings and things like that. So from this documentation, we can start to pull out some definitions defined. We have the bandwidth, which is how much spectrum the chirp can occupy, the spreading factor, which is the number of bits encoded symbol. And remember, the symbol is it's just an RF state rights, the number of bits in each RF state within the modulation. And then finally we have this thing called the chirp rate, which we've kind of hinted at. It's the first derivative of the chirp frequency. So the rate at which that that chirp signal is is constantly changing. And we can pull some numbers out of this documentation to define those. So we actually have have some common constants for the first two. And then we find a formula in one of those documentations that states the rate is a function of those first two. And since there's a finite number of values there, we can start to iterate and just try all the different frequencies and start to find one that that works. So in this case, what is the symbol we've talked about how how this modulation is basically frequency modulated chirps. Right. So what we're going to try to do with these demodulator is quantify exactly where the chirp jumps to whenever we have one of those discontinuities. So let's start working through it here. There are really three steps we're going to we're going to achieve. We're going to identify the preamble, which is the beginning of the frame denoted with the one we're going to find the start of that of the FI data unit by look, by looking in, synchronizing against the sink word, which are those downshifts that are there. And then finally, step three is we're going to try to figure out how to extract the data from these instantaneous frequency transitions. And to do that, we need to quantify them. Now, there's a technique that I found pretty early on. It was enormously helpful for doing this, and that is to transform the signal by describing it. And we'll show you what the result is in just a moment. But first, we're going to have to do some math. And math doesn't read because it's scary, but it's it's not really it's actually pretty easy. So there's a basic basic property of complex signals that states that if you multiply two signals together, if you multiply two signals together, the resulting signal has the frequency of the frequency of each of the components added together. And from that, if we multiply a signal with one frequency against the signal that has the negative value of its frequency, the result is zero. We get a deep we get a constant signal and we're working at baseband here, which means the center of the band is zero hertz so we can see negative frequencies and things like that. So if you multiply an up and down chirp together, what do you get? You get constant frequency. Now why do I say constant frequency rather than DC? If the troops are out of phase with one another, there might be an offset from from zero hertz there. So so it might not be perfectly aligned with zero hertz. We might do expect to get some offset there. So what happens if you multiply a chirp signal like this separately against an up chirp and it down chirp. So to do different two different operations produced two different products. What do you think is going to happen? Well, if you do that, you get these pretty pictures right here, so here you can see those those there's really kind of tricky diagonal chirp signals that are cutting all of your spectrum, are hard to measure, are translated into these nice, you know, nice signals that are aligned in time. And that looks like something we can start to really work with and do something with. So we need to quantify those. So, again, remember symbols, we're going to keep coming back to this. It's an hour of state. The results represent some number of bits and the law, LoRa, has this value called the spreading factor that we found some of the documentation that defines the number of bits encoded for symbol. And from the picture we saw a little bit earlier, the common values are seven through 12 or six or 12. You see you see them both in different markets. So from that, how many possible symbols to be expressed? There can be? Well, each bit can have, you know, two states is your one. And there are spreading factor number of bits. The number of symbols is two to the spreading factor. So how can we start to quantify these these symbols and start to pull them out of the fire? So the steps that I found that were that were the trick to this were to channelize and resample the signal to the bandwidth, decrypt the signal with the look of the signal with a locally generated chirp we just talked about. Then we're going to take a fast Fauria transform that signal where the number of bends of the 50 that we compute is equal to the number of possible symbols. And we'll illustrate this momentarily. And then if we do that correctly, then the most powerful component in that Pesquería transform, that is the strongest component frequency that we get back from that operation is the symbol that we're looking for, somebody chirping it. We get it into a form where we really expect her to only be one strong component per FFT, whereas if we didn't ditch it when we took the 50 of of a chirps worth of symbols, we would see the energy kind of spread all throughout, all throughout all the different bits. But by describing it correctly, all that energy gets pushed into one bin and we get a single but clear value out of it. So if we do that, we get a picture that looks like this in here at the Z axis again, is the is the intensity, the power present. And we expect that to be the symbol that we're looking for. And here it's aligned in time with the base chip on the left there. So here are the steps again. We mentioned this earlier. Let's look for the for the preamble. Right. What's a stupid, simple algorithm for finding this? Let's do it. Let's do it at 50 and let's look for basically the most powerful component being in the same bin for some number of consecutive Fatty's easy fighting. The SFD is the same thing. But again, this time we're going to do it on the opposite ditcher product. So when we did it, we get back to different streams. We get one of the D chirped up, chirps in one of the D chirp downstairs so we can look at the opposite stream and do the same algorithm looking for the the safety here. Important caveat. Accurately synchronizing on the Safdie is essential for getting good, good data out of this, this modulation, because if you have a bad sync then you can wind up having your bisley, your symbols, the samples that comprise your symbol spread between multiple adjacent fêtes if that happens and you get incorrect data. Now let's illustrate what that looks like. If you look at rows thirty nine fifty, you can see that visually it's almost impossible to tell which of those two readings represents the symbol. You see, there are two different values that are really powerful. That's the result of basically basically half of the samples from one chirp and basically half of the sample from Chirp N and then half of the samples from sample from chirp end plus one wind up in the same FFT. So when we do it, we get those two components in there. And it's really it's really ugly and hard to work with. So we can solve this by using a technique called overlapping Mufti's when looking for our safety synchronization. And basically what that means is we're going to process each sample multiple times with the effect of getting better resolution in time of our resulting Mufti's. It's more computationally intensive, but it gets us much better, better fidelity here. So if we do that, this is what the result looks like. It's a little bit hard to see right now. I'll get you a better picture in a moment, but basically it's much less ambiguous in terms of which symbol is present. So if we use those overlapping 50s, we can synchronize on that SFD. And then once we know exactly where the first symbol of the data unit is and our buffer, we can go back to using non overlapping Mufti's, which are more computationally more computationally efficient. And get us a nice read on the right here. You can see that again, if we look at lines thirty eight and thirty nine, that ambiguity is gone. Right. You can see exactly where the most intensive were, the most intense binnaz and therefore which symbol is present. And here's the whole frame synchronized. So we got the collisions on the left and doesn't look that great on the right it's much clearer. Cool. So again we recompute more computationally intensive and then we get out data. Now, one last thing we have to do to wrap up the modulation. So doing this again, remember, we were talking about the chermayeff, if our troops aren't perfectly aligned, then then the resulting deterrence signal might not necessarily be off of the same reference. Right. And of course, we don't know what chirp was used to generate the signal on the transmitter. So we have to find some way of normalizing this data to account for that that that first discrepancy. And we can do that by referencing the preamble. And it just so happens that the preamble, when you do it, always represents simple value zero. So you can basically just do a modulo operation on your receive symbols to rotate that back. So all the symbols are referenced off of the preamble and you're good to go. And that's it, right. Not even close. We're just getting started, people. Why is that? Because the data here is encoded. What is encoding? Basically encoding is a transformation that is applied to the data before it's transmitted. Why would you do something like that? Because encoding increases over the year. Resiliency. Why? Why is this necessary? Right. Remember that we're dealing with unlicensed spectrum. Right. This is what the nine hundred megahertz band, which is what LoRa uses in the United States, looks looks like look at all that stuff. It's not LoRa, right? That stuff is there to ruin your day. It's there to create all sorts of interference and make your receiver not work the way you expect. So RF is a really brutal environment. There's all sorts of interference. And basically the encoding is a way of treating your data so that even if you have a non ideal reception, you can still get the data out of the frame. So what do we have here? Remember that LoRa's clotheshorse, we have some material that's available through data sheets, but we really don't know for sure definitively what's in this file. So, again, we're going to go back to open source intelligence to figure out what we know and then try to narrow in on how we're going to iterate through this and figure out how it works. So from the patent, we have a number of very good clues. First of all, it refers to the stage called gray indexing, which, as is defined there should add zero tolerance. In the event that you read, a symbol is being off by one, off by one bit. But if you if you read a symbol in the incorrect, then secondly, you have data whitening, which induces randomness into the frame. We'll talk about that momentarily. If interleaving, which scrambles the bits within the frame, then you have for error correction, which adds correcting parody bits, you can think of it as being a parody bits on steroids rather than telling you that just an error occurred. It can actually help you correct the error without needing retransmit. So we have four different things to that to comprise the encoding there in the patent. Right. So that's awesome. It's easy, right? Why is that? Because documentation lies to us and even. And even even the clear, even the clearest signals can can can lead us into dead ends. So let me show you how. So the grand hexing we read to represent great cotting, which is just a basic binary transformation that you can use to treat data whitening. We actually have defined in one of the application notes reference designs for the pseudo random number generators that you use for use of the whitening. It's like C-code that you can copy and paste. So this should be like this should be rock solid. Step three, we have an actual algorithm for the EarlyBird that is defined in the patent. I'll show you what it is momentarily. And then finally, step four suggests that having a human code is used, which is just a standard for error correction mechanism. So the first thing to focus on figuring out here is the data whitening. And that's a critical step because this is the way the whitening works, is you X or your message against a random string. And unless you know what the random string is, you're not going to be able to make any sense of what follows it. So figuring out that random string is essential to being able to even make sense of what follows it. So, again, with whitening, you take your you take your your buffer that's going out to the radio and you exhort against a pre computed sort of random string that is known to both the transmitter and the receiver. Then when the receiver gets in the frame, it explores that the received buffer against the same sequence that the transmitter used. And you get back to the original data because if you remember, explores its own inverse. So that nicely undoes itself. Now, why would we bother with whitening, and that's because having random data is really good for receivers similar to Manchester and coding, where basically by by encoding the data such that you don't have some number of consecutive values of some number of consecutive symbols of the same value. You get this nice random data source. What that does is creates lots of edges for your receiver to do clock recovery against so you get better reception of longer messages or if your clocks are bad. Manchester, of course, comes with the penalty of a reduced bit rate. It actually cuts the effective bit rate that you can use into half of the battery was whitening, does not. The caveat is that you have to know what the string is in order for it to work. So let's find the waiting sequence. We've got these algorithms in the in the application note, we've got some examples and strange love. None of them worked, so we had to figure this out empirically. How can we do that when there's interleaving and for error correction in in the in the pipeline here? Right. You know, we can we can send something that might, you know, put the whitening in a certain state that we could we could leverage. Right. But we still have these unknown transforms and follow it. How are we going to be able to figure out what what goes up? How are we going be able to figure out the whitening when those operations are in the loop, too? Well, we need to bound the problem and make some assumptions that we can start to iterate through this black box problem. So we're going to assume that the Forder correction is what the documentation tells us. It is the Heming and for and we're also going to make another assumption and we're going to set the spreading factor equal to eight bits per symbol. And basically, if you do that, then it makes it such that we'll have exactly one Heming, eight four code word per eight bits per symbol, because if we set the number of total bits in our having error correcting code to eight, if it's possible, fits very nicely and should work out well. Now there's another very useful property of the Hemingford Error correcting code scheme that we're also going to exploit, and that's that Heming eight for contains four data bits and four parity bits each. And for 14 of those 16 states, again, remember two possible states per bit to the power for data bits per code word in each of those in 14 of those 16 code word possibilities, other for ones and for zeroes each. However, for the four, the word for data Knebel zero. That's four zeros. The code word of that is eight zeros. So it's totally non additive. So if we if we send our error correcting scheme a string of zeros to apply itself to, it's totally not additive. We get back twice as many zeros so we can leverage that to do something to try to cancel out that for error correcting stage. So let's go ahead and transmit a string of zeros. Right. So, again, if it's hamming it for his resume, we expect that stage for the four year curtain code to cancel out, right. What about the inner lever? Let's take a look at the algorithm that suggested in the pattern. There it is. The key takeaway from this is if this is implemented in a way that's similar to this, is this should be totally non additive. So this should just move bits around but not add any bits. Right. So if it is in fact non additive and all we pass through are a bunch of zeros, what happens when you shuffle around a bunch of zeros? You get the same thing out, so that falls away, too, right? So we're left with two states, right? We have our symbol grand stage and our data waiting stage waiting is what we're solving for. That's our variable and gray indexing. The quote unquote indexing is a bit of an ambiguous term, but it likely refers to some variant of gray coating, which we mentioned earlier. But even if it is gray coating versus gray coating or nothing at all, it's just something they didn't implement. That leaves only three permutations here. Right. So we've just reduced all the ambiguity of figuring out what this decoder is to really figure out what the lighting sequences, to really just figuring out which of the three states this for which of the three operations, this first gray indexing stages. Right. So if we do that, we try all three. That's only three things to attempt in order to derive the whitening sequence from the transmitter, because, again, if we send through a string of zeros, what is the whitening do? It explores the zeroes against the pseudo random string and what does anything extra zero. It's the input. So we can do this and get the transmitter to tell us what its whitening sequences so we can implement the receiver, read that out, plug it back in and then start to sell for the rest. Cool. Next stage is the inner lever. Again, we had that formula from the patent surprise surprise implemented. It was no good. So let's figure out how this works now. We're going to move very quickly through this because this was the hardest part of all this. And I'm going to show you the process without making us all the time of staring at a bunch of graph paper and trying things that that kind of went into this. But again, just like with the whitening sequence, we're going to exploit properties of the Heming fact, reveal patterns in the interleave. So, again, if we look at our Heming eight for code words that we know and love that are very useful, we're going to use this time the code word for for once, the code word for for Hex F, and in that case, the state of that code word is eight once. So if we construct a bunch of packets, we're basically we take we take eight symbols. We start we take we take four four bytes, which is eight symbols and SFH and we walk the position of those ones through our our frame here. We can start to look for patterns. Who sees it. I'll save you the trouble. Who sees it. Now look at the the bottom row. Second from the right and you'll see the pattern. Basically it's a diagonal inner lever. But the first two, the two most significant bits are flipped. So if we take this and then read out, basically we can take this and we can start to map those diagonal positions into positions within within a interleave matrix. So if we do that, we walk through all the different states and map those positions out with data that we know we get this nice table. Now, let's put this table next to the data that we're looking for. Right. So here we decomposed the Heming code words for for the data we in, which is, of course, our beloved dead beef on the in the middle column. On the left, we have the the data values, the four data bits that we're looking for. And then the column, the right column on the left there is are the Perati bits that we're looking for. Again, I'm going to make this easy for you. If you stare at this for long enough, you become compelled to reverse the order. And then if you continue staring at it, you start to see some patterns. That looks like our data, right. So if we go a step further, we can start to map in some of these HanTing correcting fields into this this matrix here. So here we see the four data are the rightmost rightmost bits. And then we can see that Perati bits, one and two correlate very nicely. And if you go a step further, we can see that. These are these the Ghiz five in format very closely as well, although they're flipped, you'll see that Perati before is actually more significant period of three. So we're almost there, right. Although we have left to do is applier and we're done. And that's the modulation. That's the whole thing. So, again, let's thank you. So, again, let's let's talk briefly about these red herrings and try to wrap this up, I want to do a demo before our Q&A. So we had these four different encoding stages here, right? We had great documentation for all of them. But empirically, after implementing them, we were able to establish that, well, three of the three of the four just weren't the case. Right. One of them was actually cool, right? One of them was actually what it said it was. So. So, yeah. Anyway, how are we able to work through this? I think it's important to reflect and try to get some takeaways from this. Hopefully this is useful as you approach your reverse engineering challenges. Basically, what was essential here was being able to bauen the problem and hold certain things constants that we could solve for unknowns. And if you remember, we kind of did this in two stages. We were able to cancel out the interleaving in the forward error correction and hold that hold that standard, hold that static in order to figure out the whitening sequence. And the gray indexing were kind of all in one go. And then when we controlled the grand indexing, the whitening sequence, and we're pretty confident about what the Ford error correction was, there was really only one variable that we really had to had to solve, really only one thing. We actually had to go into the bits and really, really kind of dig out of this thing. Right. So by making these assumptions, using open source information and really bounding the problem and working, working through it, through it, coherently able to reverse these four stages down into really one experimental variable and just solve for it. So that's that's really the trick here. OK, I'm going to blow through this next part to talk very briefly about the structure, the Laurer Phi Phi packett. So this is a picture pulled out of one of the one of the data sheets. We already talked about the preamble, this repeated chirps. One thing that's not pictured here is the single word in the story frame delimiter, which is right there. And then we have this thing called the header. Right. And it says here that the header is only present in explicit mode. So there's this notion of implicit versus explicit header in LoRa. And the explicit header includes a finder that that has some information, such as the length of the payload, the type of scheme in there that's applied to the remainder of the payload, not the header itself, but the rest of it. And then there's also an optional CRC as well. It can be included in implicit assumes that the receiver knows the modulation parameters and skips that bit. So no problem, right? We can use implicit mode to figure out what the whitening sequences and then switch back to explicit mode, use the whitening sequence from implicit and figure out what the header is by just looking to see what the values are as we change the modulation. Yeah, right. None of this is easy, right? Like, really, really nothing. Nothing helps us here. So as it turns out, implicit and explicit explicit header modes use different whitening sequences. So the header remains unpersuaded, even if we know what the implicit whitening sequence is implicit about whitening sequences. So let's see what we know. Again, we've got this header here and in this picture tells us the code rate is always four eight for the header. So no matter what the code rate, that is the the number of bits in the Heming for Hemingford error correcting codes used is for the rest of the packet. This code red is always for it. Well, what about the spreading factor, as it turns out, the header is always sent at the spreading factor, that is to less than the rest of your modulation, the code rate is still for the spreading factor for the header is the pretty factor of minus two. So two fewer bits per symbol, even if the headers implicit and I have to credit Thomas tell Camp for giving me the tip that actually led led to kind of putting this all together thanks to him. So again, the first eight symbols, no matter whether you're an implicit or explicit mode, are always Senate it minus two and code word for it. That's always the case. Also, there's this mode called low data rate where if that set on, then all of the symbols in the remaining in the remainder of the five, the five packet are also sent at spreading factor F minus two. So it's just an extra basically gets you some extra margin in case you're dealing with the noisy channel and need to get data for that's the five who want some tools to go with it, who's curious about this and wants to start playing with it. Does LoRa seem cool? So with that, that brings us to G.R. LoRa, which is an out of frequency radio module that I've been working on for for the last couple of months. And it's an open source implementation of the fire that works very nicely with the GANU radio software, defined radio, digital signal processing toolkit. It's open source software, its free software. It's got a great community built up around it. It's really cool. If you're curious about ETR, there are loads of good tutorials. And even if you're a wizard, well, if you're a wizard, you already know what this is. But it's a really, really great, great piece of software and ecosystem. And why is having an open source version of this interesting, well, existing interfaces to LoRa or layer to and above, both with the the data sheets that we get that go with each of the different lower radios and the standards that are available and open. It's all layer tuneup. We don't have any insight into what the fi state machine actually does. And FIGLIA security really can't be taken for granted. And to to back this up, I'm going to point to some eight to 15 for exploits that that kind of reinforce this from a couple of years ago. We have traves good speeds packet packet that show that he was able to do a full seven layer compromise by basically encoding the data that would induce the preamble and subframe symbols for eight to 15 for within the payload of another message, he was able to get some really wonky things to happen to radio state machines in doing so. And related to that, we have this wireless intrusion detection system evasion that was done by Travis Good and some friends of mine from Dartmouth. Where they were basically able to fingerprint how different Itochu for radio state machines work and construct packets that would be able to be heard by some but not others. So from that, you could basically identify generate versions of packets that weren't totally compliant with the standard, but would still be heard by certain receivers and not others. So some really tricky stuff here. Phi's really matter. You can't take them for granted in the picture of security. So my hope with this is by getting this tool out there, we can actually really start to look at the surface and figure out how it works and how it can be made better and really start to start to get involved with improving the security of this new protocol through some prior to site. Josh Blum has a module for both of us, which is a kind of like a competitor to radio. It's like another framework. It gets the modulation right. But the decoding is is basically off of the documentation so it can talk to itself, but it can't talk to actual hardware because it doesn't implement the real decoding stage that we had to reverse engineer. And also, there's another Gahler out there made by this guy, RPV zero on GitHub. When I first looked at it, it was like this python thing that I couldn't quite get to work. I went, What did you get last night? Actually looks pretty cool. So you might check that out, too, if you're interested in this. Looks like it's it's pretty, pretty solid. So Migiro LoRa implements modulation encoding in separate blocks so that you can you can be modular and experiment. So if you want to have like a multiple kind of like a common two layer for error correcting thing, you better resiliency. You can write that in without having to touch the demodulator. Told you a couple for you. Also, there's a very simple asynchronous PDU interface for passing data between the blocks and you basically write to it just using websocket, which is really easy. I'll demonstrate in a minute and it's just like I you know, two fifteen four which is a great eight to 15 four, which is a really great module made by Bastiaan, who I think is here really, really cool tool I used all the time. So demodulator, the demodulator in the decoding implements the process that we just reverse engineered using the stack, the 50s and all that. The modulator in the encoder use a more efficient method that does direct synthesis of chirps. So rather than like basically computing the fifty results and then doing an effect of that, we can actually index into a pre computed chirp to make the generation a lot more computationally efficient. If you want the source right there just pushed a giant update to it about two hours ago. So if you're interested in playing with it, there it is. Let's run through a quick demo before we're out of time here. So here's a scenario. I've written you guys a poem. I'm going to play you guys a poem. And I want to be able to sniff it and show you what it is. Right. So to transmit, we have our ative fruit. It's an idea for radio, like an Arduino basically with a lower radio on it. And to receive it, we're going to use our USP right down here. And of course, it's all being received by G.R. LoRa. So I'm going to jump over to my VM if I can see if I can get this up on the other screen. Bear with me one moment. There we go. Show you the interview of my password. We're going to start a receiver here and now I'm. Going to just open a. Sock it here. And I'm going to. Sir, my transmitter and let's see what we have for you. In case you're unsure of what you're looking at. So that's all over, LoRa. There are few to do's, if you want to contribute, be happy to have you do so, some additional resources if you want to know more. I've written this up all in detail in traves good speeds, PIERCEY or ETFO. The most recent issue has that in there. Also, if you want to learn more about Radio's NDR, my colleague Mark and I are giving a talk at Shukan and Troupers called. So you want to talk radio's, which is going to go through how to reverse engineer really basic Iot modulations. It'll spend a lot more time on some of the basics and show you how to actually apply the stuff yourself to wrap up. LPI plans are exploding. They have tons of momentum and are popping up everywhere. RF stacks are also becoming more diverse. So when you're talking about securing your wireless air space, you're not just worrying worried about Wi-Fi anymore. If you're a corporate security administrator, you work in corporate I.T. You also have to worry about all these other, like, Iot appliances that are coming into your enterprise and are starting to take root. On a technical note, we've shown how to go from some obscure modulation into bits. We've also added a new tool to the researchers arsenal. I want to thank Bollon Sieber Bestival. He's an incredible resource and this would have been possible without him. Also, the open source contributors who helped get here helped us all get here. And finally, the Chaos Computer Club for organizing 33c3 and having me. So thank you very much. Thank you for your attention. And I'd be happy to take your questions. Applause Herald: We are almost out of time, thank you very much, Matt. We're able to take very few and brief questions. So microphone in front, right, please. Matt: I remember you. We met in your video conference. Good to see you. Mic: Yes. There are two ways to quantify the reliability of a dense LoRa network. Matt: Could you repeat that, please? Mic: Is art a ways to quantify the reliability of a dense LoRa network? Matt: I'm sure there are. I haven't really looked at all at benchmarking or figuring out what kind of the limits are. My interest has really been in getting the decoding information extraction done. I know that there's a group in San Francisco that's building deep networks that building a LoRa product or network of some sort. They've done some benchmarking of how LoRa works in cities and they have a blog post. That's pretty good. You might check that out. Herald: We have one question from the Internet via our Signal Angel? Signal Angel: Our panel on the IAC is asking, how long did it take to figure out all of this? Matt: So, you know, I first saw LoRa in the wild in January and kind of just let the capture sit in my sitting by my hard drive for a while. It probably took about four or five weeks of working on this, more or less full time, I was a little bit I had some other things working on, too, I'd say probably four weeks from what I actually said. All right. Let's figure this thing out to having the initial results. Herald: Another question from the rear right microphone. Mic: So in decoding those two unknown layers, you had your proprietary hardware and you could send it data and it'll it won't do the AES and encryption stuff and it just sends that encoding. Matt: That's a great question. I kind of skipped over that the microchip LoRa radio that I had this guy right here. I also wanted another one that was a LoRa when radio. This is a LoRa radio, but actually exposes an API to pause the Maxsted machine so you can turn off all the layer two stuff that would add a header in encryption, stuff like that, and send what are close to arbitrary frames. And I say what are close to arbitrary frames because you can't turn off the implicit header. So it's always an implicit or sorry, you can't turn off explicit headers, it's always in the explosive header mode. So this more or less exposed raw raw payload injection. Mic: OK, thanks. Herald: Yeah, we're already in overtime. We're taking one last question from our Signal Angel on IRC and then we'll have to wrap up. Matt: I'll be happy to hang out and answer questions after the fact too. Mic: Now many people are wondering what implications does it have that basically the patent is not used at all? So could you could you say that the technology is patent free In a way? Matt: I am not a lawyer, but I have known lawyers and I know that they're clever enough to not fall for that. So I'm sure that I'm sure that the patent was defined as generally as possible. And again, it describes a modulation similar to LoRa. I'm again not a lawyer, but I'm almost certain that that that that it would be covered. So but that's a clever thought. Herald: Thank you, Mike. Please give him a warm round of applause. Thank you again. applause 33c3 postrol music Subtitles created by c3subtitles.de in the year 2021. Join, and help us!