1 00:00:03,959 --> 00:00:08,670 [Music] 2 00:00:08,670 --> 00:00:21,900 Herald: Has anyone in here ever worked with libusb or PI USB? Hands up. Okay. Who 3 00:00:21,900 --> 00:00:32,168 also thinks USB is a pain? laughs Okay. Sergey and Alexander were here back in at 4 00:00:32,168 --> 00:00:38,769 the 26C3, that's a long time ago. I think it was back in Berlin, and back then they 5 00:00:38,769 --> 00:00:45,120 presented their first homemade, or not homemade, SDR, software-defined radio. 6 00:00:45,120 --> 00:00:49,440 This year they are back again and they want to show us how they implemented 7 00:00:49,440 --> 00:00:55,420 another one, using an FPGA, and to communicate with it they used PCI Express. 8 00:00:55,420 --> 00:01:01,589 So I think if you thought USB was a pain, let's see what they can tell us about PCI 9 00:01:01,589 --> 00:01:06,690 Express. A warm round of applause for Alexander and Sergey for building a high 10 00:01:06,690 --> 00:01:12,430 throughput, low latency, PCIe-based software-defined radio 11 00:01:12,430 --> 00:01:20,220 [Applause] Alexander Chemeris: Hi everyone, good 12 00:01:20,220 --> 00:01:30,280 morning, and welcome to the first day of the Congress. So, just a little bit 13 00:01:30,280 --> 00:01:36,180 background about what we've done previously and why we are doing what we 14 00:01:36,180 --> 00:01:42,229 are doing right now, is that we started working with software-defined radios and 15 00:01:42,229 --> 00:01:51,930 by the way, who knows what software defined radio is? Okay, perfect. laughs 16 00:01:51,930 --> 00:01:59,140 And who ever actually used a software- defined radio? RTL-SDR or...? Okay, less 17 00:01:59,140 --> 00:02:06,329 people but that's still quite a lot. Okay, good. I wonder whether anyone here used 18 00:02:06,329 --> 00:02:16,940 more expensive radios like USRPs? Less people, but okay, good. Cool. So before 19 00:02:16,940 --> 00:02:22,630 2008 I've had no idea what software- defined radio is, was working with voice 20 00:02:22,630 --> 00:02:30,330 over IP software person, etc., etc., so I in 2008 I heard about OpenBTS, got 21 00:02:30,330 --> 00:02:40,080 introduced to software-defined radio and I wanted to make it really work and that's 22 00:02:40,080 --> 00:02:52,250 what led us to today. In 2009 we had to develop a clock tamer. A hardware which 23 00:02:52,250 --> 00:03:00,170 allows to use, allowed to use USRP1 to run GSM without problems. If anyone ever tried 24 00:03:00,170 --> 00:03:05,420 doing this without a good clock source knows what I'm talking about. And we 25 00:03:05,420 --> 00:03:10,550 presented this - it wasn't an SDR it was just a clock source - we presented this in 26 00:03:10,550 --> 00:03:18,530 2009 in 26C3. Then I realized that using USRP1 is not 27 00:03:18,530 --> 00:03:23,760 really a good idea, because we wanted to build a robust, industrial-grade base 28 00:03:23,760 --> 00:03:29,980 stations. So we started developing our own software defined radio, which we call 29 00:03:29,980 --> 00:03:41,290 UmTRX and it was in - we started started this in 2011. Our first base stations with 30 00:03:41,290 --> 00:03:51,590 it were deployed in 2013, but I always wanted to have something really small and 31 00:03:51,590 --> 00:03:59,510 really inexpensive and back then it wasn't possible. My original idea in 2011, we 32 00:03:59,510 --> 00:04:07,680 were to build a PCI Express card. Mini, sorry, not PCI Express card but mini PCI 33 00:04:07,680 --> 00:04:10,100 card. If you remember there were like all the 34 00:04:10,100 --> 00:04:14,470 Wi-Fi cards and mini PCI form factor and I thought that would be really cool to have 35 00:04:14,470 --> 00:04:22,490 an SDR and mini PCI, so I can plug this into my laptop or in some embedded PC and 36 00:04:22,490 --> 00:04:31,710 have a nice SDR equipment, but back then it just was not really possible, because 37 00:04:31,710 --> 00:04:37,939 electronics were bigger and more power hungry and just didn't work that way, so 38 00:04:37,939 --> 00:04:49,539 we designed UmTRX to work over gigabit ethernet and it was about that size. So 39 00:04:49,539 --> 00:04:57,300 now we spend this year at designing something, which really brings me to what 40 00:04:57,300 --> 00:05:05,289 I wanted those years ago, so the XTRX is a mini PCI Express - again there was no PCI 41 00:05:05,289 --> 00:05:10,460 Express back then, so now it's mini PCI Express, which is even smaller than PCI, I 42 00:05:10,460 --> 00:05:17,719 mean mini PCI and it's built to be embedded friendly, so you can plug this 43 00:05:17,719 --> 00:05:23,669 into a single board computer, embedded single board computer. If you have a 44 00:05:23,669 --> 00:05:28,020 laptop with a mini PCI Express you can plug this into your laptop and you have a 45 00:05:28,020 --> 00:05:35,210 really small, software-defined radio equipment. And we really want to make it 46 00:05:35,210 --> 00:05:39,430 inexpensive, that's why I was asking how many of you have ever worked it with RTL- 47 00:05:39,430 --> 00:05:44,169 SDR, how many of you ever worked with you USRPs, because the gap between them is 48 00:05:44,169 --> 00:05:53,740 pretty big and we want to really bring the software-defined radio to masses. 49 00:05:53,740 --> 00:05:59,550 Definitely won't be as cheap as RTL-SDR, but we try to make it as close as 50 00:05:59,550 --> 00:06:03,330 possible. And at the same time, so at the size of 51 00:06:03,330 --> 00:06:09,659 RTL-SDR, at the price well higher but, hopeful hopefully it will be affordable to 52 00:06:09,659 --> 00:06:17,460 pretty much everyone, we really want to bring high performance into your hands. 53 00:06:17,460 --> 00:06:22,539 And by high performance I mean this is a full transmit/receive with two channels 54 00:06:22,539 --> 00:06:28,289 transmit, two channels receive, which is usually called 2x2 MIMO in in the radio 55 00:06:28,289 --> 00:06:37,370 world. The goal was to bring it to 160 megasamples per second, which can roughly 56 00:06:37,370 --> 00:06:44,110 give you like 120 MHz of radio spectrum available. 57 00:06:44,110 --> 00:06:53,111 So what we were able to achieve is, again this is mini PCI Express form factor, it 58 00:06:53,111 --> 00:07:01,639 has small Artix7, that's the smallest and most inexpensive FPGA, which has ability 59 00:07:01,639 --> 00:07:18,029 to work with a PCI Express. It has LMS7000 chip for RFIC, very high performance, very 60 00:07:18,029 --> 00:07:27,449 tightly embedded chip with even a DSP blocks inside. It has even a GPS chip 61 00:07:27,449 --> 00:07:37,340 here, you can actually on the right upper side, you can see a GPS chip, so you can 62 00:07:37,340 --> 00:07:44,060 accually synchronize your SDR to GPS for perfect clock stability, 63 00:07:44,060 --> 00:07:51,389 so you won't have any problems running any telecommunication systems like GSM, 3G, 4G 64 00:07:51,389 --> 00:07:58,650 due to clock problems, and it also has interface for SIM cards, so you can 65 00:07:58,650 --> 00:08:06,330 actually create a software-defined radio modem and run other open source projects 66 00:08:06,330 --> 00:08:15,840 to build one in a four LT called SRSUI, if you're interested, etc., etc. so really 67 00:08:15,840 --> 00:08:22,080 really tightly packed one. And if you put this into perspective: that's how it all 68 00:08:22,080 --> 00:08:30,669 started in 2006 and that's what you have ten years later. It's pretty impressive. 69 00:08:30,669 --> 00:08:36,840 applause Thanks. But I think it actually applies to 70 00:08:36,840 --> 00:08:40,320 the whole industry who is working on shrinking the sizes because we just put 71 00:08:40,320 --> 00:08:48,890 stuff on the PCB, you know. We're not building the silicon itself. Interesting 72 00:08:48,890 --> 00:08:54,701 thing is that we did the first approach: we said let's pack everything, let's do a 73 00:08:54,701 --> 00:09:03,180 very tight PCB design. We did an eight layer PCB design and when we send it to a 74 00:09:03,180 --> 00:09:10,490 fab to estimate the cost it turned out it's $15,000 US per piece. Well in small 75 00:09:10,490 --> 00:09:18,940 volumes obviously but still a little bit too much. So we had to redesign this and 76 00:09:18,940 --> 00:09:26,712 the first thing which we did is we still kept eight layers, because in our 77 00:09:26,712 --> 00:09:32,810 experience number of layers nowadays have only minimal impact on the cost of the 78 00:09:32,810 --> 00:09:42,450 device. So like six, eight layers - the price difference is not so big. But we did 79 00:09:42,450 --> 00:09:52,190 complete rerouting and only kept 2-Deep MicroVIAs and never use the buried VIAs. 80 00:09:52,190 --> 00:09:57,240 So this make it much easier and much faster for the fab to manufacture it and 81 00:09:57,240 --> 00:10:03,740 the price suddenly went five, six times down and in volume again it will be 82 00:10:03,740 --> 00:10:18,140 significantly cheaper. And that's just for geek porn how PCB looks inside. So now 83 00:10:18,140 --> 00:10:25,140 let's go into real stuff. So PCI Express: why did we choose PCI Express? As it was 84 00:10:25,140 --> 00:10:33,310 said USB is a pain in the ass. You can't really use USB in industrial systems. For 85 00:10:33,310 --> 00:10:40,510 a whole variety of reasons just unstable. So we did use Ethernet for many years 86 00:10:40,510 --> 00:10:47,190 successfully but Ethernet has one problem: first of all inexpensive Ethernet is only 87 00:10:47,190 --> 00:10:51,780 one gigabit and one gigabit does not offer you enough bandwidth to carry all the data 88 00:10:51,780 --> 00:10:59,720 we want, plus its power-hungry etc. etc. So PCI Express is really a good choice 89 00:10:59,720 --> 00:11:06,420 because it's low power, it has low latency, it has very high bandwidth and 90 00:11:06,420 --> 00:11:11,380 it's available almost universally. When we started looking into this we realize that 91 00:11:11,380 --> 00:11:17,320 even ARM boards, some of ARM boards have PCI Express, mini PCI Express slots, which 92 00:11:17,320 --> 00:11:26,560 was a big surprise for me for example. So the problems is that unlike USB you do 93 00:11:26,560 --> 00:11:36,540 need to write your own kernel driver for this and there's no way around. And it is 94 00:11:36,540 --> 00:11:41,110 really hard to write this driver universally so we are writing it obviously 95 00:11:41,110 --> 00:11:45,300 for Linux because they're working with embedded systems, but if we want to 96 00:11:45,300 --> 00:11:51,030 rewrite it for Windows or for macOS we'll have to do a lot of rewriting. So we focus 97 00:11:51,030 --> 00:11:57,250 on what we want on Linux only right now. And now the hardest part: debugging is 98 00:11:57,250 --> 00:12:02,580 really non-trivial. One small error and your PC is completely hanged because you 99 00:12:02,580 --> 00:12:08,750 use something wrong. And you have to reboot it and restart it. That's like 100 00:12:08,750 --> 00:12:15,500 debugging kernel but sometimes even harder. To make it worse there is no 101 00:12:15,500 --> 00:12:19,400 really easy-to-use plug-and-play interface. If you want to restart; 102 00:12:19,400 --> 00:12:24,250 normally, when you when you develop a PCI Express card, when you want when you want 103 00:12:24,250 --> 00:12:31,050 to restart it you have to restart your development machine. Again not a nice way, 104 00:12:31,050 --> 00:12:39,420 it's really hard. So the first thing we did is we found, that we can use 105 00:12:39,420 --> 00:12:47,100 Thunderbolt 3 which is just recently released, and it has ability to work 106 00:12:47,100 --> 00:12:57,200 directly with PCI Express bus. So it basically has a mode in which it converts 107 00:12:57,200 --> 00:13:01,410 a PCI Express into plug-and-play interface. So if you have a laptop which 108 00:13:01,410 --> 00:13:09,450 supports Thunderbolt 3 then you can use this to do plug and play your - plug or 109 00:13:09,450 --> 00:13:16,480 unplug your device to make your development easier. There are always 110 00:13:16,480 --> 00:13:23,620 problems: there's no easy way, there's no documentation. Thunderbolt is not 111 00:13:23,620 --> 00:13:27,380 compatible with Thunderbolt. Thunderbold 3 is not compatible with Thunderbold 2. 112 00:13:27,380 --> 00:13:33,760 So we had to buy a special laptop with Thunderbold 3 with special cables like all 113 00:13:33,760 --> 00:13:40,120 this all this hard stuff. And if you really want to get documentation you have 114 00:13:40,120 --> 00:13:47,500 to sign NDA and send a business plan to them so they can approve that your 115 00:13:47,500 --> 00:13:50,670 business makes sense. laughter 116 00:13:50,670 --> 00:13:58,640 I mean... laughs So we actually opted out. We set not to go through this, what 117 00:13:58,640 --> 00:14:05,340 we did is we found that someone is actually making PCI Express to Thunderbolt 118 00:14:05,340 --> 00:14:10,550 3 converters and selling them as dev boards and that was a big relief because 119 00:14:10,550 --> 00:14:16,740 it saved us lots of time, lots of money. You just order it from from some from some 120 00:14:16,740 --> 00:14:24,920 Asian company. And yeah this is how it looks like this converter. So you buy it, 121 00:14:24,920 --> 00:14:29,970 like several pieces you can plug in your PCI Express card there and you plug this 122 00:14:29,970 --> 00:14:38,330 into your laptop. And this is the with XTRX already plugged into it. Now the only 123 00:14:38,330 --> 00:14:50,160 problem we found is that typically UEFI has a security control enabled, so that 124 00:14:50,160 --> 00:14:56,700 any random thunderbold device can't hijack your PCI bus and can't get access to your 125 00:14:56,700 --> 00:15:01,740 kernel memory and do some bad stuff. Which is a good idea - the only problem is that 126 00:15:01,740 --> 00:15:06,730 there is, it's not fully implemented in Linux. So under Windows if you plug in a 127 00:15:06,730 --> 00:15:11,690 device which is which has no security features, which is not certified, it will 128 00:15:11,690 --> 00:15:16,510 politely ask you like: "Do you really trust this device? Do you want to use it?" 129 00:15:16,510 --> 00:15:21,940 you can say "yes". Under Linux it just does not work. laughs So we spend some 130 00:15:21,940 --> 00:15:25,730 time trying to figure out how to get around this. Right, some patches from 131 00:15:25,730 --> 00:15:30,370 Intel which are not mainline and we were not able to actually get them work. So we 132 00:15:30,370 --> 00:15:38,980 just had to disable all this security measure in the laptop. So be aware that 133 00:15:38,980 --> 00:15:46,610 this is the case and we suspect that happy users of Apple might not be able to do 134 00:15:46,610 --> 00:15:53,630 this because Apple don't have BIOS so it probably can't disable this feature. So 135 00:15:53,630 --> 00:16:01,820 probably good incentive for someone to actually finish writing the driver. 136 00:16:01,820 --> 00:16:08,130 So now to the goal: so we wanted to, we want to achieve 160 mega samples per 137 00:16:08,130 --> 00:16:13,550 second, 2x2 MIMO, which means two transceiver, two transmit, two receive 138 00:16:13,550 --> 00:16:24,040 channels at 12 bits, which is roughly 7.5 Gbit/s. So first result when we plug this 139 00:16:24,040 --> 00:16:26,230 when we got this board on the fab it didn't work 140 00:16:26,230 --> 00:16:30,430 Sergey Kostanbaev mumbles: as expected Alexander Chemeris: yes as expected so the 141 00:16:30,430 --> 00:16:39,750 first the interesting thing we realized is that: first of all the FPGA has Hardware 142 00:16:39,750 --> 00:16:47,210 blocks for talking to a PCI Express which was called GTP which basically implement 143 00:16:47,210 --> 00:16:56,850 like a PCI Express serial physical layer but the thing is the numbering is reversed 144 00:16:56,850 --> 00:17:04,319 in the in PCI Express in FPGA and we did not realize this so we had to do very very 145 00:17:04,319 --> 00:17:10,619 fine soldiering to actually swap the laughs swap the lanes you can see this 146 00:17:10,619 --> 00:17:18,490 very fine work there. We also found that one of the components 147 00:17:18,490 --> 00:17:28,870 was deadbug which is a well-known term for chips which design stage are placed at 148 00:17:28,870 --> 00:17:35,960 mirrored so we mirrored occasionally mirrored that they pin out so we had to 149 00:17:35,960 --> 00:17:41,880 solder it upside down and if you can realize how small it is you can also 150 00:17:41,880 --> 00:17:49,419 appreciate the work done. And what's funny when I was looking at dead bugs I actually 151 00:17:49,419 --> 00:17:56,929 found a manual from NASA which describes how to properly soldier dead bugs to get 152 00:17:56,929 --> 00:18:00,679 it approved. audience laughs 153 00:18:00,679 --> 00:18:08,230 So this is the link I think you can go there and enjoy it's also fun stuff there. 154 00:18:08,230 --> 00:18:17,379 So after fixing all of this our next attempt this kind of works. So next stage 155 00:18:17,379 --> 00:18:23,340 is debugging the FPGA code, which has to talk to PCI Express and PCI Express has to 156 00:18:23,340 --> 00:18:28,320 talk to Linux kernel and the kernel has to talk to the driver, driver has talked to 157 00:18:28,320 --> 00:18:37,749 the user space. So peripherals are easy so the UART SPIs we've got to work almost 158 00:18:37,749 --> 00:18:44,799 immediately no problems with that, but DMA was a real beast. So we spent a lot of 159 00:18:44,799 --> 00:18:52,660 time trying to get DMA to work and the problem is that with DMA it's on FPGA so 160 00:18:52,660 --> 00:18:59,730 you can't just place a breakpoint like you do in C or C++ or in other languages it's 161 00:18:59,730 --> 00:19:07,480 real-time system running on system like it's real-time hardware, which is running 162 00:19:07,480 --> 00:19:16,351 on the fabric so you we had to Sergey was mainly developing this had to write a lot 163 00:19:16,351 --> 00:19:22,779 of small test benches and and test everything piece by piece. 164 00:19:22,779 --> 00:19:31,480 So all parts of the DMA code we had was wrapped into a small test bench which was 165 00:19:31,480 --> 00:19:39,720 emulating all the all the tricks and as classics predicted it took about five to 166 00:19:39,720 --> 00:19:47,679 ten times more than actually writing the code. So we really blew up our and 167 00:19:47,679 --> 00:19:54,529 predicted timelines by doing this, but the end we've got really stable stable work. 168 00:19:54,529 --> 00:20:03,760 So some suggestions for anyone who will try to repeat this exercise is there is a 169 00:20:03,760 --> 00:20:09,590 logic analyzer built-in to Xilinx and you can use, it it's nice it's, sometimes it's 170 00:20:09,590 --> 00:20:15,960 very helpful but you can't debug transient box, which are coming out at 171 00:20:15,960 --> 00:20:22,990 when some weird conditions are coming up. So you have to implement some read back 172 00:20:22,990 --> 00:20:28,809 registers which shows important statistic like important data about how your system 173 00:20:28,809 --> 00:20:35,340 behaves, in our case it's various counters on the DMA interface. So you can actually 174 00:20:35,340 --> 00:20:40,950 see kind of see what's happening with your with your data: Is it received? Is it 175 00:20:40,950 --> 00:20:46,269 sent? How much is and how much is received? So like for example, we can see 176 00:20:46,269 --> 00:20:53,559 when we saturate the bus or when actually is an underrun so host is not providing 177 00:20:53,559 --> 00:20:57,389 data fast enough, so we can at least understand whether it's a host problem or 178 00:20:57,389 --> 00:21:01,769 whether it's an FPGA, problem on which part we do we debug next because again: 179 00:21:01,769 --> 00:21:07,770 it's a very multi layer problem you start with FPGA, PCI Express, kernel, driver, 180 00:21:07,770 --> 00:21:15,340 user space, and any part can fail. so you can't work blind like this. So again the 181 00:21:15,340 --> 00:21:23,179 goal was to get 160 MSPS with the first implementation we could 2 MSPS: roughly 60 182 00:21:23,179 --> 00:21:30,220 times slower. The problem is that software just wasn't 183 00:21:30,220 --> 00:21:36,149 keeping up and wasn't sending data fast enough. So it was like many things done 184 00:21:36,149 --> 00:21:41,390 but the most important parts is: use real- time priority if you want to get very 185 00:21:41,390 --> 00:21:46,940 stable results and well fix software bugs. And one of the most important bugs we had 186 00:21:46,940 --> 00:21:54,240 was that DMA buffers were not freed in proper time immediately so they were busy 187 00:21:54,240 --> 00:21:59,429 for longer than they should be, which introduced extra cycles and basically just 188 00:21:59,429 --> 00:22:06,009 reduced the bandwidth. At this point let's talk a little bit 189 00:22:06,009 --> 00:22:14,389 about how to implement a high-performance driver for Linux, because if you want to 190 00:22:14,389 --> 00:22:20,870 get real real performance you have to start with the right design. There are 191 00:22:20,870 --> 00:22:26,610 basically three approaches and the whole spectrum in between; like two approaches 192 00:22:26,610 --> 00:22:33,649 and the whole spectrum in between, which is where you can refer to three. The first 193 00:22:33,649 --> 00:22:41,529 approach is full kernel control, in which case kernel driver not only is on the 194 00:22:41,529 --> 00:22:45,701 transfer, it actually has all the logics of controlling your device and all the 195 00:22:45,701 --> 00:22:52,490 export ioctl to the user space and that's the kind of a traditional way of 196 00:22:52,490 --> 00:22:57,669 writing drivers. Your your user space is completely abstracted from all the 197 00:22:57,669 --> 00:23:07,029 details. The problem is that this is probably the slowest way to do it. The 198 00:23:07,029 --> 00:23:14,340 other way is what's called the "zero cup interface": your only control is held in 199 00:23:14,340 --> 00:23:21,380 the kernel and data is provided, the raw data is provided to user space "as-is". So 200 00:23:21,380 --> 00:23:27,919 you avoid memory copy which make it faster. But still not fast enough if you 201 00:23:27,919 --> 00:23:34,279 really want to achieve maximum performance, because you still have 202 00:23:34,279 --> 00:23:40,980 context switches between the kernel and the user space. The most... the fastest 203 00:23:40,980 --> 00:23:47,289 approach possible is to have full user space implementation when kernel just 204 00:23:47,289 --> 00:23:53,059 exposed everything and says "now you do it yourself" and you have no you have no 205 00:23:53,059 --> 00:24:02,429 context switches, like almost no, and you can really optimize everything. So what 206 00:24:02,429 --> 00:24:08,850 is... what are the problems with this? The pro the pros I already mentioned: no 207 00:24:08,850 --> 00:24:13,539 no switches between kernel user space, it's very low latency because of this as 208 00:24:13,539 --> 00:24:20,980 well, it's very high bandwidth. But if you are not interested in getting the very 209 00:24:20,980 --> 00:24:27,940 high performance, the most performance, and you just want to have like some little, 210 00:24:27,940 --> 00:24:33,299 like say low bandwidth performance, then you will have to add hacks, because you 211 00:24:33,299 --> 00:24:36,710 can't get notifications of the kernel that resources available is more data 212 00:24:36,710 --> 00:24:45,570 available. It also makes it vulnerable vulnerable because if user space can 213 00:24:45,570 --> 00:24:55,310 access it, then it can do whatever it want. We at the end decided that... one 214 00:24:55,310 --> 00:25:02,590 more important thing: how to actually to get the best performance out of out of the 215 00:25:02,590 --> 00:25:10,299 bus. This is a very (?)(?) set as we want to poll your device or not to poll and get 216 00:25:10,299 --> 00:25:14,259 notified. What is polling? I guess everyone as programmer understands it, so 217 00:25:14,259 --> 00:25:18,019 polling is when you asked repeatedly: "Are you ready?", "Are you ready?", "Are you 218 00:25:18,019 --> 00:25:20,369 ready?" and when it's ready you get the data immediately. 219 00:25:20,369 --> 00:25:25,259 It's basically a busy loop of your you just constantly asking device what's 220 00:25:25,259 --> 00:25:33,350 happening. You need to dedicate a full core, and thanks God we have multi-core 221 00:25:33,350 --> 00:25:39,519 CPUs nowadays, so you can dedicate the full core to this polling and you can just 222 00:25:39,519 --> 00:25:45,539 pull constantly. But again if you don't need this highest performance, you just 223 00:25:45,539 --> 00:25:53,190 need to get something, then you will be wasting a lot of CPU resources. At the end 224 00:25:53,190 --> 00:26:00,429 we decided to do a combined architecture of your, it is possible to pull but 225 00:26:00,429 --> 00:26:05,500 there's also a chance and to get notification from a kernel to for for 226 00:26:05,500 --> 00:26:11,049 applications, which recover, which needs low bandwidth, but also require a better 227 00:26:11,049 --> 00:26:17,480 CPU performance. Which I think is the best way if you are trying to target both 228 00:26:17,480 --> 00:26:30,850 worlds. Very quickly: the architecture of system. We try to make it very very 229 00:26:30,850 --> 00:26:50,730 portable so and flexible. There is a kernel driver, which talks to low-level 230 00:26:50,730 --> 00:26:55,690 library which implements all this logic, which we took out of the driver: to 231 00:26:55,690 --> 00:27:01,309 control the PCI Express, to work with DMA, to provide 232 00:27:01,309 --> 00:27:09,360 all the... to hide all the details of the actual bus implementation. 233 00:27:09,360 --> 00:27:17,169 And then there is a high-level library which talks to this low-level library and 234 00:27:17,169 --> 00:27:22,179 also to libraries which implement control of actual peripherals, and most 235 00:27:22,179 --> 00:27:28,919 importantly to the library which implements control over our RFIC chip. 236 00:27:28,919 --> 00:27:35,119 This way it's very modular, we can replace PCI Express with something else later, we 237 00:27:35,119 --> 00:27:46,049 might be able to port it to other operating systems, and that's the goal. 238 00:27:46,049 --> 00:27:50,059 Another interesting issue is: when you start writing the Linux kernel driver you 239 00:27:50,059 --> 00:27:57,119 very quickly realize that while LDD, which is a classic book for a Linux driver, 240 00:27:57,119 --> 00:28:02,220 writing is good and it will give you a good insight; it's not actually up-to- 241 00:28:02,220 --> 00:28:08,609 date. It's more than ten years old and there's all of new interfaces which are 242 00:28:08,609 --> 00:28:14,809 not described there, so you have to resort to reading the manuals and all the 243 00:28:14,809 --> 00:28:20,409 documentation in the kernel itself. Well at least you get the up-to-date 244 00:28:20,409 --> 00:28:31,989 information. The decisions we made is to make everything easy. We use TTY for GPS 245 00:28:31,989 --> 00:28:38,090 and so you can really attach a pretty much any application which talks to GPS. So all 246 00:28:38,090 --> 00:28:45,970 of existing applications can just work out of the box. And we also wanted to be able 247 00:28:45,970 --> 00:28:54,879 to synchronize system clock to GPS, so we get automatic log synchronization across 248 00:28:54,879 --> 00:28:59,009 multiple systems, which is very important when we are deploying many, many devices 249 00:28:59,009 --> 00:29:07,090 around the world. We plan to do two interfaces, one as key 250 00:29:07,090 --> 00:29:15,919 PPS and another is a DCT, because DCT line on the UART exposed over TTY. Because 251 00:29:15,919 --> 00:29:20,259 again we found that there are two types of applications: one to support one API, 252 00:29:20,259 --> 00:29:25,539 others that support other API and there is no common thing so we have to support 253 00:29:25,539 --> 00:29:38,649 both. As we described, we want to have polls so we can get notifications of the 254 00:29:38,649 --> 00:29:48,130 kernel when data is available and we don't need to do real busy looping all the time. 255 00:29:48,130 --> 00:29:55,789 After all the software optimizations we've got to like 10 MSPS: still very, very far 256 00:29:55,789 --> 00:30:02,369 from what we want to achieve. Now there should have been a lot of 257 00:30:02,369 --> 00:30:06,570 explanations about PCI Express, but when we actually wrote everything we wanted to 258 00:30:06,570 --> 00:30:13,999 say we realize, it's just like a full two hours talk just on PCI Express. So we are 259 00:30:13,999 --> 00:30:17,760 not going to give it here, I'll just give some highlights which are most 260 00:30:17,760 --> 00:30:23,889 interesting. If you if there is real interest, we can set up a workshop and 261 00:30:23,889 --> 00:30:32,340 some of the later days and talking more details about PCI Express specifically. 262 00:30:32,340 --> 00:30:38,549 The thing is there is no open source cores for PCI Express, which are optimized for 263 00:30:38,549 --> 00:30:48,010 high performance, real time applications. There is Xillybus which as I understand is 264 00:30:48,010 --> 00:30:53,350 going to be open source, but they provide you a source if you pay them. It's very 265 00:30:53,350 --> 00:30:59,610 popular because it's very very easy to do, but it's not giving you performance. If I 266 00:30:59,610 --> 00:31:04,980 remember correctly the best it can do is maybe like 50 percent bus saturation. 267 00:31:04,980 --> 00:31:10,800 So there's also Xilinx implementation, but if you are using Xilinx implementation 268 00:31:10,800 --> 00:31:21,049 with AXI bus than you're really locked in with AXI bus with Xilinx. And it also not 269 00:31:21,049 --> 00:31:25,001 very efficient in terms of resources and if you remember we want to make this very, 270 00:31:25,001 --> 00:31:30,029 very inexpensive. So our goal is to you ... is to be able to fit everything in the 271 00:31:30,029 --> 00:31:38,499 smallest Arctic's 7 FPGA, and that's quite challenging with all the stuff in there 272 00:31:38,499 --> 00:31:47,649 and we just can't waste resources. So decision is to write your own PCI Express 273 00:31:47,649 --> 00:31:53,039 implementation. That's how it looks like. I'm not going to discuss it right now. 274 00:31:53,039 --> 00:31:59,950 There are several iterations. Initially it looked much simpler, turned out not to 275 00:31:59,950 --> 00:32:06,100 work well. So some interesting stuff about PCI 276 00:32:06,100 --> 00:32:12,749 Express which we stumbled upon is that it was working really well on Atom which is 277 00:32:12,749 --> 00:32:17,460 our main development platform because we are doing a lot of embedded stuff. Worked 278 00:32:17,460 --> 00:32:26,479 really well. When we try to plug this into core i7 just started hanging once in a 279 00:32:26,479 --> 00:32:35,090 while. So after like several not days maybe with debugging, Sergey found that 280 00:32:35,090 --> 00:32:39,330 very interesting statement in the standard which says that value is zero in byte 281 00:32:39,330 --> 00:32:45,869 count actually stands not for zero bytes but for 4096 bytes. 282 00:32:45,869 --> 00:32:58,739 I mean that's a really cool optimization. So another thing is completion which is a 283 00:32:58,739 --> 00:33:03,639 term in PCI Express basically for acknowledgment which also can carry some 284 00:33:03,639 --> 00:33:12,429 data back to your request. And sometimes if you're not sending completion, device 285 00:33:12,429 --> 00:33:20,740 just hangs. And what happens is that in this case due to some historical heritage 286 00:33:20,740 --> 00:33:29,549 of x86 it just starts returning you FFF. And if you have a register which says: „Is 287 00:33:29,549 --> 00:33:35,470 your device okay?“ and this register shows one to say „The device is okay“, guess 288 00:33:35,470 --> 00:33:38,500 what will happen? You will be always reading that your 289 00:33:38,500 --> 00:33:46,590 device is okay. So the suggestion is not to use one as the status for okay and use 290 00:33:46,590 --> 00:33:52,790 either zero or better like a two-beat sequence. So you are definitely sure that 291 00:33:52,790 --> 00:34:03,659 you are okay and not getting FFF's. So when you have a device which again may 292 00:34:03,659 --> 00:34:10,440 fail at any of the layers, you just got this new board, it's really hard, it's 293 00:34:10,440 --> 00:34:17,639 really hard to debug because of memory corruption. So we had a software bug and 294 00:34:17,639 --> 00:34:25,099 it was writing DMA addresses incorrectly and we were wondering why we 295 00:34:25,099 --> 00:34:32,179 are not getting any data in our buffers at the same time. After several starts, 296 00:34:32,179 --> 00:34:41,159 operating system just crashes. Well, that's the reason why there is this UEFI 297 00:34:41,159 --> 00:34:47,199 protection which prevents you from plugging in devices like this into your 298 00:34:47,199 --> 00:34:52,270 computer. Because it was basically writing data, like random data into random 299 00:34:52,270 --> 00:35:00,299 portions of your memory. So a lot of debugging, a lot of tests and test benches 300 00:35:00,299 --> 00:35:10,589 and we were able to find this. And another thing is if you deinitialize your driver 301 00:35:10,589 --> 00:35:15,250 incorrectly, and that's what's happening when you have plug-and-play device, which 302 00:35:15,250 --> 00:35:22,119 you can plug and unplug, then you may end up in a situation of your ... you are 303 00:35:22,119 --> 00:35:28,039 trying to write into memory which is already freed by approaching system and 304 00:35:28,039 --> 00:35:35,960 used for something else. Very well-known problem but it also happens here. So there 305 00:35:35,960 --> 00:35:50,549 ... why DMA is really hard is because it has this completion architecture for 306 00:35:50,549 --> 00:35:56,440 writing for ... sorry ... for reading data. Writes are easy. You just send the 307 00:35:56,440 --> 00:36:00,460 data, you forget about it. It's a fire- and-forget system. But for reading you 308 00:36:00,460 --> 00:36:10,420 really need to get your data back. And the thing is, it looks like this. You really 309 00:36:10,420 --> 00:36:16,020 hope that there would be some pointing device here. But basically on the top left 310 00:36:16,020 --> 00:36:24,240 you can see requests for read and on the right you can see completion transactions. 311 00:36:24,240 --> 00:36:29,890 So basically each transaction can be and most likely will be split into multiple 312 00:36:29,890 --> 00:36:38,900 transactions. So first of all you have to collect all these pieces and like write 313 00:36:38,900 --> 00:36:46,210 them into proper parts of the memory. But that's not all. The thing is the 314 00:36:46,210 --> 00:36:53,369 latency between request and completion is really high. It's like 50 cycles. So if 315 00:36:53,369 --> 00:36:58,990 you have a single, only single transaction in fly you will get really bad 316 00:36:58,990 --> 00:37:03,900 performance. You do need to have multiple transactions in flight. And the worst 317 00:37:03,900 --> 00:37:13,170 thing is that transactions can return data in random order. So it's a much more 318 00:37:13,170 --> 00:37:19,820 complicated state machine than we expected originally. So when I said, you know, the 319 00:37:19,820 --> 00:37:25,589 architecture was much simpler originally, we don't have all of this and we had to 320 00:37:25,589 --> 00:37:31,670 realize this while implementing. So again here was a whole description of how 321 00:37:31,670 --> 00:37:41,200 exactly this works. But not this time. So now after all these optimizations we've 322 00:37:41,200 --> 00:37:48,859 got 20 mega samples per second which is just six times lower than what we are 323 00:37:48,859 --> 00:37:59,599 aiming at. So now the next thing is PCI Express lanes scalability. So PCI Express 324 00:37:59,599 --> 00:38:07,220 is a serial bus. So it has multiple lanes and they allow you to basically 325 00:38:07,220 --> 00:38:14,350 horizontally scale your bandwidth. One lane is like x, than two lane is 2x, four 326 00:38:14,350 --> 00:38:20,160 lane is 4x. So the more lanes you have the more performance you are getting out of 327 00:38:20,160 --> 00:38:23,970 your, out of your bus. So the more bandwidth you're getting out of your bus. 328 00:38:23,970 --> 00:38:31,700 Not performance. So the issue is that typical a mini PCI Express, so the mini 329 00:38:31,700 --> 00:38:38,600 PCI Express standard only standardized one lane. And second lane is left as optional. 330 00:38:38,600 --> 00:38:46,099 So most motherboards don't support this. There are some but not all of them. And we 331 00:38:46,099 --> 00:38:52,370 really wanted to get this done. So we designed a special converter board which 332 00:38:52,370 --> 00:38:57,530 allows you to plug your mini PCI Express into a full-size PCI Express and 333 00:38:57,530 --> 00:39:06,790 get two lanes working. And we're also planning to have a similar board which 334 00:39:06,790 --> 00:39:12,660 will have multiple slots so you will be able to get multiple XTRX-SDRs on to the 335 00:39:12,660 --> 00:39:21,270 same, onto the same carrier board and plug this into let's say PCI Express 16x and 336 00:39:21,270 --> 00:39:29,059 you will get like really a lot of ... SDR ... a lot of IQ data which then will be 337 00:39:29,059 --> 00:39:38,760 your problem how to, how to process. So with two x's it's about twice performance 338 00:39:38,760 --> 00:39:48,930 so we are getting fifty mega samples per second. And that's the time to really cut 339 00:39:48,930 --> 00:39:59,230 the fat because the real sample size of LMS7 is 12 bits and we are transmitting 16 340 00:39:59,230 --> 00:40:06,930 because it's easier. Because CPU is working on 8, 16, 32. So we originally 341 00:40:06,930 --> 00:40:13,770 designed the driver to support 8 bit, 12 bit and 16 bit to be able to do this 342 00:40:13,770 --> 00:40:23,800 scaling. And for the test we said okay let's go from 16 to 8 bit. We'll lose 343 00:40:23,800 --> 00:40:32,960 some dynamic range but who cares these days. Still stayed the same, it's still 50 344 00:40:32,960 --> 00:40:41,980 mega samples per second, no matter what we did. And that was a lot of interesting 345 00:40:41,980 --> 00:40:49,580 debugging going on. And we realized that we actually made another, not a really 346 00:40:49,580 --> 00:40:58,720 mistake. We didn't, we didn't really know this when we designed. But we should have 347 00:40:58,720 --> 00:41:04,450 used a higher voltage for this high speed bus to get it to the full performance. And 348 00:41:04,450 --> 00:41:12,619 at 1.8 it was just degrading too fast and the bus itself was not performing well. So 349 00:41:12,619 --> 00:41:21,859 our next prototype will be using higher voltage specifically for this bus. And 350 00:41:21,859 --> 00:41:26,559 this is kind of stuff which makes designing hardware for high speed really 351 00:41:26,559 --> 00:41:32,210 hard because you have to care about coherence of the parallel buses on your, 352 00:41:32,210 --> 00:41:38,550 on your system. So at the same time we do want to keep 1.8 volts for everything else 353 00:41:38,550 --> 00:41:43,480 as much as possible. Because another problem we are facing with this device is 354 00:41:43,480 --> 00:41:47,069 that by the standard mini PCI Express allows only like ... 355 00:41:47,069 --> 00:41:51,220 Sergey Kostanbaev: ... 2.5 ... Alexander Chemeris: ... 2.5 watts of power 356 00:41:51,220 --> 00:41:58,369 consumption, no more. And that's we were, we were very lucky that LMS7 has such so 357 00:41:58,369 --> 00:42:04,460 good, so good power consumption performance. We actually had some extra 358 00:42:04,460 --> 00:42:10,049 space to have FPGA and GPS and all this stuff. But we just can't let the power 359 00:42:10,049 --> 00:42:14,880 consumption go up. Our measurements on this device showed about ... 360 00:42:14,880 --> 00:42:18,510 Sergey Kostanbaev: ... 2.3 ... Alexander Chemeris: ... 2.3 watts of power 361 00:42:18,510 --> 00:42:27,220 consumption. So we are like at the limit at this point. So when we fix the bus with 362 00:42:27,220 --> 00:42:31,420 the higher voltage, you know it's a theoretical exercise, because we haven't 363 00:42:31,420 --> 00:42:38,000 done this yet, that's plenty to happen in a couple months. We should be able to get 364 00:42:38,000 --> 00:42:47,330 to this numbers which was just 1.2 times slower. Then the next thing will be to fix 365 00:42:47,330 --> 00:42:55,550 another issue which we made at the very beginning: we have procured a wrong chip. 366 00:42:55,550 --> 00:43:05,270 Just one digit difference, you can see it's highlighted in red and green, and 367 00:43:05,270 --> 00:43:13,230 this chip it supports only a generation 1 PCI Express which is twice slower than 368 00:43:13,230 --> 00:43:18,190 generation 2 PCI Express. So again, hopefully we'll replace the chip 369 00:43:18,190 --> 00:43:30,140 and just get very simple doubling of the performance. Still it will be slower than 370 00:43:30,140 --> 00:43:39,770 we wanted it to be and here is what comes like practical versus theoretical numbers. 371 00:43:39,770 --> 00:43:47,119 Well as every bus it has it has overheads and one of the things which again we 372 00:43:47,119 --> 00:43:51,279 realized when we were implementing this is, that even though the standard 373 00:43:51,279 --> 00:43:58,910 standardized is the payload size of 4kB, actual implementations are different. For 374 00:43:58,910 --> 00:44:08,390 example desktop computers like Intel Core or Intel Atom they only have 128 byte 375 00:44:08,390 --> 00:44:18,740 payload. So there is much more overhead going on the bus to transfer data and even 376 00:44:18,740 --> 00:44:29,180 theoretically you can only achieve 87% efficiency. And on Xeon we tested and we 377 00:44:29,180 --> 00:44:37,110 found that they're using 256 payload size and this can give you like a 92% 378 00:44:37,110 --> 00:44:45,130 efficiency on the bus and this is before the overhead so the real reality is even 379 00:44:45,130 --> 00:44:53,180 worse. An interesting thing which we also did not expect, is that we originally were 380 00:44:53,180 --> 00:45:02,849 developing on Intel Atom and everything was working great. When we plug this into 381 00:45:02,849 --> 00:45:10,720 laptop like Core i7 multi-core really powerful device, we didn't expect that it 382 00:45:10,720 --> 00:45:20,140 wouldn't work. Obviously Core i7 should work better than Atom: no, not always. 383 00:45:20,140 --> 00:45:26,369 The thing is, we were plugging into a laptop, which had a built-in video card 384 00:45:26,369 --> 00:45:44,750 which was sitting on the same PCI bus and probably manufacturer hard-coded the higher 385 00:45:44,750 --> 00:45:50,590 priority for the video card than for everything else in the system, because I 386 00:45:50,590 --> 00:45:56,300 don't want your your screen to flicker. And so when you move a window you actually 387 00:45:56,300 --> 00:46:04,099 see the late packets coming to your PCI device. We had to introduce a jitter 388 00:46:04,099 --> 00:46:14,750 buffer and add more FIFO into the device to smooth it out. On the other hand the 389 00:46:14,750 --> 00:46:20,099 Xeon is performing really well. So it's very optimized. That said, we have tested 390 00:46:20,099 --> 00:46:28,119 it with discreet card and it outperforms everything by whooping five seven percent. 391 00:46:28,119 --> 00:46:38,799 What you get four for the price. So this is actually the end of the presentation. 392 00:46:38,799 --> 00:46:43,839 We still have not scheduled any workshop, but if there if there is any interest in 393 00:46:43,839 --> 00:46:53,390 actually seeing the device working or if you interested in learning more about the 394 00:46:53,390 --> 00:46:58,260 PCI Express in details let us know we'll schedule something in the next few days. 395 00:46:58,260 --> 00:47:05,339 That's the end, I think we can proceed with questions if there are any. 396 00:47:05,339 --> 00:47:14,950 Applause Herald: Okay, thank you very much. If you 397 00:47:14,950 --> 00:47:17,680 are leaving now: please try to leave quietly because we might have some 398 00:47:17,680 --> 00:47:22,960 questions and you want to hear them. If you have questions please line up right 399 00:47:22,960 --> 00:47:28,819 behind the microphones and I think we'll just wait because we don't have anything 400 00:47:28,819 --> 00:47:34,990 from the signal angel. However, if you are watching on stream you can hop into the 401 00:47:34,990 --> 00:47:39,500 channels and over social media to ask questions and they will be answered, 402 00:47:39,500 --> 00:47:47,890 hopefully. So on that microphone. Question 1: What's the minimum and maximum 403 00:47:47,890 --> 00:47:52,170 frequency of the card? Alexander Chemeris: You mean RF 404 00:47:52,170 --> 00:47:55,940 frequency? Question 1: No, the minimum frequency you 405 00:47:55,940 --> 00:48:05,640 can sample at. the most SDR devices can only sample at over 50 MHz. Is there a 406 00:48:05,640 --> 00:48:09,190 similar limitation at your card? Alexander Chemeris: Yeah, so if you're 407 00:48:09,190 --> 00:48:15,650 talking about RF frequency it can go from like almost zero even though that 408 00:48:15,650 --> 00:48:27,289 works worse below 50MHz and all the way to 3.8GHz if I remember correctly. And in 409 00:48:27,289 --> 00:48:34,880 terms of the sample rate right now it works from like about 2 MSPS and to about 410 00:48:34,880 --> 00:48:40,089 50 right now. But again, we're planning to get it to these numbers we quoted. 411 00:48:40,089 --> 00:48:45,720 Herald: Okay. The microphone over there. Question 2: Thanks for your talk. Did you 412 00:48:45,720 --> 00:48:48,630 manage to put your Linux kernel driver to the main line? 413 00:48:48,630 --> 00:48:53,519 Alexander Chemeris: No, not yet. I mean, it's not even like fully published. So I 414 00:48:53,519 --> 00:48:59,019 did not say in the beginning, sorry for this. We only just manufactured the first 415 00:48:59,019 --> 00:49:03,830 prototype, which we debugged heavily. So we are only planning to manufacture the 416 00:49:03,830 --> 00:49:10,290 second prototype with all these fixes and then we will release, like, the kernel 417 00:49:10,290 --> 00:49:16,700 driver and everything. And maybe we'll try or maybe won't try, haven't decided yet. 418 00:49:16,700 --> 00:49:18,310 Question 2: Thanks Herald: Okay... 419 00:49:18,310 --> 00:49:21,599 Alexander Chemeris: and that will be the whole other experience. 420 00:49:21,599 --> 00:49:26,099 Herald: Okay, over there. Question 3: Hey, looks like you went 421 00:49:26,099 --> 00:49:30,349 through some incredible amounts of pain to make this work. So, I was wondering, 422 00:49:30,349 --> 00:49:34,960 aren't there any simulators at least for parts of the system, or the PCIe bus for 423 00:49:34,960 --> 00:49:40,150 the DMA something? Any simulator so that you can actually first design the system 424 00:49:40,150 --> 00:49:44,630 there and debug it more easily? Sergey Kostanbaev: Yes, there are 425 00:49:44,630 --> 00:49:50,400 available simulators, but the problem's all there are non-free. So you have to pay 426 00:49:50,400 --> 00:49:57,109 for them. So yeah and we choose the hard way. 427 00:49:57,109 --> 00:49:59,520 Question 3: Okay thanks. Herald: We have a question from the signal 428 00:49:59,520 --> 00:50:03,180 angel. Question 4: Yeah are the FPGA codes, Linux 429 00:50:03,180 --> 00:50:07,650 driver, and library code, and the design project files public and if so, did they 430 00:50:07,650 --> 00:50:13,480 post them yet? They can't find them on xtrx.io. 431 00:50:13,480 --> 00:50:17,970 Alexander Chemeris: Yeah, so they're not published yet. As I said, we haven't 432 00:50:17,970 --> 00:50:24,579 released them. So, the drivers and libraries will definitely be available, 433 00:50:24,579 --> 00:50:28,589 FPGA code... We are considering this probably also will be available in open 434 00:50:28,589 --> 00:50:36,359 source. But we will publish them together with the public announcement of the 435 00:50:36,359 --> 00:50:42,220 device. Herald: Ok, that microphone. 436 00:50:42,220 --> 00:50:46,010 Question 5: Yes. Did you guys see any signal integrity issues between on the PCI 437 00:50:46,010 --> 00:50:50,009 bus, or on this bus to the LMS chip, the Lime microchip, I think, this doing 438 00:50:50,009 --> 00:50:51,009 the RF ? AC: Right. 439 00:50:51,009 --> 00:50:56,359 Question 5: Did you try to measure signal integrity issues, or... because there were 440 00:50:56,359 --> 00:51:01,130 some reliability issues, right? AC: Yeah, we actually... so, PCI. With PCI 441 00:51:01,130 --> 00:51:02,559 we never had issues, if I remember correctly. 442 00:51:02,559 --> 00:51:04,760 SK: No. AC: I just... it was just working. 443 00:51:04,760 --> 00:51:10,940 SK: Well, the board is so small, and when there are small traces there's no problem 444 00:51:10,940 --> 00:51:14,790 in signal integrity. So it's actually saved us. 445 00:51:14,790 --> 00:51:20,599 AC: Yeah. Designing a small board is easier. Yeah, with the LMS 7, the problem is not 446 00:51:20,599 --> 00:51:26,099 the signal integrity in terms of difference in the length of the traces, 447 00:51:26,099 --> 00:51:37,319 but rather the fact that the signal degrades over voltage, also over speed in 448 00:51:37,319 --> 00:51:44,010 terms of voltage, and drops below the detection level, and all this stuff. We 449 00:51:44,010 --> 00:51:47,220 use some measurements. I actually wanted to add some pictures here, but decided 450 00:51:47,220 --> 00:51:54,359 that's not going to be super interesting. H: Okay. Microphone over there. 451 00:51:54,359 --> 00:51:58,359 Question 6: Yes. Thanks for the talk. How much work would it be to convert the two 452 00:51:58,359 --> 00:52:05,610 by two SDR into an 8-input logic analyzer in terms of hard- and software? So, if you 453 00:52:05,610 --> 00:52:12,289 have a really fast logic analyzer, where you can record unlimited traces with? 454 00:52:12,289 --> 00:52:18,980 AC: A logic analyzer... Q6: So basically it's just also an analog 455 00:52:18,980 --> 00:52:27,040 digital converter and you largely want fast sampling and a large amount of memory 456 00:52:27,040 --> 00:52:30,900 to store the traces. AC: Well, I just think it's not the best 457 00:52:30,900 --> 00:52:40,300 use for it. It's probably... I don't know. Maybe Sergey has any ideas, but I think it 458 00:52:40,300 --> 00:52:47,549 just may be easier to get high-speed ADC and replace the Lime chip with a high- 459 00:52:47,549 --> 00:52:56,720 speed ADC to get what you want, because the Lime chip has so many things there 460 00:52:56,720 --> 00:53:01,450 specifically for RF. SK: Yeah, the main problem you cannot just 461 00:53:01,450 --> 00:53:09,099 sample original data. You should shift it over frequency, so you cannot sample 462 00:53:09,099 --> 00:53:16,619 original signal, and using it for something else except spectrum analyzing 463 00:53:16,619 --> 00:53:20,839 is hard. Q6: OK. Thanks. 464 00:53:20,839 --> 00:53:25,750 H: OK. Another question from the internet. Signal angel: Yes. Have you compared the 465 00:53:25,750 --> 00:53:32,240 sample rate of the ADC of the Lime DA chip to the USRP ADCs, and if so, how does the 466 00:53:32,240 --> 00:53:40,160 lower sample rate affect the performance? AC: So, comparing low sample rate to 467 00:53:40,160 --> 00:53:49,281 higher sample rate. We haven't done much testing on the RF performance yet, because 468 00:53:49,281 --> 00:53:58,440 we were so busy with all this stuff, so we are yet to see in terms of low bit rates 469 00:53:58,440 --> 00:54:03,190 versus sample rates versus high sample rate. Well, high sample rate always gives 470 00:54:03,190 --> 00:54:09,859 you better performance, but you also get higher power consumption. So, I guess it's 471 00:54:09,859 --> 00:54:14,019 the question of what's more more important for you. 472 00:54:14,019 --> 00:54:20,440 H: Okay. Over there. Question 7: I've gathered there is no 473 00:54:20,440 --> 00:54:25,319 mixer bypass, so you can't directly sample the signal. Is there a way to use the same 474 00:54:25,319 --> 00:54:31,720 antenna for send and receive, yet. AC: Actually, there is... Input for ADC. 475 00:54:31,720 --> 00:54:38,289 SK: But it's not a bypass, it's a dedicated pin on LMS chip, and since we're 476 00:54:38,289 --> 00:54:45,569 very space-constrained, we didn't route them, so you can not actually bypass it. 477 00:54:45,569 --> 00:54:50,359 AC: Okay, in our specific hardware, so in general, so in the LMS chip there is a 478 00:54:50,359 --> 00:54:58,170 special pin which allows you to drive your signal directly to ADC without all the 479 00:54:58,170 --> 00:55:02,950 mixers, filters, all this radio stuff, just directly to ADC. So, yes, 480 00:55:02,950 --> 00:55:06,869 theoretically that's possible. SK: We even thought about this, but it 481 00:55:06,869 --> 00:55:10,960 doesn't fit this design. Q7: Okay. And can I share antennas, 482 00:55:10,960 --> 00:55:15,700 because I have an existing laptop with existing antennas, but I would use the 483 00:55:15,700 --> 00:55:22,140 same antenna to send and receive. AC: Yeah, so, I mean, that's... depends on 484 00:55:22,140 --> 00:55:25,619 what exactly do you want to do. If you want a TDG system, then yes, if you 485 00:55:25,619 --> 00:55:30,869 want an FDG system, then you will have to put a small duplexer in there, but yeah, 486 00:55:30,869 --> 00:55:34,839 that's the idea. So you can plug this into your laptop and use your existing 487 00:55:34,839 --> 00:55:39,640 antennas. That's one of the ideas of how to use xtrx. 488 00:55:39,640 --> 00:55:41,799 Q7: Yeah, because there's all four connectors. 489 00:55:41,799 --> 00:55:45,400 AC: Yeah. One thing which I actually forgot to mention is - I kind of mentioned 490 00:55:45,400 --> 00:55:53,930 in the slides - is that any other SDRs which are based on Ethernet or on the USB 491 00:55:53,930 --> 00:56:02,309 can't work with a CSMA wireless systems, and the most famous CSMA system is Wi-Fi. 492 00:56:02,309 --> 00:56:09,259 So, it turns out that because of the latency between your operating system and 493 00:56:09,259 --> 00:56:17,569 your radio on USB, you just can't react fast enough for Wi-Fi to work, because you 494 00:56:17,569 --> 00:56:23,240 - probably you know that - in Wi-Fi you carrier sense, and if you sense that the 495 00:56:23,240 --> 00:56:29,579 spectrum is free, you start transmitting. Does make a sense when you have huge 496 00:56:29,579 --> 00:56:36,160 latency, because you all know that... you know the spectrum was free back then, so, 497 00:56:36,160 --> 00:56:43,730 with xtrx, you actually can work with CSMA systems like Wi-Fi, so again it makes it 498 00:56:43,730 --> 00:56:51,390 possible to have a fully software implementation of Wi-Fi in your laptop. It 499 00:56:51,390 --> 00:56:58,660 obviously won't work like as good as your commercial Wi-Fi, because you will have to 500 00:56:58,660 --> 00:57:03,839 do a lot of processing on your CPU, but for some purposes like experimentation, 501 00:57:03,839 --> 00:57:07,980 for example, for wireless labs and R&D labs, that's really valuable. 502 00:57:07,980 --> 00:57:11,400 Q7: Thanks. H: Okay. Over there. 503 00:57:11,400 --> 00:57:15,519 Q8: Okay. what PCB design package did you use?. 504 00:57:15,519 --> 00:57:17,819 AC: Altium. SK: Altium, yeah. 505 00:57:17,819 --> 00:57:22,940 Q8: And I'd be interested in the PCIe workshop. Would be really great if you do 506 00:57:22,940 --> 00:57:24,940 this one. AC: Say this again? 507 00:57:24,940 --> 00:57:28,069 Q8: Would be really great if you do the PCI Express workshop. 508 00:57:28,069 --> 00:57:32,720 AC: Ah. PCI Express workshop. Okay. Thank you. 509 00:57:32,720 --> 00:57:36,690 H: Okay, I think we have one more question from the microphones, and that's you. 510 00:57:36,690 --> 00:57:42,880 Q9: Okay. Great talk. And again, I would appreciate a PCI Express workshop, if it 511 00:57:42,880 --> 00:57:47,190 ever happens. What are these synchronization options between multiple 512 00:57:47,190 --> 00:57:55,089 cards. Can you synchronize the ADC clock, and can you synchronize the presumably 513 00:57:55,089 --> 00:58:04,609 digitally created IF? SK: Yes, so... so, unfortunately, just IF synchronization is 514 00:58:04,609 --> 00:58:10,279 not possible, because Lime chip doesn't expose a low frequency. But we can 515 00:58:10,279 --> 00:58:16,000 synchronize digitally. So, we have special one PPS signal synchronization. We have 516 00:58:16,000 --> 00:58:25,180 lines for clock synchronization and other stuff. We can do it in software. So the 517 00:58:25,180 --> 00:58:31,789 Lime chip has phase correction register, so when you measure... if there is a phase 518 00:58:31,789 --> 00:58:35,170 difference, so you can compensate it on different boards. 519 00:58:35,170 --> 00:58:39,309 Q9: Tune to a station a long way away and then rotate the phase until it aligns. 520 00:58:39,309 --> 00:58:41,819 SK: Yeah. Q9: Thank you. 521 00:58:41,819 --> 00:58:46,339 AC: Little tricky, but possible. So, that's one of our plans for future, 522 00:58:46,339 --> 00:58:52,819 because we do want to see, like 128 by 128 MIMO at home. 523 00:58:52,819 --> 00:58:56,060 H: Okay, we have another question from the internet. 524 00:58:56,060 --> 00:59:00,450 Signal angel: I actually have two questions. The first one is: What is the 525 00:59:00,450 --> 00:59:07,710 expected price after a prototype stage? And the second one is: Can you tell us 526 00:59:07,710 --> 00:59:10,400 more about this setup you had for debugging the PCIe 527 00:59:10,400 --> 00:59:15,970 issues? AC: Could you repeat the second question? 528 00:59:15,970 --> 00:59:20,269 SK: It's ????????????, I think. SA: It's more about the setup you had for 529 00:59:20,269 --> 00:59:24,480 debugging the PCIe issues. SK: Second question, I think it's most 530 00:59:24,480 --> 00:59:31,200 about our next workshop, because it's a more complicated setup, so... mostly 531 00:59:31,200 --> 00:59:35,580 remove everything about its now current presentation. 532 00:59:35,580 --> 00:59:39,580 AC: Yeah, but in general, and in terms of hardware setup, that was our hardware 533 00:59:39,580 --> 00:59:47,890 setup, so we bought this PCI Express to Thunderbolt3, we bought the laptop which 534 00:59:47,890 --> 00:59:53,089 supports Thunderbolt3, and that's how we were debugging it. So, we don't need, like 535 00:59:53,089 --> 00:59:57,780 a full-fledged PC, we don't have to restart it all the time. So, in terms of 536 00:59:57,780 --> 01:00:06,650 price, we don't have the fixed price yet. So, all I can say right now is that we are 537 01:00:06,650 --> 01:00:18,349 targeting no more than your bladeRF or HackRF devices, and probably even cheaper. 538 01:00:18,349 --> 01:00:25,210 For some versions. H: Okay. We are out of time, so thank you 539 01:00:25,210 --> 01:00:45,079 again Sergey and Alexander. [Applause] 540 01:00:45,079 --> 01:00:49,619 [Music] 541 01:00:49,619 --> 01:00:54,950 subtitles created by c3subtitles.de in the year 20??. Join, and help us!