WEBVTT 00:00:03.959 --> 00:00:08.670 [Music] 00:00:08.670 --> 00:00:21.900 Herald: Has anyone in here ever worked with libusb or PI USB? Hands up. Okay. Who 00:00:21.900 --> 00:00:32.168 also thinks USB is a pain? laughs Okay. Sergey and Alexander were here back in at 00:00:32.168 --> 00:00:38.769 the 26C3, that's a long time ago. I think it was back in Berlin, and back then they 00:00:38.769 --> 00:00:45.120 presented their first homemade, or not homemade, SDR, software-defined radio. 00:00:45.120 --> 00:00:49.440 This year they are back again and they want to show us how they implemented 00:00:49.440 --> 00:00:55.420 another one, using an FPGA, and to communicate with it they used PCI Express. 00:00:55.420 --> 00:01:01.589 So I think if you thought USB was a pain, let's see what they can tell us about PCI 00:01:01.589 --> 00:01:06.690 Express. A warm round of applause for Alexander and Sergey for building a high 00:01:06.690 --> 00:01:12.430 throughput, low latency, PCIe-based software-defined radio 00:01:12.430 --> 00:01:20.220 [Applause] Alexander Chemeris: Hi everyone, good 00:01:20.220 --> 00:01:30.280 morning, and welcome to the first day of the Congress. So, just a little bit 00:01:30.280 --> 00:01:36.180 background about what we've done previously and why we are doing what we 00:01:36.180 --> 00:01:42.229 are doing right now, is that we started working with software-defined radios and 00:01:42.229 --> 00:01:51.930 by the way, who knows what software defined radio is? Okay, perfect. laughs 00:01:51.930 --> 00:01:59.140 And who ever actually used a software- defined radio? RTL-SDR or...? Okay, less 00:01:59.140 --> 00:02:06.329 people but that's still quite a lot. Okay, good. I wonder whether anyone here used 00:02:06.329 --> 00:02:16.940 more expensive radios like USRPs? Less people, but okay, good. Cool. So before 00:02:16.940 --> 00:02:22.630 2008 I've had no idea what software- defined radio is, was working with voice 00:02:22.630 --> 00:02:30.330 over IP software person, etc., etc., so I in 2008 I heard about OpenBTS, got 00:02:30.330 --> 00:02:40.080 introduced to software-defined radio and I wanted to make it really work and that's 00:02:40.080 --> 00:02:52.250 what led us to today. In 2009 we had to develop a clock tamer. A hardware which 00:02:52.250 --> 00:03:00.170 allows to use, allowed to use USRP1 to run GSM without problems. If anyone ever tried 00:03:00.170 --> 00:03:05.420 doing this without a good clock source knows what I'm talking about. And we 00:03:05.420 --> 00:03:10.550 presented this - it wasn't an SDR it was just a clock source - we presented this in 00:03:10.550 --> 00:03:18.530 2009 in 26C3. Then I realized that using USRP1 is not 00:03:18.530 --> 00:03:23.760 really a good idea, because we wanted to build a robust, industrial-grade base 00:03:23.760 --> 00:03:29.980 stations. So we started developing our own software defined radio, which we call 00:03:29.980 --> 00:03:41.290 UmTRX and it was in - we started started this in 2011. Our first base stations with 00:03:41.290 --> 00:03:51.590 it were deployed in 2013, but I always wanted to have something really small and 00:03:51.590 --> 00:03:59.510 really inexpensive and back then it wasn't possible. My original idea in 2011, we 00:03:59.510 --> 00:04:07.680 were to build a PCI Express card. Mini, sorry, not PCI Express card but mini PCI 00:04:07.680 --> 00:04:10.100 card. If you remember there were like all the 00:04:10.100 --> 00:04:14.470 Wi-Fi cards and mini PCI form factor and I thought that would be really cool to have 00:04:14.470 --> 00:04:22.490 an SDR and mini PCI, so I can plug this into my laptop or in some embedded PC and 00:04:22.490 --> 00:04:31.710 have a nice SDR equipment, but back then it just was not really possible, because 00:04:31.710 --> 00:04:37.939 electronics were bigger and more power hungry and just didn't work that way, so 00:04:37.939 --> 00:04:49.539 we designed UmTRX to work over gigabit ethernet and it was about that size. So 00:04:49.539 --> 00:04:57.300 now we spend this year at designing something, which really brings me to what 00:04:57.300 --> 00:05:05.289 I wanted those years ago, so the XTRX is a mini PCI Express - again there was no PCI 00:05:05.289 --> 00:05:10.460 Express back then, so now it's mini PCI Express, which is even smaller than PCI, I 00:05:10.460 --> 00:05:17.719 mean mini PCI and it's built to be embedded friendly, so you can plug this 00:05:17.719 --> 00:05:23.669 into a single board computer, embedded single board computer. If you have a 00:05:23.669 --> 00:05:28.020 laptop with a mini PCI Express you can plug this into your laptop and you have a 00:05:28.020 --> 00:05:35.210 really small, software-defined radio equipment. And we really want to make it 00:05:35.210 --> 00:05:39.430 inexpensive, that's why I was asking how many of you have ever worked it with RTL- 00:05:39.430 --> 00:05:44.169 SDR, how many of you ever worked with you USRPs, because the gap between them is 00:05:44.169 --> 00:05:53.740 pretty big and we want to really bring the software-defined radio to masses. 00:05:53.740 --> 00:05:59.550 Definitely won't be as cheap as RTL-SDR, but we try to make it as close as 00:05:59.550 --> 00:06:03.330 possible. And at the same time, so at the size of 00:06:03.330 --> 00:06:09.659 RTL-SDR, at the price well higher but, hopeful hopefully it will be affordable to 00:06:09.659 --> 00:06:17.460 pretty much everyone, we really want to bring high performance into your hands. 00:06:17.460 --> 00:06:22.539 And by high performance I mean this is a full transmit/receive with two channels 00:06:22.539 --> 00:06:28.289 transmit, two channels receive, which is usually called 2x2 MIMO in in the radio 00:06:28.289 --> 00:06:37.370 world. The goal was to bring it to 160 megasamples per second, which can roughly 00:06:37.370 --> 00:06:44.110 give you like 120 MHz of radio spectrum available. 00:06:44.110 --> 00:06:53.111 So what we were able to achieve is, again this is mini PCI Express form factor, it 00:06:53.111 --> 00:07:01.639 has small Artix7, that's the smallest and most inexpensive FPGA, which has ability 00:07:01.639 --> 00:07:18.029 to work with a PCI Express. It has LMS7000 chip for RFIC, very high performance, very 00:07:18.029 --> 00:07:27.449 tightly embedded chip with even a DSP blocks inside. It has even a GPS chip 00:07:27.449 --> 00:07:37.340 here, you can actually on the right upper side, you can see a GPS chip, so you can 00:07:37.340 --> 00:07:44.060 accually synchronize your SDR to GPS for perfect clock stability, 00:07:44.060 --> 00:07:51.389 so you won't have any problems running any telecommunication systems like GSM, 3G, 4G 00:07:51.389 --> 00:07:58.650 due to clock problems, and it also has interface for SIM cards, so you can 00:07:58.650 --> 00:08:06.330 actually create a software-defined radio modem and run other open source projects 00:08:06.330 --> 00:08:15.840 to build one in a four LT called SRSUI, if you're interested, etc., etc. so really 00:08:15.840 --> 00:08:22.080 really tightly packed one. And if you put this into perspective: that's how it all 00:08:22.080 --> 00:08:30.669 started in 2006 and that's what you have ten years later. It's pretty impressive. 00:08:30.669 --> 00:08:36.840 applause Thanks. But I think it actually applies to 00:08:36.840 --> 00:08:40.320 the whole industry who is working on shrinking the sizes because we just put 00:08:40.320 --> 00:08:48.890 stuff on the PCB, you know. We're not building the silicon itself. Interesting 00:08:48.890 --> 00:08:54.701 thing is that we did the first approach: we said let's pack everything, let's do a 00:08:54.701 --> 00:09:03.180 very tight PCB design. We did an eight layer PCB design and when we send it to a 00:09:03.180 --> 00:09:10.490 fab to estimate the cost it turned out it's $15,000 US per piece. Well in small 00:09:10.490 --> 00:09:18.940 volumes obviously but still a little bit too much. So we had to redesign this and 00:09:18.940 --> 00:09:26.712 the first thing which we did is we still kept eight layers, because in our 00:09:26.712 --> 00:09:32.810 experience number of layers nowadays have only minimal impact on the cost of the 00:09:32.810 --> 00:09:42.450 device. So like six, eight layers - the price difference is not so big. But we did 00:09:42.450 --> 00:09:52.190 complete rerouting and only kept 2-Deep MicroVIAs and never use the buried VIAs. 00:09:52.190 --> 00:09:57.240 So this make it much easier and much faster for the fab to manufacture it and 00:09:57.240 --> 00:10:03.740 the price suddenly went five, six times down and in volume again it will be 00:10:03.740 --> 00:10:18.140 significantly cheaper. And that's just for geek porn how PCB looks inside. So now 00:10:18.140 --> 00:10:25.140 let's go into real stuff. So PCI Express: why did we choose PCI Express? As it was 00:10:25.140 --> 00:10:33.310 said USB is a pain in the ass. You can't really use USB in industrial systems. For 00:10:33.310 --> 00:10:40.510 a whole variety of reasons just unstable. So we did use Ethernet for many years 00:10:40.510 --> 00:10:47.190 successfully but Ethernet has one problem: first of all inexpensive Ethernet is only 00:10:47.190 --> 00:10:51.780 one gigabit and one gigabit does not offer you enough bandwidth to carry all the data 00:10:51.780 --> 00:10:59.720 we want, plus its power-hungry etc. etc. So PCI Express is really a good choice 00:10:59.720 --> 00:11:06.420 because it's low power, it has low latency, it has very high bandwidth and 00:11:06.420 --> 00:11:11.380 it's available almost universally. When we started looking into this we realize that 00:11:11.380 --> 00:11:17.320 even ARM boards, some of ARM boards have PCI Express, mini PCI Express slots, which 00:11:17.320 --> 00:11:26.560 was a big surprise for me for example. So the problems is that unlike USB you do 00:11:26.560 --> 00:11:36.540 need to write your own kernel driver for this and there's no way around. And it is 00:11:36.540 --> 00:11:41.110 really hard to write this driver universally so we are writing it obviously 00:11:41.110 --> 00:11:45.300 for Linux because they're working with embedded systems, but if we want to 00:11:45.300 --> 00:11:51.030 rewrite it for Windows or for macOS we'll have to do a lot of rewriting. So we focus 00:11:51.030 --> 00:11:57.250 on what we want on Linux only right now. And now the hardest part: debugging is 00:11:57.250 --> 00:12:02.580 really non-trivial. One small error and your PC is completely hanged because you 00:12:02.580 --> 00:12:08.750 use something wrong. And you have to reboot it and restart it. That's like 00:12:08.750 --> 00:12:15.500 debugging kernel but sometimes even harder. To make it worse there is no 00:12:15.500 --> 00:12:19.400 really easy-to-use plug-and-play interface. If you want to restart; 00:12:19.400 --> 00:12:24.250 normally, when you when you develop a PCI Express card, when you want when you want 00:12:24.250 --> 00:12:31.050 to restart it you have to restart your development machine. Again not a nice way, 00:12:31.050 --> 00:12:39.420 it's really hard. So the first thing we did is we found, that we can use 00:12:39.420 --> 00:12:47.100 Thunderbolt 3 which is just recently released, and it has ability to work 00:12:47.100 --> 00:12:57.200 directly with PCI Express bus. So it basically has a mode in which it converts 00:12:57.200 --> 00:13:01.410 a PCI Express into plug-and-play interface. So if you have a laptop which 00:13:01.410 --> 00:13:09.450 supports Thunderbolt 3 then you can use this to do plug and play your - plug or 00:13:09.450 --> 00:13:16.480 unplug your device to make your development easier. There are always 00:13:16.480 --> 00:13:23.620 problems: there's no easy way, there's no documentation. Thunderbolt is not 00:13:23.620 --> 00:13:27.380 compatible with Thunderbolt. Thunderbold 3 is not compatible with Thunderbold 2. 00:13:27.380 --> 00:13:33.760 So we had to buy a special laptop with Thunderbold 3 with special cables like all 00:13:33.760 --> 00:13:40.120 this all this hard stuff. And if you really want to get documentation you have 00:13:40.120 --> 00:13:47.500 to sign NDA and send a business plan to them so they can approve that your 00:13:47.500 --> 00:13:50.670 business makes sense. laughter 00:13:50.670 --> 00:13:58.640 I mean... laughs So we actually opted out. We set not to go through this, what 00:13:58.640 --> 00:14:05.340 we did is we found that someone is actually making PCI Express to Thunderbolt 00:14:05.340 --> 00:14:10.550 3 converters and selling them as dev boards and that was a big relief because 00:14:10.550 --> 00:14:16.740 it saved us lots of time, lots of money. You just order it from from some from some 00:14:16.740 --> 00:14:24.920 Asian company. And yeah this is how it looks like this converter. So you buy it, 00:14:24.920 --> 00:14:29.970 like several pieces you can plug in your PCI Express card there and you plug this 00:14:29.970 --> 00:14:38.330 into your laptop. And this is the with XTRX already plugged into it. Now the only 00:14:38.330 --> 00:14:50.160 problem we found is that typically UEFI has a security control enabled, so that 00:14:50.160 --> 00:14:56.700 any random thunderbold device can't hijack your PCI bus and can't get access to your 00:14:56.700 --> 00:15:01.740 kernel memory and do some bad stuff. Which is a good idea - the only problem is that 00:15:01.740 --> 00:15:06.730 there is, it's not fully implemented in Linux. So under Windows if you plug in a 00:15:06.730 --> 00:15:11.690 device which is which has no security features, which is not certified, it will 00:15:11.690 --> 00:15:16.510 politely ask you like: "Do you really trust this device? Do you want to use it?" 00:15:16.510 --> 00:15:21.940 you can say "yes". Under Linux it just does not work. laughs So we spend some 00:15:21.940 --> 00:15:25.730 time trying to figure out how to get around this. Right, some patches from 00:15:25.730 --> 00:15:30.370 Intel which are not mainline and we were not able to actually get them work. So we 00:15:30.370 --> 00:15:38.980 just had to disable all this security measure in the laptop. So be aware that 00:15:38.980 --> 00:15:46.610 this is the case and we suspect that happy users of Apple might not be able to do 00:15:46.610 --> 00:15:53.630 this because Apple don't have BIOS so it probably can't disable this feature. So 00:15:53.630 --> 00:16:01.820 probably good incentive for someone to actually finish writing the driver. 00:16:01.820 --> 00:16:08.130 So now to the goal: so we wanted to, we want to achieve 160 mega samples per 00:16:08.130 --> 00:16:13.550 second, 2x2 MIMO, which means two transceiver, two transmit, two receive 00:16:13.550 --> 00:16:24.040 channels at 12 bits, which is roughly 7.5 Gbit/s. So first result when we plug this 00:16:24.040 --> 00:16:26.230 when we got this board on the fab it didn't work 00:16:26.230 --> 00:16:30.430 Sergey Kostanbaev mumbles: as expected Alexander Chemeris: yes as expected so the 00:16:30.430 --> 00:16:39.750 first the interesting thing we realized is that: first of all the FPGA has Hardware 00:16:39.750 --> 00:16:47.210 blocks for talking to a PCI Express which was called GTP which basically implement 00:16:47.210 --> 00:16:56.850 like a PCI Express serial physical layer but the thing is the numbering is reversed 00:16:56.850 --> 00:17:04.319 in the in PCI Express in FPGA and we did not realize this so we had to do very very 00:17:04.319 --> 00:17:10.619 fine soldiering to actually swap the laughs swap the lanes you can see this 00:17:10.619 --> 00:17:18.490 very fine work there. We also found that one of the components 00:17:18.490 --> 00:17:28.870 was deadbug which is a well-known term for chips which design stage are placed at 00:17:28.870 --> 00:17:35.960 mirrored so we mirrored occasionally mirrored that they pin out so we had to 00:17:35.960 --> 00:17:41.880 solder it upside down and if you can realize how small it is you can also 00:17:41.880 --> 00:17:49.419 appreciate the work done. And what's funny when I was looking at dead bugs I actually 00:17:49.419 --> 00:17:56.929 found a manual from NASA which describes how to properly soldier dead bugs to get 00:17:56.929 --> 00:18:00.679 it approved. audience laughs 00:18:00.679 --> 00:18:08.230 So this is the link I think you can go there and enjoy it's also fun stuff there. 00:18:08.230 --> 00:18:17.379 So after fixing all of this our next attempt this kind of works. So next stage 00:18:17.379 --> 00:18:23.340 is debugging the FPGA code, which has to talk to PCI Express and PCI Express has to 00:18:23.340 --> 00:18:28.320 talk to Linux kernel and the kernel has to talk to the driver, driver has talked to 00:18:28.320 --> 00:18:37.749 the user space. So peripherals are easy so the UART SPIs we've got to work almost 00:18:37.749 --> 00:18:44.799 immediately no problems with that, but DMA was a real beast. So we spent a lot of 00:18:44.799 --> 00:18:52.660 time trying to get DMA to work and the problem is that with DMA it's on FPGA so 00:18:52.660 --> 00:18:59.730 you can't just place a breakpoint like you do in C or C++ or in other languages it's 00:18:59.730 --> 00:19:07.480 real-time system running on system like it's real-time hardware, which is running 00:19:07.480 --> 00:19:16.351 on the fabric so you we had to Sergey was mainly developing this had to write a lot 00:19:16.351 --> 00:19:22.779 of small test benches and and test everything piece by piece. 00:19:22.779 --> 00:19:31.480 So all parts of the DMA code we had was wrapped into a small test bench which was 00:19:31.480 --> 00:19:39.720 emulating all the all the tricks and as classics predicted it took about five to 00:19:39.720 --> 00:19:47.679 ten times more than actually writing the code. So we really blew up our and 00:19:47.679 --> 00:19:54.529 predicted timelines by doing this, but the end we've got really stable stable work. 00:19:54.529 --> 00:20:03.760 So some suggestions for anyone who will try to repeat this exercise is there is a 00:20:03.760 --> 00:20:09.590 logic analyzer built-in to Xilinx and you can use, it it's nice it's, sometimes it's 00:20:09.590 --> 00:20:15.960 very helpful but you can't debug transient box, which are coming out at 00:20:15.960 --> 00:20:22.990 when some weird conditions are coming up. So you have to implement some read back 00:20:22.990 --> 00:20:28.809 registers which shows important statistic like important data about how your system 00:20:28.809 --> 00:20:35.340 behaves, in our case it's various counters on the DMA interface. So you can actually 00:20:35.340 --> 00:20:40.950 see kind of see what's happening with your with your data: Is it received? Is it 00:20:40.950 --> 00:20:46.269 sent? How much is and how much is received? So like for example, we can see 00:20:46.269 --> 00:20:53.559 when we saturate the bus or when actually is an underrun so host is not providing 00:20:53.559 --> 00:20:57.389 data fast enough, so we can at least understand whether it's a host problem or 00:20:57.389 --> 00:21:01.769 whether it's an FPGA, problem on which part we do we debug next because again: 00:21:01.769 --> 00:21:07.770 it's a very multi layer problem you start with FPGA, PCI Express, kernel, driver, 00:21:07.770 --> 00:21:15.340 user space, and any part can fail. so you can't work blind like this. So again the 00:21:15.340 --> 00:21:23.179 goal was to get 160 MSPS with the first implementation we could 2 MSPS: roughly 60 00:21:23.179 --> 00:21:30.220 times slower. The problem is that software just wasn't 00:21:30.220 --> 00:21:36.149 keeping up and wasn't sending data fast enough. So it was like many things done 00:21:36.149 --> 00:21:41.390 but the most important parts is: use real- time priority if you want to get very 00:21:41.390 --> 00:21:46.940 stable results and well fix software bugs. And one of the most important bugs we had 00:21:46.940 --> 00:21:54.240 was that DMA buffers were not freed in proper time immediately so they were busy 00:21:54.240 --> 00:21:59.429 for longer than they should be, which introduced extra cycles and basically just 00:21:59.429 --> 00:22:06.009 reduced the bandwidth. At this point let's talk a little bit 00:22:06.009 --> 00:22:14.389 about how to implement a high-performance driver for Linux, because if you want to 00:22:14.389 --> 00:22:20.870 get real real performance you have to start with the right design. There are 00:22:20.870 --> 00:22:26.610 basically three approaches and the whole spectrum in between; like two approaches 00:22:26.610 --> 00:22:33.649 and the whole spectrum in between, which is where you can refer to three. The first 00:22:33.649 --> 00:22:41.529 approach is full kernel control, in which case kernel driver not only is on the 00:22:41.529 --> 00:22:45.701 transfer, it actually has all the logics of controlling your device and all the 00:22:45.701 --> 00:22:52.490 export ioctl to the user space and that's the kind of a traditional way of 00:22:52.490 --> 00:22:57.669 writing drivers. Your your user space is completely abstracted from all the 00:22:57.669 --> 00:23:07.029 details. The problem is that this is probably the slowest way to do it. The 00:23:07.029 --> 00:23:14.340 other way is what's called the "zero cup interface": your only control is held in 00:23:14.340 --> 00:23:21.380 the kernel and data is provided, the raw data is provided to user space "as-is". So 00:23:21.380 --> 00:23:27.919 you avoid memory copy which make it faster. But still not fast enough if you 00:23:27.919 --> 00:23:34.279 really want to achieve maximum performance, because you still have 00:23:34.279 --> 00:23:40.980 context switches between the kernel and the user space. The most... the fastest 00:23:40.980 --> 00:23:47.289 approach possible is to have full user space implementation when kernel just 00:23:47.289 --> 00:23:53.059 exposed everything and says "now you do it yourself" and you have no you have no 00:23:53.059 --> 00:24:02.429 context switches, like almost no, and you can really optimize everything. So what 00:24:02.429 --> 00:24:08.850 is... what are the problems with this? The pro the pros I already mentioned: no 00:24:08.850 --> 00:24:13.539 no switches between kernel user space, it's very low latency because of this as 00:24:13.539 --> 00:24:20.980 well, it's very high bandwidth. But if you are not interested in getting the very 00:24:20.980 --> 00:24:27.940 high performance, the most performance, and you just want to have like some little, 00:24:27.940 --> 00:24:33.299 like say low bandwidth performance, then you will have to add hacks, because you 00:24:33.299 --> 00:24:36.710 can't get notifications of the kernel that resources available is more data 00:24:36.710 --> 00:24:45.570 available. It also makes it vulnerable vulnerable because if user space can 00:24:45.570 --> 00:24:55.310 access it, then it can do whatever it want. We at the end decided that... one 00:24:55.310 --> 00:25:02.590 more important thing: how to actually to get the best performance out of out of the 00:25:02.590 --> 00:25:10.299 bus. This is a very (?)(?) set as we want to poll your device or not to poll and get 00:25:10.299 --> 00:25:14.259 notified. What is polling? I guess everyone as programmer understands it, so 00:25:14.259 --> 00:25:18.019 polling is when you asked repeatedly: "Are you ready?", "Are you ready?", "Are you 00:25:18.019 --> 00:25:20.369 ready?" and when it's ready you get the data immediately. 00:25:20.369 --> 00:25:25.259 It's basically a busy loop of your you just constantly asking device what's 00:25:25.259 --> 00:25:33.350 happening. You need to dedicate a full core, and thanks God we have multi-core 00:25:33.350 --> 00:25:39.519 CPUs nowadays, so you can dedicate the full core to this polling and you can just 00:25:39.519 --> 00:25:45.539 pull constantly. But again if you don't need this highest performance, you just 00:25:45.539 --> 00:25:53.190 need to get something, then you will be wasting a lot of CPU resources. At the end 00:25:53.190 --> 00:26:00.429 we decided to do a combined architecture of your, it is possible to pull but 00:26:00.429 --> 00:26:05.500 there's also a chance and to get notification from a kernel to for for 00:26:05.500 --> 00:26:11.049 applications, which recover, which needs low bandwidth, but also require a better 00:26:11.049 --> 00:26:17.480 CPU performance. Which I think is the best way if you are trying to target both 00:26:17.480 --> 00:26:30.850 worlds. Very quickly: the architecture of system. We try to make it very very 00:26:30.850 --> 00:26:50.730 portable so and flexible. There is a kernel driver, which talks to low-level 00:26:50.730 --> 00:26:55.690 library which implements all this logic, which we took out of the driver: to 00:26:55.690 --> 00:27:01.309 control the PCI Express, to work with DMA, to provide 00:27:01.309 --> 00:27:09.360 all the... to hide all the details of the actual bus implementation. 00:27:09.360 --> 00:27:17.169 And then there is a high-level library which talks to this low-level library and 00:27:17.169 --> 00:27:22.179 also to libraries which implement control of actual peripherals, and most 00:27:22.179 --> 00:27:28.919 importantly to the library which implements control over our RFIC chip. 00:27:28.919 --> 00:27:35.119 This way it's very modular, we can replace PCI Express with something else later, we 00:27:35.119 --> 00:27:46.049 might be able to port it to other operating systems, and that's the goal. 00:27:46.049 --> 00:27:50.059 Another interesting issue is: when you start writing the Linux kernel driver you 00:27:50.059 --> 00:27:57.119 very quickly realize that while LDD, which is a classic book for a Linux driver, 00:27:57.119 --> 00:28:02.220 writing is good and it will give you a good insight; it's not actually up-to- 00:28:02.220 --> 00:28:08.609 date. It's more than ten years old and there's all of new interfaces which are 00:28:08.609 --> 00:28:14.809 not described there, so you have to resort to reading the manuals and all the 00:28:14.809 --> 00:28:20.409 documentation in the kernel itself. Well at least you get the up-to-date 00:28:20.409 --> 00:28:31.989 information. The decisions we made is to make everything easy. We use TTY for GPS 00:28:31.989 --> 00:28:38.090 and so you can really attach a pretty much any application which talks to GPS. So all 00:28:38.090 --> 00:28:45.970 of existing applications can just work out of the box. And we also wanted to be able 00:28:45.970 --> 00:28:54.879 to synchronize system clock to GPS, so we get automatic log synchronization across 00:28:54.879 --> 00:28:59.009 multiple systems, which is very important when we are deploying many, many devices 00:28:59.009 --> 00:29:07.090 around the world. We plan to do two interfaces, one as key 00:29:07.090 --> 00:29:15.919 PPS and another is a DCT, because DCT line on the UART exposed over TTY. Because 00:29:15.919 --> 00:29:20.259 again we found that there are two types of applications: one to support one API, 00:29:20.259 --> 00:29:25.539 others that support other API and there is no common thing so we have to support 00:29:25.539 --> 00:29:38.649 both. As we described, we want to have polls so we can get notifications of the 00:29:38.649 --> 00:29:48.130 kernel when data is available and we don't need to do real busy looping all the time. 00:29:48.130 --> 00:29:55.789 After all the software optimizations we've got to like 10 MSPS: still very, very far 00:29:55.789 --> 00:30:02.369 from what we want to achieve. Now there should have been a lot of 00:30:02.369 --> 00:30:06.570 explanations about PCI Express, but when we actually wrote everything we wanted to 00:30:06.570 --> 00:30:13.999 say we realize, it's just like a full two hours talk just on PCI Express. So we are 00:30:13.999 --> 00:30:17.760 not going to give it here, I'll just give some highlights which are most 00:30:17.760 --> 00:30:23.889 interesting. If you if there is real interest, we can set up a workshop and 00:30:23.889 --> 00:30:32.340 some of the later days and talking more details about PCI Express specifically. 00:30:32.340 --> 00:30:38.549 The thing is there is no open source cores for PCI Express, which are optimized for 00:30:38.549 --> 00:30:48.010 high performance, real time applications. There is Xillybus which as I understand is 00:30:48.010 --> 00:30:53.350 going to be open source, but they provide you a source if you pay them. It's very 00:30:53.350 --> 00:30:59.610 popular because it's very very easy to do, but it's not giving you performance. If I 00:30:59.610 --> 00:31:04.980 remember correctly the best it can do is maybe like 50 percent bus saturation. 00:31:04.980 --> 00:31:10.800 So there's also Xilinx implementation, but if you are using Xilinx implementation 00:31:10.800 --> 00:31:21.049 with AXI bus than you're really locked in with AXI bus with Xilinx. And it also not 00:31:21.049 --> 00:31:25.001 very efficient in terms of resources and if you remember we want to make this very, 00:31:25.001 --> 00:31:30.029 very inexpensive. So our goal is to you ... is to be able to fit everything in the 00:31:30.029 --> 00:31:38.499 smallest Arctic's 7 FPGA, and that's quite challenging with all the stuff in there 00:31:38.499 --> 00:31:47.649 and we just can't waste resources. So decision is to write your own PCI Express 00:31:47.649 --> 00:31:53.039 implementation. That's how it looks like. I'm not going to discuss it right now. 00:31:53.039 --> 00:31:59.950 There are several iterations. Initially it looked much simpler, turned out not to 00:31:59.950 --> 00:32:06.100 work well. So some interesting stuff about PCI 00:32:06.100 --> 00:32:12.749 Express which we stumbled upon is that it was working really well on Atom which is 00:32:12.749 --> 00:32:17.460 our main development platform because we are doing a lot of embedded stuff. Worked 00:32:17.460 --> 00:32:26.479 really well. When we try to plug this into core i7 just started hanging once in a 00:32:26.479 --> 00:32:35.090 while. So after like several not days maybe with debugging, Sergey found that 00:32:35.090 --> 00:32:39.330 very interesting statement in the standard which says that value is zero in byte 00:32:39.330 --> 00:32:45.869 count actually stands not for zero bytes but for 4096 bytes. 00:32:45.869 --> 00:32:58.739 I mean that's a really cool optimization. So another thing is completion which is a 00:32:58.739 --> 00:33:03.639 term in PCI Express basically for acknowledgment which also can carry some 00:33:03.639 --> 00:33:12.429 data back to your request. And sometimes if you're not sending completion, device 00:33:12.429 --> 00:33:20.740 just hangs. And what happens is that in this case due to some historical heritage 00:33:20.740 --> 00:33:29.549 of x86 it just starts returning you FFF. And if you have a register which says: „Is 00:33:29.549 --> 00:33:35.470 your device okay?“ and this register shows one to say „The device is okay“, guess 00:33:35.470 --> 00:33:38.500 what will happen? You will be always reading that your 00:33:38.500 --> 00:33:46.590 device is okay. So the suggestion is not to use one as the status for okay and use 00:33:46.590 --> 00:33:52.790 either zero or better like a two-beat sequence. So you are definitely sure that 00:33:52.790 --> 00:34:03.659 you are okay and not getting FFF's. So when you have a device which again may 00:34:03.659 --> 00:34:10.440 fail at any of the layers, you just got this new board, it's really hard, it's 00:34:10.440 --> 00:34:17.639 really hard to debug because of memory corruption. So we had a software bug and 00:34:17.639 --> 00:34:25.099 it was writing DMA addresses incorrectly and we were wondering why we 00:34:25.099 --> 00:34:32.179 are not getting any data in our buffers at the same time. After several starts, 00:34:32.179 --> 00:34:41.159 operating system just crashes. Well, that's the reason why there is this UEFI 00:34:41.159 --> 00:34:47.199 protection which prevents you from plugging in devices like this into your 00:34:47.199 --> 00:34:52.270 computer. Because it was basically writing data, like random data into random 00:34:52.270 --> 00:35:00.299 portions of your memory. So a lot of debugging, a lot of tests and test benches 00:35:00.299 --> 00:35:10.589 and we were able to find this. And another thing is if you deinitialize your driver 00:35:10.589 --> 00:35:15.250 incorrectly, and that's what's happening when you have plug-and-play device, which 00:35:15.250 --> 00:35:22.119 you can plug and unplug, then you may end up in a situation of your ... you are 00:35:22.119 --> 00:35:28.039 trying to write into memory which is already freed by approaching system and 00:35:28.039 --> 00:35:35.960 used for something else. Very well-known problem but it also happens here. So there 00:35:35.960 --> 00:35:50.549 ... why DMA is really hard is because it has this completion architecture for 00:35:50.549 --> 00:35:56.440 writing for ... sorry ... for reading data. Writes are easy. You just send the 00:35:56.440 --> 00:36:00.460 data, you forget about it. It's a fire- and-forget system. But for reading you 00:36:00.460 --> 00:36:10.420 really need to get your data back. And the thing is, it looks like this. You really 00:36:10.420 --> 00:36:16.020 hope that there would be some pointing device here. But basically on the top left 00:36:16.020 --> 00:36:24.240 you can see requests for read and on the right you can see completion transactions. 00:36:24.240 --> 00:36:29.890 So basically each transaction can be and most likely will be split into multiple 00:36:29.890 --> 00:36:38.900 transactions. So first of all you have to collect all these pieces and like write 00:36:38.900 --> 00:36:46.210 them into proper parts of the memory. But that's not all. The thing is the 00:36:46.210 --> 00:36:53.369 latency between request and completion is really high. It's like 50 cycles. So if 00:36:53.369 --> 00:36:58.990 you have a single, only single transaction in fly you will get really bad 00:36:58.990 --> 00:37:03.900 performance. You do need to have multiple transactions in flight. And the worst 00:37:03.900 --> 00:37:13.170 thing is that transactions can return data in random order. So it's a much more 00:37:13.170 --> 00:37:19.820 complicated state machine than we expected originally. So when I said, you know, the 00:37:19.820 --> 00:37:25.589 architecture was much simpler originally, we don't have all of this and we had to 00:37:25.589 --> 00:37:31.670 realize this while implementing. So again here was a whole description of how 00:37:31.670 --> 00:37:41.200 exactly this works. But not this time. So now after all these optimizations we've 00:37:41.200 --> 00:37:48.859 got 20 mega samples per second which is just six times lower than what we are 00:37:48.859 --> 00:37:59.599 aiming at. So now the next thing is PCI Express lanes scalability. So PCI Express 00:37:59.599 --> 00:38:07.220 is a serial bus. So it has multiple lanes and they allow you to basically 00:38:07.220 --> 00:38:14.350 horizontally scale your bandwidth. One lane is like x, than two lane is 2x, four 00:38:14.350 --> 00:38:20.160 lane is 4x. So the more lanes you have the more performance you are getting out of 00:38:20.160 --> 00:38:23.970 your, out of your bus. So the more bandwidth you're getting out of your bus. 00:38:23.970 --> 00:38:31.700 Not performance. So the issue is that typical a mini PCI Express, so the mini 00:38:31.700 --> 00:38:38.600 PCI Express standard only standardized one lane. And second lane is left as optional. 00:38:38.600 --> 00:38:46.099 So most motherboards don't support this. There are some but not all of them. And we 00:38:46.099 --> 00:38:52.370 really wanted to get this done. So we designed a special converter board which 00:38:52.370 --> 00:38:57.530 allows you to plug your mini PCI Express into a full-size PCI Express and 00:38:57.530 --> 00:39:06.790 get two lanes working. And we're also planning to have a similar board which 00:39:06.790 --> 00:39:12.660 will have multiple slots so you will be able to get multiple XTRX-SDRs on to the 00:39:12.660 --> 00:39:21.270 same, onto the same carrier board and plug this into let's say PCI Express 16x and 00:39:21.270 --> 00:39:29.059 you will get like really a lot of ... SDR ... a lot of IQ data which then will be 00:39:29.059 --> 00:39:38.760 your problem how to, how to process. So with two x's it's about twice performance 00:39:38.760 --> 00:39:48.930 so we are getting fifty mega samples per second. And that's the time to really cut 00:39:48.930 --> 00:39:59.230 the fat because the real sample size of LMS7 is 12 bits and we are transmitting 16 00:39:59.230 --> 00:40:06.930 because it's easier. Because CPU is working on 8, 16, 32. So we originally 00:40:06.930 --> 00:40:13.770 designed the driver to support 8 bit, 12 bit and 16 bit to be able to do this 00:40:13.770 --> 00:40:23.800 scaling. And for the test we said okay let's go from 16 to 8 bit. We'll lose 00:40:23.800 --> 00:40:32.960 some dynamic range but who cares these days. Still stayed the same, it's still 50 00:40:32.960 --> 00:40:41.980 mega samples per second, no matter what we did. And that was a lot of interesting 00:40:41.980 --> 00:40:49.580 debugging going on. And we realized that we actually made another, not a really 00:40:49.580 --> 00:40:58.720 mistake. We didn't, we didn't really know this when we designed. But we should have 00:40:58.720 --> 00:41:04.450 used a higher voltage for this high speed bus to get it to the full performance. And 00:41:04.450 --> 00:41:12.619 at 1.8 it was just degrading too fast and the bus itself was not performing well. So 00:41:12.619 --> 00:41:21.859 our next prototype will be using higher voltage specifically for this bus. And 00:41:21.859 --> 00:41:26.559 this is kind of stuff which makes designing hardware for high speed really 00:41:26.559 --> 00:41:32.210 hard because you have to care about coherence of the parallel buses on your, 00:41:32.210 --> 00:41:38.550 on your system. So at the same time we do want to keep 1.8 volts for everything else 00:41:38.550 --> 00:41:43.480 as much as possible. Because another problem we are facing with this device is 00:41:43.480 --> 00:41:47.069 that by the standard mini PCI Express allows only like ... 00:41:47.069 --> 00:41:51.220 Sergey Kostanbaev: ... 2.5 ... Alexander Chemeris: ... 2.5 watts of power 00:41:51.220 --> 00:41:58.369 consumption, no more. And that's we were, we were very lucky that LMS7 has such so 00:41:58.369 --> 00:42:04.460 good, so good power consumption performance. We actually had some extra 00:42:04.460 --> 00:42:10.049 space to have FPGA and GPS and all this stuff. But we just can't let the power 00:42:10.049 --> 00:42:14.880 consumption go up. Our measurements on this device showed about ... 00:42:14.880 --> 00:42:18.510 Sergey Kostanbaev: ... 2.3 ... Alexander Chemeris: ... 2.3 watts of power 00:42:18.510 --> 00:42:27.220 consumption. So we are like at the limit at this point. So when we fix the bus with 00:42:27.220 --> 00:42:31.420 the higher voltage, you know it's a theoretical exercise, because we haven't 00:42:31.420 --> 00:42:38.000 done this yet, that's plenty to happen in a couple months. We should be able to get 00:42:38.000 --> 00:42:47.330 to this numbers which was just 1.2 times slower. Then the next thing will be to fix 00:42:47.330 --> 00:42:55.550 another issue which we made at the very beginning: we have procured a wrong chip. 00:42:55.550 --> 00:43:05.270 Just one digit difference, you can see it's highlighted in red and green, and 00:43:05.270 --> 00:43:13.230 this chip it supports only a generation 1 PCI Express which is twice slower than 00:43:13.230 --> 00:43:18.190 generation 2 PCI Express. So again, hopefully we'll replace the chip 00:43:18.190 --> 00:43:30.140 and just get very simple doubling of the performance. Still it will be slower than 00:43:30.140 --> 00:43:39.770 we wanted it to be and here is what comes like practical versus theoretical numbers. 00:43:39.770 --> 00:43:47.119 Well as every bus it has it has overheads and one of the things which again we 00:43:47.119 --> 00:43:51.279 realized when we were implementing this is, that even though the standard 00:43:51.279 --> 00:43:58.910 standardized is the payload size of 4kB, actual implementations are different. For 00:43:58.910 --> 00:44:08.390 example desktop computers like Intel Core or Intel Atom they only have 128 byte 00:44:08.390 --> 00:44:18.740 payload. So there is much more overhead going on the bus to transfer data and even 00:44:18.740 --> 00:44:29.180 theoretically you can only achieve 87% efficiency. And on Xeon we tested and we 00:44:29.180 --> 00:44:37.110 found that they're using 256 payload size and this can give you like a 92% 00:44:37.110 --> 00:44:45.130 efficiency on the bus and this is before the overhead so the real reality is even 00:44:45.130 --> 00:44:53.180 worse. An interesting thing which we also did not expect, is that we originally were 00:44:53.180 --> 00:45:02.849 developing on Intel Atom and everything was working great. When we plug this into 00:45:02.849 --> 00:45:10.720 laptop like Core i7 multi-core really powerful device, we didn't expect that it 00:45:10.720 --> 00:45:20.140 wouldn't work. Obviously Core i7 should work better than Atom: no, not always. 00:45:20.140 --> 00:45:26.369 The thing is, we were plugging into a laptop, which had a built-in video card 00:45:26.369 --> 00:45:44.750 which was sitting on the same PCI bus and probably manufacturer hard-coded the higher 00:45:44.750 --> 00:45:50.590 priority for the video card than for everything else in the system, because I 00:45:50.590 --> 00:45:56.300 don't want your your screen to flicker. And so when you move a window you actually 00:45:56.300 --> 00:46:04.099 see the late packets coming to your PCI device. We had to introduce a jitter 00:46:04.099 --> 00:46:14.750 buffer and add more FIFO into the device to smooth it out. On the other hand the 00:46:14.750 --> 00:46:20.099 Xeon is performing really well. So it's very optimized. That said, we have tested 00:46:20.099 --> 00:46:28.119 it with discreet card and it outperforms everything by whooping five seven percent. 00:46:28.119 --> 00:46:38.799 What you get four for the price. So this is actually the end of the presentation. 00:46:38.799 --> 00:46:43.839 We still have not scheduled any workshop, but if there if there is any interest in 00:46:43.839 --> 00:46:53.390 actually seeing the device working or if you interested in learning more about the 00:46:53.390 --> 00:46:58.260 PCI Express in details let us know we'll schedule something in the next few days. 00:46:58.260 --> 00:47:05.339 That's the end, I think we can proceed with questions if there are any. 00:47:05.339 --> 00:47:14.950 Applause Herald: Okay, thank you very much. If you 00:47:14.950 --> 00:47:17.680 are leaving now: please try to leave quietly because we might have some 00:47:17.680 --> 00:47:22.960 questions and you want to hear them. If you have questions please line up right 00:47:22.960 --> 00:47:28.819 behind the microphones and I think we'll just wait because we don't have anything 00:47:28.819 --> 00:47:34.990 from the signal angel. However, if you are watching on stream you can hop into the 00:47:34.990 --> 00:47:39.500 channels and over social media to ask questions and they will be answered, 00:47:39.500 --> 00:47:47.890 hopefully. So on that microphone. Question 1: What's the minimum and maximum 00:47:47.890 --> 00:47:52.170 frequency of the card? Alexander Chemeris: You mean RF 00:47:52.170 --> 00:47:55.940 frequency? Question 1: No, the minimum frequency you 00:47:55.940 --> 00:48:05.640 can sample at. the most SDR devices can only sample at over 50 MHz. Is there a 00:48:05.640 --> 00:48:09.190 similar limitation at your card? Alexander Chemeris: Yeah, so if you're 00:48:09.190 --> 00:48:15.650 talking about RF frequency it can go from like almost zero even though that 00:48:15.650 --> 00:48:27.289 works worse below 50MHz and all the way to 3.8GHz if I remember correctly. And in 00:48:27.289 --> 00:48:34.880 terms of the sample rate right now it works from like about 2 MSPS and to about 00:48:34.880 --> 00:48:40.089 50 right now. But again, we're planning to get it to these numbers we quoted. 00:48:40.089 --> 00:48:45.720 Herald: Okay. The microphone over there. Question 2: Thanks for your talk. Did you 00:48:45.720 --> 00:48:48.630 manage to put your Linux kernel driver to the main line? 00:48:48.630 --> 00:48:53.519 Alexander Chemeris: No, not yet. I mean, it's not even like fully published. So I 00:48:53.519 --> 00:48:59.019 did not say in the beginning, sorry for this. We only just manufactured the first 00:48:59.019 --> 00:49:03.830 prototype, which we debugged heavily. So we are only planning to manufacture the 00:49:03.830 --> 00:49:10.290 second prototype with all these fixes and then we will release, like, the kernel 00:49:10.290 --> 00:49:16.700 driver and everything. And maybe we'll try or maybe won't try, haven't decided yet. 00:49:16.700 --> 00:49:18.310 Question 2: Thanks Herald: Okay... 00:49:18.310 --> 00:49:21.599 Alexander Chemeris: and that will be the whole other experience. 00:49:21.599 --> 00:49:26.099 Herald: Okay, over there. Question 3: Hey, looks like you went 00:49:26.099 --> 00:49:30.349 through some incredible amounts of pain to make this work. So, I was wondering, 00:49:30.349 --> 00:49:34.960 aren't there any simulators at least for parts of the system, or the PCIe bus for 00:49:34.960 --> 00:49:40.150 the DMA something? Any simulator so that you can actually first design the system 00:49:40.150 --> 00:49:44.630 there and debug it more easily? Sergey Kostanbaev: Yes, there are 00:49:44.630 --> 00:49:50.400 available simulators, but the problem's all there are non-free. So you have to pay 00:49:50.400 --> 00:49:57.109 for them. So yeah and we choose the hard way. 00:49:57.109 --> 00:49:59.520 Question 3: Okay thanks. Herald: We have a question from the signal 00:49:59.520 --> 00:50:03.180 angel. Question 4: Yeah are the FPGA codes, Linux 00:50:03.180 --> 00:50:07.650 driver, and library code, and the design project files public and if so, did they 00:50:07.650 --> 00:50:13.480 post them yet? They can't find them on xtrx.io. 00:50:13.480 --> 00:50:17.970 Alexander Chemeris: Yeah, so they're not published yet. As I said, we haven't 00:50:17.970 --> 00:50:24.579 released them. So, the drivers and libraries will definitely be available, 00:50:24.579 --> 00:50:28.589 FPGA code... We are considering this probably also will be available in open 00:50:28.589 --> 00:50:36.359 source. But we will publish them together with the public announcement of the 00:50:36.359 --> 00:50:42.220 device. Herald: Ok, that microphone. 00:50:42.220 --> 00:50:46.010 Question 5: Yes. Did you guys see any signal integrity issues between on the PCI 00:50:46.010 --> 00:50:50.009 bus, or on this bus to the LMS chip, the Lime microchip, I think, this doing 00:50:50.009 --> 00:50:51.009 the RF ? AC: Right. 00:50:51.009 --> 00:50:56.359 Question 5: Did you try to measure signal integrity issues, or... because there were 00:50:56.359 --> 00:51:01.130 some reliability issues, right? AC: Yeah, we actually... so, PCI. With PCI 00:51:01.130 --> 00:51:02.559 we never had issues, if I remember correctly. 00:51:02.559 --> 00:51:04.760 SK: No. AC: I just... it was just working. 00:51:04.760 --> 00:51:10.940 SK: Well, the board is so small, and when there are small traces there's no problem 00:51:10.940 --> 00:51:14.790 in signal integrity. So it's actually saved us. 00:51:14.790 --> 00:51:20.599 AC: Yeah. Designing a small board is easier. Yeah, with the LMS 7, the problem is not 00:51:20.599 --> 00:51:26.099 the signal integrity in terms of difference in the length of the traces, 00:51:26.099 --> 00:51:37.319 but rather the fact that the signal degrades over voltage, also over speed in 00:51:37.319 --> 00:51:44.010 terms of voltage, and drops below the detection level, and all this stuff. We 00:51:44.010 --> 00:51:47.220 use some measurements. I actually wanted to add some pictures here, but decided 00:51:47.220 --> 00:51:54.359 that's not going to be super interesting. H: Okay. Microphone over there. 00:51:54.359 --> 00:51:58.359 Question 6: Yes. Thanks for the talk. How much work would it be to convert the two 00:51:58.359 --> 00:52:05.610 by two SDR into an 8-input logic analyzer in terms of hard- and software? So, if you 00:52:05.610 --> 00:52:12.289 have a really fast logic analyzer, where you can record unlimited traces with? 00:52:12.289 --> 00:52:18.980 AC: A logic analyzer... Q6: So basically it's just also an analog 00:52:18.980 --> 00:52:27.040 digital converter and you largely want fast sampling and a large amount of memory 00:52:27.040 --> 00:52:30.900 to store the traces. AC: Well, I just think it's not the best 00:52:30.900 --> 00:52:40.300 use for it. It's probably... I don't know. Maybe Sergey has any ideas, but I think it 00:52:40.300 --> 00:52:47.549 just may be easier to get high-speed ADC and replace the Lime chip with a high- 00:52:47.549 --> 00:52:56.720 speed ADC to get what you want, because the Lime chip has so many things there 00:52:56.720 --> 00:53:01.450 specifically for RF. SK: Yeah, the main problem you cannot just 00:53:01.450 --> 00:53:09.099 sample original data. You should shift it over frequency, so you cannot sample 00:53:09.099 --> 00:53:16.619 original signal, and using it for something else except spectrum analyzing 00:53:16.619 --> 00:53:20.839 is hard. Q6: OK. Thanks. 00:53:20.839 --> 00:53:25.750 H: OK. Another question from the internet. Signal angel: Yes. Have you compared the 00:53:25.750 --> 00:53:32.240 sample rate of the ADC of the Lime DA chip to the USRP ADCs, and if so, how does the 00:53:32.240 --> 00:53:40.160 lower sample rate affect the performance? AC: So, comparing low sample rate to 00:53:40.160 --> 00:53:49.281 higher sample rate. We haven't done much testing on the RF performance yet, because 00:53:49.281 --> 00:53:58.440 we were so busy with all this stuff, so we are yet to see in terms of low bit rates 00:53:58.440 --> 00:54:03.190 versus sample rates versus high sample rate. Well, high sample rate always gives 00:54:03.190 --> 00:54:09.859 you better performance, but you also get higher power consumption. So, I guess it's 00:54:09.859 --> 00:54:14.019 the question of what's more more important for you. 00:54:14.019 --> 00:54:20.440 H: Okay. Over there. Question 7: I've gathered there is no 00:54:20.440 --> 00:54:25.319 mixer bypass, so you can't directly sample the signal. Is there a way to use the same 00:54:25.319 --> 00:54:31.720 antenna for send and receive, yet. AC: Actually, there is... Input for ADC. 00:54:31.720 --> 00:54:38.289 SK: But it's not a bypass, it's a dedicated pin on LMS chip, and since we're 00:54:38.289 --> 00:54:45.569 very space-constrained, we didn't route them, so you can not actually bypass it. 00:54:45.569 --> 00:54:50.359 AC: Okay, in our specific hardware, so in general, so in the LMS chip there is a 00:54:50.359 --> 00:54:58.170 special pin which allows you to drive your signal directly to ADC without all the 00:54:58.170 --> 00:55:02.950 mixers, filters, all this radio stuff, just directly to ADC. So, yes, 00:55:02.950 --> 00:55:06.869 theoretically that's possible. SK: We even thought about this, but it 00:55:06.869 --> 00:55:10.960 doesn't fit this design. Q7: Okay. And can I share antennas, 00:55:10.960 --> 00:55:15.700 because I have an existing laptop with existing antennas, but I would use the 00:55:15.700 --> 00:55:22.140 same antenna to send and receive. AC: Yeah, so, I mean, that's... depends on 00:55:22.140 --> 00:55:25.619 what exactly do you want to do. If you want a TDG system, then yes, if you 00:55:25.619 --> 00:55:30.869 want an FDG system, then you will have to put a small duplexer in there, but yeah, 00:55:30.869 --> 00:55:34.839 that's the idea. So you can plug this into your laptop and use your existing 00:55:34.839 --> 00:55:39.640 antennas. That's one of the ideas of how to use xtrx. 00:55:39.640 --> 00:55:41.799 Q7: Yeah, because there's all four connectors. 00:55:41.799 --> 00:55:45.400 AC: Yeah. One thing which I actually forgot to mention is - I kind of mentioned 00:55:45.400 --> 00:55:53.930 in the slides - is that any other SDRs which are based on Ethernet or on the USB 00:55:53.930 --> 00:56:02.309 can't work with a CSMA wireless systems, and the most famous CSMA system is Wi-Fi. 00:56:02.309 --> 00:56:09.259 So, it turns out that because of the latency between your operating system and 00:56:09.259 --> 00:56:17.569 your radio on USB, you just can't react fast enough for Wi-Fi to work, because you 00:56:17.569 --> 00:56:23.240 - probably you know that - in Wi-Fi you carrier sense, and if you sense that the 00:56:23.240 --> 00:56:29.579 spectrum is free, you start transmitting. Does make a sense when you have huge 00:56:29.579 --> 00:56:36.160 latency, because you all know that... you know the spectrum was free back then, so, 00:56:36.160 --> 00:56:43.730 with xtrx, you actually can work with CSMA systems like Wi-Fi, so again it makes it 00:56:43.730 --> 00:56:51.390 possible to have a fully software implementation of Wi-Fi in your laptop. It 00:56:51.390 --> 00:56:58.660 obviously won't work like as good as your commercial Wi-Fi, because you will have to 00:56:58.660 --> 00:57:03.839 do a lot of processing on your CPU, but for some purposes like experimentation, 00:57:03.839 --> 00:57:07.980 for example, for wireless labs and R&D labs, that's really valuable. 00:57:07.980 --> 00:57:11.400 Q7: Thanks. H: Okay. Over there. 00:57:11.400 --> 00:57:15.519 Q8: Okay. what PCB design package did you use?. 00:57:15.519 --> 00:57:17.819 AC: Altium. SK: Altium, yeah. 00:57:17.819 --> 00:57:22.940 Q8: And I'd be interested in the PCIe workshop. Would be really great if you do 00:57:22.940 --> 00:57:24.940 this one. AC: Say this again? 00:57:24.940 --> 00:57:28.069 Q8: Would be really great if you do the PCI Express workshop. 00:57:28.069 --> 00:57:32.720 AC: Ah. PCI Express workshop. Okay. Thank you. 00:57:32.720 --> 00:57:36.690 H: Okay, I think we have one more question from the microphones, and that's you. 00:57:36.690 --> 00:57:42.880 Q9: Okay. Great talk. And again, I would appreciate a PCI Express workshop, if it 00:57:42.880 --> 00:57:47.190 ever happens. What are these synchronization options between multiple 00:57:47.190 --> 00:57:55.089 cards. Can you synchronize the ADC clock, and can you synchronize the presumably 00:57:55.089 --> 00:58:04.609 digitally created IF? SK: Yes, so... so, unfortunately, just IF synchronization is 00:58:04.609 --> 00:58:10.279 not possible, because Lime chip doesn't expose a low frequency. But we can 00:58:10.279 --> 00:58:16.000 synchronize digitally. So, we have special one PPS signal synchronization. We have 00:58:16.000 --> 00:58:25.180 lines for clock synchronization and other stuff. We can do it in software. So the 00:58:25.180 --> 00:58:31.789 Lime chip has phase correction register, so when you measure... if there is a phase 00:58:31.789 --> 00:58:35.170 difference, so you can compensate it on different boards. 00:58:35.170 --> 00:58:39.309 Q9: Tune to a station a long way away and then rotate the phase until it aligns. 00:58:39.309 --> 00:58:41.819 SK: Yeah. Q9: Thank you. 00:58:41.819 --> 00:58:46.339 AC: Little tricky, but possible. So, that's one of our plans for future, 00:58:46.339 --> 00:58:52.819 because we do want to see, like 128 by 128 MIMO at home. 00:58:52.819 --> 00:58:56.060 H: Okay, we have another question from the internet. 00:58:56.060 --> 00:59:00.450 Signal angel: I actually have two questions. The first one is: What is the 00:59:00.450 --> 00:59:07.710 expected price after a prototype stage? And the second one is: Can you tell us 00:59:07.710 --> 00:59:10.400 more about this setup you had for debugging the PCIe 00:59:10.400 --> 00:59:15.970 issues? AC: Could you repeat the second question? 00:59:15.970 --> 00:59:20.269 SK: It's ????????????, I think. SA: It's more about the setup you had for 00:59:20.269 --> 00:59:24.480 debugging the PCIe issues. SK: Second question, I think it's most 00:59:24.480 --> 00:59:31.200 about our next workshop, because it's a more complicated setup, so... mostly 00:59:31.200 --> 00:59:35.580 remove everything about its now current presentation. 00:59:35.580 --> 00:59:39.580 AC: Yeah, but in general, and in terms of hardware setup, that was our hardware 00:59:39.580 --> 00:59:47.890 setup, so we bought this PCI Express to Thunderbolt3, we bought the laptop which 00:59:47.890 --> 00:59:53.089 supports Thunderbolt3, and that's how we were debugging it. So, we don't need, like 00:59:53.089 --> 00:59:57.780 a full-fledged PC, we don't have to restart it all the time. So, in terms of 00:59:57.780 --> 01:00:06.650 price, we don't have the fixed price yet. So, all I can say right now is that we are 01:00:06.650 --> 01:00:18.349 targeting no more than your bladeRF or HackRF devices, and probably even cheaper. 01:00:18.349 --> 01:00:25.210 For some versions. H: Okay. We are out of time, so thank you 01:00:25.210 --> 01:00:45.079 again Sergey and Alexander. [Applause] 01:00:45.079 --> 01:00:49.619 [Music] 01:00:49.619 --> 01:00:54.950 subtitles created by c3subtitles.de in the year 20??. Join, and help us!