WEBVTT
00:00:03.959 --> 00:00:08.670
[Music]
00:00:08.670 --> 00:00:21.900
Herald: Has anyone in here ever worked
with libusb or PI USB? Hands up. Okay. Who
00:00:21.900 --> 00:00:32.168
also thinks USB is a pain? laughs Okay.
Sergey and Alexander were here back in at
00:00:32.168 --> 00:00:38.769
the 26C3, that's a long time ago. I think
it was back in Berlin, and back then they
00:00:38.769 --> 00:00:45.120
presented their first homemade, or not
homemade, SDR, software-defined radio.
00:00:45.120 --> 00:00:49.440
This year they are back again and they
want to show us how they implemented
00:00:49.440 --> 00:00:55.420
another one, using an FPGA, and to
communicate with it they used PCI Express.
00:00:55.420 --> 00:01:01.589
So I think if you thought USB was a pain,
let's see what they can tell us about PCI
00:01:01.589 --> 00:01:06.690
Express. A warm round of applause for
Alexander and Sergey for building a high
00:01:06.690 --> 00:01:12.430
throughput, low latency, PCIe-based
software-defined radio
00:01:12.430 --> 00:01:20.220
[Applause]
Alexander Chemeris: Hi everyone, good
00:01:20.220 --> 00:01:30.280
morning, and welcome to the first day of
the Congress. So, just a little bit
00:01:30.280 --> 00:01:36.180
background about what we've done
previously and why we are doing what we
00:01:36.180 --> 00:01:42.229
are doing right now, is that we started
working with software-defined radios and
00:01:42.229 --> 00:01:51.930
by the way, who knows what software
defined radio is? Okay, perfect. laughs
00:01:51.930 --> 00:01:59.140
And who ever actually used a software-
defined radio? RTL-SDR or...? Okay, less
00:01:59.140 --> 00:02:06.329
people but that's still quite a lot. Okay,
good. I wonder whether anyone here used
00:02:06.329 --> 00:02:16.940
more expensive radios like USRPs? Less
people, but okay, good. Cool. So before
00:02:16.940 --> 00:02:22.630
2008 I've had no idea what software-
defined radio is, was working with voice
00:02:22.630 --> 00:02:30.330
over IP software person, etc., etc., so I
in 2008 I heard about OpenBTS, got
00:02:30.330 --> 00:02:40.080
introduced to software-defined radio and I
wanted to make it really work and that's
00:02:40.080 --> 00:02:52.250
what led us to today. In 2009 we had to
develop a clock tamer. A hardware which
00:02:52.250 --> 00:03:00.170
allows to use, allowed to use USRP1 to run GSM
without problems. If anyone ever tried
00:03:00.170 --> 00:03:05.420
doing this without a good clock source
knows what I'm talking about. And we
00:03:05.420 --> 00:03:10.550
presented this - it wasn't an SDR it was
just a clock source - we presented this in
00:03:10.550 --> 00:03:18.530
2009 in 26C3.
Then I realized that using USRP1 is not
00:03:18.530 --> 00:03:23.760
really a good idea, because we wanted to
build a robust, industrial-grade base
00:03:23.760 --> 00:03:29.980
stations. So we started developing our own
software defined radio, which we call
00:03:29.980 --> 00:03:41.290
UmTRX and it was in - we started started
this in 2011. Our first base stations with
00:03:41.290 --> 00:03:51.590
it were deployed in 2013, but I always
wanted to have something really small and
00:03:51.590 --> 00:03:59.510
really inexpensive and back then it wasn't
possible. My original idea in 2011, we
00:03:59.510 --> 00:04:07.680
were to build a PCI Express card. Mini,
sorry, not PCI Express card but mini PCI
00:04:07.680 --> 00:04:10.100
card.
If you remember there were like all the
00:04:10.100 --> 00:04:14.470
Wi-Fi cards and mini PCI form factor and I
thought that would be really cool to have
00:04:14.470 --> 00:04:22.490
an SDR and mini PCI, so I can plug this
into my laptop or in some embedded PC and
00:04:22.490 --> 00:04:31.710
have a nice SDR equipment, but back then
it just was not really possible, because
00:04:31.710 --> 00:04:37.939
electronics were bigger and more power
hungry and just didn't work that way, so
00:04:37.939 --> 00:04:49.539
we designed UmTRX to work over gigabit
ethernet and it was about that size. So
00:04:49.539 --> 00:04:57.300
now we spend this year at designing
something, which really brings me to what
00:04:57.300 --> 00:05:05.289
I wanted those years ago, so the XTRX is a
mini PCI Express - again there was no PCI
00:05:05.289 --> 00:05:10.460
Express back then, so now it's mini PCI
Express, which is even smaller than PCI, I
00:05:10.460 --> 00:05:17.719
mean mini PCI and it's built to be
embedded friendly, so you can plug this
00:05:17.719 --> 00:05:23.669
into a single board computer, embedded
single board computer. If you have a
00:05:23.669 --> 00:05:28.020
laptop with a mini PCI Express you can
plug this into your laptop and you have a
00:05:28.020 --> 00:05:35.210
really small, software-defined radio
equipment. And we really want to make it
00:05:35.210 --> 00:05:39.430
inexpensive, that's why I was asking how
many of you have ever worked it with RTL-
00:05:39.430 --> 00:05:44.169
SDR, how many of you ever worked with you
USRPs, because the gap between them is
00:05:44.169 --> 00:05:53.740
pretty big and we want to really bring the
software-defined radio to masses.
00:05:53.740 --> 00:05:59.550
Definitely won't be as cheap as RTL-SDR,
but we try to make it as close as
00:05:59.550 --> 00:06:03.330
possible.
And at the same time, so at the size of
00:06:03.330 --> 00:06:09.659
RTL-SDR, at the price well higher but,
hopeful hopefully it will be affordable to
00:06:09.659 --> 00:06:17.460
pretty much everyone, we really want to
bring high performance into your hands.
00:06:17.460 --> 00:06:22.539
And by high performance I mean this is a
full transmit/receive with two channels
00:06:22.539 --> 00:06:28.289
transmit, two channels receive, which is
usually called 2x2 MIMO in in the radio
00:06:28.289 --> 00:06:37.370
world. The goal was to bring it to 160
megasamples per second, which can roughly
00:06:37.370 --> 00:06:44.110
give you like 120 MHz of radio spectrum
available.
00:06:44.110 --> 00:06:53.111
So what we were able to achieve is, again
this is mini PCI Express form factor, it
00:06:53.111 --> 00:07:01.639
has small Artix7, that's the smallest and
most inexpensive FPGA, which has ability
00:07:01.639 --> 00:07:18.029
to work with a PCI Express. It has LMS7000
chip for RFIC, very high performance, very
00:07:18.029 --> 00:07:27.449
tightly embedded chip with even a DSP
blocks inside. It has even a GPS chip
00:07:27.449 --> 00:07:37.340
here, you can actually on the right upper
side, you can see a GPS chip, so you can
00:07:37.340 --> 00:07:44.060
accually synchronize your SDR to GPS for
perfect clock stability,
00:07:44.060 --> 00:07:51.389
so you won't have any problems running any
telecommunication systems like GSM, 3G, 4G
00:07:51.389 --> 00:07:58.650
due to clock problems, and it also has
interface for SIM cards, so you can
00:07:58.650 --> 00:08:06.330
actually create a software-defined radio
modem and run other open source projects
00:08:06.330 --> 00:08:15.840
to build one in a four LT called SRSUI, if
you're interested, etc., etc. so really
00:08:15.840 --> 00:08:22.080
really tightly packed one. And if you put
this into perspective: that's how it all
00:08:22.080 --> 00:08:30.669
started in 2006 and that's what you have
ten years later. It's pretty impressive.
00:08:30.669 --> 00:08:36.840
applause
Thanks. But I think it actually applies to
00:08:36.840 --> 00:08:40.320
the whole industry who is working on
shrinking the sizes because we just put
00:08:40.320 --> 00:08:48.890
stuff on the PCB, you know. We're not
building the silicon itself. Interesting
00:08:48.890 --> 00:08:54.701
thing is that we did the first approach:
we said let's pack everything, let's do a
00:08:54.701 --> 00:09:03.180
very tight PCB design. We did an eight
layer PCB design and when we send it to a
00:09:03.180 --> 00:09:10.490
fab to estimate the cost it turned out
it's $15,000 US per piece. Well in small
00:09:10.490 --> 00:09:18.940
volumes obviously but still a little bit
too much. So we had to redesign this and
00:09:18.940 --> 00:09:26.712
the first thing which we did is we still
kept eight layers, because in our
00:09:26.712 --> 00:09:32.810
experience number of layers nowadays have
only minimal impact on the cost of the
00:09:32.810 --> 00:09:42.450
device. So like six, eight layers - the
price difference is not so big. But we did
00:09:42.450 --> 00:09:52.190
complete rerouting and only kept 2-Deep
MicroVIAs and never use the buried VIAs.
00:09:52.190 --> 00:09:57.240
So this make it much easier and much
faster for the fab to manufacture it and
00:09:57.240 --> 00:10:03.740
the price suddenly went five, six times
down and in volume again it will be
00:10:03.740 --> 00:10:18.140
significantly cheaper. And that's just for
geek porn how PCB looks inside. So now
00:10:18.140 --> 00:10:25.140
let's go into real stuff. So PCI Express:
why did we choose PCI Express? As it was
00:10:25.140 --> 00:10:33.310
said USB is a pain in the ass. You can't
really use USB in industrial systems. For
00:10:33.310 --> 00:10:40.510
a whole variety of reasons just unstable.
So we did use Ethernet for many years
00:10:40.510 --> 00:10:47.190
successfully but Ethernet has one problem:
first of all inexpensive Ethernet is only
00:10:47.190 --> 00:10:51.780
one gigabit and one gigabit does not offer
you enough bandwidth to carry all the data
00:10:51.780 --> 00:10:59.720
we want, plus its power-hungry etc. etc.
So PCI Express is really a good choice
00:10:59.720 --> 00:11:06.420
because it's low power, it has low
latency, it has very high bandwidth and
00:11:06.420 --> 00:11:11.380
it's available almost universally. When we
started looking into this we realize that
00:11:11.380 --> 00:11:17.320
even ARM boards, some of ARM boards have
PCI Express, mini PCI Express slots, which
00:11:17.320 --> 00:11:26.560
was a big surprise for me for example.
So the problems is that unlike USB you do
00:11:26.560 --> 00:11:36.540
need to write your own kernel driver for
this and there's no way around. And it is
00:11:36.540 --> 00:11:41.110
really hard to write this driver
universally so we are writing it obviously
00:11:41.110 --> 00:11:45.300
for Linux because they're working with
embedded systems, but if we want to
00:11:45.300 --> 00:11:51.030
rewrite it for Windows or for macOS we'll
have to do a lot of rewriting. So we focus
00:11:51.030 --> 00:11:57.250
on what we want on Linux only right now.
And now the hardest part: debugging is
00:11:57.250 --> 00:12:02.580
really non-trivial. One small error and
your PC is completely hanged because you
00:12:02.580 --> 00:12:08.750
use something wrong. And you have to
reboot it and restart it. That's like
00:12:08.750 --> 00:12:15.500
debugging kernel but sometimes even
harder. To make it worse there is no
00:12:15.500 --> 00:12:19.400
really easy-to-use plug-and-play
interface. If you want to restart;
00:12:19.400 --> 00:12:24.250
normally, when you when you develop a PCI
Express card, when you want when you want
00:12:24.250 --> 00:12:31.050
to restart it you have to restart your
development machine. Again not a nice way,
00:12:31.050 --> 00:12:39.420
it's really hard. So the first thing we
did is we found, that we can use
00:12:39.420 --> 00:12:47.100
Thunderbolt 3 which is just recently
released, and it has ability to work
00:12:47.100 --> 00:12:57.200
directly with PCI Express bus. So it
basically has a mode in which it converts
00:12:57.200 --> 00:13:01.410
a PCI Express into plug-and-play
interface. So if you have a laptop which
00:13:01.410 --> 00:13:09.450
supports Thunderbolt 3 then you can use
this to do plug and play your - plug or
00:13:09.450 --> 00:13:16.480
unplug your device to make your
development easier. There are always
00:13:16.480 --> 00:13:23.620
problems: there's no easy way, there's no
documentation. Thunderbolt is not
00:13:23.620 --> 00:13:27.380
compatible with Thunderbolt. Thunderbold 3
is not compatible with Thunderbold 2.
00:13:27.380 --> 00:13:33.760
So we had to buy a special laptop with
Thunderbold 3 with special cables like all
00:13:33.760 --> 00:13:40.120
this all this hard stuff. And if you
really want to get documentation you have
00:13:40.120 --> 00:13:47.500
to sign NDA and send a business plan to
them so they can approve that your
00:13:47.500 --> 00:13:50.670
business makes sense.
laughter
00:13:50.670 --> 00:13:58.640
I mean... laughs So we actually opted
out. We set not to go through this, what
00:13:58.640 --> 00:14:05.340
we did is we found that someone is
actually making PCI Express to Thunderbolt
00:14:05.340 --> 00:14:10.550
3 converters and selling them as dev
boards and that was a big relief because
00:14:10.550 --> 00:14:16.740
it saved us lots of time, lots of money.
You just order it from from some from some
00:14:16.740 --> 00:14:24.920
Asian company. And yeah this is how it
looks like this converter. So you buy it,
00:14:24.920 --> 00:14:29.970
like several pieces you can plug in your
PCI Express card there and you plug this
00:14:29.970 --> 00:14:38.330
into your laptop. And this is the with
XTRX already plugged into it. Now the only
00:14:38.330 --> 00:14:50.160
problem we found is that typically UEFI
has a security control enabled, so that
00:14:50.160 --> 00:14:56.700
any random thunderbold device can't hijack
your PCI bus and can't get access to your
00:14:56.700 --> 00:15:01.740
kernel memory and do some bad stuff. Which
is a good idea - the only problem is that
00:15:01.740 --> 00:15:06.730
there is, it's not fully implemented in
Linux. So under Windows if you plug in a
00:15:06.730 --> 00:15:11.690
device which is which has no security
features, which is not certified, it will
00:15:11.690 --> 00:15:16.510
politely ask you like: "Do you really
trust this device? Do you want to use it?"
00:15:16.510 --> 00:15:21.940
you can say "yes". Under Linux it just
does not work. laughs So we spend some
00:15:21.940 --> 00:15:25.730
time trying to figure out how to get
around this. Right, some patches from
00:15:25.730 --> 00:15:30.370
Intel which are not mainline and we were
not able to actually get them work. So we
00:15:30.370 --> 00:15:38.980
just had to disable all this security
measure in the laptop. So be aware that
00:15:38.980 --> 00:15:46.610
this is the case and we suspect that happy
users of Apple might not be able to do
00:15:46.610 --> 00:15:53.630
this because Apple don't have BIOS so it
probably can't disable this feature. So
00:15:53.630 --> 00:16:01.820
probably good incentive for someone to
actually finish writing the driver.
00:16:01.820 --> 00:16:08.130
So now to the goal: so we wanted to, we
want to achieve 160 mega samples per
00:16:08.130 --> 00:16:13.550
second, 2x2 MIMO, which means two
transceiver, two transmit, two receive
00:16:13.550 --> 00:16:24.040
channels at 12 bits, which is roughly 7.5
Gbit/s. So first result when we plug this
00:16:24.040 --> 00:16:26.230
when we got this board on the fab it
didn't work
00:16:26.230 --> 00:16:30.430
Sergey Kostanbaev mumbles: as expected
Alexander Chemeris: yes as expected so the
00:16:30.430 --> 00:16:39.750
first the interesting thing we realized is
that: first of all the FPGA has Hardware
00:16:39.750 --> 00:16:47.210
blocks for talking to a PCI Express which
was called GTP which basically implement
00:16:47.210 --> 00:16:56.850
like a PCI Express serial physical layer
but the thing is the numbering is reversed
00:16:56.850 --> 00:17:04.319
in the in PCI Express in FPGA and we did
not realize this so we had to do very very
00:17:04.319 --> 00:17:10.619
fine soldiering to actually swap the
laughs swap the lanes you can see this
00:17:10.619 --> 00:17:18.490
very fine work there.
We also found that one of the components
00:17:18.490 --> 00:17:28.870
was deadbug which is a well-known term for
chips which design stage are placed at
00:17:28.870 --> 00:17:35.960
mirrored so we mirrored occasionally
mirrored that they pin out so we had to
00:17:35.960 --> 00:17:41.880
solder it upside down and if you can
realize how small it is you can also
00:17:41.880 --> 00:17:49.419
appreciate the work done. And what's funny
when I was looking at dead bugs I actually
00:17:49.419 --> 00:17:56.929
found a manual from NASA which describes
how to properly soldier dead bugs to get
00:17:56.929 --> 00:18:00.679
it approved.
audience laughs
00:18:00.679 --> 00:18:08.230
So this is the link I think you can go
there and enjoy it's also fun stuff there.
00:18:08.230 --> 00:18:17.379
So after fixing all of this our next
attempt this kind of works. So next stage
00:18:17.379 --> 00:18:23.340
is debugging the FPGA code, which has to
talk to PCI Express and PCI Express has to
00:18:23.340 --> 00:18:28.320
talk to Linux kernel and the kernel has to
talk to the driver, driver has talked to
00:18:28.320 --> 00:18:37.749
the user space. So peripherals are easy so
the UART SPIs we've got to work almost
00:18:37.749 --> 00:18:44.799
immediately no problems with that, but DMA
was a real beast. So we spent a lot of
00:18:44.799 --> 00:18:52.660
time trying to get DMA to work and the
problem is that with DMA it's on FPGA so
00:18:52.660 --> 00:18:59.730
you can't just place a breakpoint like you
do in C or C++ or in other languages it's
00:18:59.730 --> 00:19:07.480
real-time system running on system like
it's real-time hardware, which is running
00:19:07.480 --> 00:19:16.351
on the fabric so you we had to Sergey was
mainly developing this had to write a lot
00:19:16.351 --> 00:19:22.779
of small test benches and and test
everything piece by piece.
00:19:22.779 --> 00:19:31.480
So all parts of the DMA code we had was
wrapped into a small test bench which was
00:19:31.480 --> 00:19:39.720
emulating all the all the tricks and as
classics predicted it took about five to
00:19:39.720 --> 00:19:47.679
ten times more than actually writing the
code. So we really blew up our and
00:19:47.679 --> 00:19:54.529
predicted timelines by doing this, but the
end we've got really stable stable work.
00:19:54.529 --> 00:20:03.760
So some suggestions for anyone who will
try to repeat this exercise is there is a
00:20:03.760 --> 00:20:09.590
logic analyzer built-in to Xilinx and you
can use, it it's nice it's, sometimes it's
00:20:09.590 --> 00:20:15.960
very helpful but you can't debug
transient box, which are coming out at
00:20:15.960 --> 00:20:22.990
when some weird conditions are coming up.
So you have to implement some read back
00:20:22.990 --> 00:20:28.809
registers which shows important statistic
like important data about how your system
00:20:28.809 --> 00:20:35.340
behaves, in our case it's various counters
on the DMA interface. So you can actually
00:20:35.340 --> 00:20:40.950
see kind of see what's happening with your
with your data: Is it received? Is it
00:20:40.950 --> 00:20:46.269
sent? How much is and how much is
received? So like for example, we can see
00:20:46.269 --> 00:20:53.559
when we saturate the bus or when actually
is an underrun so host is not providing
00:20:53.559 --> 00:20:57.389
data fast enough, so we can at least
understand whether it's a host problem or
00:20:57.389 --> 00:21:01.769
whether it's an FPGA, problem on which
part we do we debug next because again:
00:21:01.769 --> 00:21:07.770
it's a very multi layer problem you start
with FPGA, PCI Express, kernel, driver,
00:21:07.770 --> 00:21:15.340
user space, and any part can fail. so you
can't work blind like this. So again the
00:21:15.340 --> 00:21:23.179
goal was to get 160 MSPS with the first
implementation we could 2 MSPS: roughly 60
00:21:23.179 --> 00:21:30.220
times slower.
The problem is that software just wasn't
00:21:30.220 --> 00:21:36.149
keeping up and wasn't sending data fast
enough. So it was like many things done
00:21:36.149 --> 00:21:41.390
but the most important parts is: use real-
time priority if you want to get very
00:21:41.390 --> 00:21:46.940
stable results and well fix software bugs.
And one of the most important bugs we had
00:21:46.940 --> 00:21:54.240
was that DMA buffers were not freed in
proper time immediately so they were busy
00:21:54.240 --> 00:21:59.429
for longer than they should be, which
introduced extra cycles and basically just
00:21:59.429 --> 00:22:06.009
reduced the bandwidth.
At this point let's talk a little bit
00:22:06.009 --> 00:22:14.389
about how to implement a high-performance
driver for Linux, because if you want to
00:22:14.389 --> 00:22:20.870
get real real performance you have to
start with the right design. There are
00:22:20.870 --> 00:22:26.610
basically three approaches and the whole
spectrum in between; like two approaches
00:22:26.610 --> 00:22:33.649
and the whole spectrum in between, which
is where you can refer to three. The first
00:22:33.649 --> 00:22:41.529
approach is full kernel control, in which
case kernel driver not only is on the
00:22:41.529 --> 00:22:45.701
transfer, it actually has all the logics
of controlling your device and all the
00:22:45.701 --> 00:22:52.490
export ioctl to the user space and
that's the kind of a traditional way of
00:22:52.490 --> 00:22:57.669
writing drivers. Your your user space is
completely abstracted from all the
00:22:57.669 --> 00:23:07.029
details. The problem is that this is
probably the slowest way to do it. The
00:23:07.029 --> 00:23:14.340
other way is what's called the "zero cup
interface": your only control is held in
00:23:14.340 --> 00:23:21.380
the kernel and data is provided, the raw
data is provided to user space "as-is". So
00:23:21.380 --> 00:23:27.919
you avoid memory copy which make it
faster. But still not fast enough if you
00:23:27.919 --> 00:23:34.279
really want to achieve maximum
performance, because you still have
00:23:34.279 --> 00:23:40.980
context switches between the kernel and
the user space. The most... the fastest
00:23:40.980 --> 00:23:47.289
approach possible is to have full user
space implementation when kernel just
00:23:47.289 --> 00:23:53.059
exposed everything and says "now you do it
yourself" and you have no you have no
00:23:53.059 --> 00:24:02.429
context switches, like almost no, and you
can really optimize everything. So what
00:24:02.429 --> 00:24:08.850
is... what are the problems with this?
The pro the pros I already mentioned: no
00:24:08.850 --> 00:24:13.539
no switches between kernel user space,
it's very low latency because of this as
00:24:13.539 --> 00:24:20.980
well, it's very high bandwidth. But if you
are not interested in getting the very
00:24:20.980 --> 00:24:27.940
high performance, the most performance, and
you just want to have like some little,
00:24:27.940 --> 00:24:33.299
like say low bandwidth performance, then
you will have to add hacks, because you
00:24:33.299 --> 00:24:36.710
can't get notifications of the kernel that
resources available is more data
00:24:36.710 --> 00:24:45.570
available. It also makes it vulnerable
vulnerable because if user space can
00:24:45.570 --> 00:24:55.310
access it, then it can do whatever it
want. We at the end decided that... one
00:24:55.310 --> 00:25:02.590
more important thing: how to actually to
get the best performance out of out of the
00:25:02.590 --> 00:25:10.299
bus. This is a very (?)(?) set as we want
to poll your device or not to poll and get
00:25:10.299 --> 00:25:14.259
notified. What is polling? I guess
everyone as programmer understands it, so
00:25:14.259 --> 00:25:18.019
polling is when you asked repeatedly: "Are
you ready?", "Are you ready?", "Are you
00:25:18.019 --> 00:25:20.369
ready?" and when it's ready you get the
data immediately.
00:25:20.369 --> 00:25:25.259
It's basically a busy loop of your you
just constantly asking device what's
00:25:25.259 --> 00:25:33.350
happening. You need to dedicate a full
core, and thanks God we have multi-core
00:25:33.350 --> 00:25:39.519
CPUs nowadays, so you can dedicate the
full core to this polling and you can just
00:25:39.519 --> 00:25:45.539
pull constantly. But again if you don't
need this highest performance, you just
00:25:45.539 --> 00:25:53.190
need to get something, then you will be
wasting a lot of CPU resources. At the end
00:25:53.190 --> 00:26:00.429
we decided to do a combined architecture
of your, it is possible to pull but
00:26:00.429 --> 00:26:05.500
there's also a chance and to get
notification from a kernel to for for
00:26:05.500 --> 00:26:11.049
applications, which recover, which needs
low bandwidth, but also require a better
00:26:11.049 --> 00:26:17.480
CPU performance. Which I think is the best
way if you are trying to target both
00:26:17.480 --> 00:26:30.850
worlds. Very quickly: the architecture of
system. We try to make it very very
00:26:30.850 --> 00:26:50.730
portable so and flexible. There is a
kernel driver, which talks to low-level
00:26:50.730 --> 00:26:55.690
library which implements all this logic,
which we took out of the driver: to
00:26:55.690 --> 00:27:01.309
control the
PCI Express, to work with DMA, to provide
00:27:01.309 --> 00:27:09.360
all the... to hide all the details of the
actual bus implementation.
00:27:09.360 --> 00:27:17.169
And then there is a high-level library
which talks to this low-level library and
00:27:17.169 --> 00:27:22.179
also to libraries which implement control
of actual peripherals, and most
00:27:22.179 --> 00:27:28.919
importantly to the library which
implements control over our RFIC chip.
00:27:28.919 --> 00:27:35.119
This way it's very modular, we can replace
PCI Express with something else later, we
00:27:35.119 --> 00:27:46.049
might be able to port it to other
operating systems, and that's the goal.
00:27:46.049 --> 00:27:50.059
Another interesting issue is: when you
start writing the Linux kernel driver you
00:27:50.059 --> 00:27:57.119
very quickly realize that while LDD, which
is a classic book for a Linux driver,
00:27:57.119 --> 00:28:02.220
writing is good and it will give you a
good insight; it's not actually up-to-
00:28:02.220 --> 00:28:08.609
date. It's more than ten years old and
there's all of new interfaces which are
00:28:08.609 --> 00:28:14.809
not described there, so you have to resort
to reading the manuals and all the
00:28:14.809 --> 00:28:20.409
documentation in the kernel itself. Well
at least you get the up-to-date
00:28:20.409 --> 00:28:31.989
information. The decisions we made is to
make everything easy. We use TTY for GPS
00:28:31.989 --> 00:28:38.090
and so you can really attach a pretty much
any application which talks to GPS. So all
00:28:38.090 --> 00:28:45.970
of existing applications can just work out
of the box. And we also wanted to be able
00:28:45.970 --> 00:28:54.879
to synchronize system clock to GPS, so we
get automatic log synchronization across
00:28:54.879 --> 00:28:59.009
multiple systems, which is very important
when we are deploying many, many devices
00:28:59.009 --> 00:29:07.090
around the world.
We plan to do two interfaces, one as key
00:29:07.090 --> 00:29:15.919
PPS and another is a DCT, because DCT line
on the UART exposed over TTY. Because
00:29:15.919 --> 00:29:20.259
again we found that there are two types of
applications: one to support one API,
00:29:20.259 --> 00:29:25.539
others that support other API and there is
no common thing so we have to support
00:29:25.539 --> 00:29:38.649
both. As we described, we want to have
polls so we can get notifications of the
00:29:38.649 --> 00:29:48.130
kernel when data is available and we don't
need to do real busy looping all the time.
00:29:48.130 --> 00:29:55.789
After all the software optimizations we've
got to like 10 MSPS: still very, very far
00:29:55.789 --> 00:30:02.369
from what we want to achieve.
Now there should have been a lot of
00:30:02.369 --> 00:30:06.570
explanations about PCI Express, but when
we actually wrote everything we wanted to
00:30:06.570 --> 00:30:13.999
say we realize, it's just like a full two
hours talk just on PCI Express. So we are
00:30:13.999 --> 00:30:17.760
not going to give it here, I'll just give
some highlights which are most
00:30:17.760 --> 00:30:23.889
interesting. If you if there is real
interest, we can set up a workshop and
00:30:23.889 --> 00:30:32.340
some of the later days and talking more
details about PCI Express specifically.
00:30:32.340 --> 00:30:38.549
The thing is there is no open source cores
for PCI Express, which are optimized for
00:30:38.549 --> 00:30:48.010
high performance, real time applications.
There is Xillybus which as I understand is
00:30:48.010 --> 00:30:53.350
going to be open source, but they provide
you a source if you pay them. It's very
00:30:53.350 --> 00:30:59.610
popular because it's very very easy to do,
but it's not giving you performance. If I
00:30:59.610 --> 00:31:04.980
remember correctly the best it can do is
maybe like 50 percent bus saturation.
00:31:04.980 --> 00:31:10.800
So there's also Xilinx implementation, but
if you are using Xilinx implementation
00:31:10.800 --> 00:31:21.049
with AXI bus than you're really locked in
with AXI bus with Xilinx. And it also not
00:31:21.049 --> 00:31:25.001
very efficient in terms of resources and
if you remember we want to make this very,
00:31:25.001 --> 00:31:30.029
very inexpensive. So our goal is to you
... is to be able to fit everything in the
00:31:30.029 --> 00:31:38.499
smallest Arctic's 7 FPGA, and that's quite
challenging with all the stuff in there
00:31:38.499 --> 00:31:47.649
and we just can't waste resources. So
decision is to write your own PCI Express
00:31:47.649 --> 00:31:53.039
implementation. That's how it looks like.
I'm not going to discuss it right now.
00:31:53.039 --> 00:31:59.950
There are several iterations. Initially it
looked much simpler, turned out not to
00:31:59.950 --> 00:32:06.100
work well.
So some interesting stuff about PCI
00:32:06.100 --> 00:32:12.749
Express which we stumbled upon is that it
was working really well on Atom which is
00:32:12.749 --> 00:32:17.460
our main development platform because we
are doing a lot of embedded stuff. Worked
00:32:17.460 --> 00:32:26.479
really well. When we try to plug this into
core i7 just started hanging once in a
00:32:26.479 --> 00:32:35.090
while. So after like several not days
maybe with debugging, Sergey found that
00:32:35.090 --> 00:32:39.330
very interesting statement in the standard
which says that value is zero in byte
00:32:39.330 --> 00:32:45.869
count actually stands not for zero bytes
but for 4096 bytes.
00:32:45.869 --> 00:32:58.739
I mean that's a really cool optimization.
So another thing is completion which is a
00:32:58.739 --> 00:33:03.639
term in PCI Express basically for
acknowledgment which also can carry some
00:33:03.639 --> 00:33:12.429
data back to your request. And sometimes
if you're not sending completion, device
00:33:12.429 --> 00:33:20.740
just hangs. And what happens is that in
this case due to some historical heritage
00:33:20.740 --> 00:33:29.549
of x86 it just starts returning you FFF.
And if you have a register which says: „Is
00:33:29.549 --> 00:33:35.470
your device okay?“ and this register shows
one to say „The device is okay“, guess
00:33:35.470 --> 00:33:38.500
what will happen?
You will be always reading that your
00:33:38.500 --> 00:33:46.590
device is okay. So the suggestion is not
to use one as the status for okay and use
00:33:46.590 --> 00:33:52.790
either zero or better like a two-beat
sequence. So you are definitely sure that
00:33:52.790 --> 00:34:03.659
you are okay and not getting FFF's. So
when you have a device which again may
00:34:03.659 --> 00:34:10.440
fail at any of the layers, you just got
this new board, it's really hard, it's
00:34:10.440 --> 00:34:17.639
really hard to debug because of memory
corruption. So we had a software bug and
00:34:17.639 --> 00:34:25.099
it was writing DMA addresses
incorrectly and we were wondering why we
00:34:25.099 --> 00:34:32.179
are not getting any data in our buffers at
the same time. After several starts,
00:34:32.179 --> 00:34:41.159
operating system just crashes. Well, that's
the reason why there is this UEFI
00:34:41.159 --> 00:34:47.199
protection which prevents you from
plugging in devices like this into your
00:34:47.199 --> 00:34:52.270
computer. Because it was basically writing
data, like random data into random
00:34:52.270 --> 00:35:00.299
portions of your memory. So a lot of
debugging, a lot of tests and test benches
00:35:00.299 --> 00:35:10.589
and we were able to find this. And another
thing is if you deinitialize your driver
00:35:10.589 --> 00:35:15.250
incorrectly, and that's what's happening
when you have plug-and-play device, which
00:35:15.250 --> 00:35:22.119
you can plug and unplug, then you may end
up in a situation of your ... you are
00:35:22.119 --> 00:35:28.039
trying to write into memory which is
already freed by approaching system and
00:35:28.039 --> 00:35:35.960
used for something else. Very well-known
problem but it also happens here. So there
00:35:35.960 --> 00:35:50.549
... why DMA is really hard is because it
has this completion architecture for
00:35:50.549 --> 00:35:56.440
writing for ... sorry ... for reading
data. Writes are easy. You just send the
00:35:56.440 --> 00:36:00.460
data, you forget about it. It's a fire-
and-forget system. But for reading you
00:36:00.460 --> 00:36:10.420
really need to get your data back. And the
thing is, it looks like this. You really
00:36:10.420 --> 00:36:16.020
hope that there would be some pointing
device here. But basically on the top left
00:36:16.020 --> 00:36:24.240
you can see requests for read and on the
right you can see completion transactions.
00:36:24.240 --> 00:36:29.890
So basically each transaction can be and
most likely will be split into multiple
00:36:29.890 --> 00:36:38.900
transactions. So first of all you have to
collect all these pieces and like write
00:36:38.900 --> 00:36:46.210
them into proper parts of the memory.
But that's not all. The thing is the
00:36:46.210 --> 00:36:53.369
latency between request and completion is
really high. It's like 50 cycles. So if
00:36:53.369 --> 00:36:58.990
you have a single, only single transaction
in fly you will get really bad
00:36:58.990 --> 00:37:03.900
performance. You do need to have multiple
transactions in flight. And the worst
00:37:03.900 --> 00:37:13.170
thing is that transactions can return data
in random order. So it's a much more
00:37:13.170 --> 00:37:19.820
complicated state machine than we expected
originally. So when I said, you know, the
00:37:19.820 --> 00:37:25.589
architecture was much simpler originally,
we don't have all of this and we had to
00:37:25.589 --> 00:37:31.670
realize this while implementing. So again
here was a whole description of how
00:37:31.670 --> 00:37:41.200
exactly this works. But not this time. So
now after all these optimizations we've
00:37:41.200 --> 00:37:48.859
got 20 mega samples per second which is
just six times lower than what we are
00:37:48.859 --> 00:37:59.599
aiming at. So now the next thing is PCI
Express lanes scalability. So PCI Express
00:37:59.599 --> 00:38:07.220
is a serial bus. So it has multiple lanes
and they allow you to basically
00:38:07.220 --> 00:38:14.350
horizontally scale your bandwidth. One
lane is like x, than two lane is 2x, four
00:38:14.350 --> 00:38:20.160
lane is 4x. So the more lanes you have the
more performance you are getting out of
00:38:20.160 --> 00:38:23.970
your, out of your bus. So the more
bandwidth you're getting out of your bus.
00:38:23.970 --> 00:38:31.700
Not performance. So the issue is that
typical a mini PCI Express, so the mini
00:38:31.700 --> 00:38:38.600
PCI Express standard only standardized one
lane. And second lane is left as optional.
00:38:38.600 --> 00:38:46.099
So most motherboards don't support this.
There are some but not all of them. And we
00:38:46.099 --> 00:38:52.370
really wanted to get this done. So we
designed a special converter board which
00:38:52.370 --> 00:38:57.530
allows you to plug your mini PCI Express
into a full-size PCI Express and
00:38:57.530 --> 00:39:06.790
get two lanes working. And we're also
planning to have a similar board which
00:39:06.790 --> 00:39:12.660
will have multiple slots so you will be
able to get multiple XTRX-SDRs on to the
00:39:12.660 --> 00:39:21.270
same, onto the same carrier board and plug
this into let's say PCI Express 16x and
00:39:21.270 --> 00:39:29.059
you will get like really a lot of ... SDR
... a lot of IQ data which then will be
00:39:29.059 --> 00:39:38.760
your problem how to, how to process. So
with two x's it's about twice performance
00:39:38.760 --> 00:39:48.930
so we are getting fifty mega samples per
second. And that's the time to really cut
00:39:48.930 --> 00:39:59.230
the fat because the real sample size of
LMS7 is 12 bits and we are transmitting 16
00:39:59.230 --> 00:40:06.930
because it's easier. Because CPU is
working on 8, 16, 32. So we originally
00:40:06.930 --> 00:40:13.770
designed the driver to support 8 bit, 12
bit and 16 bit to be able to do this
00:40:13.770 --> 00:40:23.800
scaling. And for the test we said okay
let's go from 16 to 8 bit. We'll lose
00:40:23.800 --> 00:40:32.960
some dynamic range but who cares these
days. Still stayed the same, it's still 50
00:40:32.960 --> 00:40:41.980
mega samples per second, no matter what we
did. And that was a lot of interesting
00:40:41.980 --> 00:40:49.580
debugging going on. And we realized that
we actually made another, not a really
00:40:49.580 --> 00:40:58.720
mistake. We didn't, we didn't really know
this when we designed. But we should have
00:40:58.720 --> 00:41:04.450
used a higher voltage for this high speed
bus to get it to the full performance. And
00:41:04.450 --> 00:41:12.619
at 1.8 it was just degrading too fast and
the bus itself was not performing well. So
00:41:12.619 --> 00:41:21.859
our next prototype will be using higher
voltage specifically for this bus. And
00:41:21.859 --> 00:41:26.559
this is kind of stuff which makes
designing hardware for high speed really
00:41:26.559 --> 00:41:32.210
hard because you have to care about
coherence of the parallel buses on your,
00:41:32.210 --> 00:41:38.550
on your system. So at the same time we do
want to keep 1.8 volts for everything else
00:41:38.550 --> 00:41:43.480
as much as possible. Because another
problem we are facing with this device is
00:41:43.480 --> 00:41:47.069
that by the standard mini PCI Express
allows only like ...
00:41:47.069 --> 00:41:51.220
Sergey Kostanbaev: ... 2.5 ...
Alexander Chemeris: ... 2.5 watts of power
00:41:51.220 --> 00:41:58.369
consumption, no more. And that's we were,
we were very lucky that LMS7 has such so
00:41:58.369 --> 00:42:04.460
good, so good power consumption
performance. We actually had some extra
00:42:04.460 --> 00:42:10.049
space to have FPGA and GPS and all this
stuff. But we just can't let the power
00:42:10.049 --> 00:42:14.880
consumption go up. Our measurements on
this device showed about ...
00:42:14.880 --> 00:42:18.510
Sergey Kostanbaev: ... 2.3 ...
Alexander Chemeris: ... 2.3 watts of power
00:42:18.510 --> 00:42:27.220
consumption. So we are like at the limit
at this point. So when we fix the bus with
00:42:27.220 --> 00:42:31.420
the higher voltage, you know it's a
theoretical exercise, because we haven't
00:42:31.420 --> 00:42:38.000
done this yet, that's plenty to happen in
a couple months. We should be able to get
00:42:38.000 --> 00:42:47.330
to this numbers which was just 1.2 times
slower. Then the next thing will be to fix
00:42:47.330 --> 00:42:55.550
another issue which we made at the very
beginning: we have procured a wrong chip.
00:42:55.550 --> 00:43:05.270
Just one digit difference, you can see
it's highlighted in red and green, and
00:43:05.270 --> 00:43:13.230
this chip it supports only a generation 1
PCI Express which is twice slower than
00:43:13.230 --> 00:43:18.190
generation 2 PCI Express.
So again, hopefully we'll replace the chip
00:43:18.190 --> 00:43:30.140
and just get very simple doubling of the
performance. Still it will be slower than
00:43:30.140 --> 00:43:39.770
we wanted it to be and here is what comes
like practical versus theoretical numbers.
00:43:39.770 --> 00:43:47.119
Well as every bus it has it has overheads
and one of the things which again we
00:43:47.119 --> 00:43:51.279
realized when we were implementing this
is, that even though the standard
00:43:51.279 --> 00:43:58.910
standardized is the payload size of 4kB,
actual implementations are different. For
00:43:58.910 --> 00:44:08.390
example desktop computers like Intel Core
or Intel Atom they only have 128 byte
00:44:08.390 --> 00:44:18.740
payload. So there is much more overhead
going on the bus to transfer data and even
00:44:18.740 --> 00:44:29.180
theoretically you can only achieve 87%
efficiency. And on Xeon we tested and we
00:44:29.180 --> 00:44:37.110
found that they're using 256 payload size
and this can give you like a 92%
00:44:37.110 --> 00:44:45.130
efficiency on the bus and this is before
the overhead so the real reality is even
00:44:45.130 --> 00:44:53.180
worse. An interesting thing which we also
did not expect, is that we originally were
00:44:53.180 --> 00:45:02.849
developing on Intel Atom and everything
was working great. When we plug this into
00:45:02.849 --> 00:45:10.720
laptop like Core i7 multi-core really
powerful device, we didn't expect that it
00:45:10.720 --> 00:45:20.140
wouldn't work. Obviously Core i7 should
work better than Atom: no, not always.
00:45:20.140 --> 00:45:26.369
The thing is, we were plugging into a
laptop, which had a built-in video card
00:45:26.369 --> 00:45:44.750
which was sitting on the same PCI bus and
probably manufacturer hard-coded the higher
00:45:44.750 --> 00:45:50.590
priority for the video card than for
everything else in the system, because I
00:45:50.590 --> 00:45:56.300
don't want your your screen to flicker.
And so when you move a window you actually
00:45:56.300 --> 00:46:04.099
see the late packets coming to your PCI
device. We had to introduce a jitter
00:46:04.099 --> 00:46:14.750
buffer and add more FIFO into the device
to smooth it out. On the other hand the
00:46:14.750 --> 00:46:20.099
Xeon is performing really well. So it's
very optimized. That said, we have tested
00:46:20.099 --> 00:46:28.119
it with discreet card and it outperforms
everything by whooping five seven percent.
00:46:28.119 --> 00:46:38.799
What you get four for the price. So this
is actually the end of the presentation.
00:46:38.799 --> 00:46:43.839
We still have not scheduled any workshop,
but if there if there is any interest in
00:46:43.839 --> 00:46:53.390
actually seeing the device working or if
you interested in learning more about the
00:46:53.390 --> 00:46:58.260
PCI Express in details let us know we'll
schedule something in the next few days.
00:46:58.260 --> 00:47:05.339
That's the end, I think we can proceed
with questions if there are any.
00:47:05.339 --> 00:47:14.950
Applause
Herald: Okay, thank you very much. If you
00:47:14.950 --> 00:47:17.680
are leaving now: please try to leave
quietly because we might have some
00:47:17.680 --> 00:47:22.960
questions and you want to hear them. If
you have questions please line up right
00:47:22.960 --> 00:47:28.819
behind the microphones and I think we'll
just wait because we don't have anything
00:47:28.819 --> 00:47:34.990
from the signal angel. However, if you are
watching on stream you can hop into the
00:47:34.990 --> 00:47:39.500
channels and over social media to ask
questions and they will be answered,
00:47:39.500 --> 00:47:47.890
hopefully. So on that microphone.
Question 1: What's the minimum and maximum
00:47:47.890 --> 00:47:52.170
frequency of the card?
Alexander Chemeris: You mean RF
00:47:52.170 --> 00:47:55.940
frequency?
Question 1: No, the minimum frequency you
00:47:55.940 --> 00:48:05.640
can sample at. the most SDR devices can
only sample at over 50 MHz. Is there a
00:48:05.640 --> 00:48:09.190
similar limitation at your card?
Alexander Chemeris: Yeah, so if you're
00:48:09.190 --> 00:48:15.650
talking about RF frequency it can go
from like almost zero even though that
00:48:15.650 --> 00:48:27.289
works worse below 50MHz and all the way to
3.8GHz if I remember correctly. And in
00:48:27.289 --> 00:48:34.880
terms of the sample rate right now it
works from like about 2 MSPS and to about
00:48:34.880 --> 00:48:40.089
50 right now. But again, we're planning to
get it to these numbers we quoted.
00:48:40.089 --> 00:48:45.720
Herald: Okay. The microphone over there.
Question 2: Thanks for your talk. Did you
00:48:45.720 --> 00:48:48.630
manage to put your Linux kernel driver to
the main line?
00:48:48.630 --> 00:48:53.519
Alexander Chemeris: No, not yet. I mean,
it's not even like fully published. So I
00:48:53.519 --> 00:48:59.019
did not say in the beginning, sorry for
this. We only just manufactured the first
00:48:59.019 --> 00:49:03.830
prototype, which we debugged heavily. So
we are only planning to manufacture the
00:49:03.830 --> 00:49:10.290
second prototype with all these fixes and
then we will release, like, the kernel
00:49:10.290 --> 00:49:16.700
driver and everything. And maybe we'll try
or maybe won't try, haven't decided yet.
00:49:16.700 --> 00:49:18.310
Question 2: Thanks
Herald: Okay...
00:49:18.310 --> 00:49:21.599
Alexander Chemeris: and that will be the
whole other experience.
00:49:21.599 --> 00:49:26.099
Herald: Okay, over there.
Question 3: Hey, looks like you went
00:49:26.099 --> 00:49:30.349
through some incredible amounts of pain to
make this work. So, I was wondering,
00:49:30.349 --> 00:49:34.960
aren't there any simulators at least for
parts of the system, or the PCIe bus for
00:49:34.960 --> 00:49:40.150
the DMA something? Any simulator so that
you can actually first design the system
00:49:40.150 --> 00:49:44.630
there and debug it more easily?
Sergey Kostanbaev: Yes, there are
00:49:44.630 --> 00:49:50.400
available simulators, but the problem's
all there are non-free. So you have to pay
00:49:50.400 --> 00:49:57.109
for them. So yeah and we choose the hard
way.
00:49:57.109 --> 00:49:59.520
Question 3: Okay thanks.
Herald: We have a question from the signal
00:49:59.520 --> 00:50:03.180
angel.
Question 4: Yeah are the FPGA codes, Linux
00:50:03.180 --> 00:50:07.650
driver, and library code, and the design
project files public and if so, did they
00:50:07.650 --> 00:50:13.480
post them yet? They can't find them on
xtrx.io.
00:50:13.480 --> 00:50:17.970
Alexander Chemeris: Yeah, so they're not
published yet. As I said, we haven't
00:50:17.970 --> 00:50:24.579
released them. So, the drivers and
libraries will definitely be available,
00:50:24.579 --> 00:50:28.589
FPGA code... We are considering this
probably also will be available in open
00:50:28.589 --> 00:50:36.359
source. But we will publish them together
with the public announcement of the
00:50:36.359 --> 00:50:42.220
device.
Herald: Ok, that microphone.
00:50:42.220 --> 00:50:46.010
Question 5: Yes. Did you guys see any
signal integrity issues between on the PCI
00:50:46.010 --> 00:50:50.009
bus, or on this bus to the LMS chip, the
Lime microchip, I think, this doing
00:50:50.009 --> 00:50:51.009
the RF ?
AC: Right.
00:50:51.009 --> 00:50:56.359
Question 5: Did you try to measure signal
integrity issues, or... because there were
00:50:56.359 --> 00:51:01.130
some reliability issues, right?
AC: Yeah, we actually... so, PCI. With PCI
00:51:01.130 --> 00:51:02.559
we never had issues, if I remember
correctly.
00:51:02.559 --> 00:51:04.760
SK: No.
AC: I just... it was just working.
00:51:04.760 --> 00:51:10.940
SK: Well, the board is so small, and when
there are small traces there's no problem
00:51:10.940 --> 00:51:14.790
in signal integrity. So it's actually
saved us.
00:51:14.790 --> 00:51:20.599
AC: Yeah. Designing a small board is easier.
Yeah, with the LMS 7, the problem is not
00:51:20.599 --> 00:51:26.099
the signal integrity in terms of
difference in the length of the traces,
00:51:26.099 --> 00:51:37.319
but rather the fact that the signal
degrades over voltage, also over speed in
00:51:37.319 --> 00:51:44.010
terms of voltage, and drops below the
detection level, and all this stuff. We
00:51:44.010 --> 00:51:47.220
use some measurements. I actually wanted
to add some pictures here, but decided
00:51:47.220 --> 00:51:54.359
that's not going to be super interesting.
H: Okay. Microphone over there.
00:51:54.359 --> 00:51:58.359
Question 6: Yes. Thanks for the talk. How
much work would it be to convert the two
00:51:58.359 --> 00:52:05.610
by two SDR into an 8-input logic analyzer
in terms of hard- and software? So, if you
00:52:05.610 --> 00:52:12.289
have a really fast logic analyzer, where
you can record unlimited traces with?
00:52:12.289 --> 00:52:18.980
AC: A logic analyzer...
Q6: So basically it's just also an analog
00:52:18.980 --> 00:52:27.040
digital converter and you largely want
fast sampling and a large amount of memory
00:52:27.040 --> 00:52:30.900
to store the traces.
AC: Well, I just think it's not the best
00:52:30.900 --> 00:52:40.300
use for it. It's probably... I don't know.
Maybe Sergey has any ideas, but I think it
00:52:40.300 --> 00:52:47.549
just may be easier to get high-speed ADC
and replace the Lime chip with a high-
00:52:47.549 --> 00:52:56.720
speed ADC to get what you want, because
the Lime chip has so many things there
00:52:56.720 --> 00:53:01.450
specifically for RF.
SK: Yeah, the main problem you cannot just
00:53:01.450 --> 00:53:09.099
sample original data. You should shift it
over frequency, so you cannot sample
00:53:09.099 --> 00:53:16.619
original signal, and using it for
something else except spectrum analyzing
00:53:16.619 --> 00:53:20.839
is hard.
Q6: OK. Thanks.
00:53:20.839 --> 00:53:25.750
H: OK. Another question from the internet.
Signal angel: Yes. Have you compared the
00:53:25.750 --> 00:53:32.240
sample rate of the ADC of the Lime DA chip
to the USRP ADCs, and if so, how does the
00:53:32.240 --> 00:53:40.160
lower sample rate affect the performance?
AC: So, comparing low sample rate to
00:53:40.160 --> 00:53:49.281
higher sample rate. We haven't done much
testing on the RF performance yet, because
00:53:49.281 --> 00:53:58.440
we were so busy with all this stuff, so we
are yet to see in terms of low bit rates
00:53:58.440 --> 00:54:03.190
versus sample rates versus high sample
rate. Well, high sample rate always gives
00:54:03.190 --> 00:54:09.859
you better performance, but you also get
higher power consumption. So, I guess it's
00:54:09.859 --> 00:54:14.019
the question of what's more more important
for you.
00:54:14.019 --> 00:54:20.440
H: Okay. Over there.
Question 7: I've gathered there is no
00:54:20.440 --> 00:54:25.319
mixer bypass, so you can't directly sample
the signal. Is there a way to use the same
00:54:25.319 --> 00:54:31.720
antenna for send and receive, yet.
AC: Actually, there is... Input for ADC.
00:54:31.720 --> 00:54:38.289
SK: But it's not a bypass, it's a
dedicated pin on LMS chip, and since we're
00:54:38.289 --> 00:54:45.569
very space-constrained, we didn't route
them, so you can not actually bypass it.
00:54:45.569 --> 00:54:50.359
AC: Okay, in our specific hardware, so in
general, so in the LMS chip there is a
00:54:50.359 --> 00:54:58.170
special pin which allows you to drive your
signal directly to ADC without all the
00:54:58.170 --> 00:55:02.950
mixers, filters, all this radio stuff,
just directly to ADC. So, yes,
00:55:02.950 --> 00:55:06.869
theoretically that's possible.
SK: We even thought about this, but it
00:55:06.869 --> 00:55:10.960
doesn't fit this design.
Q7: Okay. And can I share antennas,
00:55:10.960 --> 00:55:15.700
because I have an existing laptop with
existing antennas, but I would use the
00:55:15.700 --> 00:55:22.140
same antenna to send and receive.
AC: Yeah, so, I mean, that's... depends on
00:55:22.140 --> 00:55:25.619
what exactly do you want to do. If you
want a TDG system, then yes, if you
00:55:25.619 --> 00:55:30.869
want an FDG system, then you will have to
put a small duplexer in there, but yeah,
00:55:30.869 --> 00:55:34.839
that's the idea. So you can plug this into
your laptop and use your existing
00:55:34.839 --> 00:55:39.640
antennas. That's one of the ideas of how
to use xtrx.
00:55:39.640 --> 00:55:41.799
Q7: Yeah, because there's all four
connectors.
00:55:41.799 --> 00:55:45.400
AC: Yeah. One thing which I actually
forgot to mention is - I kind of mentioned
00:55:45.400 --> 00:55:53.930
in the slides - is that any other SDRs
which are based on Ethernet or on the USB
00:55:53.930 --> 00:56:02.309
can't work with a CSMA wireless systems,
and the most famous CSMA system is Wi-Fi.
00:56:02.309 --> 00:56:09.259
So, it turns out that because of the
latency between your operating system and
00:56:09.259 --> 00:56:17.569
your radio on USB, you just can't react
fast enough for Wi-Fi to work, because you
00:56:17.569 --> 00:56:23.240
- probably you know that - in Wi-Fi you
carrier sense, and if you sense that the
00:56:23.240 --> 00:56:29.579
spectrum is free, you start transmitting.
Does make a sense when you have huge
00:56:29.579 --> 00:56:36.160
latency, because you all know that... you
know the spectrum was free back then, so,
00:56:36.160 --> 00:56:43.730
with xtrx, you actually can work with CSMA
systems like Wi-Fi, so again it makes it
00:56:43.730 --> 00:56:51.390
possible to have a fully software
implementation of Wi-Fi in your laptop. It
00:56:51.390 --> 00:56:58.660
obviously won't work like as good as your
commercial Wi-Fi, because you will have to
00:56:58.660 --> 00:57:03.839
do a lot of processing on your CPU, but
for some purposes like experimentation,
00:57:03.839 --> 00:57:07.980
for example, for wireless labs and R&D
labs, that's really valuable.
00:57:07.980 --> 00:57:11.400
Q7: Thanks.
H: Okay. Over there.
00:57:11.400 --> 00:57:15.519
Q8: Okay. what PCB design package did you
use?.
00:57:15.519 --> 00:57:17.819
AC: Altium.
SK: Altium, yeah.
00:57:17.819 --> 00:57:22.940
Q8: And I'd be interested in the PCIe
workshop. Would be really great if you do
00:57:22.940 --> 00:57:24.940
this one.
AC: Say this again?
00:57:24.940 --> 00:57:28.069
Q8: Would be really great if you do the
PCI Express workshop.
00:57:28.069 --> 00:57:32.720
AC: Ah. PCI Express workshop. Okay. Thank
you.
00:57:32.720 --> 00:57:36.690
H: Okay, I think we have one more question
from the microphones, and that's you.
00:57:36.690 --> 00:57:42.880
Q9: Okay. Great talk. And again, I would
appreciate a PCI Express workshop, if it
00:57:42.880 --> 00:57:47.190
ever happens. What are these
synchronization options between multiple
00:57:47.190 --> 00:57:55.089
cards. Can you synchronize the ADC clock,
and can you synchronize the presumably
00:57:55.089 --> 00:58:04.609
digitally created IF? SK: Yes, so... so,
unfortunately, just IF synchronization is
00:58:04.609 --> 00:58:10.279
not possible, because Lime chip doesn't
expose a low frequency. But we can
00:58:10.279 --> 00:58:16.000
synchronize digitally. So, we have special
one PPS signal synchronization. We have
00:58:16.000 --> 00:58:25.180
lines for clock synchronization and other
stuff. We can do it in software. So the
00:58:25.180 --> 00:58:31.789
Lime chip has phase correction register,
so when you measure... if there is a phase
00:58:31.789 --> 00:58:35.170
difference, so you can compensate it on
different boards.
00:58:35.170 --> 00:58:39.309
Q9: Tune to a station a long way away and
then rotate the phase until it aligns.
00:58:39.309 --> 00:58:41.819
SK: Yeah.
Q9: Thank you.
00:58:41.819 --> 00:58:46.339
AC: Little tricky, but possible. So,
that's one of our plans for future,
00:58:46.339 --> 00:58:52.819
because we do want to see, like 128 by 128
MIMO at home.
00:58:52.819 --> 00:58:56.060
H: Okay, we have another question from the
internet.
00:58:56.060 --> 00:59:00.450
Signal angel: I actually have two
questions. The first one is: What is the
00:59:00.450 --> 00:59:07.710
expected price after a prototype stage?
And the second one is: Can you tell us
00:59:07.710 --> 00:59:10.400
more about this setup you had for
debugging the PCIe
00:59:10.400 --> 00:59:15.970
issues?
AC: Could you repeat the second question?
00:59:15.970 --> 00:59:20.269
SK: It's ????????????, I think.
SA: It's more about the setup you had for
00:59:20.269 --> 00:59:24.480
debugging the PCIe issues.
SK: Second question, I think it's most
00:59:24.480 --> 00:59:31.200
about our next workshop, because it's a
more complicated setup, so... mostly
00:59:31.200 --> 00:59:35.580
remove everything about its now current
presentation.
00:59:35.580 --> 00:59:39.580
AC: Yeah, but in general, and in terms of
hardware setup, that was our hardware
00:59:39.580 --> 00:59:47.890
setup, so we bought this PCI Express to
Thunderbolt3, we bought the laptop which
00:59:47.890 --> 00:59:53.089
supports Thunderbolt3, and that's how we
were debugging it. So, we don't need, like
00:59:53.089 --> 00:59:57.780
a full-fledged PC, we don't have to
restart it all the time. So, in terms of
00:59:57.780 --> 01:00:06.650
price, we don't have the fixed price yet.
So, all I can say right now is that we are
01:00:06.650 --> 01:00:18.349
targeting no more than your bladeRF or
HackRF devices, and probably even cheaper.
01:00:18.349 --> 01:00:25.210
For some versions.
H: Okay. We are out of time, so thank you
01:00:25.210 --> 01:00:45.079
again Sergey and Alexander.
[Applause]
01:00:45.079 --> 01:00:49.619
[Music]
01:00:49.619 --> 01:00:54.950
subtitles created by c3subtitles.de
in the year 20??. Join, and help us!