1
00:00:03,959 --> 00:00:08,670
[Music]
2
00:00:08,670 --> 00:00:21,900
Herald: Has anyone in here ever worked
with libusb or PI USB? Hands up. Okay. Who
3
00:00:21,900 --> 00:00:32,168
also thinks USB is a pain? laughs Okay.
Sergey and Alexander were here back in at
4
00:00:32,168 --> 00:00:38,769
the 26C3, that's a long time ago. I think
it was back in Berlin, and back then they
5
00:00:38,769 --> 00:00:45,120
presented their first homemade, or not
homemade, SDR, software-defined radio.
6
00:00:45,120 --> 00:00:49,440
This year they are back again and they
want to show us how they implemented
7
00:00:49,440 --> 00:00:55,420
another one, using an FPGA, and to
communicate with it they used PCI Express.
8
00:00:55,420 --> 00:01:01,589
So I think if you thought USB was a pain,
let's see what they can tell us about PCI
9
00:01:01,589 --> 00:01:06,690
Express. A warm round of applause for
Alexander and Sergey for building a high
10
00:01:06,690 --> 00:01:12,430
throughput, low latency, PCIe-based
software-defined radio
11
00:01:12,430 --> 00:01:20,220
[Applause]
Alexander Chemeris: Hi everyone, good
12
00:01:20,220 --> 00:01:30,280
morning, and welcome to the first day of
the Congress. So, just a little bit
13
00:01:30,280 --> 00:01:36,180
background about what we've done
previously and why we are doing what we
14
00:01:36,180 --> 00:01:42,229
are doing right now, is that we started
working with software-defined radios and
15
00:01:42,229 --> 00:01:51,930
by the way, who knows what software
defined radio is? Okay, perfect. laughs
16
00:01:51,930 --> 00:01:59,140
And who ever actually used a software-
defined radio? RTL-SDR or...? Okay, less
17
00:01:59,140 --> 00:02:06,329
people but that's still quite a lot. Okay,
good. I wonder whether anyone here used
18
00:02:06,329 --> 00:02:16,940
more expensive radios like USRPs? Less
people, but okay, good. Cool. So before
19
00:02:16,940 --> 00:02:22,630
2008 I've had no idea what software-
defined radio is, was working with voice
20
00:02:22,630 --> 00:02:30,330
over IP software person, etc., etc., so I
in 2008 I heard about OpenBTS, got
21
00:02:30,330 --> 00:02:40,080
introduced to software-defined radio and I
wanted to make it really work and that's
22
00:02:40,080 --> 00:02:52,250
what led us to today. In 2009 we had to
develop a clock tamer. A hardware which
23
00:02:52,250 --> 00:03:00,170
allows to use, allowed to use USRP1 to run GSM
without problems. If anyone ever tried
24
00:03:00,170 --> 00:03:05,420
doing this without a good clock source
knows what I'm talking about. And we
25
00:03:05,420 --> 00:03:10,550
presented this - it wasn't an SDR it was
just a clock source - we presented this in
26
00:03:10,550 --> 00:03:18,530
2009 in 26C3.
Then I realized that using USRP1 is not
27
00:03:18,530 --> 00:03:23,760
really a good idea, because we wanted to
build a robust, industrial-grade base
28
00:03:23,760 --> 00:03:29,980
stations. So we started developing our own
software defined radio, which we call
29
00:03:29,980 --> 00:03:41,290
UmTRX and it was in - we started started
this in 2011. Our first base stations with
30
00:03:41,290 --> 00:03:51,590
it were deployed in 2013, but I always
wanted to have something really small and
31
00:03:51,590 --> 00:03:59,510
really inexpensive and back then it wasn't
possible. My original idea in 2011, we
32
00:03:59,510 --> 00:04:07,680
were to build a PCI Express card. Mini,
sorry, not PCI Express card but mini PCI
33
00:04:07,680 --> 00:04:10,100
card.
If you remember there were like all the
34
00:04:10,100 --> 00:04:14,470
Wi-Fi cards and mini PCI form factor and I
thought that would be really cool to have
35
00:04:14,470 --> 00:04:22,490
an SDR and mini PCI, so I can plug this
into my laptop or in some embedded PC and
36
00:04:22,490 --> 00:04:31,710
have a nice SDR equipment, but back then
it just was not really possible, because
37
00:04:31,710 --> 00:04:37,939
electronics were bigger and more power
hungry and just didn't work that way, so
38
00:04:37,939 --> 00:04:49,539
we designed UmTRX to work over gigabit
ethernet and it was about that size. So
39
00:04:49,539 --> 00:04:57,300
now we spend this year at designing
something, which really brings me to what
40
00:04:57,300 --> 00:05:05,289
I wanted those years ago, so the XTRX is a
mini PCI Express - again there was no PCI
41
00:05:05,289 --> 00:05:10,460
Express back then, so now it's mini PCI
Express, which is even smaller than PCI, I
42
00:05:10,460 --> 00:05:17,719
mean mini PCI and it's built to be
embedded friendly, so you can plug this
43
00:05:17,719 --> 00:05:23,669
into a single board computer, embedded
single board computer. If you have a
44
00:05:23,669 --> 00:05:28,020
laptop with a mini PCI Express you can
plug this into your laptop and you have a
45
00:05:28,020 --> 00:05:35,210
really small, software-defined radio
equipment. And we really want to make it
46
00:05:35,210 --> 00:05:39,430
inexpensive, that's why I was asking how
many of you have ever worked it with RTL-
47
00:05:39,430 --> 00:05:44,169
SDR, how many of you ever worked with you
USRPs, because the gap between them is
48
00:05:44,169 --> 00:05:53,740
pretty big and we want to really bring the
software-defined radio to masses.
49
00:05:53,740 --> 00:05:59,550
Definitely won't be as cheap as RTL-SDR,
but we try to make it as close as
50
00:05:59,550 --> 00:06:03,330
possible.
And at the same time, so at the size of
51
00:06:03,330 --> 00:06:09,659
RTL-SDR, at the price well higher but,
hopeful hopefully it will be affordable to
52
00:06:09,659 --> 00:06:17,460
pretty much everyone, we really want to
bring high performance into your hands.
53
00:06:17,460 --> 00:06:22,539
And by high performance I mean this is a
full transmit/receive with two channels
54
00:06:22,539 --> 00:06:28,289
transmit, two channels receive, which is
usually called 2x2 MIMO in in the radio
55
00:06:28,289 --> 00:06:37,370
world. The goal was to bring it to 160
megasamples per second, which can roughly
56
00:06:37,370 --> 00:06:44,110
give you like 120 MHz of radio spectrum
available.
57
00:06:44,110 --> 00:06:53,111
So what we were able to achieve is, again
this is mini PCI Express form factor, it
58
00:06:53,111 --> 00:07:01,639
has small Artix7, that's the smallest and
most inexpensive FPGA, which has ability
59
00:07:01,639 --> 00:07:18,029
to work with a PCI Express. It has LMS7000
chip for RFIC, very high performance, very
60
00:07:18,029 --> 00:07:27,449
tightly embedded chip with even a DSP
blocks inside. It has even a GPS chip
61
00:07:27,449 --> 00:07:37,340
here, you can actually on the right upper
side, you can see a GPS chip, so you can
62
00:07:37,340 --> 00:07:44,060
accually synchronize your SDR to GPS for
perfect clock stability,
63
00:07:44,060 --> 00:07:51,389
so you won't have any problems running any
telecommunication systems like GSM, 3G, 4G
64
00:07:51,389 --> 00:07:58,650
due to clock problems, and it also has
interface for SIM cards, so you can
65
00:07:58,650 --> 00:08:06,330
actually create a software-defined radio
modem and run other open source projects
66
00:08:06,330 --> 00:08:15,840
to build one in a four LT called SRSUI, if
you're interested, etc., etc. so really
67
00:08:15,840 --> 00:08:22,080
really tightly packed one. And if you put
this into perspective: that's how it all
68
00:08:22,080 --> 00:08:30,669
started in 2006 and that's what you have
ten years later. It's pretty impressive.
69
00:08:30,669 --> 00:08:36,840
applause
Thanks. But I think it actually applies to
70
00:08:36,840 --> 00:08:40,320
the whole industry who is working on
shrinking the sizes because we just put
71
00:08:40,320 --> 00:08:48,890
stuff on the PCB, you know. We're not
building the silicon itself. Interesting
72
00:08:48,890 --> 00:08:54,701
thing is that we did the first approach:
we said let's pack everything, let's do a
73
00:08:54,701 --> 00:09:03,180
very tight PCB design. We did an eight
layer PCB design and when we send it to a
74
00:09:03,180 --> 00:09:10,490
fab to estimate the cost it turned out
it's $15,000 US per piece. Well in small
75
00:09:10,490 --> 00:09:18,940
volumes obviously but still a little bit
too much. So we had to redesign this and
76
00:09:18,940 --> 00:09:26,712
the first thing which we did is we still
kept eight layers, because in our
77
00:09:26,712 --> 00:09:32,810
experience number of layers nowadays have
only minimal impact on the cost of the
78
00:09:32,810 --> 00:09:42,450
device. So like six, eight layers - the
price difference is not so big. But we did
79
00:09:42,450 --> 00:09:52,190
complete rerouting and only kept 2-Deep
MicroVIAs and never use the buried VIAs.
80
00:09:52,190 --> 00:09:57,240
So this make it much easier and much
faster for the fab to manufacture it and
81
00:09:57,240 --> 00:10:03,740
the price suddenly went five, six times
down and in volume again it will be
82
00:10:03,740 --> 00:10:18,140
significantly cheaper. And that's just for
geek porn how PCB looks inside. So now
83
00:10:18,140 --> 00:10:25,140
let's go into real stuff. So PCI Express:
why did we choose PCI Express? As it was
84
00:10:25,140 --> 00:10:33,310
said USB is a pain in the ass. You can't
really use USB in industrial systems. For
85
00:10:33,310 --> 00:10:40,510
a whole variety of reasons just unstable.
So we did use Ethernet for many years
86
00:10:40,510 --> 00:10:47,190
successfully but Ethernet has one problem:
first of all inexpensive Ethernet is only
87
00:10:47,190 --> 00:10:51,780
one gigabit and one gigabit does not offer
you enough bandwidth to carry all the data
88
00:10:51,780 --> 00:10:59,720
we want, plus its power-hungry etc. etc.
So PCI Express is really a good choice
89
00:10:59,720 --> 00:11:06,420
because it's low power, it has low
latency, it has very high bandwidth and
90
00:11:06,420 --> 00:11:11,380
it's available almost universally. When we
started looking into this we realize that
91
00:11:11,380 --> 00:11:17,320
even ARM boards, some of ARM boards have
PCI Express, mini PCI Express slots, which
92
00:11:17,320 --> 00:11:26,560
was a big surprise for me for example.
So the problems is that unlike USB you do
93
00:11:26,560 --> 00:11:36,540
need to write your own kernel driver for
this and there's no way around. And it is
94
00:11:36,540 --> 00:11:41,110
really hard to write this driver
universally so we are writing it obviously
95
00:11:41,110 --> 00:11:45,300
for Linux because they're working with
embedded systems, but if we want to
96
00:11:45,300 --> 00:11:51,030
rewrite it for Windows or for macOS we'll
have to do a lot of rewriting. So we focus
97
00:11:51,030 --> 00:11:57,250
on what we want on Linux only right now.
And now the hardest part: debugging is
98
00:11:57,250 --> 00:12:02,580
really non-trivial. One small error and
your PC is completely hanged because you
99
00:12:02,580 --> 00:12:08,750
use something wrong. And you have to
reboot it and restart it. That's like
100
00:12:08,750 --> 00:12:15,500
debugging kernel but sometimes even
harder. To make it worse there is no
101
00:12:15,500 --> 00:12:19,400
really easy-to-use plug-and-play
interface. If you want to restart;
102
00:12:19,400 --> 00:12:24,250
normally, when you when you develop a PCI
Express card, when you want when you want
103
00:12:24,250 --> 00:12:31,050
to restart it you have to restart your
development machine. Again not a nice way,
104
00:12:31,050 --> 00:12:39,420
it's really hard. So the first thing we
did is we found, that we can use
105
00:12:39,420 --> 00:12:47,100
Thunderbolt 3 which is just recently
released, and it has ability to work
106
00:12:47,100 --> 00:12:57,200
directly with PCI Express bus. So it
basically has a mode in which it converts
107
00:12:57,200 --> 00:13:01,410
a PCI Express into plug-and-play
interface. So if you have a laptop which
108
00:13:01,410 --> 00:13:09,450
supports Thunderbolt 3 then you can use
this to do plug and play your - plug or
109
00:13:09,450 --> 00:13:16,480
unplug your device to make your
development easier. There are always
110
00:13:16,480 --> 00:13:23,620
problems: there's no easy way, there's no
documentation. Thunderbolt is not
111
00:13:23,620 --> 00:13:27,380
compatible with Thunderbolt. Thunderbold 3
is not compatible with Thunderbold 2.
112
00:13:27,380 --> 00:13:33,760
So we had to buy a special laptop with
Thunderbold 3 with special cables like all
113
00:13:33,760 --> 00:13:40,120
this all this hard stuff. And if you
really want to get documentation you have
114
00:13:40,120 --> 00:13:47,500
to sign NDA and send a business plan to
them so they can approve that your
115
00:13:47,500 --> 00:13:50,670
business makes sense.
laughter
116
00:13:50,670 --> 00:13:58,640
I mean... laughs So we actually opted
out. We set not to go through this, what
117
00:13:58,640 --> 00:14:05,340
we did is we found that someone is
actually making PCI Express to Thunderbolt
118
00:14:05,340 --> 00:14:10,550
3 converters and selling them as dev
boards and that was a big relief because
119
00:14:10,550 --> 00:14:16,740
it saved us lots of time, lots of money.
You just order it from from some from some
120
00:14:16,740 --> 00:14:24,920
Asian company. And yeah this is how it
looks like this converter. So you buy it,
121
00:14:24,920 --> 00:14:29,970
like several pieces you can plug in your
PCI Express card there and you plug this
122
00:14:29,970 --> 00:14:38,330
into your laptop. And this is the with
XTRX already plugged into it. Now the only
123
00:14:38,330 --> 00:14:50,160
problem we found is that typically UEFI
has a security control enabled, so that
124
00:14:50,160 --> 00:14:56,700
any random thunderbold device can't hijack
your PCI bus and can't get access to your
125
00:14:56,700 --> 00:15:01,740
kernel memory and do some bad stuff. Which
is a good idea - the only problem is that
126
00:15:01,740 --> 00:15:06,730
there is, it's not fully implemented in
Linux. So under Windows if you plug in a
127
00:15:06,730 --> 00:15:11,690
device which is which has no security
features, which is not certified, it will
128
00:15:11,690 --> 00:15:16,510
politely ask you like: "Do you really
trust this device? Do you want to use it?"
129
00:15:16,510 --> 00:15:21,940
you can say "yes". Under Linux it just
does not work. laughs So we spend some
130
00:15:21,940 --> 00:15:25,730
time trying to figure out how to get
around this. Right, some patches from
131
00:15:25,730 --> 00:15:30,370
Intel which are not mainline and we were
not able to actually get them work. So we
132
00:15:30,370 --> 00:15:38,980
just had to disable all this security
measure in the laptop. So be aware that
133
00:15:38,980 --> 00:15:46,610
this is the case and we suspect that happy
users of Apple might not be able to do
134
00:15:46,610 --> 00:15:53,630
this because Apple don't have BIOS so it
probably can't disable this feature. So
135
00:15:53,630 --> 00:16:01,820
probably good incentive for someone to
actually finish writing the driver.
136
00:16:01,820 --> 00:16:08,130
So now to the goal: so we wanted to, we
want to achieve 160 mega samples per
137
00:16:08,130 --> 00:16:13,550
second, 2x2 MIMO, which means two
transceiver, two transmit, two receive
138
00:16:13,550 --> 00:16:24,040
channels at 12 bits, which is roughly 7.5
Gbit/s. So first result when we plug this
139
00:16:24,040 --> 00:16:26,230
when we got this board on the fab it
didn't work
140
00:16:26,230 --> 00:16:30,430
Sergey Kostanbaev mumbles: as expected
Alexander Chemeris: yes as expected so the
141
00:16:30,430 --> 00:16:39,750
first the interesting thing we realized is
that: first of all the FPGA has Hardware
142
00:16:39,750 --> 00:16:47,210
blocks for talking to a PCI Express which
was called GTP which basically implement
143
00:16:47,210 --> 00:16:56,850
like a PCI Express serial physical layer
but the thing is the numbering is reversed
144
00:16:56,850 --> 00:17:04,319
in the in PCI Express in FPGA and we did
not realize this so we had to do very very
145
00:17:04,319 --> 00:17:10,619
fine soldiering to actually swap the
laughs swap the lanes you can see this
146
00:17:10,619 --> 00:17:18,490
very fine work there.
We also found that one of the components
147
00:17:18,490 --> 00:17:28,870
was deadbug which is a well-known term for
chips which design stage are placed at
148
00:17:28,870 --> 00:17:35,960
mirrored so we mirrored occasionally
mirrored that they pin out so we had to
149
00:17:35,960 --> 00:17:41,880
solder it upside down and if you can
realize how small it is you can also
150
00:17:41,880 --> 00:17:49,419
appreciate the work done. And what's funny
when I was looking at dead bugs I actually
151
00:17:49,419 --> 00:17:56,929
found a manual from NASA which describes
how to properly soldier dead bugs to get
152
00:17:56,929 --> 00:18:00,679
it approved.
audience laughs
153
00:18:00,679 --> 00:18:08,230
So this is the link I think you can go
there and enjoy it's also fun stuff there.
154
00:18:08,230 --> 00:18:17,379
So after fixing all of this our next
attempt this kind of works. So next stage
155
00:18:17,379 --> 00:18:23,340
is debugging the FPGA code, which has to
talk to PCI Express and PCI Express has to
156
00:18:23,340 --> 00:18:28,320
talk to Linux kernel and the kernel has to
talk to the driver, driver has talked to
157
00:18:28,320 --> 00:18:37,749
the user space. So peripherals are easy so
the UART SPIs we've got to work almost
158
00:18:37,749 --> 00:18:44,799
immediately no problems with that, but DMA
was a real beast. So we spent a lot of
159
00:18:44,799 --> 00:18:52,660
time trying to get DMA to work and the
problem is that with DMA it's on FPGA so
160
00:18:52,660 --> 00:18:59,730
you can't just place a breakpoint like you
do in C or C++ or in other languages it's
161
00:18:59,730 --> 00:19:07,480
real-time system running on system like
it's real-time hardware, which is running
162
00:19:07,480 --> 00:19:16,351
on the fabric so you we had to Sergey was
mainly developing this had to write a lot
163
00:19:16,351 --> 00:19:22,779
of small test benches and and test
everything piece by piece.
164
00:19:22,779 --> 00:19:31,480
So all parts of the DMA code we had was
wrapped into a small test bench which was
165
00:19:31,480 --> 00:19:39,720
emulating all the all the tricks and as
classics predicted it took about five to
166
00:19:39,720 --> 00:19:47,679
ten times more than actually writing the
code. So we really blew up our and
167
00:19:47,679 --> 00:19:54,529
predicted timelines by doing this, but the
end we've got really stable stable work.
168
00:19:54,529 --> 00:20:03,760
So some suggestions for anyone who will
try to repeat this exercise is there is a
169
00:20:03,760 --> 00:20:09,590
logic analyzer built-in to Xilinx and you
can use, it it's nice it's, sometimes it's
170
00:20:09,590 --> 00:20:15,960
very helpful but you can't debug
transient box, which are coming out at
171
00:20:15,960 --> 00:20:22,990
when some weird conditions are coming up.
So you have to implement some read back
172
00:20:22,990 --> 00:20:28,809
registers which shows important statistic
like important data about how your system
173
00:20:28,809 --> 00:20:35,340
behaves, in our case it's various counters
on the DMA interface. So you can actually
174
00:20:35,340 --> 00:20:40,950
see kind of see what's happening with your
with your data: Is it received? Is it
175
00:20:40,950 --> 00:20:46,269
sent? How much is and how much is
received? So like for example, we can see
176
00:20:46,269 --> 00:20:53,559
when we saturate the bus or when actually
is an underrun so host is not providing
177
00:20:53,559 --> 00:20:57,389
data fast enough, so we can at least
understand whether it's a host problem or
178
00:20:57,389 --> 00:21:01,769
whether it's an FPGA, problem on which
part we do we debug next because again:
179
00:21:01,769 --> 00:21:07,770
it's a very multi layer problem you start
with FPGA, PCI Express, kernel, driver,
180
00:21:07,770 --> 00:21:15,340
user space, and any part can fail. so you
can't work blind like this. So again the
181
00:21:15,340 --> 00:21:23,179
goal was to get 160 MSPS with the first
implementation we could 2 MSPS: roughly 60
182
00:21:23,179 --> 00:21:30,220
times slower.
The problem is that software just wasn't
183
00:21:30,220 --> 00:21:36,149
keeping up and wasn't sending data fast
enough. So it was like many things done
184
00:21:36,149 --> 00:21:41,390
but the most important parts is: use real-
time priority if you want to get very
185
00:21:41,390 --> 00:21:46,940
stable results and well fix software bugs.
And one of the most important bugs we had
186
00:21:46,940 --> 00:21:54,240
was that DMA buffers were not freed in
proper time immediately so they were busy
187
00:21:54,240 --> 00:21:59,429
for longer than they should be, which
introduced extra cycles and basically just
188
00:21:59,429 --> 00:22:06,009
reduced the bandwidth.
At this point let's talk a little bit
189
00:22:06,009 --> 00:22:14,389
about how to implement a high-performance
driver for Linux, because if you want to
190
00:22:14,389 --> 00:22:20,870
get real real performance you have to
start with the right design. There are
191
00:22:20,870 --> 00:22:26,610
basically three approaches and the whole
spectrum in between; like two approaches
192
00:22:26,610 --> 00:22:33,649
and the whole spectrum in between, which
is where you can refer to three. The first
193
00:22:33,649 --> 00:22:41,529
approach is full kernel control, in which
case kernel driver not only is on the
194
00:22:41,529 --> 00:22:45,701
transfer, it actually has all the logics
of controlling your device and all the
195
00:22:45,701 --> 00:22:52,490
export ioctl to the user space and
that's the kind of a traditional way of
196
00:22:52,490 --> 00:22:57,669
writing drivers. Your your user space is
completely abstracted from all the
197
00:22:57,669 --> 00:23:07,029
details. The problem is that this is
probably the slowest way to do it. The
198
00:23:07,029 --> 00:23:14,340
other way is what's called the "zero cup
interface": your only control is held in
199
00:23:14,340 --> 00:23:21,380
the kernel and data is provided, the raw
data is provided to user space "as-is". So
200
00:23:21,380 --> 00:23:27,919
you avoid memory copy which make it
faster. But still not fast enough if you
201
00:23:27,919 --> 00:23:34,279
really want to achieve maximum
performance, because you still have
202
00:23:34,279 --> 00:23:40,980
context switches between the kernel and
the user space. The most... the fastest
203
00:23:40,980 --> 00:23:47,289
approach possible is to have full user
space implementation when kernel just
204
00:23:47,289 --> 00:23:53,059
exposed everything and says "now you do it
yourself" and you have no you have no
205
00:23:53,059 --> 00:24:02,429
context switches, like almost no, and you
can really optimize everything. So what
206
00:24:02,429 --> 00:24:08,850
is... what are the problems with this?
The pro the pros I already mentioned: no
207
00:24:08,850 --> 00:24:13,539
no switches between kernel user space,
it's very low latency because of this as
208
00:24:13,539 --> 00:24:20,980
well, it's very high bandwidth. But if you
are not interested in getting the very
209
00:24:20,980 --> 00:24:27,940
high performance, the most performance, and
you just want to have like some little,
210
00:24:27,940 --> 00:24:33,299
like say low bandwidth performance, then
you will have to add hacks, because you
211
00:24:33,299 --> 00:24:36,710
can't get notifications of the kernel that
resources available is more data
212
00:24:36,710 --> 00:24:45,570
available. It also makes it vulnerable
vulnerable because if user space can
213
00:24:45,570 --> 00:24:55,310
access it, then it can do whatever it
want. We at the end decided that... one
214
00:24:55,310 --> 00:25:02,590
more important thing: how to actually to
get the best performance out of out of the
215
00:25:02,590 --> 00:25:10,299
bus. This is a very (?)(?) set as we want
to poll your device or not to poll and get
216
00:25:10,299 --> 00:25:14,259
notified. What is polling? I guess
everyone as programmer understands it, so
217
00:25:14,259 --> 00:25:18,019
polling is when you asked repeatedly: "Are
you ready?", "Are you ready?", "Are you
218
00:25:18,019 --> 00:25:20,369
ready?" and when it's ready you get the
data immediately.
219
00:25:20,369 --> 00:25:25,259
It's basically a busy loop of your you
just constantly asking device what's
220
00:25:25,259 --> 00:25:33,350
happening. You need to dedicate a full
core, and thanks God we have multi-core
221
00:25:33,350 --> 00:25:39,519
CPUs nowadays, so you can dedicate the
full core to this polling and you can just
222
00:25:39,519 --> 00:25:45,539
pull constantly. But again if you don't
need this highest performance, you just
223
00:25:45,539 --> 00:25:53,190
need to get something, then you will be
wasting a lot of CPU resources. At the end
224
00:25:53,190 --> 00:26:00,429
we decided to do a combined architecture
of your, it is possible to pull but
225
00:26:00,429 --> 00:26:05,500
there's also a chance and to get
notification from a kernel to for for
226
00:26:05,500 --> 00:26:11,049
applications, which recover, which needs
low bandwidth, but also require a better
227
00:26:11,049 --> 00:26:17,480
CPU performance. Which I think is the best
way if you are trying to target both
228
00:26:17,480 --> 00:26:30,850
worlds. Very quickly: the architecture of
system. We try to make it very very
229
00:26:30,850 --> 00:26:50,730
portable so and flexible. There is a
kernel driver, which talks to low-level
230
00:26:50,730 --> 00:26:55,690
library which implements all this logic,
which we took out of the driver: to
231
00:26:55,690 --> 00:27:01,309
control the
PCI Express, to work with DMA, to provide
232
00:27:01,309 --> 00:27:09,360
all the... to hide all the details of the
actual bus implementation.
233
00:27:09,360 --> 00:27:17,169
And then there is a high-level library
which talks to this low-level library and
234
00:27:17,169 --> 00:27:22,179
also to libraries which implement control
of actual peripherals, and most
235
00:27:22,179 --> 00:27:28,919
importantly to the library which
implements control over our RFIC chip.
236
00:27:28,919 --> 00:27:35,119
This way it's very modular, we can replace
PCI Express with something else later, we
237
00:27:35,119 --> 00:27:46,049
might be able to port it to other
operating systems, and that's the goal.
238
00:27:46,049 --> 00:27:50,059
Another interesting issue is: when you
start writing the Linux kernel driver you
239
00:27:50,059 --> 00:27:57,119
very quickly realize that while LDD, which
is a classic book for a Linux driver,
240
00:27:57,119 --> 00:28:02,220
writing is good and it will give you a
good insight; it's not actually up-to-
241
00:28:02,220 --> 00:28:08,609
date. It's more than ten years old and
there's all of new interfaces which are
242
00:28:08,609 --> 00:28:14,809
not described there, so you have to resort
to reading the manuals and all the
243
00:28:14,809 --> 00:28:20,409
documentation in the kernel itself. Well
at least you get the up-to-date
244
00:28:20,409 --> 00:28:31,989
information. The decisions we made is to
make everything easy. We use TTY for GPS
245
00:28:31,989 --> 00:28:38,090
and so you can really attach a pretty much
any application which talks to GPS. So all
246
00:28:38,090 --> 00:28:45,970
of existing applications can just work out
of the box. And we also wanted to be able
247
00:28:45,970 --> 00:28:54,879
to synchronize system clock to GPS, so we
get automatic log synchronization across
248
00:28:54,879 --> 00:28:59,009
multiple systems, which is very important
when we are deploying many, many devices
249
00:28:59,009 --> 00:29:07,090
around the world.
We plan to do two interfaces, one as key
250
00:29:07,090 --> 00:29:15,919
PPS and another is a DCT, because DCT line
on the UART exposed over TTY. Because
251
00:29:15,919 --> 00:29:20,259
again we found that there are two types of
applications: one to support one API,
252
00:29:20,259 --> 00:29:25,539
others that support other API and there is
no common thing so we have to support
253
00:29:25,539 --> 00:29:38,649
both. As we described, we want to have
polls so we can get notifications of the
254
00:29:38,649 --> 00:29:48,130
kernel when data is available and we don't
need to do real busy looping all the time.
255
00:29:48,130 --> 00:29:55,789
After all the software optimizations we've
got to like 10 MSPS: still very, very far
256
00:29:55,789 --> 00:30:02,369
from what we want to achieve.
Now there should have been a lot of
257
00:30:02,369 --> 00:30:06,570
explanations about PCI Express, but when
we actually wrote everything we wanted to
258
00:30:06,570 --> 00:30:13,999
say we realize, it's just like a full two
hours talk just on PCI Express. So we are
259
00:30:13,999 --> 00:30:17,760
not going to give it here, I'll just give
some highlights which are most
260
00:30:17,760 --> 00:30:23,889
interesting. If you if there is real
interest, we can set up a workshop and
261
00:30:23,889 --> 00:30:32,340
some of the later days and talking more
details about PCI Express specifically.
262
00:30:32,340 --> 00:30:38,549
The thing is there is no open source cores
for PCI Express, which are optimized for
263
00:30:38,549 --> 00:30:48,010
high performance, real time applications.
There is Xillybus which as I understand is
264
00:30:48,010 --> 00:30:53,350
going to be open source, but they provide
you a source if you pay them. It's very
265
00:30:53,350 --> 00:30:59,610
popular because it's very very easy to do,
but it's not giving you performance. If I
266
00:30:59,610 --> 00:31:04,980
remember correctly the best it can do is
maybe like 50 percent bus saturation.
267
00:31:04,980 --> 00:31:10,800
So there's also Xilinx implementation, but
if you are using Xilinx implementation
268
00:31:10,800 --> 00:31:21,049
with AXI bus than you're really locked in
with AXI bus with Xilinx. And it also not
269
00:31:21,049 --> 00:31:25,001
very efficient in terms of resources and
if you remember we want to make this very,
270
00:31:25,001 --> 00:31:30,029
very inexpensive. So our goal is to you
... is to be able to fit everything in the
271
00:31:30,029 --> 00:31:38,499
smallest Arctic's 7 FPGA, and that's quite
challenging with all the stuff in there
272
00:31:38,499 --> 00:31:47,649
and we just can't waste resources. So
decision is to write your own PCI Express
273
00:31:47,649 --> 00:31:53,039
implementation. That's how it looks like.
I'm not going to discuss it right now.
274
00:31:53,039 --> 00:31:59,950
There are several iterations. Initially it
looked much simpler, turned out not to
275
00:31:59,950 --> 00:32:06,100
work well.
So some interesting stuff about PCI
276
00:32:06,100 --> 00:32:12,749
Express which we stumbled upon is that it
was working really well on Atom which is
277
00:32:12,749 --> 00:32:17,460
our main development platform because we
are doing a lot of embedded stuff. Worked
278
00:32:17,460 --> 00:32:26,479
really well. When we try to plug this into
core i7 just started hanging once in a
279
00:32:26,479 --> 00:32:35,090
while. So after like several not days
maybe with debugging, Sergey found that
280
00:32:35,090 --> 00:32:39,330
very interesting statement in the standard
which says that value is zero in byte
281
00:32:39,330 --> 00:32:45,869
count actually stands not for zero bytes
but for 4096 bytes.
282
00:32:45,869 --> 00:32:58,739
I mean that's a really cool optimization.
So another thing is completion which is a
283
00:32:58,739 --> 00:33:03,639
term in PCI Express basically for
acknowledgment which also can carry some
284
00:33:03,639 --> 00:33:12,429
data back to your request. And sometimes
if you're not sending completion, device
285
00:33:12,429 --> 00:33:20,740
just hangs. And what happens is that in
this case due to some historical heritage
286
00:33:20,740 --> 00:33:29,549
of x86 it just starts returning you FFF.
And if you have a register which says: „Is
287
00:33:29,549 --> 00:33:35,470
your device okay?“ and this register shows
one to say „The device is okay“, guess
288
00:33:35,470 --> 00:33:38,500
what will happen?
You will be always reading that your
289
00:33:38,500 --> 00:33:46,590
device is okay. So the suggestion is not
to use one as the status for okay and use
290
00:33:46,590 --> 00:33:52,790
either zero or better like a two-beat
sequence. So you are definitely sure that
291
00:33:52,790 --> 00:34:03,659
you are okay and not getting FFF's. So
when you have a device which again may
292
00:34:03,659 --> 00:34:10,440
fail at any of the layers, you just got
this new board, it's really hard, it's
293
00:34:10,440 --> 00:34:17,639
really hard to debug because of memory
corruption. So we had a software bug and
294
00:34:17,639 --> 00:34:25,099
it was writing DMA addresses
incorrectly and we were wondering why we
295
00:34:25,099 --> 00:34:32,179
are not getting any data in our buffers at
the same time. After several starts,
296
00:34:32,179 --> 00:34:41,159
operating system just crashes. Well, that's
the reason why there is this UEFI
297
00:34:41,159 --> 00:34:47,199
protection which prevents you from
plugging in devices like this into your
298
00:34:47,199 --> 00:34:52,270
computer. Because it was basically writing
data, like random data into random
299
00:34:52,270 --> 00:35:00,299
portions of your memory. So a lot of
debugging, a lot of tests and test benches
300
00:35:00,299 --> 00:35:10,589
and we were able to find this. And another
thing is if you deinitialize your driver
301
00:35:10,589 --> 00:35:15,250
incorrectly, and that's what's happening
when you have plug-and-play device, which
302
00:35:15,250 --> 00:35:22,119
you can plug and unplug, then you may end
up in a situation of your ... you are
303
00:35:22,119 --> 00:35:28,039
trying to write into memory which is
already freed by approaching system and
304
00:35:28,039 --> 00:35:35,960
used for something else. Very well-known
problem but it also happens here. So there
305
00:35:35,960 --> 00:35:50,549
... why DMA is really hard is because it
has this completion architecture for
306
00:35:50,549 --> 00:35:56,440
writing for ... sorry ... for reading
data. Writes are easy. You just send the
307
00:35:56,440 --> 00:36:00,460
data, you forget about it. It's a fire-
and-forget system. But for reading you
308
00:36:00,460 --> 00:36:10,420
really need to get your data back. And the
thing is, it looks like this. You really
309
00:36:10,420 --> 00:36:16,020
hope that there would be some pointing
device here. But basically on the top left
310
00:36:16,020 --> 00:36:24,240
you can see requests for read and on the
right you can see completion transactions.
311
00:36:24,240 --> 00:36:29,890
So basically each transaction can be and
most likely will be split into multiple
312
00:36:29,890 --> 00:36:38,900
transactions. So first of all you have to
collect all these pieces and like write
313
00:36:38,900 --> 00:36:46,210
them into proper parts of the memory.
But that's not all. The thing is the
314
00:36:46,210 --> 00:36:53,369
latency between request and completion is
really high. It's like 50 cycles. So if
315
00:36:53,369 --> 00:36:58,990
you have a single, only single transaction
in fly you will get really bad
316
00:36:58,990 --> 00:37:03,900
performance. You do need to have multiple
transactions in flight. And the worst
317
00:37:03,900 --> 00:37:13,170
thing is that transactions can return data
in random order. So it's a much more
318
00:37:13,170 --> 00:37:19,820
complicated state machine than we expected
originally. So when I said, you know, the
319
00:37:19,820 --> 00:37:25,589
architecture was much simpler originally,
we don't have all of this and we had to
320
00:37:25,589 --> 00:37:31,670
realize this while implementing. So again
here was a whole description of how
321
00:37:31,670 --> 00:37:41,200
exactly this works. But not this time. So
now after all these optimizations we've
322
00:37:41,200 --> 00:37:48,859
got 20 mega samples per second which is
just six times lower than what we are
323
00:37:48,859 --> 00:37:59,599
aiming at. So now the next thing is PCI
Express lanes scalability. So PCI Express
324
00:37:59,599 --> 00:38:07,220
is a serial bus. So it has multiple lanes
and they allow you to basically
325
00:38:07,220 --> 00:38:14,350
horizontally scale your bandwidth. One
lane is like x, than two lane is 2x, four
326
00:38:14,350 --> 00:38:20,160
lane is 4x. So the more lanes you have the
more performance you are getting out of
327
00:38:20,160 --> 00:38:23,970
your, out of your bus. So the more
bandwidth you're getting out of your bus.
328
00:38:23,970 --> 00:38:31,700
Not performance. So the issue is that
typical a mini PCI Express, so the mini
329
00:38:31,700 --> 00:38:38,600
PCI Express standard only standardized one
lane. And second lane is left as optional.
330
00:38:38,600 --> 00:38:46,099
So most motherboards don't support this.
There are some but not all of them. And we
331
00:38:46,099 --> 00:38:52,370
really wanted to get this done. So we
designed a special converter board which
332
00:38:52,370 --> 00:38:57,530
allows you to plug your mini PCI Express
into a full-size PCI Express and
333
00:38:57,530 --> 00:39:06,790
get two lanes working. And we're also
planning to have a similar board which
334
00:39:06,790 --> 00:39:12,660
will have multiple slots so you will be
able to get multiple XTRX-SDRs on to the
335
00:39:12,660 --> 00:39:21,270
same, onto the same carrier board and plug
this into let's say PCI Express 16x and
336
00:39:21,270 --> 00:39:29,059
you will get like really a lot of ... SDR
... a lot of IQ data which then will be
337
00:39:29,059 --> 00:39:38,760
your problem how to, how to process. So
with two x's it's about twice performance
338
00:39:38,760 --> 00:39:48,930
so we are getting fifty mega samples per
second. And that's the time to really cut
339
00:39:48,930 --> 00:39:59,230
the fat because the real sample size of
LMS7 is 12 bits and we are transmitting 16
340
00:39:59,230 --> 00:40:06,930
because it's easier. Because CPU is
working on 8, 16, 32. So we originally
341
00:40:06,930 --> 00:40:13,770
designed the driver to support 8 bit, 12
bit and 16 bit to be able to do this
342
00:40:13,770 --> 00:40:23,800
scaling. And for the test we said okay
let's go from 16 to 8 bit. We'll lose
343
00:40:23,800 --> 00:40:32,960
some dynamic range but who cares these
days. Still stayed the same, it's still 50
344
00:40:32,960 --> 00:40:41,980
mega samples per second, no matter what we
did. And that was a lot of interesting
345
00:40:41,980 --> 00:40:49,580
debugging going on. And we realized that
we actually made another, not a really
346
00:40:49,580 --> 00:40:58,720
mistake. We didn't, we didn't really know
this when we designed. But we should have
347
00:40:58,720 --> 00:41:04,450
used a higher voltage for this high speed
bus to get it to the full performance. And
348
00:41:04,450 --> 00:41:12,619
at 1.8 it was just degrading too fast and
the bus itself was not performing well. So
349
00:41:12,619 --> 00:41:21,859
our next prototype will be using higher
voltage specifically for this bus. And
350
00:41:21,859 --> 00:41:26,559
this is kind of stuff which makes
designing hardware for high speed really
351
00:41:26,559 --> 00:41:32,210
hard because you have to care about
coherence of the parallel buses on your,
352
00:41:32,210 --> 00:41:38,550
on your system. So at the same time we do
want to keep 1.8 volts for everything else
353
00:41:38,550 --> 00:41:43,480
as much as possible. Because another
problem we are facing with this device is
354
00:41:43,480 --> 00:41:47,069
that by the standard mini PCI Express
allows only like ...
355
00:41:47,069 --> 00:41:51,220
Sergey Kostanbaev: ... 2.5 ...
Alexander Chemeris: ... 2.5 watts of power
356
00:41:51,220 --> 00:41:58,369
consumption, no more. And that's we were,
we were very lucky that LMS7 has such so
357
00:41:58,369 --> 00:42:04,460
good, so good power consumption
performance. We actually had some extra
358
00:42:04,460 --> 00:42:10,049
space to have FPGA and GPS and all this
stuff. But we just can't let the power
359
00:42:10,049 --> 00:42:14,880
consumption go up. Our measurements on
this device showed about ...
360
00:42:14,880 --> 00:42:18,510
Sergey Kostanbaev: ... 2.3 ...
Alexander Chemeris: ... 2.3 watts of power
361
00:42:18,510 --> 00:42:27,220
consumption. So we are like at the limit
at this point. So when we fix the bus with
362
00:42:27,220 --> 00:42:31,420
the higher voltage, you know it's a
theoretical exercise, because we haven't
363
00:42:31,420 --> 00:42:38,000
done this yet, that's plenty to happen in
a couple months. We should be able to get
364
00:42:38,000 --> 00:42:47,330
to this numbers which was just 1.2 times
slower. Then the next thing will be to fix
365
00:42:47,330 --> 00:42:55,550
another issue which we made at the very
beginning: we have procured a wrong chip.
366
00:42:55,550 --> 00:43:05,270
Just one digit difference, you can see
it's highlighted in red and green, and
367
00:43:05,270 --> 00:43:13,230
this chip it supports only a generation 1
PCI Express which is twice slower than
368
00:43:13,230 --> 00:43:18,190
generation 2 PCI Express.
So again, hopefully we'll replace the chip
369
00:43:18,190 --> 00:43:30,140
and just get very simple doubling of the
performance. Still it will be slower than
370
00:43:30,140 --> 00:43:39,770
we wanted it to be and here is what comes
like practical versus theoretical numbers.
371
00:43:39,770 --> 00:43:47,119
Well as every bus it has it has overheads
and one of the things which again we
372
00:43:47,119 --> 00:43:51,279
realized when we were implementing this
is, that even though the standard
373
00:43:51,279 --> 00:43:58,910
standardized is the payload size of 4kB,
actual implementations are different. For
374
00:43:58,910 --> 00:44:08,390
example desktop computers like Intel Core
or Intel Atom they only have 128 byte
375
00:44:08,390 --> 00:44:18,740
payload. So there is much more overhead
going on the bus to transfer data and even
376
00:44:18,740 --> 00:44:29,180
theoretically you can only achieve 87%
efficiency. And on Xeon we tested and we
377
00:44:29,180 --> 00:44:37,110
found that they're using 256 payload size
and this can give you like a 92%
378
00:44:37,110 --> 00:44:45,130
efficiency on the bus and this is before
the overhead so the real reality is even
379
00:44:45,130 --> 00:44:53,180
worse. An interesting thing which we also
did not expect, is that we originally were
380
00:44:53,180 --> 00:45:02,849
developing on Intel Atom and everything
was working great. When we plug this into
381
00:45:02,849 --> 00:45:10,720
laptop like Core i7 multi-core really
powerful device, we didn't expect that it
382
00:45:10,720 --> 00:45:20,140
wouldn't work. Obviously Core i7 should
work better than Atom: no, not always.
383
00:45:20,140 --> 00:45:26,369
The thing is, we were plugging into a
laptop, which had a built-in video card
384
00:45:26,369 --> 00:45:44,750
which was sitting on the same PCI bus and
probably manufacturer hard-coded the higher
385
00:45:44,750 --> 00:45:50,590
priority for the video card than for
everything else in the system, because I
386
00:45:50,590 --> 00:45:56,300
don't want your your screen to flicker.
And so when you move a window you actually
387
00:45:56,300 --> 00:46:04,099
see the late packets coming to your PCI
device. We had to introduce a jitter
388
00:46:04,099 --> 00:46:14,750
buffer and add more FIFO into the device
to smooth it out. On the other hand the
389
00:46:14,750 --> 00:46:20,099
Xeon is performing really well. So it's
very optimized. That said, we have tested
390
00:46:20,099 --> 00:46:28,119
it with discreet card and it outperforms
everything by whooping five seven percent.
391
00:46:28,119 --> 00:46:38,799
What you get four for the price. So this
is actually the end of the presentation.
392
00:46:38,799 --> 00:46:43,839
We still have not scheduled any workshop,
but if there if there is any interest in
393
00:46:43,839 --> 00:46:53,390
actually seeing the device working or if
you interested in learning more about the
394
00:46:53,390 --> 00:46:58,260
PCI Express in details let us know we'll
schedule something in the next few days.
395
00:46:58,260 --> 00:47:05,339
That's the end, I think we can proceed
with questions if there are any.
396
00:47:05,339 --> 00:47:14,950
Applause
Herald: Okay, thank you very much. If you
397
00:47:14,950 --> 00:47:17,680
are leaving now: please try to leave
quietly because we might have some
398
00:47:17,680 --> 00:47:22,960
questions and you want to hear them. If
you have questions please line up right
399
00:47:22,960 --> 00:47:28,819
behind the microphones and I think we'll
just wait because we don't have anything
400
00:47:28,819 --> 00:47:34,990
from the signal angel. However, if you are
watching on stream you can hop into the
401
00:47:34,990 --> 00:47:39,500
channels and over social media to ask
questions and they will be answered,
402
00:47:39,500 --> 00:47:47,890
hopefully. So on that microphone.
Question 1: What's the minimum and maximum
403
00:47:47,890 --> 00:47:52,170
frequency of the card?
Alexander Chemeris: You mean RF
404
00:47:52,170 --> 00:47:55,940
frequency?
Question 1: No, the minimum frequency you
405
00:47:55,940 --> 00:48:05,640
can sample at. the most SDR devices can
only sample at over 50 MHz. Is there a
406
00:48:05,640 --> 00:48:09,190
similar limitation at your card?
Alexander Chemeris: Yeah, so if you're
407
00:48:09,190 --> 00:48:15,650
talking about RF frequency it can go
from like almost zero even though that
408
00:48:15,650 --> 00:48:27,289
works worse below 50MHz and all the way to
3.8GHz if I remember correctly. And in
409
00:48:27,289 --> 00:48:34,880
terms of the sample rate right now it
works from like about 2 MSPS and to about
410
00:48:34,880 --> 00:48:40,089
50 right now. But again, we're planning to
get it to these numbers we quoted.
411
00:48:40,089 --> 00:48:45,720
Herald: Okay. The microphone over there.
Question 2: Thanks for your talk. Did you
412
00:48:45,720 --> 00:48:48,630
manage to put your Linux kernel driver to
the main line?
413
00:48:48,630 --> 00:48:53,519
Alexander Chemeris: No, not yet. I mean,
it's not even like fully published. So I
414
00:48:53,519 --> 00:48:59,019
did not say in the beginning, sorry for
this. We only just manufactured the first
415
00:48:59,019 --> 00:49:03,830
prototype, which we debugged heavily. So
we are only planning to manufacture the
416
00:49:03,830 --> 00:49:10,290
second prototype with all these fixes and
then we will release, like, the kernel
417
00:49:10,290 --> 00:49:16,700
driver and everything. And maybe we'll try
or maybe won't try, haven't decided yet.
418
00:49:16,700 --> 00:49:18,310
Question 2: Thanks
Herald: Okay...
419
00:49:18,310 --> 00:49:21,599
Alexander Chemeris: and that will be the
whole other experience.
420
00:49:21,599 --> 00:49:26,099
Herald: Okay, over there.
Question 3: Hey, looks like you went
421
00:49:26,099 --> 00:49:30,349
through some incredible amounts of pain to
make this work. So, I was wondering,
422
00:49:30,349 --> 00:49:34,960
aren't there any simulators at least for
parts of the system, or the PCIe bus for
423
00:49:34,960 --> 00:49:40,150
the DMA something? Any simulator so that
you can actually first design the system
424
00:49:40,150 --> 00:49:44,630
there and debug it more easily?
Sergey Kostanbaev: Yes, there are
425
00:49:44,630 --> 00:49:50,400
available simulators, but the problem's
all there are non-free. So you have to pay
426
00:49:50,400 --> 00:49:57,109
for them. So yeah and we choose the hard
way.
427
00:49:57,109 --> 00:49:59,520
Question 3: Okay thanks.
Herald: We have a question from the signal
428
00:49:59,520 --> 00:50:03,180
angel.
Question 4: Yeah are the FPGA codes, Linux
429
00:50:03,180 --> 00:50:07,650
driver, and library code, and the design
project files public and if so, did they
430
00:50:07,650 --> 00:50:13,480
post them yet? They can't find them on
xtrx.io.
431
00:50:13,480 --> 00:50:17,970
Alexander Chemeris: Yeah, so they're not
published yet. As I said, we haven't
432
00:50:17,970 --> 00:50:24,579
released them. So, the drivers and
libraries will definitely be available,
433
00:50:24,579 --> 00:50:28,589
FPGA code... We are considering this
probably also will be available in open
434
00:50:28,589 --> 00:50:36,359
source. But we will publish them together
with the public announcement of the
435
00:50:36,359 --> 00:50:42,220
device.
Herald: Ok, that microphone.
436
00:50:42,220 --> 00:50:46,010
Question 5: Yes. Did you guys see any
signal integrity issues between on the PCI
437
00:50:46,010 --> 00:50:50,009
bus, or on this bus to the LMS chip, the
Lime microchip, I think, this doing
438
00:50:50,009 --> 00:50:51,009
the RF ?
AC: Right.
439
00:50:51,009 --> 00:50:56,359
Question 5: Did you try to measure signal
integrity issues, or... because there were
440
00:50:56,359 --> 00:51:01,130
some reliability issues, right?
AC: Yeah, we actually... so, PCI. With PCI
441
00:51:01,130 --> 00:51:02,559
we never had issues, if I remember
correctly.
442
00:51:02,559 --> 00:51:04,760
SK: No.
AC: I just... it was just working.
443
00:51:04,760 --> 00:51:10,940
SK: Well, the board is so small, and when
there are small traces there's no problem
444
00:51:10,940 --> 00:51:14,790
in signal integrity. So it's actually
saved us.
445
00:51:14,790 --> 00:51:20,599
AC: Yeah. Designing a small board is easier.
Yeah, with the LMS 7, the problem is not
446
00:51:20,599 --> 00:51:26,099
the signal integrity in terms of
difference in the length of the traces,
447
00:51:26,099 --> 00:51:37,319
but rather the fact that the signal
degrades over voltage, also over speed in
448
00:51:37,319 --> 00:51:44,010
terms of voltage, and drops below the
detection level, and all this stuff. We
449
00:51:44,010 --> 00:51:47,220
use some measurements. I actually wanted
to add some pictures here, but decided
450
00:51:47,220 --> 00:51:54,359
that's not going to be super interesting.
H: Okay. Microphone over there.
451
00:51:54,359 --> 00:51:58,359
Question 6: Yes. Thanks for the talk. How
much work would it be to convert the two
452
00:51:58,359 --> 00:52:05,610
by two SDR into an 8-input logic analyzer
in terms of hard- and software? So, if you
453
00:52:05,610 --> 00:52:12,289
have a really fast logic analyzer, where
you can record unlimited traces with?
454
00:52:12,289 --> 00:52:18,980
AC: A logic analyzer...
Q6: So basically it's just also an analog
455
00:52:18,980 --> 00:52:27,040
digital converter and you largely want
fast sampling and a large amount of memory
456
00:52:27,040 --> 00:52:30,900
to store the traces.
AC: Well, I just think it's not the best
457
00:52:30,900 --> 00:52:40,300
use for it. It's probably... I don't know.
Maybe Sergey has any ideas, but I think it
458
00:52:40,300 --> 00:52:47,549
just may be easier to get high-speed ADC
and replace the Lime chip with a high-
459
00:52:47,549 --> 00:52:56,720
speed ADC to get what you want, because
the Lime chip has so many things there
460
00:52:56,720 --> 00:53:01,450
specifically for RF.
SK: Yeah, the main problem you cannot just
461
00:53:01,450 --> 00:53:09,099
sample original data. You should shift it
over frequency, so you cannot sample
462
00:53:09,099 --> 00:53:16,619
original signal, and using it for
something else except spectrum analyzing
463
00:53:16,619 --> 00:53:20,839
is hard.
Q6: OK. Thanks.
464
00:53:20,839 --> 00:53:25,750
H: OK. Another question from the internet.
Signal angel: Yes. Have you compared the
465
00:53:25,750 --> 00:53:32,240
sample rate of the ADC of the Lime DA chip
to the USRP ADCs, and if so, how does the
466
00:53:32,240 --> 00:53:40,160
lower sample rate affect the performance?
AC: So, comparing low sample rate to
467
00:53:40,160 --> 00:53:49,281
higher sample rate. We haven't done much
testing on the RF performance yet, because
468
00:53:49,281 --> 00:53:58,440
we were so busy with all this stuff, so we
are yet to see in terms of low bit rates
469
00:53:58,440 --> 00:54:03,190
versus sample rates versus high sample
rate. Well, high sample rate always gives
470
00:54:03,190 --> 00:54:09,859
you better performance, but you also get
higher power consumption. So, I guess it's
471
00:54:09,859 --> 00:54:14,019
the question of what's more more important
for you.
472
00:54:14,019 --> 00:54:20,440
H: Okay. Over there.
Question 7: I've gathered there is no
473
00:54:20,440 --> 00:54:25,319
mixer bypass, so you can't directly sample
the signal. Is there a way to use the same
474
00:54:25,319 --> 00:54:31,720
antenna for send and receive, yet.
AC: Actually, there is... Input for ADC.
475
00:54:31,720 --> 00:54:38,289
SK: But it's not a bypass, it's a
dedicated pin on LMS chip, and since we're
476
00:54:38,289 --> 00:54:45,569
very space-constrained, we didn't route
them, so you can not actually bypass it.
477
00:54:45,569 --> 00:54:50,359
AC: Okay, in our specific hardware, so in
general, so in the LMS chip there is a
478
00:54:50,359 --> 00:54:58,170
special pin which allows you to drive your
signal directly to ADC without all the
479
00:54:58,170 --> 00:55:02,950
mixers, filters, all this radio stuff,
just directly to ADC. So, yes,
480
00:55:02,950 --> 00:55:06,869
theoretically that's possible.
SK: We even thought about this, but it
481
00:55:06,869 --> 00:55:10,960
doesn't fit this design.
Q7: Okay. And can I share antennas,
482
00:55:10,960 --> 00:55:15,700
because I have an existing laptop with
existing antennas, but I would use the
483
00:55:15,700 --> 00:55:22,140
same antenna to send and receive.
AC: Yeah, so, I mean, that's... depends on
484
00:55:22,140 --> 00:55:25,619
what exactly do you want to do. If you
want a TDG system, then yes, if you
485
00:55:25,619 --> 00:55:30,869
want an FDG system, then you will have to
put a small duplexer in there, but yeah,
486
00:55:30,869 --> 00:55:34,839
that's the idea. So you can plug this into
your laptop and use your existing
487
00:55:34,839 --> 00:55:39,640
antennas. That's one of the ideas of how
to use xtrx.
488
00:55:39,640 --> 00:55:41,799
Q7: Yeah, because there's all four
connectors.
489
00:55:41,799 --> 00:55:45,400
AC: Yeah. One thing which I actually
forgot to mention is - I kind of mentioned
490
00:55:45,400 --> 00:55:53,930
in the slides - is that any other SDRs
which are based on Ethernet or on the USB
491
00:55:53,930 --> 00:56:02,309
can't work with a CSMA wireless systems,
and the most famous CSMA system is Wi-Fi.
492
00:56:02,309 --> 00:56:09,259
So, it turns out that because of the
latency between your operating system and
493
00:56:09,259 --> 00:56:17,569
your radio on USB, you just can't react
fast enough for Wi-Fi to work, because you
494
00:56:17,569 --> 00:56:23,240
- probably you know that - in Wi-Fi you
carrier sense, and if you sense that the
495
00:56:23,240 --> 00:56:29,579
spectrum is free, you start transmitting.
Does make a sense when you have huge
496
00:56:29,579 --> 00:56:36,160
latency, because you all know that... you
know the spectrum was free back then, so,
497
00:56:36,160 --> 00:56:43,730
with xtrx, you actually can work with CSMA
systems like Wi-Fi, so again it makes it
498
00:56:43,730 --> 00:56:51,390
possible to have a fully software
implementation of Wi-Fi in your laptop. It
499
00:56:51,390 --> 00:56:58,660
obviously won't work like as good as your
commercial Wi-Fi, because you will have to
500
00:56:58,660 --> 00:57:03,839
do a lot of processing on your CPU, but
for some purposes like experimentation,
501
00:57:03,839 --> 00:57:07,980
for example, for wireless labs and R&D
labs, that's really valuable.
502
00:57:07,980 --> 00:57:11,400
Q7: Thanks.
H: Okay. Over there.
503
00:57:11,400 --> 00:57:15,519
Q8: Okay. what PCB design package did you
use?.
504
00:57:15,519 --> 00:57:17,819
AC: Altium.
SK: Altium, yeah.
505
00:57:17,819 --> 00:57:22,940
Q8: And I'd be interested in the PCIe
workshop. Would be really great if you do
506
00:57:22,940 --> 00:57:24,940
this one.
AC: Say this again?
507
00:57:24,940 --> 00:57:28,069
Q8: Would be really great if you do the
PCI Express workshop.
508
00:57:28,069 --> 00:57:32,720
AC: Ah. PCI Express workshop. Okay. Thank
you.
509
00:57:32,720 --> 00:57:36,690
H: Okay, I think we have one more question
from the microphones, and that's you.
510
00:57:36,690 --> 00:57:42,880
Q9: Okay. Great talk. And again, I would
appreciate a PCI Express workshop, if it
511
00:57:42,880 --> 00:57:47,190
ever happens. What are these
synchronization options between multiple
512
00:57:47,190 --> 00:57:55,089
cards. Can you synchronize the ADC clock,
and can you synchronize the presumably
513
00:57:55,089 --> 00:58:04,609
digitally created IF? SK: Yes, so... so,
unfortunately, just IF synchronization is
514
00:58:04,609 --> 00:58:10,279
not possible, because Lime chip doesn't
expose a low frequency. But we can
515
00:58:10,279 --> 00:58:16,000
synchronize digitally. So, we have special
one PPS signal synchronization. We have
516
00:58:16,000 --> 00:58:25,180
lines for clock synchronization and other
stuff. We can do it in software. So the
517
00:58:25,180 --> 00:58:31,789
Lime chip has phase correction register,
so when you measure... if there is a phase
518
00:58:31,789 --> 00:58:35,170
difference, so you can compensate it on
different boards.
519
00:58:35,170 --> 00:58:39,309
Q9: Tune to a station a long way away and
then rotate the phase until it aligns.
520
00:58:39,309 --> 00:58:41,819
SK: Yeah.
Q9: Thank you.
521
00:58:41,819 --> 00:58:46,339
AC: Little tricky, but possible. So,
that's one of our plans for future,
522
00:58:46,339 --> 00:58:52,819
because we do want to see, like 128 by 128
MIMO at home.
523
00:58:52,819 --> 00:58:56,060
H: Okay, we have another question from the
internet.
524
00:58:56,060 --> 00:59:00,450
Signal angel: I actually have two
questions. The first one is: What is the
525
00:59:00,450 --> 00:59:07,710
expected price after a prototype stage?
And the second one is: Can you tell us
526
00:59:07,710 --> 00:59:10,400
more about this setup you had for
debugging the PCIe
527
00:59:10,400 --> 00:59:15,970
issues?
AC: Could you repeat the second question?
528
00:59:15,970 --> 00:59:20,269
SK: It's ????????????, I think.
SA: It's more about the setup you had for
529
00:59:20,269 --> 00:59:24,480
debugging the PCIe issues.
SK: Second question, I think it's most
530
00:59:24,480 --> 00:59:31,200
about our next workshop, because it's a
more complicated setup, so... mostly
531
00:59:31,200 --> 00:59:35,580
remove everything about its now current
presentation.
532
00:59:35,580 --> 00:59:39,580
AC: Yeah, but in general, and in terms of
hardware setup, that was our hardware
533
00:59:39,580 --> 00:59:47,890
setup, so we bought this PCI Express to
Thunderbolt3, we bought the laptop which
534
00:59:47,890 --> 00:59:53,089
supports Thunderbolt3, and that's how we
were debugging it. So, we don't need, like
535
00:59:53,089 --> 00:59:57,780
a full-fledged PC, we don't have to
restart it all the time. So, in terms of
536
00:59:57,780 --> 01:00:06,650
price, we don't have the fixed price yet.
So, all I can say right now is that we are
537
01:00:06,650 --> 01:00:18,349
targeting no more than your bladeRF or
HackRF devices, and probably even cheaper.
538
01:00:18,349 --> 01:00:25,210
For some versions.
H: Okay. We are out of time, so thank you
539
01:00:25,210 --> 01:00:45,079
again Sergey and Alexander.
[Applause]
540
01:00:45,079 --> 01:00:49,619
[Music]
541
01:00:49,619 --> 01:00:54,950
subtitles created by c3subtitles.de
in the year 20??. Join, and help us!