-
[Music]
-
Herald: Has anyone in here ever worked
with libusb or PI USB? Hands up. Okay. Who
-
also thinks USB is a pain? laughs Okay.
Sergey and Alexander were here back in at
-
the 26C3, that's a long time ago. I think
it was back in Berlin, and back then they
-
presented their first homemade, or not
homemade, SDR, software-defined radio.
-
This year they are back again and they
want to show us how they implemented
-
another one, using an FPGA, and to
communicate with it they used PCI Express.
-
So I think if you thought USB was a pain,
let's see what they can tell us about PCI
-
Express. A warm round of applause for
Alexander and Sergey for building a high
-
throughput, low latency, PCIe-based
software-defined radio
-
[Applause]
Alexander Chemeris: Hi everyone, good
-
morning, and welcome to the first day of
the Congress. So, just a little bit
-
background about what we've done
previously and why we are doing what we
-
are doing right now, is that we started
working with software-defined radios and
-
by the way, who knows what software
defined radio is? Okay, perfect. laughs
-
And who ever actually used a software-
defined radio? RTL-SDR or...? Okay, less
-
people but that's still quite a lot. Okay,
good. I wonder whether anyone here used
-
more expensive radios like USRPs? Less
people, but okay, good. Cool. So before
-
2008 I've had no idea what software-
defined radio is, was working with voice
-
over IP software person, etc., etc., so I
in 2008 I heard about OpenBTS, got
-
introduced to software-defined radio and I
wanted to make it really work and that's
-
what led us to today. In 2009 we had to
develop a clock tamer. A hardware which
-
allows to use, allowed to use USRP1 to run GSM
without problems. If anyone ever tried
-
doing this without a good clock source
knows what I'm talking about. And we
-
presented this - it wasn't an SDR it was
just a clock source - we presented this in
-
2009 in 26C3.
Then I realized that using USRP1 is not
-
really a good idea, because we wanted to
build a robust, industrial-grade base
-
stations. So we started developing our own
software defined radio, which we call
-
UmTRX and it was in - we started started
this in 2011. Our first base stations with
-
it were deployed in 2013, but I always
wanted to have something really small and
-
really inexpensive and back then it wasn't
possible. My original idea in 2011, we
-
were to build a PCI Express card. Mini,
sorry, not PCI Express card but mini PCI
-
card.
If you remember there were like all the
-
Wi-Fi cards and mini PCI form factor and I
thought that would be really cool to have
-
an SDR and mini PCI, so I can plug this
into my laptop or in some embedded PC and
-
have a nice SDR equipment, but back then
it just was not really possible, because
-
electronics were bigger and more power
hungry and just didn't work that way, so
-
we designed UmTRX to work over gigabit
ethernet and it was about that size. So
-
now we spend this year at designing
something, which really brings me to what
-
I wanted those years ago, so the XTRX is a
mini PCI Express - again there was no PCI
-
Express back then, so now it's mini PCI
Express, which is even smaller than PCI, I
-
mean mini PCI and it's built to be
embedded friendly, so you can plug this
-
into a single board computer, embedded
single board computer. If you have a
-
laptop with a mini PCI Express you can
plug this into your laptop and you have a
-
really small, software-defined radio
equipment. And we really want to make it
-
inexpensive, that's why I was asking how
many of you have ever worked it with RTL-
-
SDR, how many of you ever worked with you
USRPs, because the gap between them is
-
pretty big and we want to really bring the
software-defined radio to masses.
-
Definitely won't be as cheap as RTL-SDR,
but we try to make it as close as
-
possible.
And at the same time, so at the size of
-
RTL-SDR, at the price well higher but,
hopeful hopefully it will be affordable to
-
pretty much everyone, we really want to
bring high performance into your hands.
-
And by high performance I mean this is a
full transmit/receive with two channels
-
transmit, two channels receive, which is
usually called 2x2 MIMO in in the radio
-
world. The goal was to bring it to 160
megasamples per second, which can roughly
-
give you like 120 MHz of radio spectrum
available.
-
So what we were able to achieve is, again
this is mini PCI Express form factor, it
-
has small Artix7, that's the smallest and
most inexpensive FPGA, which has ability
-
to work with a PCI Express. It has LMS7000
chip for RFIC, very high performance, very
-
tightly embedded chip with even a DSP
blocks inside. It has even a GPS chip
-
here, you can actually on the right upper
side, you can see a GPS chip, so you can
-
accually synchronize your SDR to GPS for
perfect clock stability,
-
so you won't have any problems running any
telecommunication systems like GSM, 3G, 4G
-
due to clock problems, and it also has
interface for SIM cards, so you can
-
actually create a software-defined radio
modem and run other open source projects
-
to build one in a four LT called SRSUI, if
you're interested, etc., etc. so really
-
really tightly packed one. And if you put
this into perspective: that's how it all
-
started in 2006 and that's what you have
ten years later. It's pretty impressive.
-
applause
Thanks. But I think it actually applies to
-
the whole industry who is working on
shrinking the sizes because we just put
-
stuff on the PCB, you know. We're not
building the silicon itself. Interesting
-
thing is that we did the first approach:
we said let's pack everything, let's do a
-
very tight PCB design. We did an eight
layer PCB design and when we send it to a
-
fab to estimate the cost it turned out
it's $15,000 US per piece. Well in small
-
volumes obviously but still a little bit
too much. So we had to redesign this and
-
the first thing which we did is we still
kept eight layers, because in our
-
experience number of layers nowadays have
only minimal impact on the cost of the
-
device. So like six, eight layers - the
price difference is not so big. But we did
-
complete rerouting and only kept 2-Deep
MicroVIAs and never use the buried VIAs.
-
So this make it much easier and much
faster for the fab to manufacture it and
-
the price suddenly went five, six times
down and in volume again it will be
-
significantly cheaper. And that's just for
geek porn how PCB looks inside. So now
-
let's go into real stuff. So PCI Express:
why did we choose PCI Express? As it was
-
said USB is a pain in the ass. You can't
really use USB in industrial systems. For
-
a whole variety of reasons just unstable.
So we did use Ethernet for many years
-
successfully but Ethernet has one problem:
first of all inexpensive Ethernet is only
-
one gigabit and one gigabit does not offer
you enough bandwidth to carry all the data
-
we want, plus its power-hungry etc. etc.
So PCI Express is really a good choice
-
because it's low power, it has low
latency, it has very high bandwidth and
-
it's available almost universally. When we
started looking into this we realize that
-
even ARM boards, some of ARM boards have
PCI Express, mini PCI Express slots, which
-
was a big surprise for me for example.
So the problems is that unlike USB you do
-
need to write your own kernel driver for
this and there's no way around. And it is
-
really hard to write this driver
universally so we are writing it obviously
-
for Linux because they're working with
embedded systems, but if we want to
-
rewrite it for Windows or for macOS we'll
have to do a lot of rewriting. So we focus
-
on what we want on Linux only right now.
And now the hardest part: debugging is
-
really non-trivial. One small error and
your PC is completely hanged because you
-
use something wrong. And you have to
reboot it and restart it. That's like
-
debugging kernel but sometimes even
harder. To make it worse there is no
-
really easy-to-use plug-and-play
interface. If you want to restart;
-
normally, when you when you develop a PCI
Express card, when you want when you want
-
to restart it you have to restart your
development machine. Again not a nice way,
-
it's really hard. So the first thing we
did is we found, that we can use
-
Thunderbolt 3 which is just recently
released, and it has ability to work
-
directly with PCI Express bus. So it
basically has a mode in which it converts
-
a PCI Express into plug-and-play
interface. So if you have a laptop which
-
supports Thunderbolt 3 then you can use
this to do plug and play your - plug or
-
unplug your device to make your
development easier. There are always
-
problems: there's no easy way, there's no
documentation. Thunderbolt is not
-
compatible with Thunderbolt. Thunderbold 3
is not compatible with Thunderbold 2.
-
So we had to buy a special laptop with
Thunderbold 3 with special cables like all
-
this all this hard stuff. And if you
really want to get documentation you have
-
to sign NDA and send a business plan to
them so they can approve that your
-
business makes sense.
laughter
-
I mean... laughs So we actually opted
out. We set not to go through this, what
-
we did is we found that someone is
actually making PCI Express to Thunderbolt
-
3 converters and selling them as dev
boards and that was a big relief because
-
it saved us lots of time, lots of money.
You just order it from from some from some
-
Asian company. And yeah this is how it
looks like this converter. So you buy it,
-
like several pieces you can plug in your
PCI Express card there and you plug this
-
into your laptop. And this is the with
XTRX already plugged into it. Now the only
-
problem we found is that typically UEFI
has a security control enabled, so that
-
any random thunderbold device can't hijack
your PCI bus and can't get access to your
-
kernel memory and do some bad stuff. Which
is a good idea - the only problem is that
-
there is, it's not fully implemented in
Linux. So under Windows if you plug in a
-
device which is which has no security
features, which is not certified, it will
-
politely ask you like: "Do you really
trust this device? Do you want to use it?"
-
you can say "yes". Under Linux it just
does not work. laughs So we spend some
-
time trying to figure out how to get
around this. Right, some patches from
-
Intel which are not mainline and we were
not able to actually get them work. So we
-
just had to disable all this security
measure in the laptop. So be aware that
-
this is the case and we suspect that happy
users of Apple might not be able to do
-
this because Apple don't have BIOS so it
probably can't disable this feature. So
-
probably good incentive for someone to
actually finish writing the driver.
-
So now to the goal: so we wanted to, we
want to achieve 160 mega samples per
-
second, 2x2 MIMO, which means two
transceiver, two transmit, two receive
-
channels at 12 bits, which is roughly 7.5
Gbit/s. So first result when we plug this
-
when we got this board on the fab it
didn't work
-
Sergey Kostanbaev mumbles: as expected
Alexander Chemeris: yes as expected so the
-
first the interesting thing we realized is
that: first of all the FPGA has Hardware
-
blocks for talking to a PCI Express which
was called GTP which basically implement
-
like a PCI Express serial physical layer
but the thing is the numbering is reversed
-
in the in PCI Express in FPGA and we did
not realize this so we had to do very very
-
fine soldiering to actually swap the
laughs swap the lanes you can see this
-
very fine work there.
We also found that one of the components
-
was deadbug which is a well-known term for
chips which design stage are placed at
-
mirrored so we mirrored occasionally
mirrored that they pin out so we had to
-
solder it upside down and if you can
realize how small it is you can also
-
appreciate the work done. And what's funny
when I was looking at dead bugs I actually
-
found a manual from NASA which describes
how to properly soldier dead bugs to get
-
it approved.
audience laughs
-
So this is the link I think you can go
there and enjoy it's also fun stuff there.
-
So after fixing all of this our next
attempt this kind of works. So next stage
-
is debugging the FPGA code, which has to
talk to PCI Express and PCI Express has to
-
talk to Linux kernel and the kernel has to
talk to the driver, driver has talked to
-
the user space. So peripherals are easy so
the UART SPIs we've got to work almost
-
immediately no problems with that, but DMA
was a real beast. So we spent a lot of
-
time trying to get DMA to work and the
problem is that with DMA it's on FPGA so
-
you can't just place a breakpoint like you
do in C or C++ or in other languages it's
-
real-time system running on system like
it's real-time hardware, which is running
-
on the fabric so you we had to Sergey was
mainly developing this had to write a lot
-
of small test benches and and test
everything piece by piece.
-
So all parts of the DMA code we had was
wrapped into a small test bench which was
-
emulating all the all the tricks and as
classics predicted it took about five to
-
ten times more than actually writing the
code. So we really blew up our and
-
predicted timelines by doing this, but the
end we've got really stable stable work.
-
So some suggestions for anyone who will
try to repeat this exercise is there is a
-
logic analyzer built-in to Xilinx and you
can use, it it's nice it's, sometimes it's
-
very helpful but you can't debug
transient box, which are coming out at
-
when some weird conditions are coming up.
So you have to implement some read back
-
registers which shows important statistic
like important data about how your system
-
behaves, in our case it's various counters
on the DMA interface. So you can actually
-
see kind of see what's happening with your
with your data: Is it received? Is it
-
sent? How much is and how much is
received? So like for example, we can see
-
when we saturate the bus or when actually
is an underrun so host is not providing
-
data fast enough, so we can at least
understand whether it's a host problem or
-
whether it's an FPGA, problem on which
part we do we debug next because again:
-
it's a very multi layer problem you start
with FPGA, PCI Express, kernel, driver,
-
user space, and any part can fail. so you
can't work blind like this. So again the
-
goal was to get 160 MSPS with the first
implementation we could 2 MSPS: roughly 60
-
times slower.
The problem is that software just wasn't
-
keeping up and wasn't sending data fast
enough. So it was like many things done
-
but the most important parts is: use real-
time priority if you want to get very
-
stable results and well fix software bugs.
And one of the most important bugs we had
-
was that DMA buffers were not freed in
proper time immediately so they were busy
-
for longer than they should be, which
introduced extra cycles and basically just
-
reduced the bandwidth.
At this point let's talk a little bit
-
about how to implement a high-performance
driver for Linux, because if you want to
-
get real real performance you have to
start with the right design. There are
-
basically three approaches and the whole
spectrum in between; like two approaches
-
and the whole spectrum in between, which
is where you can refer to three. The first
-
approach is full kernel control, in which
case kernel driver not only is on the
-
transfer, it actually has all the logics
of controlling your device and all the
-
export ioctl to the user space and
that's the kind of a traditional way of
-
writing drivers. Your your user space is
completely abstracted from all the
-
details. The problem is that this is
probably the slowest way to do it. The
-
other way is what's called the "zero cup
interface": your only control is held in
-
the kernel and data is provided, the raw
data is provided to user space "as-is". So
-
you avoid memory copy which make it
faster. But still not fast enough if you
-
really want to achieve maximum
performance, because you still have
-
context switches between the kernel and
the user space. The most... the fastest
-
approach possible is to have full user
space implementation when kernel just
-
exposed everything and says "now you do it
yourself" and you have no you have no
-
context switches, like almost no, and you
can really optimize everything. So what
-
is... what are the problems with this?
The pro the pros I already mentioned: no
-
no switches between kernel user space,
it's very low latency because of this as
-
well, it's very high bandwidth. But if you
are not interested in getting the very
-
high performance, the most performance, and
you just want to have like some little,
-
like say low bandwidth performance, then
you will have to add hacks, because you
-
can't get notifications of the kernel that
resources available is more data
-
available. It also makes it vulnerable
vulnerable because if user space can
-
access it, then it can do whatever it
want. We at the end decided that... one
-
more important thing: how to actually to
get the best performance out of out of the
-
bus. This is a very (?)(?) set as we want
to poll your device or not to poll and get
-
notified. What is polling? I guess
everyone as programmer understands it, so
-
polling is when you asked repeatedly: "Are
you ready?", "Are you ready?", "Are you
-
ready?" and when it's ready you get the
data immediately.
-
It's basically a busy loop of your you
just constantly asking device what's
-
happening. You need to dedicate a full
core, and thanks God we have multi-core
-
CPUs nowadays, so you can dedicate the
full core to this polling and you can just
-
pull constantly. But again if you don't
need this highest performance, you just
-
need to get something, then you will be
wasting a lot of CPU resources. At the end
-
we decided to do a combined architecture
of your, it is possible to pull but
-
there's also a chance and to get
notification from a kernel to for for
-
applications, which recover, which needs
low bandwidth, but also require a better
-
CPU performance. Which I think is the best
way if you are trying to target both
-
worlds. Very quickly: the architecture of
system. We try to make it very very
-
portable so and flexible. There is a
kernel driver, which talks to low-level
-
library which implements all this logic,
which we took out of the driver: to
-
control the
PCI Express, to work with DMA, to provide
-
all the... to hide all the details of the
actual bus implementation.
-
And then there is a high-level library
which talks to this low-level library and
-
also to libraries which implement control
of actual peripherals, and most
-
importantly to the library which
implements control over our RFIC chip.
-
This way it's very modular, we can replace
PCI Express with something else later, we
-
might be able to port it to other
operating systems, and that's the goal.
-
Another interesting issue is: when you
start writing the Linux kernel driver you
-
very quickly realize that while LDD, which
is a classic book for a Linux driver,
-
writing is good and it will give you a
good insight; it's not actually up-to-
-
date. It's more than ten years old and
there's all of new interfaces which are
-
not described there, so you have to resort
to reading the manuals and all the
-
documentation in the kernel itself. Well
at least you get the up-to-date
-
information. The decisions we made is to
make everything easy. We use TTY for GPS
-
and so you can really attach a pretty much
any application which talks to GPS. So all
-
of existing applications can just work out
of the box. And we also wanted to be able
-
to synchronize system clock to GPS, so we
get automatic log synchronization across
-
multiple systems, which is very important
when we are deploying many, many devices
-
around the world.
We plan to do two interfaces, one as key
-
PPS and another is a DCT, because DCT line
on the UART exposed over TTY. Because
-
again we found that there are two types of
applications: one to support one API,
-
others that support other API and there is
no common thing so we have to support
-
both. As we described, we want to have
polls so we can get notifications of the
-
kernel when data is available and we don't
need to do real busy looping all the time.
-
After all the software optimizations we've
got to like 10 MSPS: still very, very far
-
from what we want to achieve.
Now there should have been a lot of
-
explanations about PCI Express, but when
we actually wrote everything we wanted to
-
say we realize, it's just like a full two
hours talk just on PCI Express. So we are
-
not going to give it here, I'll just give
some highlights which are most
-
interesting. If you if there is real
interest, we can set up a workshop and
-
some of the later days and talking more
details about PCI Express specifically.
-
The thing is there is no open source cores
for PCI Express, which are optimized for
-
high performance, real time applications.
There is Xillybus which as I understand is
-
going to be open source, but they provide
you a source if you pay them. It's very
-
popular because it's very very easy to do,
but it's not giving you performance. If I
-
remember correctly the best it can do is
maybe like 50 percent bus saturation.
-
So there's also Xilinx implementation, but
if you are using Xilinx implementation
-
with AXI bus than you're really locked in
with AXI bus with Xilinx. And it also not
-
very efficient in terms of resources and
if you remember we want to make this very,
-
very inexpensive. So our goal is to you
... is to be able to fit everything in the
-
smallest Arctic's 7 FPGA, and that's quite
challenging with all the stuff in there
-
and we just can't waste resources. So
decision is to write your own PCI Express
-
implementation. That's how it looks like.
I'm not going to discuss it right now.
-
There are several iterations. Initially it
looked much simpler, turned out not to
-
work well.
So some interesting stuff about PCI
-
Express which we stumbled upon is that it
was working really well on Atom which is
-
our main development platform because we
are doing a lot of embedded stuff. Worked
-
really well. When we try to plug this into
core i7 just started hanging once in a
-
while. So after like several not days
maybe with debugging, Sergey found that
-
very interesting statement in the standard
which says that value is zero in byte
-
count actually stands not for zero bytes
but for 4096 bytes.
-
I mean that's a really cool optimization.
So another thing is completion which is a
-
term in PCI Express basically for
acknowledgment which also can carry some
-
data back to your request. And sometimes
if you're not sending completion, device
-
just hangs. And what happens is that in
this case due to some historical heritage
-
of x86 it just starts returning you FFF.
And if you have a register which says: „Is
-
your device okay?“ and this register shows
one to say „The device is okay“, guess
-
what will happen?
You will be always reading that your
-
device is okay. So the suggestion is not
to use one as the status for okay and use
-
either zero or better like a two-beat
sequence. So you are definitely sure that
-
you are okay and not getting FFF's. So
when you have a device which again may
-
fail at any of the layers, you just got
this new board, it's really hard, it's
-
really hard to debug because of memory
corruption. So we had a software bug and
-
it was writing DMA addresses
incorrectly and we were wondering why we
-
are not getting any data in our buffers at
the same time. After several starts,
-
operating system just crashes. Well, that's
the reason why there is this UEFI
-
protection which prevents you from
plugging in devices like this into your
-
computer. Because it was basically writing
data, like random data into random
-
portions of your memory. So a lot of
debugging, a lot of tests and test benches
-
and we were able to find this. And another
thing is if you deinitialize your driver
-
incorrectly, and that's what's happening
when you have plug-and-play device, which
-
you can plug and unplug, then you may end
up in a situation of your ... you are
-
trying to write into memory which is
already freed by approaching system and
-
used for something else. Very well-known
problem but it also happens here. So there
-
... why DMA is really hard is because it
has this completion architecture for
-
writing for ... sorry ... for reading
data. Writes are easy. You just send the
-
data, you forget about it. It's a fire-
and-forget system. But for reading you
-
really need to get your data back. And the
thing is, it looks like this. You really
-
hope that there would be some pointing
device here. But basically on the top left
-
you can see requests for read and on the
right you can see completion transactions.
-
So basically each transaction can be and
most likely will be split into multiple
-
transactions. So first of all you have to
collect all these pieces and like write
-
them into proper parts of the memory.
But that's not all. The thing is the
-
latency between request and completion is
really high. It's like 50 cycles. So if
-
you have a single, only single transaction
in fly you will get really bad
-
performance. You do need to have multiple
transactions in flight. And the worst
-
thing is that transactions can return data
in random order. So it's a much more
-
complicated state machine than we expected
originally. So when I said, you know, the
-
architecture was much simpler originally,
we don't have all of this and we had to
-
realize this while implementing. So again
here was a whole description of how
-
exactly this works. But not this time. So
now after all these optimizations we've
-
got 20 mega samples per second which is
just six times lower than what we are
-
aiming at. So now the next thing is PCI
Express lanes scalability. So PCI Express
-
is a serial bus. So it has multiple lanes
and they allow you to basically
-
horizontally scale your bandwidth. One
lane is like x, than two lane is 2x, four
-
lane is 4x. So the more lanes you have the
more performance you are getting out of
-
your, out of your bus. So the more
bandwidth you're getting out of your bus.
-
Not performance. So the issue is that
typical a mini PCI Express, so the mini
-
PCI Express standard only standardized one
lane. And second lane is left as optional.
-
So most motherboards don't support this.
There are some but not all of them. And we
-
really wanted to get this done. So we
designed a special converter board which
-
allows you to plug your mini PCI Express
into a full-size PCI Express and
-
get two lanes working. And we're also
planning to have a similar board which
-
will have multiple slots so you will be
able to get multiple XTRX-SDRs on to the
-
same, onto the same carrier board and plug
this into let's say PCI Express 16x and
-
you will get like really a lot of ... SDR
... a lot of IQ data which then will be
-
your problem how to, how to process. So
with two x's it's about twice performance
-
so we are getting fifty mega samples per
second. And that's the time to really cut
-
the fat because the real sample size of
LMS7 is 12 bits and we are transmitting 16
-
because it's easier. Because CPU is
working on 8, 16, 32. So we originally
-
designed the driver to support 8 bit, 12
bit and 16 bit to be able to do this
-
scaling. And for the test we said okay
let's go from 16 to 8 bit. We'll lose
-
some dynamic range but who cares these
days. Still stayed the same, it's still 50
-
mega samples per second, no matter what we
did. And that was a lot of interesting
-
debugging going on. And we realized that
we actually made another, not a really
-
mistake. We didn't, we didn't really know
this when we designed. But we should have
-
used a higher voltage for this high speed
bus to get it to the full performance. And
-
at 1.8 it was just degrading too fast and
the bus itself was not performing well. So
-
our next prototype will be using higher
voltage specifically for this bus. And
-
this is kind of stuff which makes
designing hardware for high speed really
-
hard because you have to care about
coherence of the parallel buses on your,
-
on your system. So at the same time we do
want to keep 1.8 volts for everything else
-
as much as possible. Because another
problem we are facing with this device is
-
that by the standard mini PCI Express
allows only like ...
-
Sergey Kostanbaev: ... 2.5 ...
Alexander Chemeris: ... 2.5 watts of power
-
consumption, no more. And that's we were,
we were very lucky that LMS7 has such so
-
good, so good power consumption
performance. We actually had some extra
-
space to have FPGA and GPS and all this
stuff. But we just can't let the power
-
consumption go up. Our measurements on
this device showed about ...
-
Sergey Kostanbaev: ... 2.3 ...
Alexander Chemeris: ... 2.3 watts of power
-
consumption. So we are like at the limit
at this point. So when we fix the bus with
-
the higher voltage, you know it's a
theoretical exercise, because we haven't
-
done this yet, that's plenty to happen in
a couple months. We should be able to get
-
to this numbers which was just 1.2 times
slower. Then the next thing will be to fix
-
another issue which we made at the very
beginning: we have procured a wrong chip.
-
Just one digit difference, you can see
it's highlighted in red and green, and
-
this chip it supports only a generation 1
PCI Express which is twice slower than
-
generation 2 PCI Express.
So again, hopefully we'll replace the chip
-
and just get very simple doubling of the
performance. Still it will be slower than
-
we wanted it to be and here is what comes
like practical versus theoretical numbers.
-
Well as every bus it has it has overheads
and one of the things which again we
-
realized when we were implementing this
is, that even though the standard
-
standardized is the payload size of 4kB,
actual implementations are different. For
-
example desktop computers like Intel Core
or Intel Atom they only have 128 byte
-
payload. So there is much more overhead
going on the bus to transfer data and even
-
theoretically you can only achieve 87%
efficiency. And on Xeon we tested and we
-
found that they're using 256 payload size
and this can give you like a 92%
-
efficiency on the bus and this is before
the overhead so the real reality is even
-
worse. An interesting thing which we also
did not expect, is that we originally were
-
developing on Intel Atom and everything
was working great. When we plug this into
-
laptop like Core i7 multi-core really
powerful device, we didn't expect that it
-
wouldn't work. Obviously Core i7 should
work better than Atom: no, not always.
-
The thing is, we were plugging into a
laptop, which had a built-in video card
-
which was sitting on the same PCI bus and
probably manufacturer hard-coded the higher
-
priority for the video card than for
everything else in the system, because I
-
don't want your your screen to flicker.
And so when you move a window you actually
-
see the late packets coming to your PCI
device. We had to introduce a jitter
-
buffer and add more FIFO into the device
to smooth it out. On the other hand the
-
Xeon is performing really well. So it's
very optimized. That said, we have tested
-
it with discreet card and it outperforms
everything by whooping five seven percent.
-
What you get four for the price. So this
is actually the end of the presentation.
-
We still have not scheduled any workshop,
but if there if there is any interest in
-
actually seeing the device working or if
you interested in learning more about the
-
PCI Express in details let us know we'll
schedule something in the next few days.
-
That's the end, I think we can proceed
with questions if there are any.
-
Applause
Herald: Okay, thank you very much. If you
-
are leaving now: please try to leave
quietly because we might have some
-
questions and you want to hear them. If
you have questions please line up right
-
behind the microphones and I think we'll
just wait because we don't have anything
-
from the signal angel. However, if you are
watching on stream you can hop into the
-
channels and over social media to ask
questions and they will be answered,
-
hopefully. So on that microphone.
Question 1: What's the minimum and maximum
-
frequency of the card?
Alexander Chemeris: You mean RF
-
frequency?
Question 1: No, the minimum frequency you
-
can sample at. the most SDR devices can
only sample at over 50 MHz. Is there a
-
similar limitation at your card?
Alexander Chemeris: Yeah, so if you're
-
talking about RF frequency it can go
from like almost zero even though that
-
works worse below 50MHz and all the way to
3.8GHz if I remember correctly. And in
-
terms of the sample rate right now it
works from like about 2 MSPS and to about
-
50 right now. But again, we're planning to
get it to these numbers we quoted.
-
Herald: Okay. The microphone over there.
Question 2: Thanks for your talk. Did you
-
manage to put your Linux kernel driver to
the main line?
-
Alexander Chemeris: No, not yet. I mean,
it's not even like fully published. So I
-
did not say in the beginning, sorry for
this. We only just manufactured the first
-
prototype, which we debugged heavily. So
we are only planning to manufacture the
-
second prototype with all these fixes and
then we will release, like, the kernel
-
driver and everything. And maybe we'll try
or maybe won't try, haven't decided yet.
-
Question 2: Thanks
Herald: Okay...
-
Alexander Chemeris: and that will be the
whole other experience.
-
Herald: Okay, over there.
Question 3: Hey, looks like you went
-
through some incredible amounts of pain to
make this work. So, I was wondering,
-
aren't there any simulators at least for
parts of the system, or the PCIe bus for
-
the DMA something? Any simulator so that
you can actually first design the system
-
there and debug it more easily?
Sergey Kostanbaev: Yes, there are
-
available simulators, but the problem's
all there are non-free. So you have to pay
-
for them. So yeah and we choose the hard
way.
-
Question 3: Okay thanks.
Herald: We have a question from the signal
-
angel.
Question 4: Yeah are the FPGA codes, Linux
-
driver, and library code, and the design
project files public and if so, did they
-
post them yet? They can't find them on
xtrx.io.
-
Alexander Chemeris: Yeah, so they're not
published yet. As I said, we haven't
-
released them. So, the drivers and
libraries will definitely be available,
-
FPGA code... We are considering this
probably also will be available in open
-
source. But we will publish them together
with the public announcement of the
-
device.
Herald: Ok, that microphone.
-
Question 5: Yes. Did you guys see any
signal integrity issues between on the PCI
-
bus, or on this bus to the LMS chip, the
Lime microchip, I think, this doing
-
the RF ?
AC: Right.
-
Question 5: Did you try to measure signal
integrity issues, or... because there were
-
some reliability issues, right?
AC: Yeah, we actually... so, PCI. With PCI
-
we never had issues, if I remember
correctly.
-
SK: No.
AC: I just... it was just working.
-
SK: Well, the board is so small, and when
there are small traces there's no problem
-
in signal integrity. So it's actually
saved us.
-
AC: Yeah. Designing a small board is easier.
Yeah, with the LMS 7, the problem is not
-
the signal integrity in terms of
difference in the length of the traces,
-
but rather the fact that the signal
degrades over voltage, also over speed in
-
terms of voltage, and drops below the
detection level, and all this stuff. We
-
use some measurements. I actually wanted
to add some pictures here, but decided
-
that's not going to be super interesting.
H: Okay. Microphone over there.
-
Question 6: Yes. Thanks for the talk. How
much work would it be to convert the two
-
by two SDR into an 8-input logic analyzer
in terms of hard- and software? So, if you
-
have a really fast logic analyzer, where
you can record unlimited traces with?
-
AC: A logic analyzer...
Q6: So basically it's just also an analog
-
digital converter and you largely want
fast sampling and a large amount of memory
-
to store the traces.
AC: Well, I just think it's not the best
-
use for it. It's probably... I don't know.
Maybe Sergey has any ideas, but I think it
-
just may be easier to get high-speed ADC
and replace the Lime chip with a high-
-
speed ADC to get what you want, because
the Lime chip has so many things there
-
specifically for RF.
SK: Yeah, the main problem you cannot just
-
sample original data. You should shift it
over frequency, so you cannot sample
-
original signal, and using it for
something else except spectrum analyzing
-
is hard.
Q6: OK. Thanks.
-
H: OK. Another question from the internet.
Signal angel: Yes. Have you compared the
-
sample rate of the ADC of the Lime DA chip
to the USRP ADCs, and if so, how does the
-
lower sample rate affect the performance?
AC: So, comparing low sample rate to
-
higher sample rate. We haven't done much
testing on the RF performance yet, because
-
we were so busy with all this stuff, so we
are yet to see in terms of low bit rates
-
versus sample rates versus high sample
rate. Well, high sample rate always gives
-
you better performance, but you also get
higher power consumption. So, I guess it's
-
the question of what's more more important
for you.
-
H: Okay. Over there.
Question 7: I've gathered there is no
-
mixer bypass, so you can't directly sample
the signal. Is there a way to use the same
-
antenna for send and receive, yet.
AC: Actually, there is... Input for ADC.
-
SK: But it's not a bypass, it's a
dedicated pin on LMS chip, and since we're
-
very space-constrained, we didn't route
them, so you can not actually bypass it.
-
AC: Okay, in our specific hardware, so in
general, so in the LMS chip there is a
-
special pin which allows you to drive your
signal directly to ADC without all the
-
mixers, filters, all this radio stuff,
just directly to ADC. So, yes,
-
theoretically that's possible.
SK: We even thought about this, but it
-
doesn't fit this design.
Q7: Okay. And can I share antennas,
-
because I have an existing laptop with
existing antennas, but I would use the
-
same antenna to send and receive.
AC: Yeah, so, I mean, that's... depends on
-
what exactly do you want to do. If you
want a TDG system, then yes, if you
-
want an FDG system, then you will have to
put a small duplexer in there, but yeah,
-
that's the idea. So you can plug this into
your laptop and use your existing
-
antennas. That's one of the ideas of how
to use xtrx.
-
Q7: Yeah, because there's all four
connectors.
-
AC: Yeah. One thing which I actually
forgot to mention is - I kind of mentioned
-
in the slides - is that any other SDRs
which are based on Ethernet or on the USB
-
can't work with a CSMA wireless systems,
and the most famous CSMA system is Wi-Fi.
-
So, it turns out that because of the
latency between your operating system and
-
your radio on USB, you just can't react
fast enough for Wi-Fi to work, because you
-
- probably you know that - in Wi-Fi you
carrier sense, and if you sense that the
-
spectrum is free, you start transmitting.
Does make a sense when you have huge
-
latency, because you all know that... you
know the spectrum was free back then, so,
-
with xtrx, you actually can work with CSMA
systems like Wi-Fi, so again it makes it
-
possible to have a fully software
implementation of Wi-Fi in your laptop. It
-
obviously won't work like as good as your
commercial Wi-Fi, because you will have to
-
do a lot of processing on your CPU, but
for some purposes like experimentation,
-
for example, for wireless labs and R&D
labs, that's really valuable.
-
Q7: Thanks.
H: Okay. Over there.
-
Q8: Okay. what PCB design package did you
use?.
-
AC: Altium.
SK: Altium, yeah.
-
Q8: And I'd be interested in the PCIe
workshop. Would be really great if you do
-
this one.
AC: Say this again?
-
Q8: Would be really great if you do the
PCI Express workshop.
-
AC: Ah. PCI Express workshop. Okay. Thank
you.
-
H: Okay, I think we have one more question
from the microphones, and that's you.
-
Q9: Okay. Great talk. And again, I would
appreciate a PCI Express workshop, if it
-
ever happens. What are these
synchronization options between multiple
-
cards. Can you synchronize the ADC clock,
and can you synchronize the presumably
-
digitally created IF? SK: Yes, so... so,
unfortunately, just IF synchronization is
-
not possible, because Lime chip doesn't
expose a low frequency. But we can
-
synchronize digitally. So, we have special
one PPS signal synchronization. We have
-
lines for clock synchronization and other
stuff. We can do it in software. So the
-
Lime chip has phase correction register,
so when you measure... if there is a phase
-
difference, so you can compensate it on
different boards.
-
Q9: Tune to a station a long way away and
then rotate the phase until it aligns.
-
SK: Yeah.
Q9: Thank you.
-
AC: Little tricky, but possible. So,
that's one of our plans for future,
-
because we do want to see, like 128 by 128
MIMO at home.
-
H: Okay, we have another question from the
internet.
-
Signal angel: I actually have two
questions. The first one is: What is the
-
expected price after a prototype stage?
And the second one is: Can you tell us
-
more about this setup you had for
debugging the PCIe
-
issues?
AC: Could you repeat the second question?
-
SK: It's ????????????, I think.
SA: It's more about the setup you had for
-
debugging the PCIe issues.
SK: Second question, I think it's most
-
about our next workshop, because it's a
more complicated setup, so... mostly
-
remove everything about its now current
presentation.
-
AC: Yeah, but in general, and in terms of
hardware setup, that was our hardware
-
setup, so we bought this PCI Express to
Thunderbolt3, we bought the laptop which
-
supports Thunderbolt3, and that's how we
were debugging it. So, we don't need, like
-
a full-fledged PC, we don't have to
restart it all the time. So, in terms of
-
price, we don't have the fixed price yet.
So, all I can say right now is that we are
-
targeting no more than your bladeRF or
HackRF devices, and probably even cheaper.
-
For some versions.
H: Okay. We are out of time, so thank you
-
again Sergey and Alexander.
[Applause]
-
[Music]
-
subtitles created by c3subtitles.de
in the year 20??. Join, and help us!