[Music]
Herald: Has anyone in here ever worked
with libusb or PI USB? Hands up. Okay. Who
also thinks USB is a pain? laughs Okay.
Sergey and Alexander were here back in at
the 26C3, that's a long time ago. I think
it was back in Berlin, and back then they
presented their first homemade, or not
homemade, SDR, software-defined radio.
This year they are back again and they
want to show us how they implemented
another one, using an FPGA, and to
communicate with it they used PCI Express.
So I think if you thought USB was a pain,
let's see what they can tell us about PCI
Express. A warm round of applause for
Alexander and Sergey for building a high
throughput, low latency, PCIe-based
software-defined radio
[Applause]
Alexander Chemeris: Hi everyone, good
morning, and welcome to the first day of
the Congress. So, just a little bit
background about what we've done
previously and why we are doing what we
are doing right now, is that we started
working with software-defined radios and
by the way, who knows what software
defined radio is? Okay, perfect. laughs
And who ever actually used a software-
defined radio? RTL-SDR or...? Okay, less
people but that's still quite a lot. Okay,
good. I wonder whether anyone here used
more expensive radios like USRPs? Less
people, but okay, good. Cool. So before
2008 I've had no idea what software-
defined radio is, was working with voice
over IP software person, etc., etc., so I
in 2008 I heard about OpenBTS, got
introduced to software-defined radio and I
wanted to make it really work and that's
what led us to today. In 2009 we had to
develop a clock tamer. A hardware which
allows to use, allowed to use USRP1 to run GSM
without problems. If anyone ever tried
doing this without a good clock source
knows what I'm talking about. And we
presented this - it wasn't an SDR it was
just a clock source - we presented this in
2009 in 26C3.
Then I realized that using USRP1 is not
really a good idea, because we wanted to
build a robust, industrial-grade base
stations. So we started developing our own
software defined radio, which we call
UmTRX and it was in - we started started
this in 2011. Our first base stations with
it were deployed in 2013, but I always
wanted to have something really small and
really inexpensive and back then it wasn't
possible. My original idea in 2011, we
were to build a PCI Express card. Mini,
sorry, not PCI Express card but mini PCI
card.
If you remember there were like all the
Wi-Fi cards and mini PCI form factor and I
thought that would be really cool to have
an SDR and mini PCI, so I can plug this
into my laptop or in some embedded PC and
have a nice SDR equipment, but back then
it just was not really possible, because
electronics were bigger and more power
hungry and just didn't work that way, so
we designed UmTRX to work over gigabit
ethernet and it was about that size. So
now we spend this year at designing
something, which really brings me to what
I wanted those years ago, so the XTRX is a
mini PCI Express - again there was no PCI
Express back then, so now it's mini PCI
Express, which is even smaller than PCI, I
mean mini PCI and it's built to be
embedded friendly, so you can plug this
into a single board computer, embedded
single board computer. If you have a
laptop with a mini PCI Express you can
plug this into your laptop and you have a
really small, software-defined radio
equipment. And we really want to make it
inexpensive, that's why I was asking how
many of you have ever worked it with RTL-
SDR, how many of you ever worked with you
USRPs, because the gap between them is
pretty big and we want to really bring the
software-defined radio to masses.
Definitely won't be as cheap as RTL-SDR,
but we try to make it as close as
possible.
And at the same time, so at the size of
RTL-SDR, at the price well higher but,
hopeful hopefully it will be affordable to
pretty much everyone, we really want to
bring high performance into your hands.
And by high performance I mean this is a
full transmit/receive with two channels
transmit, two channels receive, which is
usually called 2x2 MIMO in in the radio
world. The goal was to bring it to 160
megasamples per second, which can roughly
give you like 120 MHz of radio spectrum
available.
So what we were able to achieve is, again
this is mini PCI Express form factor, it
has small Artix7, that's the smallest and
most inexpensive FPGA, which has ability
to work with a PCI Express. It has LMS7000
chip for RFIC, very high performance, very
tightly embedded chip with even a DSP
blocks inside. It has even a GPS chip
here, you can actually on the right upper
side, you can see a GPS chip, so you can
accually synchronize your SDR to GPS for
perfect clock stability,
so you won't have any problems running any
telecommunication systems like GSM, 3G, 4G
due to clock problems, and it also has
interface for SIM cards, so you can
actually create a software-defined radio
modem and run other open source projects
to build one in a four LT called SRSUI, if
you're interested, etc., etc. so really
really tightly packed one. And if you put
this into perspective: that's how it all
started in 2006 and that's what you have
ten years later. It's pretty impressive.
applause
Thanks. But I think it actually applies to
the whole industry who is working on
shrinking the sizes because we just put
stuff on the PCB, you know. We're not
building the silicon itself. Interesting
thing is that we did the first approach:
we said let's pack everything, let's do a
very tight PCB design. We did an eight
layer PCB design and when we send it to a
fab to estimate the cost it turned out
it's $15,000 US per piece. Well in small
volumes obviously but still a little bit
too much. So we had to redesign this and
the first thing which we did is we still
kept eight layers, because in our
experience number of layers nowadays have
only minimal impact on the cost of the
device. So like six, eight layers - the
price difference is not so big. But we did
complete rerouting and only kept 2-Deep
MicroVIAs and never use the buried VIAs.
So this make it much easier and much
faster for the fab to manufacture it and
the price suddenly went five, six times
down and in volume again it will be
significantly cheaper. And that's just for
geek porn how PCB looks inside. So now
let's go into real stuff. So PCI Express:
why did we choose PCI Express? As it was
said USB is a pain in the ass. You can't
really use USB in industrial systems. For
a whole variety of reasons just unstable.
So we did use Ethernet for many years
successfully but Ethernet has one problem:
first of all inexpensive Ethernet is only
one gigabit and one gigabit does not offer
you enough bandwidth to carry all the data
we want, plus its power-hungry etc. etc.
So PCI Express is really a good choice
because it's low power, it has low
latency, it has very high bandwidth and
it's available almost universally. When we
started looking into this we realize that
even ARM boards, some of ARM boards have
PCI Express, mini PCI Express slots, which
was a big surprise for me for example.
So the problems is that unlike USB you do
need to write your own kernel driver for
this and there's no way around. And it is
really hard to write this driver
universally so we are writing it obviously
for Linux because they're working with
embedded systems, but if we want to
rewrite it for Windows or for macOS we'll
have to do a lot of rewriting. So we focus
on what we want on Linux only right now.
And now the hardest part: debugging is
really non-trivial. One small error and
your PC is completely hanged because you
use something wrong. And you have to
reboot it and restart it. That's like
debugging kernel but sometimes even
harder. To make it worse there is no
really easy-to-use plug-and-play
interface. If you want to restart;
normally, when you when you develop a PCI
Express card, when you want when you want
to restart it you have to restart your
development machine. Again not a nice way,
it's really hard. So the first thing we
did is we found, that we can use
Thunderbolt 3 which is just recently
released, and it has ability to work
directly with PCI Express bus. So it
basically has a mode in which it converts
a PCI Express into plug-and-play
interface. So if you have a laptop which
supports Thunderbolt 3 then you can use
this to do plug and play your - plug or
unplug your device to make your
development easier. There are always
problems: there's no easy way, there's no
documentation. Thunderbolt is not
compatible with Thunderbolt. Thunderbold 3
is not compatible with Thunderbold 2.
So we had to buy a special laptop with
Thunderbold 3 with special cables like all
this all this hard stuff. And if you
really want to get documentation you have
to sign NDA and send a business plan to
them so they can approve that your
business makes sense.
laughter
I mean... laughs So we actually opted
out. We set not to go through this, what
we did is we found that someone is
actually making PCI Express to Thunderbolt
3 converters and selling them as dev
boards and that was a big relief because
it saved us lots of time, lots of money.
You just order it from from some from some
Asian company. And yeah this is how it
looks like this converter. So you buy it,
like several pieces you can plug in your
PCI Express card there and you plug this
into your laptop. And this is the with
XTRX already plugged into it. Now the only
problem we found is that typically UEFI
has a security control enabled, so that
any random thunderbold device can't hijack
your PCI bus and can't get access to your
kernel memory and do some bad stuff. Which
is a good idea - the only problem is that
there is, it's not fully implemented in
Linux. So under Windows if you plug in a
device which is which has no security
features, which is not certified, it will
politely ask you like: "Do you really
trust this device? Do you want to use it?"
you can say "yes". Under Linux it just
does not work. laughs So we spend some
time trying to figure out how to get
around this. Right, some patches from
Intel which are not mainline and we were
not able to actually get them work. So we
just had to disable all this security
measure in the laptop. So be aware that
this is the case and we suspect that happy
users of Apple might not be able to do
this because Apple don't have BIOS so it
probably can't disable this feature. So
probably good incentive for someone to
actually finish writing the driver.
So now to the goal: so we wanted to, we
want to achieve 160 mega samples per
second, 2x2 MIMO, which means two
transceiver, two transmit, two receive
channels at 12 bits, which is roughly 7.5
Gbit/s. So first result when we plug this
when we got this board on the fab it
didn't work
Sergey Kostanbaev mumbles: as expected
Alexander Chemeris: yes as expected so the
first the interesting thing we realized is
that: first of all the FPGA has Hardware
blocks for talking to a PCI Express which
was called GTP which basically implement
like a PCI Express serial physical layer
but the thing is the numbering is reversed
in the in PCI Express in FPGA and we did
not realize this so we had to do very very
fine soldiering to actually swap the
laughs swap the lanes you can see this
very fine work there.
We also found that one of the components
was deadbug which is a well-known term for
chips which design stage are placed at
mirrored so we mirrored occasionally
mirrored that they pin out so we had to
solder it upside down and if you can
realize how small it is you can also
appreciate the work done. And what's funny
when I was looking at dead bugs I actually
found a manual from NASA which describes
how to properly soldier dead bugs to get
it approved.
audience laughs
So this is the link I think you can go
there and enjoy it's also fun stuff there.
So after fixing all of this our next
attempt this kind of works. So next stage
is debugging the FPGA code, which has to
talk to PCI Express and PCI Express has to
talk to Linux kernel and the kernel has to
talk to the driver, driver has talked to
the user space. So peripherals are easy so
the UART SPIs we've got to work almost
immediately no problems with that, but DMA
was a real beast. So we spent a lot of
time trying to get DMA to work and the
problem is that with DMA it's on FPGA so
you can't just place a breakpoint like you
do in C or C++ or in other languages it's
real-time system running on system like
it's real-time hardware, which is running
on the fabric so you we had to Sergey was
mainly developing this had to write a lot
of small test benches and and test
everything piece by piece.
So all parts of the DMA code we had was
wrapped into a small test bench which was
emulating all the all the tricks and as
classics predicted it took about five to
ten times more than actually writing the
code. So we really blew up our and
predicted timelines by doing this, but the
end we've got really stable stable work.
So some suggestions for anyone who will
try to repeat this exercise is there is a
logic analyzer built-in to Xilinx and you
can use, it it's nice it's, sometimes it's
very helpful but you can't debug
transient box, which are coming out at
when some weird conditions are coming up.
So you have to implement some read back
registers which shows important statistic
like important data about how your system
behaves, in our case it's various counters
on the DMA interface. So you can actually
see kind of see what's happening with your
with your data: Is it received? Is it
sent? How much is and how much is
received? So like for example, we can see
when we saturate the bus or when actually
is an underrun so host is not providing
data fast enough, so we can at least
understand whether it's a host problem or
whether it's an FPGA, problem on which
part we do we debug next because again:
it's a very multi layer problem you start
with FPGA, PCI Express, kernel, driver,
user space, and any part can fail. so you
can't work blind like this. So again the
goal was to get 160 MSPS with the first
implementation we could 2 MSPS: roughly 60
times slower.
The problem is that software just wasn't
keeping up and wasn't sending data fast
enough. So it was like many things done
but the most important parts is: use real-
time priority if you want to get very
stable results and well fix software bugs.
And one of the most important bugs we had
was that DMA buffers were not freed in
proper time immediately so they were busy
for longer than they should be, which
introduced extra cycles and basically just
reduced the bandwidth.
At this point let's talk a little bit
about how to implement a high-performance
driver for Linux, because if you want to
get real real performance you have to
start with the right design. There are
basically three approaches and the whole
spectrum in between; like two approaches
and the whole spectrum in between, which
is where you can refer to three. The first
approach is full kernel control, in which
case kernel driver not only is on the
transfer, it actually has all the logics
of controlling your device and all the
export ioctl to the user space and
that's the kind of a traditional way of
writing drivers. Your your user space is
completely abstracted from all the
details. The problem is that this is
probably the slowest way to do it. The
other way is what's called the "zero cup
interface": your only control is held in
the kernel and data is provided, the raw
data is provided to user space "as-is". So
you avoid memory copy which make it
faster. But still not fast enough if you
really want to achieve maximum
performance, because you still have
context switches between the kernel and
the user space. The most... the fastest
approach possible is to have full user
space implementation when kernel just
exposed everything and says "now you do it
yourself" and you have no you have no
context switches, like almost no, and you
can really optimize everything. So what
is... what are the problems with this?
The pro the pros I already mentioned: no
no switches between kernel user space,
it's very low latency because of this as
well, it's very high bandwidth. But if you
are not interested in getting the very
high performance, the most performance, and
you just want to have like some little,
like say low bandwidth performance, then
you will have to add hacks, because you
can't get notifications of the kernel that
resources available is more data
available. It also makes it vulnerable
vulnerable because if user space can
access it, then it can do whatever it
want. We at the end decided that... one
more important thing: how to actually to
get the best performance out of out of the
bus. This is a very (?)(?) set as we want
to poll your device or not to poll and get
notified. What is polling? I guess
everyone as programmer understands it, so
polling is when you asked repeatedly: "Are
you ready?", "Are you ready?", "Are you
ready?" and when it's ready you get the
data immediately.
It's basically a busy loop of your you
just constantly asking device what's
happening. You need to dedicate a full
core, and thanks God we have multi-core
CPUs nowadays, so you can dedicate the
full core to this polling and you can just
pull constantly. But again if you don't
need this highest performance, you just
need to get something, then you will be
wasting a lot of CPU resources. At the end
we decided to do a combined architecture
of your, it is possible to pull but
there's also a chance and to get
notification from a kernel to for for
applications, which recover, which needs
low bandwidth, but also require a better
CPU performance. Which I think is the best
way if you are trying to target both
worlds. Very quickly: the architecture of
system. We try to make it very very
portable so and flexible. There is a
kernel driver, which talks to low-level
library which implements all this logic,
which we took out of the driver: to
control the
PCI Express, to work with DMA, to provide
all the... to hide all the details of the
actual bus implementation.
And then there is a high-level library
which talks to this low-level library and
also to libraries which implement control
of actual peripherals, and most
importantly to the library which
implements control over our RFIC chip.
This way it's very modular, we can replace
PCI Express with something else later, we
might be able to port it to other
operating systems, and that's the goal.
Another interesting issue is: when you
start writing the Linux kernel driver you
very quickly realize that while LDD, which
is a classic book for a Linux driver,
writing is good and it will give you a
good insight; it's not actually up-to-
date. It's more than ten years old and
there's all of new interfaces which are
not described there, so you have to resort
to reading the manuals and all the
documentation in the kernel itself. Well
at least you get the up-to-date
information. The decisions we made is to
make everything easy. We use TTY for GPS
and so you can really attach a pretty much
any application which talks to GPS. So all
of existing applications can just work out
of the box. And we also wanted to be able
to synchronize system clock to GPS, so we
get automatic log synchronization across
multiple systems, which is very important
when we are deploying many, many devices
around the world.
We plan to do two interfaces, one as key
PPS and another is a DCT, because DCT line
on the UART exposed over TTY. Because
again we found that there are two types of
applications: one to support one API,
others that support other API and there is
no common thing so we have to support
both. As we described, we want to have
polls so we can get notifications of the
kernel when data is available and we don't
need to do real busy looping all the time.
After all the software optimizations we've
got to like 10 MSPS: still very, very far
from what we want to achieve.
Now there should have been a lot of
explanations about PCI Express, but when
we actually wrote everything we wanted to
say we realize, it's just like a full two
hours talk just on PCI Express. So we are
not going to give it here, I'll just give
some highlights which are most
interesting. If you if there is real
interest, we can set up a workshop and
some of the later days and talking more
details about PCI Express specifically.
The thing is there is no open source cores
for PCI Express, which are optimized for
high performance, real time applications.
There is Xillybus which as I understand is
going to be open source, but they provide
you a source if you pay them. It's very
popular because it's very very easy to do,
but it's not giving you performance. If I
remember correctly the best it can do is
maybe like 50 percent bus saturation.
So there's also Xilinx implementation, but
if you are using Xilinx implementation
with AXI bus than you're really locked in
with AXI bus with Xilinx. And it also not
very efficient in terms of resources and
if you remember we want to make this very,
very inexpensive. So our goal is to you
... is to be able to fit everything in the
smallest Arctic's 7 FPGA, and that's quite
challenging with all the stuff in there
and we just can't waste resources. So
decision is to write your own PCI Express
implementation. That's how it looks like.
I'm not going to discuss it right now.
There are several iterations. Initially it
looked much simpler, turned out not to
work well.
So some interesting stuff about PCI
Express which we stumbled upon is that it
was working really well on Atom which is
our main development platform because we
are doing a lot of embedded stuff. Worked
really well. When we try to plug this into
core i7 just started hanging once in a
while. So after like several not days
maybe with debugging, Sergey found that
very interesting statement in the standard
which says that value is zero in byte
count actually stands not for zero bytes
but for 4096 bytes.
I mean that's a really cool optimization.
So another thing is completion which is a
term in PCI Express basically for
acknowledgment which also can carry some
data back to your request. And sometimes
if you're not sending completion, device
just hangs. And what happens is that in
this case due to some historical heritage
of x86 it just starts returning you FFF.
And if you have a register which says: „Is
your device okay?“ and this register shows
one to say „The device is okay“, guess
what will happen?
You will be always reading that your
device is okay. So the suggestion is not
to use one as the status for okay and use
either zero or better like a two-beat
sequence. So you are definitely sure that
you are okay and not getting FFF's. So
when you have a device which again may
fail at any of the layers, you just got
this new board, it's really hard, it's
really hard to debug because of memory
corruption. So we had a software bug and
it was writing DMA addresses
incorrectly and we were wondering why we
are not getting any data in our buffers at
the same time. After several starts,
operating system just crashes. Well, that's
the reason why there is this UEFI
protection which prevents you from
plugging in devices like this into your
computer. Because it was basically writing
data, like random data into random
portions of your memory. So a lot of
debugging, a lot of tests and test benches
and we were able to find this. And another
thing is if you deinitialize your driver
incorrectly, and that's what's happening
when you have plug-and-play device, which
you can plug and unplug, then you may end
up in a situation of your ... you are
trying to write into memory which is
already freed by approaching system and
used for something else. Very well-known
problem but it also happens here. So there
... why DMA is really hard is because it
has this completion architecture for
writing for ... sorry ... for reading
data. Writes are easy. You just send the
data, you forget about it. It's a fire-
and-forget system. But for reading you
really need to get your data back. And the
thing is, it looks like this. You really
hope that there would be some pointing
device here. But basically on the top left
you can see requests for read and on the
right you can see completion transactions.
So basically each transaction can be and
most likely will be split into multiple
transactions. So first of all you have to
collect all these pieces and like write
them into proper parts of the memory.
But that's not all. The thing is the
latency between request and completion is
really high. It's like 50 cycles. So if
you have a single, only single transaction
in fly you will get really bad
performance. You do need to have multiple
transactions in flight. And the worst
thing is that transactions can return data
in random order. So it's a much more
complicated state machine than we expected
originally. So when I said, you know, the
architecture was much simpler originally,
we don't have all of this and we had to
realize this while implementing. So again
here was a whole description of how
exactly this works. But not this time. So
now after all these optimizations we've
got 20 mega samples per second which is
just six times lower than what we are
aiming at. So now the next thing is PCI
Express lanes scalability. So PCI Express
is a serial bus. So it has multiple lanes
and they allow you to basically
horizontally scale your bandwidth. One
lane is like x, than two lane is 2x, four
lane is 4x. So the more lanes you have the
more performance you are getting out of
your, out of your bus. So the more
bandwidth you're getting out of your bus.
Not performance. So the issue is that
typical a mini PCI Express, so the mini
PCI Express standard only standardized one
lane. And second lane is left as optional.
So most motherboards don't support this.
There are some but not all of them. And we
really wanted to get this done. So we
designed a special converter board which
allows you to plug your mini PCI Express
into a full-size PCI Express and
get two lanes working. And we're also
planning to have a similar board which
will have multiple slots so you will be
able to get multiple XTRX-SDRs on to the
same, onto the same carrier board and plug
this into let's say PCI Express 16x and
you will get like really a lot of ... SDR
... a lot of IQ data which then will be
your problem how to, how to process. So
with two x's it's about twice performance
so we are getting fifty mega samples per
second. And that's the time to really cut
the fat because the real sample size of
LMS7 is 12 bits and we are transmitting 16
because it's easier. Because CPU is
working on 8, 16, 32. So we originally
designed the driver to support 8 bit, 12
bit and 16 bit to be able to do this
scaling. And for the test we said okay
let's go from 16 to 8 bit. We'll lose
some dynamic range but who cares these
days. Still stayed the same, it's still 50
mega samples per second, no matter what we
did. And that was a lot of interesting
debugging going on. And we realized that
we actually made another, not a really
mistake. We didn't, we didn't really know
this when we designed. But we should have
used a higher voltage for this high speed
bus to get it to the full performance. And
at 1.8 it was just degrading too fast and
the bus itself was not performing well. So
our next prototype will be using higher
voltage specifically for this bus. And
this is kind of stuff which makes
designing hardware for high speed really
hard because you have to care about
coherence of the parallel buses on your,
on your system. So at the same time we do
want to keep 1.8 volts for everything else
as much as possible. Because another
problem we are facing with this device is
that by the standard mini PCI Express
allows only like ...
Sergey Kostanbaev: ... 2.5 ...
Alexander Chemeris: ... 2.5 watts of power
consumption, no more. And that's we were,
we were very lucky that LMS7 has such so
good, so good power consumption
performance. We actually had some extra
space to have FPGA and GPS and all this
stuff. But we just can't let the power
consumption go up. Our measurements on
this device showed about ...
Sergey Kostanbaev: ... 2.3 ...
Alexander Chemeris: ... 2.3 watts of power
consumption. So we are like at the limit
at this point. So when we fix the bus with
the higher voltage, you know it's a
theoretical exercise, because we haven't
done this yet, that's plenty to happen in
a couple months. We should be able to get
to this numbers which was just 1.2 times
slower. Then the next thing will be to fix
another issue which we made at the very
beginning: we have procured a wrong chip.
Just one digit difference, you can see
it's highlighted in red and green, and
this chip it supports only a generation 1
PCI Express which is twice slower than
generation 2 PCI Express.
So again, hopefully we'll replace the chip
and just get very simple doubling of the
performance. Still it will be slower than
we wanted it to be and here is what comes
like practical versus theoretical numbers.
Well as every bus it has it has overheads
and one of the things which again we
realized when we were implementing this
is, that even though the standard
standardized is the payload size of 4kB,
actual implementations are different. For
example desktop computers like Intel Core
or Intel Atom they only have 128 byte
payload. So there is much more overhead
going on the bus to transfer data and even
theoretically you can only achieve 87%
efficiency. And on Xeon we tested and we
found that they're using 256 payload size
and this can give you like a 92%
efficiency on the bus and this is before
the overhead so the real reality is even
worse. An interesting thing which we also
did not expect, is that we originally were
developing on Intel Atom and everything
was working great. When we plug this into
laptop like Core i7 multi-core really
powerful device, we didn't expect that it
wouldn't work. Obviously Core i7 should
work better than Atom: no, not always.
The thing is, we were plugging into a
laptop, which had a built-in video card
which was sitting on the same PCI bus and
probably manufacturer hard-coded the higher
priority for the video card than for
everything else in the system, because I
don't want your your screen to flicker.
And so when you move a window you actually
see the late packets coming to your PCI
device. We had to introduce a jitter
buffer and add more FIFO into the device
to smooth it out. On the other hand the
Xeon is performing really well. So it's
very optimized. That said, we have tested
it with discreet card and it outperforms
everything by whooping five seven percent.
What you get four for the price. So this
is actually the end of the presentation.
We still have not scheduled any workshop,
but if there if there is any interest in
actually seeing the device working or if
you interested in learning more about the
PCI Express in details let us know we'll
schedule something in the next few days.
That's the end, I think we can proceed
with questions if there are any.
Applause
Herald: Okay, thank you very much. If you
are leaving now: please try to leave
quietly because we might have some
questions and you want to hear them. If
you have questions please line up right
behind the microphones and I think we'll
just wait because we don't have anything
from the signal angel. However, if you are
watching on stream you can hop into the
channels and over social media to ask
questions and they will be answered,
hopefully. So on that microphone.
Question 1: What's the minimum and maximum
frequency of the card?
Alexander Chemeris: You mean RF
frequency?
Question 1: No, the minimum frequency you
can sample at. the most SDR devices can
only sample at over 50 MHz. Is there a
similar limitation at your card?
Alexander Chemeris: Yeah, so if you're
talking about RF frequency it can go
from like almost zero even though that
works worse below 50MHz and all the way to
3.8GHz if I remember correctly. And in
terms of the sample rate right now it
works from like about 2 MSPS and to about
50 right now. But again, we're planning to
get it to these numbers we quoted.
Herald: Okay. The microphone over there.
Question 2: Thanks for your talk. Did you
manage to put your Linux kernel driver to
the main line?
Alexander Chemeris: No, not yet. I mean,
it's not even like fully published. So I
did not say in the beginning, sorry for
this. We only just manufactured the first
prototype, which we debugged heavily. So
we are only planning to manufacture the
second prototype with all these fixes and
then we will release, like, the kernel
driver and everything. And maybe we'll try
or maybe won't try, haven't decided yet.
Question 2: Thanks
Herald: Okay...
Alexander Chemeris: and that will be the
whole other experience.
Herald: Okay, over there.
Question 3: Hey, looks like you went
through some incredible amounts of pain to
make this work. So, I was wondering,
aren't there any simulators at least for
parts of the system, or the PCIe bus for
the DMA something? Any simulator so that
you can actually first design the system
there and debug it more easily?
Sergey Kostanbaev: Yes, there are
available simulators, but the problem's
all there are non-free. So you have to pay
for them. So yeah and we choose the hard
way.
Question 3: Okay thanks.
Herald: We have a question from the signal
angel.
Question 4: Yeah are the FPGA codes, Linux
driver, and library code, and the design
project files public and if so, did they
post them yet? They can't find them on
xtrx.io.
Alexander Chemeris: Yeah, so they're not
published yet. As I said, we haven't
released them. So, the drivers and
libraries will definitely be available,
FPGA code... We are considering this
probably also will be available in open
source. But we will publish them together
with the public announcement of the
device.
Herald: Ok, that microphone.
Question 5: Yes. Did you guys see any
signal integrity issues between on the PCI
bus, or on this bus to the LMS chip, the
Lime microchip, I think, this doing
the RF ?
AC: Right.
Question 5: Did you try to measure signal
integrity issues, or... because there were
some reliability issues, right?
AC: Yeah, we actually... so, PCI. With PCI
we never had issues, if I remember
correctly.
SK: No.
AC: I just... it was just working.
SK: Well, the board is so small, and when
there are small traces there's no problem
in signal integrity. So it's actually
saved us.
AC: Yeah. Designing a small board is easier.
Yeah, with the LMS 7, the problem is not
the signal integrity in terms of
difference in the length of the traces,
but rather the fact that the signal
degrades over voltage, also over speed in
terms of voltage, and drops below the
detection level, and all this stuff. We
use some measurements. I actually wanted
to add some pictures here, but decided
that's not going to be super interesting.
H: Okay. Microphone over there.
Question 6: Yes. Thanks for the talk. How
much work would it be to convert the two
by two SDR into an 8-input logic analyzer
in terms of hard- and software? So, if you
have a really fast logic analyzer, where
you can record unlimited traces with?
AC: A logic analyzer...
Q6: So basically it's just also an analog
digital converter and you largely want
fast sampling and a large amount of memory
to store the traces.
AC: Well, I just think it's not the best
use for it. It's probably... I don't know.
Maybe Sergey has any ideas, but I think it
just may be easier to get high-speed ADC
and replace the Lime chip with a high-
speed ADC to get what you want, because
the Lime chip has so many things there
specifically for RF.
SK: Yeah, the main problem you cannot just
sample original data. You should shift it
over frequency, so you cannot sample
original signal, and using it for
something else except spectrum analyzing
is hard.
Q6: OK. Thanks.
H: OK. Another question from the internet.
Signal angel: Yes. Have you compared the
sample rate of the ADC of the Lime DA chip
to the USRP ADCs, and if so, how does the
lower sample rate affect the performance?
AC: So, comparing low sample rate to
higher sample rate. We haven't done much
testing on the RF performance yet, because
we were so busy with all this stuff, so we
are yet to see in terms of low bit rates
versus sample rates versus high sample
rate. Well, high sample rate always gives
you better performance, but you also get
higher power consumption. So, I guess it's
the question of what's more more important
for you.
H: Okay. Over there.
Question 7: I've gathered there is no
mixer bypass, so you can't directly sample
the signal. Is there a way to use the same
antenna for send and receive, yet.
AC: Actually, there is... Input for ADC.
SK: But it's not a bypass, it's a
dedicated pin on LMS chip, and since we're
very space-constrained, we didn't route
them, so you can not actually bypass it.
AC: Okay, in our specific hardware, so in
general, so in the LMS chip there is a
special pin which allows you to drive your
signal directly to ADC without all the
mixers, filters, all this radio stuff,
just directly to ADC. So, yes,
theoretically that's possible.
SK: We even thought about this, but it
doesn't fit this design.
Q7: Okay. And can I share antennas,
because I have an existing laptop with
existing antennas, but I would use the
same antenna to send and receive.
AC: Yeah, so, I mean, that's... depends on
what exactly do you want to do. If you
want a TDG system, then yes, if you
want an FDG system, then you will have to
put a small duplexer in there, but yeah,
that's the idea. So you can plug this into
your laptop and use your existing
antennas. That's one of the ideas of how
to use xtrx.
Q7: Yeah, because there's all four
connectors.
AC: Yeah. One thing which I actually
forgot to mention is - I kind of mentioned
in the slides - is that any other SDRs
which are based on Ethernet or on the USB
can't work with a CSMA wireless systems,
and the most famous CSMA system is Wi-Fi.
So, it turns out that because of the
latency between your operating system and
your radio on USB, you just can't react
fast enough for Wi-Fi to work, because you
- probably you know that - in Wi-Fi you
carrier sense, and if you sense that the
spectrum is free, you start transmitting.
Does make a sense when you have huge
latency, because you all know that... you
know the spectrum was free back then, so,
with xtrx, you actually can work with CSMA
systems like Wi-Fi, so again it makes it
possible to have a fully software
implementation of Wi-Fi in your laptop. It
obviously won't work like as good as your
commercial Wi-Fi, because you will have to
do a lot of processing on your CPU, but
for some purposes like experimentation,
for example, for wireless labs and R&D
labs, that's really valuable.
Q7: Thanks.
H: Okay. Over there.
Q8: Okay. what PCB design package did you
use?.
AC: Altium.
SK: Altium, yeah.
Q8: And I'd be interested in the PCIe
workshop. Would be really great if you do
this one.
AC: Say this again?
Q8: Would be really great if you do the
PCI Express workshop.
AC: Ah. PCI Express workshop. Okay. Thank
you.
H: Okay, I think we have one more question
from the microphones, and that's you.
Q9: Okay. Great talk. And again, I would
appreciate a PCI Express workshop, if it
ever happens. What are these
synchronization options between multiple
cards. Can you synchronize the ADC clock,
and can you synchronize the presumably
digitally created IF? SK: Yes, so... so,
unfortunately, just IF synchronization is
not possible, because Lime chip doesn't
expose a low frequency. But we can
synchronize digitally. So, we have special
one PPS signal synchronization. We have
lines for clock synchronization and other
stuff. We can do it in software. So the
Lime chip has phase correction register,
so when you measure... if there is a phase
difference, so you can compensate it on
different boards.
Q9: Tune to a station a long way away and
then rotate the phase until it aligns.
SK: Yeah.
Q9: Thank you.
AC: Little tricky, but possible. So,
that's one of our plans for future,
because we do want to see, like 128 by 128
MIMO at home.
H: Okay, we have another question from the
internet.
Signal angel: I actually have two
questions. The first one is: What is the
expected price after a prototype stage?
And the second one is: Can you tell us
more about this setup you had for
debugging the PCIe
issues?
AC: Could you repeat the second question?
SK: It's ????????????, I think.
SA: It's more about the setup you had for
debugging the PCIe issues.
SK: Second question, I think it's most
about our next workshop, because it's a
more complicated setup, so... mostly
remove everything about its now current
presentation.
AC: Yeah, but in general, and in terms of
hardware setup, that was our hardware
setup, so we bought this PCI Express to
Thunderbolt3, we bought the laptop which
supports Thunderbolt3, and that's how we
were debugging it. So, we don't need, like
a full-fledged PC, we don't have to
restart it all the time. So, in terms of
price, we don't have the fixed price yet.
So, all I can say right now is that we are
targeting no more than your bladeRF or
HackRF devices, and probably even cheaper.
For some versions.
H: Okay. We are out of time, so thank you
again Sergey and Alexander.
[Applause]
[Music]
subtitles created by c3subtitles.de
in the year 20??. Join, and help us!