0:00:00.000,0:00:18.406
36C3 Intro musik
0:00:18.406,0:00:22.640
Herald: The next talk will be titled 'How[br]to Design Highly Reliable Digital
0:00:22.640,0:00:26.472
Electronics', and it will be delivered to[br]you by Szymon and Stefan. Warm Applause
0:00:26.472,0:00:30.199
for them.
0:00:30.199,0:00:36.360
applause
0:00:36.360,0:00:41.360
Stefan: All right. Good morning, Congress.[br]So perhaps every one of you in the room
0:00:41.360,0:00:45.600
here has at one point or another in their[br]lives witnessed their computer behaving
0:00:45.600,0:00:50.320
weirdly and doing things that it was not[br]supposed to do or what you didn't
0:00:50.320,0:00:54.400
anticipate it to do. And well, typically[br]that would have probably been the result
0:00:54.400,0:01:00.000
of a software bug of some sort somewhere[br]inside the huge software stack your PC is
0:01:00.000,0:01:04.720
running on. Have you ever considered what[br]the probability of this weird behavior
0:01:04.720,0:01:09.120
being caused by a bit flipped somewhere in[br]your memory of your computer might have
0:01:09.120,0:01:16.240
been? So what you can see in this video on[br]the screen now is a physics experiment
0:01:16.240,0:01:20.720
called a cloud chamber. It's a very simple[br]experiment that is actually able to
0:01:20.720,0:01:26.560
visualize and make apparent all the[br]constant stream of background radiation we
0:01:26.560,0:01:32.640
all are constantly exposed to. So what's[br]happening here is that highly energetic
0:01:32.640,0:01:39.040
particles, for example, from space they[br]trace through gaseous alcohol and they
0:01:39.040,0:01:42.160
collide with alcohol molecules and they[br]form in this process a trail of
0:01:42.160,0:01:48.240
condensation while they do that. And if[br]you think about your computer, a typical
0:01:48.240,0:01:53.200
cell of RAM, of which you might have, I[br]don't know, 4, 8, 10 gigabytes in your
0:01:53.200,0:01:58.400
machine is as big as only 80 nanometers[br]wide. So it's very, very tiny. And you
0:01:58.400,0:02:02.560
probably are able to appreciate the small[br]amount of energy that is needed or that is
0:02:02.560,0:02:08.480
used to store the information inside each[br]of those bits. And the sheer amount of of
0:02:08.480,0:02:12.560
those bits you have in your RAM and your[br]computer. So a couple of years ago, there
0:02:12.560,0:02:17.600
was a study that concluded that in a[br]computer with about four gigabytes of RAM,
0:02:17.600,0:02:23.600
a bit flip, um, caused by such an event by[br]cosmic background radiation can occur
0:02:23.600,0:02:29.200
about once every 33 hours. So a[br]bit less than than one per day. In an
0:02:29.200,0:02:34.960
incident in 2008, a Quantas Airlines[br]flight actually nearly crashed, and the
0:02:34.960,0:02:40.080
reason for this crash was traced back to[br]be very likely caused by a bit flipped
0:02:40.080,0:02:44.400
somewhere in one of the CPUs of the[br]avionics system and nearly caused the
0:02:44.400,0:02:50.480
death of a lot of passengers on this[br]plane. In 2003, in Belgium, a small
0:02:50.480,0:02:56.880
municipal vote actually had a weird hiccup[br]in which one of the candidates in this
0:02:56.880,0:03:02.153
election actually got 4096 more votes added in a single instance.
0:03:02.153,0:03:06.480
And that was traced back to be very likely[br]caused by cosmic background radiation,
0:03:06.480,0:03:10.000
flipping a memory cell somewhere that[br]stored the vote count. And it was only
0:03:10.000,0:03:14.560
discovered that this happened because this[br]number of votes for this particular
0:03:14.560,0:03:18.880
candidate was considered unreasonable, but[br]otherwise would have gotten away probably
0:03:18.880,0:03:27.360
without being detected. So a few words[br]about us: Szymon and I, we both work at
0:03:27.360,0:03:32.480
CERN in the microelectronics section and[br]we both develop electronics that need to
0:03:32.480,0:03:37.360
be tolerant to these sorts of effects. So[br]we develop radiation tolerant electronics
0:03:37.360,0:03:42.846
for the experiments at CERN, at the LHC.[br]Among a lot of other applications, you can
0:03:42.846,0:03:48.330
meet the two of us at the Lötlabor Jena[br]assembly if you are interested in what we
0:03:48.330,0:03:55.847
are talking about today. And we will also[br]give a small talk or a small workshop
0:03:55.847,0:03:59.190
about radiation detection tomorrow, in one[br]of the seminar rooms. So feel free to pass
0:03:59.190,0:04:02.544
by there, it will be a quick introduction.[br]To give you a small idea of what kind of
0:04:02.544,0:04:08.541
environment we are working for: So if you[br]would use one of your default intel i7
0:04:08.541,0:04:14.294
CPUs from your notebook and would put it[br]anywhere where we operate our electronics,
0:04:14.294,0:04:19.632
it would very shortly die in a matter of[br]probably one or two minutes and it would
0:04:19.632,0:04:24.626
die for more than just one reason, which[br]is rather interesting and compelling. So
0:04:24.626,0:04:30.985
the idea for today's talk is to give you[br]all an insight into all the things that
0:04:30.985,0:04:34.575
need to be taken into account when you[br]design electronics for radiation
0:04:34.575,0:04:39.152
environments. What kinds of different[br]challenges come when you try to do that.
0:04:39.152,0:04:43.116
We classify and explain the different[br]types of radiation effects that exist. And
0:04:43.116,0:04:47.617
then we also present what you can do to[br]mitigate these effects and also validate
0:04:47.617,0:04:52.116
that what you did to care for them or[br]protect your circuits actually worked. And
0:04:52.116,0:04:57.477
of course, as we do that, we'll try to[br]give our view on how we develop radiation
0:04:57.477,0:05:03.257
tolerant electronics at CERN and how our[br]workflow looks like to make sure this
0:05:03.257,0:05:08.272
works. So let's first maybe take a step[br]back and have a look at what we mean when
0:05:08.272,0:05:12.997
we say radiation environments. The first[br]one that you probably have in mind right
0:05:12.997,0:05:19.044
now when you think about radiation is[br]space. So, this interstellar space is
0:05:19.044,0:05:24.292
basically filled with, very high speed,[br]highly energetic electrons and protons and
0:05:24.292,0:05:28.716
all sorts of high energy particles. And[br]while they, for example, traverse close to
0:05:28.716,0:05:34.513
planets as our Earth - these planets[br]sometimes do have a magnetic field and the
0:05:34.513,0:05:39.317
highly energetic particles are actually[br]deflected by these magnetic fields and
0:05:39.317,0:05:43.824
they can protect the planets as our[br]planet, for example, from this highly
0:05:43.824,0:05:47.986
energetic radiation. But in the process,[br]there around these planets sometimes they
0:05:47.986,0:05:52.107
form these radiation belts - known as the[br]Van Allen belts after the guy who
0:05:52.107,0:05:56.043
discovered this effect a long time ago.[br]And a satellite in space as it orbits
0:05:56.043,0:06:01.620
around the Earth might, depending on what[br]orbit is chosen, sometimes go through
0:06:01.620,0:06:05.647
these belts of highly intense radiation.[br]That, of course, then needs to be taken
0:06:05.647,0:06:11.552
into account when designing electronics[br]for such a satellite. And if Earth itself
0:06:11.552,0:06:17.191
is not able to give you enough radiation,[br]you may think of the very famous Juno
0:06:17.191,0:06:22.874
Jupiter mission that has become famous[br]about a year ago. They actually in the
0:06:22.874,0:06:28.288
environment of Jupiter they anticipated so[br]much radiation that they actually decided
0:06:28.288,0:06:33.408
to put all the electronics of the[br]satellite inside a one centimeter thick
0:06:33.408,0:06:39.831
cube of titanium, which is famously known[br]as the Juno Radiation Vault. But not only
0:06:39.831,0:06:43.870
space offers radiation environments.[br]Another form of radiation you probably all
0:06:43.870,0:06:48.292
recognize this when I show you this[br]picture, which is an X-ray image of a
0:06:48.292,0:06:54.936
hand. And X-ray is also considered a form[br]of radiation. And while, of course, the
0:06:54.936,0:07:01.320
doses or amounts of radiation any patient[br]is exposed to while doing diagnosis or
0:07:01.320,0:07:05.801
treatment of some disease, that might not[br]be the full story when it comes to medical
0:07:05.801,0:07:10.220
applications. So this is a medical[br]particle accelerator which is used for
0:07:10.220,0:07:15.288
cancer treatment. And in these sorts of[br]accelerators, typically carbon ions or
0:07:15.288,0:07:20.389
protons are accelerated and then focused[br]and used to treat and selectively destroy
0:07:20.389,0:07:25.302
cancer cells in the body. And this comes[br]already relatively close to the
0:07:25.302,0:07:29.695
environment we are working in and working[br]for. So Szymon and I are working, for
0:07:29.695,0:07:36.616
example, on electronics, for the CMS[br]detector inside the LHC or which we build
0:07:36.616,0:07:43.906
dedicated, radiation tolerant, integrated[br]circuits which have to withstand very,
0:07:43.906,0:07:49.373
very large amounts and doses of short[br]lived radiation in order to function
0:07:49.373,0:07:54.414
correctly. And if we didn't specifically[br]design electronics for that, basically the
0:07:54.414,0:08:01.893
whole system would never be able to work.[br]To illustrate a bit how you can imagine
0:08:01.893,0:08:06.062
the scale of this environment: This is a[br]single plot of a collision event that was
0:08:06.062,0:08:11.161
recorded in the ATLAS experiment. And each[br]of those tiny little traces you can make
0:08:11.161,0:08:15.997
out in this diagram is actually either one[br]or multiple secondary particles that were
0:08:15.997,0:08:22.166
created in the initial collision of two[br]proton bunches inside the experiment. And
0:08:22.166,0:08:27.501
in each of those, of course, races around[br]the detector electronics, which make these
0:08:27.501,0:08:32.817
traces visible. Itself, then decaying into[br]multiple other secondary particles which
0:08:32.817,0:08:37.856
all go through our electronics. And if[br]that doesn't sound, let's say, bad enough
0:08:37.856,0:08:42.576
for digital electronics, these collisions[br]happen about 40 million times a second. Of
0:08:42.576,0:08:47.608
course, multiplying the number of events[br]or problems they can cause in our
0:08:47.608,0:08:54.608
circuits. So we now want to introduce all[br]the things that can happen, the different
0:08:54.608,0:08:59.570
radiation effects. But first, probably we[br]take a step back and look at what we mean
0:08:59.570,0:09:05.805
when we say digital electronics or digital[br]logic, which we want to focus on today. So
0:09:05.805,0:09:11.058
from your university lectures or your[br]reading, you probably know the first class
0:09:11.058,0:09:14.577
of digital logic, which is the[br]combinatorial logic. So this is typically
0:09:14.577,0:09:19.222
logic that just does a simple linear[br]relation of the inputs of a circuit and
0:09:19.222,0:09:23.956
produces an output as exemplified with[br]these AND and OR, NAND, XOR gates that you
0:09:23.956,0:09:28.829
see here. But if you want to build - I[br]mean even though we use those everywhere
0:09:28.829,0:09:32.775
in our circuits - you probably also want[br]to store state in a more complex circuit,
0:09:32.775,0:09:37.857
for example, in the registers of your CPU[br]they store some sort of internal
0:09:37.857,0:09:41.736
information. And for that we use the other[br]class of logic, which is called the
0:09:41.736,0:09:44.726
sequential logic. So this is typically[br]clocked with some system clock frequency
0:09:44.726,0:09:50.883
and it changes its output with relation to[br]the inputs whenever this clock signal changes.
0:09:50.883,0:09:54.263
And now if we look at how all[br]these different logic functionalities are
0:09:54.263,0:09:58.292
implemented. So typically nowadays for[br]that you may know that we use CMOS
0:09:58.292,0:10:02.340
technologies and basically represent all[br]this logic functionality as digital gates
0:10:02.340,0:10:10.558
using small P-MOS and N-MOS MOSFET[br]transistors in CMOS technologies. And if
0:10:10.558,0:10:16.408
we kind of try to build a model for more[br]complex digital circuits, we typically use
0:10:16.408,0:10:21.814
something we call the finite state machine[br]model, in which we use a model that
0:10:21.814,0:10:25.822
consists of a combinatorial and a[br]sequential part. And you can see that the
0:10:25.822,0:10:31.031
output of the circuit depends both on the[br]internal state inside the register as well
0:10:31.031,0:10:35.331
as also the input to the combinatorial[br]logic. And accordingly, also the state
0:10:35.331,0:10:40.924
that is internal is always changed by the[br]inputs as well as the current state. So
0:10:40.924,0:10:44.604
this is kind of the simple model for more[br]complex systems that can be used to model
0:10:44.604,0:10:50.214
different effects. Um, now let's try to[br]actually look at what the radiation can do
0:10:50.214,0:10:53.948
to transistors. And for that we are going[br]to have a quick recap at what the
0:10:53.948,0:10:57.895
transistor actually is and how it looks[br]like. As you may perhaps know is that in
0:10:57.895,0:11:03.736
CMOS technologies, transistors are built[br]on wafers of high purity silicon. So this
0:11:03.736,0:11:09.074
is a crystalline, very regularly organized[br]lattice of silicon atoms. And what we do
0:11:09.074,0:11:14.092
to form a transistor on such a wafer is[br]that we add dopants. So in order to form
0:11:14.092,0:11:19.629
diffusion regions, which later will become[br]the source and drain of our transistors.
0:11:19.629,0:11:24.474
And then on top of that we grow a layer of[br]insulating oxide. And on top of that we
0:11:24.474,0:11:28.713
put polysilicon, which forms the gate[br]terminal of the transistor. And in the end
0:11:28.713,0:11:32.813
we end up with an equivalent circuit a bit[br]like that. And now to put things back into
0:11:32.813,0:11:37.670
perspective - you may also note that the[br]dimension of these structures are very
0:11:37.670,0:11:42.543
tiny. So we talk about tens of nanometers[br]for some of the dimensions I've outlined
0:11:42.543,0:11:47.958
here. And as the technologies shrink,[br]these become smaller and smaller and
0:11:47.958,0:11:52.284
therefore you'll probably also realize or[br]are able to appreciate the small amount of
0:11:52.284,0:11:56.560
energy that are used to store information[br]inside these digital circuits, which makes
0:11:56.560,0:12:02.390
them perhaps more sensitive to radiation.[br]So let's take a look. What different types
0:12:02.390,0:12:08.385
of radiation effects exist? We typically[br]in this case, differentiate them into two
0:12:08.385,0:12:13.268
main classes of events. The first one[br]would be the cumulative effects, which are
0:12:13.268,0:12:17.362
effects that, as the name implies,[br]accumulate over time. So as the circuit is
0:12:17.362,0:12:22.127
placed inside some radiation environment,[br]over time it accumulates more and more
0:12:22.127,0:12:26.969
dose and therefore worsens its performance[br]or changes how it operates. And on the
0:12:26.969,0:12:30.549
other side, we have the Single Event[br]Effects, which are always events that
0:12:30.549,0:12:35.075
happen at some instantaneous point in[br]time, and then suddenly, without being
0:12:35.075,0:12:39.316
predictable, change how the circuit[br]operates or how it functions or if it
0:12:39.316,0:12:43.931
works in the first place or not. So I'm[br]going to first go into the class of
0:12:43.931,0:12:47.685
cumulative effects and then later on,[br]Szymon will go into the other class of the
0:12:47.685,0:12:53.173
Single Event Effects. So in terms of these[br]accumulating effects, we basically have
0:12:53.173,0:12:57.580
two main subclasses: The first one being[br]ionization or TID effects, for Total
0:12:57.580,0:13:02.033
Ionizing Dose - and the second one being[br]displacement damages. So displacement
0:13:02.033,0:13:07.137
damages do exactly what they sound like.[br]It is all the effects that happen when an
0:13:07.137,0:13:11.249
atom in the silicon lattice is actually[br]displaced, so removed from its lattice
0:13:11.249,0:13:15.266
position and actually changes the[br]structure of the semiconductor. But
0:13:15.266,0:13:19.548
luckily, these effects don't have a big[br]impact in the CMOS digital circuits that
0:13:19.548,0:13:23.164
we are looking at today. So we will[br]disregard them for the moment and we'll be
0:13:23.164,0:13:28.120
looking more at the ionization damage, or[br]TID. So ionization - as a quick recap - is
0:13:28.120,0:13:35.901
whenever electrons are removed or added to[br]an atom, effectively transforming it into
0:13:35.901,0:13:42.747
an ion. And these effects are especially[br]critical for the circuits we are building
0:13:42.747,0:13:46.316
because of what they do is that they[br]change the behavior of the transistors.
0:13:46.316,0:13:50.233
And without looking too much into the[br]semiconductor details, I just want to show
0:13:50.233,0:13:55.730
their typical effect that we are concerned[br]about in this very simple circuit here. So
0:13:55.730,0:14:00.348
this is just an inverter circuit[br]consisting of two transistors here and
0:14:00.348,0:14:05.812
there. And what the circuit does in normal[br]operation is it just takes an input signal
0:14:05.812,0:14:10.062
and inverts and basically gives the[br]inverted signal at the output. And as the
0:14:10.062,0:14:15.549
transistors are irradiated and accumulate[br]dose, you can see that the edges of the
0:14:15.549,0:14:20.391
output signal get slower. So the[br]transistor takes longer to turn on and off.
0:14:20.391,0:14:24.574
And what that does in turn is that it[br]limits the maximum operation frequency of
0:14:24.574,0:14:28.795
your circuit. And of course, that is not[br]something you want to do. You want your
0:14:28.795,0:14:31.723
circuit to operate at some frequency in[br]your final system. And if the maximum
0:14:31.723,0:14:35.600
frequency it can work at degrades over[br]time, at some point it will fail as the
0:14:35.600,0:14:39.276
maximum frequency is just too low. So[br]let's have a look at what we can do to
0:14:39.276,0:14:44.395
mitigate these effects. The first one and[br]I already mentioned it when talking about
0:14:44.395,0:14:48.488
the Juno mission, is shielding. So if you[br]can actually put a box around your
0:14:48.488,0:14:52.586
electronics and shield any radiation from[br]actually hitting your transistors, it is
0:14:52.586,0:14:56.900
obvious that they will last longer and[br]will suffer less from the radiation damage
0:14:56.900,0:15:01.241
that it would otherwise do. So this[br]approach is very often used in space
0:15:01.241,0:15:04.988
applications like on satellites, but it's[br]not very useful if you are actually trying
0:15:04.988,0:15:08.209
to measure the radiation with your[br]circuits as we do, for example, in the
0:15:08.209,0:15:12.415
particle accelerators we build integrated[br]circuits for. So there first of all, we
0:15:12.415,0:15:16.344
want to measure the radiation so we cannot[br]shield our detectors from the radiation.
0:15:16.344,0:15:20.592
And also, we don't want to influence the[br]tracks of these secondary collision
0:15:20.592,0:15:24.162
products with any shielding material that[br]would be in the way. So this is not very
0:15:24.162,0:15:28.315
useful in a particle accelerator[br]environment, let's say. So we have to
0:15:28.315,0:15:33.880
resort to different methods. So as I said,[br]we do have to design our own integrated
0:15:33.880,0:15:38.826
circuits in the first place. So we have[br]some freedom in what we call transistor
0:15:38.826,0:15:45.236
level design. So we can actually alter the[br]dimensions of the transistors. We can make
0:15:45.236,0:15:50.055
them larger to withstand larger doses of[br]radiation and we can use special
0:15:50.055,0:15:54.354
techniques in terms of layout that we can[br]experimentally verifiy to be more
0:15:54.354,0:15:59.266
resistant to radiation effects. And as a[br]third measure, which is probably the most
0:15:59.266,0:16:03.491
important one for us, is what we call[br]modeling. So we actually are able to
0:16:03.491,0:16:08.358
characterize all the effects that[br]radiation will have on a transistor. And
0:16:08.358,0:16:12.442
if we can do that, if we will know: 'If I[br]put it into a radiation environment for a
0:16:12.442,0:16:17.000
year, how much slower will it become?'[br]Then it is of course easy to say: 'OK, I
0:16:17.000,0:16:20.648
can just over-design my circuit and make[br]it a bit more simple, maybe have less functionality,
0:16:20.648,0:16:24.464
but be able to operate at a[br]higher frequency and therefore withstand
0:16:24.464,0:16:30.240
the radiation effects for a longer time[br]while still working sufficiently well at
0:16:30.240,0:16:35.118
the end of its expected lifetime.' So[br]that's more or less what we can do about
0:16:35.118,0:16:38.254
these effects. And I'll hand over to[br]Szymon for the second class.
0:16:38.254,0:16:42.655
Szymon: Contrary to the cumulative effects[br]presented by Stefan, the other group are
0:16:42.655,0:16:46.424
Single Event Effects which are caused by[br]high energy deposits, which are caused by
0:16:46.424,0:16:52.143
a single particle or shower of particles.[br]And they can happen at any time, even
0:16:52.143,0:16:57.089
seconds after irradiation is started. It[br]means that if your circuit is vulnerable
0:16:57.089,0:17:01.667
to this class of effects, it can fail[br]immediately after radiation is present.
0:17:01.667,0:17:06.313
And here we also classify these effects[br]into several groups. The first are hard,
0:17:06.313,0:17:11.450
or permanent, errors, which as the name[br]indicates can permanently destroy your
0:17:11.450,0:17:20.260
circuit. And this type of errors are[br]typically critical for power devices where
0:17:20.260,0:17:24.340
you have large power densities and they[br]are not so much of a problem for digital
0:17:24.340,0:17:30.100
circuits. In the other class of effects[br]are soft errors. And here we distinguish
0:17:30.100,0:17:34.100
transient, or Single Event Transient[br]errors, which are spurious signals
0:17:34.100,0:17:41.220
propagating in your circuit as a result of[br]a gate being hit by a particle and they
0:17:41.220,0:17:45.700
are especially problematic for analog[br]circuits or asynchronous digital circuits,
0:17:45.700,0:17:51.460
but under some circumstances they can be[br]also problematic for synchronous systems.
0:17:51.460,0:17:56.420
And the other class of problems are[br]static, or Single Event Upset problems,
0:17:56.420,0:18:01.220
which basically means that your memory[br]element like a register gets flipped. And
0:18:01.220,0:18:05.060
then of course, if your system is not[br]designed to handle this type of errors
0:18:05.060,0:18:09.620
properly, it can lead to a failure. So in[br]the following part of the presentation
0:18:09.620,0:18:15.300
we'll focus mostly on soft errors. So[br]let's try to understand what is the origin
0:18:15.300,0:18:20.820
of this type of problem. So as Stefan[br]mentioned, the typical transistor is built
0:18:20.820,0:18:25.230
out of diffusions, gate and channel. So[br]here you can see one diffusion. Let's
0:18:25.230,0:18:29.230
assume that it is a drain diffusion. And[br]then when a particle goes through and
0:18:29.230,0:18:36.700
deposits charge, it creates free electron and[br]hole pairs, which then in the presence of
0:18:36.700,0:18:43.320
electric fields, they get collected by[br]means of drift, which results in a large
0:18:43.320,0:18:46.930
current spike, which is very short. And[br]then the rest of the charge could be
0:18:46.930,0:18:50.940
collected by diffusion which is a much[br]slower process and therefore also the
0:18:50.940,0:18:56.390
amplitude of the event is much, much[br]smaller. So let's try to understand what
0:18:56.390,0:19:01.230
could happen in a typical memory cell. So[br]on this schematic, you can see the
0:19:01.230,0:19:05.740
simplest memory cell, which is composed of[br]two back-to-back inverters. And let's
0:19:05.740,0:19:12.810
assume that node A is at high and node B[br]is at low potential initially. And then we
0:19:12.810,0:19:17.210
have a particle hitting the drain of[br]transistor M1 which creates a short
0:19:17.210,0:19:22.590
circuit current between drain and ground,[br]bringing the drain of transistor M1 to low
0:19:22.590,0:19:29.871
potential, which also acts on the gates of[br]second inverter, temporarily changing its
0:19:29.871,0:19:38.734
state from low to high, which reinforces[br]the wrong state in the first inverter. And
0:19:38.734,0:19:45.340
at this time the error is locked in your[br]memory cell and you basically lost your
0:19:45.340,0:19:49.652
information. So you may be asking[br]yourself: 'How much charge is needed
0:19:49.652,0:19:54.281
really to flip a state of a memory cell?'.[br]And you can get this number from either
0:19:54.281,0:19:59.952
simulations or from measurements. So let's[br]assume that what we could do, we could try
0:19:59.952,0:20:04.605
to inject some current into the sensitive[br]node, for example, drain of transistor M1.
0:20:04.605,0:20:08.790
And here what I will show is that on the[br]top plot you will have current as a function
0:20:08.790,0:20:13.484
of time. On the second plot you will have[br]output voltage. So voltage at node B as a
0:20:13.484,0:20:19.121
function of time and at the lowest plot you[br]will see a probability of having a bit
0:20:19.121,0:20:23.097
flip. So if you inject very little[br]current, of course nothing changes at the
0:20:23.097,0:20:27.670
output, but once you start increasing the[br]amount of current you are injecting, you
0:20:27.670,0:20:33.306
see that something appears at the output[br]and at some point the output will toggle,
0:20:33.306,0:20:39.747
so it will switch to the other state. And[br]at this point, if you really calculate
0:20:39.747,0:20:46.369
what is the area under the current curve[br]you can find what is the critical charge
0:20:46.369,0:20:53.499
needed to flip the memory cell. And if you[br]go further, if you start injecting even
0:20:53.499,0:21:00.701
more current, you will not see that much[br]difference in the output voltage waveform.
0:21:00.701,0:21:05.112
It could become only slightly faster. And[br]at this point, you also can notice that
0:21:05.112,0:21:09.528
the probability now jumped to one, which[br]means that any time you inject so much
0:21:09.528,0:21:17.431
current there is a fault in your circuit.[br]So for now, we just found what is the
0:21:17.431,0:21:23.414
probability of having a bit-flip from 0 to[br]1 in node B. Of course we should also
0:21:23.414,0:21:27.904
calculate the same for the other[br]direction, so from 1 to zero. And usually
0:21:27.904,0:21:32.377
it is slightly different. And then of[br]course we should inject in all the other
0:21:32.377,0:21:37.817
nodes, for example node B and also should[br]study all possible transitions. And then
0:21:37.817,0:21:43.492
at the end, if you calculate the[br]superposition of these effects and you
0:21:43.492,0:21:48.655
multiply them by the active area of each[br]node, you will end up with what we call
0:21:48.655,0:21:52.420
the cross section, which has a dimension[br]of centimeters squared, which will tell
0:21:52.420,0:21:57.357
you how sensitive your circuit is to this[br]type of effects. And then knowing the
0:21:57.357,0:22:03.761
radiation profile of your environment, you[br]can calculate the expected upset rate in
0:22:03.761,0:22:10.105
the final application. So now, having[br]covered the basic of the single event
0:22:10.105,0:22:16.517
effects, let's try to check how we can[br]mitigate them. And here also technology
0:22:16.517,0:22:20.875
plays a significant role. So of course,[br]newer technologies offer us much smaller
0:22:20.875,0:22:26.692
devices. And together with that, what[br]follows is that usually supply voltages
0:22:26.692,0:22:31.047
are getting smaller and smaller as well as[br]the node capacitance, which means that for
0:22:31.047,0:22:35.565
our Single Event Upsets it is very bad[br]because the critical charge which is
0:22:35.565,0:22:40.207
required to flip our bit is getting less[br]and less. But at the end, at the same
0:22:40.207,0:22:44.135
time, physical dimensions of our[br]transistors are getting smaller, which
0:22:44.135,0:22:48.097
means that the cross section for them[br]being hit is also getting smaller. So
0:22:48.097,0:22:52.495
overall, the effects really depend on the[br]circuit topology and the radiation
0:22:52.495,0:22:59.181
environment. So another protection method[br]could be introduced on the cell level. And
0:22:59.181,0:23:04.914
here we could imagine increasing the[br]critical charge. And that could be done in
0:23:04.914,0:23:10.819
the easiest way by just increasing the[br]node capacitance by, for example, putting
0:23:10.819,0:23:16.096
larger transistors. But of course, this[br]also increases the collection electrode,
0:23:16.096,0:23:22.657
which is not nice. And another way could[br]be just increase the capacitance by adding
0:23:22.657,0:23:28.336
some extra metal capacitance, but it, of[br]course, slows down the circuit. Another
0:23:28.336,0:23:33.615
approach could be to try to store the[br]information on more than two nodes. So I
0:23:33.615,0:23:38.377
showed you that on a simple SRAM cell we[br]store information only on two nodes, so
0:23:38.377,0:23:43.102
you could try to come up with some other[br]cells, for example, like that one in which
0:23:43.102,0:23:47.406
the information you stored on four nodes.[br]So you can see that the architecture is
0:23:47.406,0:23:53.800
very similar to the basic SRAM cell. But[br]you should be careful always to very
0:23:53.800,0:23:59.000
carefully simulate your design, because if[br]we analyze this circuit, you will quickly
0:23:59.000,0:24:02.936
realize that this circuit, even though the[br]information is stored in four different
0:24:02.936,0:24:09.867
nodes, the same type of loop exists as in[br]the basic circuit. Meaning that at the end
0:24:09.867,0:24:15.227
the circuit offers basically no hardening[br]with respect to the previous cell. So
0:24:15.227,0:24:21.074
actually we can do it better. So here you[br]can see a typical dual interlocked cell.
0:24:21.074,0:24:26.445
So the amount of transistors is exactly[br]the same as in the previous example, but
0:24:26.445,0:24:30.819
now they are interconnected slightly[br]differently. And here you can see that
0:24:30.819,0:24:36.262
this cell has also two stable configurations. [br]But this time data can propagate, the low
0:24:36.262,0:24:40.587
level from a given node can propagate[br]only to the left hand side, while high
0:24:40.587,0:24:47.872
level can propagate to the right hand[br]side. And each stage being inverting means
0:24:47.872,0:24:54.918
that the fault can not propagate for more[br]than one node. Of course, this cell has
0:24:54.918,0:25:00.379
some drawbacks: It consumes more area than[br]a simple SRAM cell and also write access
0:25:00.379,0:25:04.240
requires accessing at least two nodes at[br]the same time to really change the state
0:25:04.240,0:25:09.801
of the cell. And so you may ask yourself,[br]how effective is this cell? So here I will
0:25:09.801,0:25:13.709
show you a cross section plot. So it is[br]the probability of having an error as a
0:25:13.709,0:25:18.883
function of injected energy. And as a[br]reference, you can see a pink curve on the
0:25:18.883,0:25:25.650
top, which is for a normal, not protected[br]cell. And on the green you can see the
0:25:25.650,0:25:31.399
cross section for the error in the DICE[br]cell. So as you can see, it is one order
0:25:31.399,0:25:36.934
of magnitude better than the normal cell.[br]But still, the cross section is far from
0:25:36.934,0:25:41.426
being negligible, So, the problem was[br]identified: So it was identified that the
0:25:41.426,0:25:45.679
problem was caused by the fact that some[br]sensitive nodes were very close together
0:25:45.679,0:25:50.807
on the layout and therefore they could be[br]upset by the same particle. Because as we
0:25:50.807,0:25:54.721
mentioned, that single devices, they are very[br]small. We are talking about dimensions
0:25:54.721,0:25:59.675
below a micron. So after realizing that,[br]we designed another cell in which we
0:25:59.675,0:26:04.799
separated more sensitive nodes and we[br]ended up with the blue curve, and as you
0:26:04.799,0:26:08.907
can see the cross section was reduced by[br]two more orders of magnitude and the
0:26:08.907,0:26:14.205
threshold was increased significantly. So[br]if you don't want to redesign your
0:26:14.205,0:26:18.771
standard cells, you could also apply some[br]mitigation techniques on block level. So
0:26:18.771,0:26:24.717
here we can use some encoding to encode[br]our state better. And as an example, I
0:26:24.717,0:26:31.540
will show you a typical Hamming code. So[br]to protect four bits, we have to add three
0:26:31.540,0:26:38.052
additional party bits which are calculated[br]according to this formula. And then once
0:26:38.052,0:26:44.133
you calculate the parity bits, you can use[br]those to check the state integrity of your
0:26:44.133,0:26:50.360
internal state. And if any of their parity[br]bits is not equal to zero, then the bits
0:26:50.360,0:26:55.375
instantaneously become syndromes,[br]indicating where the error happened. And
0:26:55.375,0:26:59.916
you can use these information to correct[br]the error. Of course, in this case, the
0:26:59.916,0:27:06.533
efficiency is not really nice because we[br]need three additional bits to protect only
0:27:06.533,0:27:11.828
four bits of information. But as the state[br]length increases the protection also is
0:27:11.828,0:27:18.855
more efficient. Another approach would be[br]to do even less. Meaning that instead of
0:27:18.855,0:27:23.970
changing anything you need in your design,[br]you can just triplicate your design or
0:27:23.970,0:27:30.190
multiply it many times and just vote,[br]which state is correct? So this concept is
0:27:30.190,0:27:35.046
called tripple modular redudancy and it is[br]based around a voter cell. So it is a
0:27:35.046,0:27:40.210
cell which has odd number of[br]inputs and output is always equal to
0:27:40.210,0:27:45.040
majority of its input. And as I mentioned[br]that the idea is that you have, for
0:27:45.040,0:27:49.292
example, three circuits: A, B and C, and[br]during normal operation, when they are
0:27:49.292,0:27:54.471
identical, the output is also the same.[br]However, when there is a problem, for
0:27:54.471,0:28:00.957
example, in logic, part B, the output[br]is affected. So this problem is
0:28:00.957,0:28:05.509
effectively masked by the voter cell[br]and it is not visible from outside of the
0:28:05.509,0:28:10.383
circuit. But you have to be careful not to[br]take this picture as a as a design
0:28:10.383,0:28:15.501
template. So let's try to analyze what[br]would happen with a state machine
0:28:15.501,0:28:20.329
similar to what Stephan introduced. If you[br]were to just use this concept. So here you
0:28:20.329,0:28:24.859
can see three state machines and[br]a voter on the output. And as we can see,
0:28:24.859,0:28:29.484
if you have an upside in, for example, the[br]state register A, then the state is
0:28:29.484,0:28:36.676
broken. But still the output of the[br]circuit, which is indicated by letter s is
0:28:36.676,0:28:42.355
correct because the B and C registers are[br]still fine. But what happens if some time
0:28:42.355,0:28:49.283
later we have an upset in memory element B[br]or C? Then of course the state
0:28:49.283,0:28:56.028
of our system is broken and we can not[br]recover it. So you can ask yourself what
0:28:56.028,0:29:02.204
can we do better in order to avoid this[br]situation? So that just to be sure. Please
0:29:02.204,0:29:06.654
do not use this technique to protect your[br]circuits. So the easiest mitigation could
0:29:06.654,0:29:13.201
be to use as an input to your logic to use[br]the output of the voter cell itself.
0:29:13.201,0:29:18.491
What it offers us is that now whenever you[br]have an upset in one of the memory
0:29:18.491,0:29:22.933
elements for the next computation, for the[br]next stage, we always use the voter
0:29:22.933,0:29:27.631
output, which ensures that the signal[br]will be removed one clock cycle later. So
0:29:27.631,0:29:32.726
you will have another hit sometime later,[br]basically, it will not affect our state.
0:29:32.726,0:29:39.765
Until now we consider only upsets in our[br]registers but what happens if we have
0:29:39.765,0:29:45.885
charge in our voter? So you see that[br]if there is no state change, basically the
0:29:45.885,0:29:50.981
transient in the voter doesn't impact[br]our system. But if you are really unlucky
0:29:50.981,0:29:55.777
and the transient happens when the clock[br]transition happens, so when whenever we
0:29:55.777,0:30:01.182
enlarge the data, we can corrupt the state[br]in three registers at the same time, which
0:30:01.182,0:30:05.605
is less than ideal. So to overcome this[br]limitation, you can consider skewing our
0:30:05.605,0:30:11.101
clocks by some time, which is larger than[br]the maximum charge in time. And now,
0:30:11.101,0:30:18.050
because with each register samples the[br]output of the voter a slightly different
0:30:18.050,0:30:23.449
time, we can corrupt only one flip-flop[br]at the time. So of course, if you are
0:30:23.449,0:30:28.780
unlucky, we can have problematic[br]situations in which one register is
0:30:28.780,0:30:33.646
already in your state. The other register[br]is still in the old state. And then it
0:30:33.646,0:30:39.728
can lead to undetermenistic result. So it[br]is better, but still not ideal. So as a
0:30:39.728,0:30:46.578
general theme, you have seen that we were[br]adding and adding more resources so you
0:30:46.578,0:30:50.418
can ask yourself what would happen if we[br]tripplicate everything. So in this case,
0:30:50.418,0:30:54.262
we tripplicated registers, we[br]tripplicate our logic and our voters. And
0:30:54.262,0:30:59.138
now you can see that whenever we have an[br]upset in our register, it can only affect
0:30:59.138,0:31:04.513
one register at the time and the error[br]will be removed from the system one clock
0:31:04.513,0:31:08.912
cycle later. Also, if we have an upset[br]in the voter or in their logic it can be
0:31:08.912,0:31:13.372
larged only to one register, which means[br]that in principle we create that system
0:31:13.372,0:31:17.885
which is really robust. Unfortunately,[br]nothing is for free. So here I compare a
0:31:17.885,0:31:22.823
different tripplication environments and[br]as you can see that the more protection
0:31:22.823,0:31:26.326
you want to have, the more you have to pay[br]in terms of resources being power in the
0:31:26.326,0:31:31.373
area. And also usual, you pay small[br]penalty in terms of maximum operational
0:31:31.373,0:31:37.597
speed. So which flavor of protection you[br]use depends really on
0:31:37.597,0:31:42.420
application. So for most sensitive[br]circuits, you probably you want to use
0:31:42.420,0:31:48.493
full TMR and you may leave some other[br]bits of logic unprotected. So another, if
0:31:48.493,0:31:54.749
your system is not mission critical and[br]you can tolerate some downtime, you can
0:31:54.749,0:32:00.294
consider scrubbing, which means periodically [br]checking the state of your system and refreshing it
0:32:00.294,0:32:05.120
if necessary if an error is detected using[br]some parity bits or copy of the data in
0:32:05.120,0:32:10.394
a safe space. Or you can have a[br]watchdog which will find out that
0:32:10.394,0:32:13.951
something went wrong and it will just[br]reinitialize the whole system. So now,
0:32:13.951,0:32:20.011
having covered the basics of all the effects[br]we will have to face, we would like
0:32:20.011,0:32:24.293
to show you the basic flow which we follow[br]during designing our radiation hardened
0:32:24.293,0:32:29.746
circuits. So of course we always start[br]with specifications. So we try to
0:32:29.746,0:32:34.228
understand our radiation environment in[br]which the circuit is meant to operate. So
0:32:34.228,0:32:38.750
we come up with some specifications for[br]total dose which could be accumulated and
0:32:38.750,0:32:45.348
for the rate of single event upsets. And[br]at this moment, it is also not very rare
0:32:45.348,0:32:49.705
that we have to decide to move some[br]functionality out of our detector volume,
0:32:49.705,0:32:56.133
outside, where we can use of the sort of[br]commercial equipment to do number
0:32:56.133,0:33:04.820
crunching. But let's assume that we would[br]go with our ASIC. So having the
0:33:04.820,0:33:09.220
specifications, of course we proceed with[br]functional implementation. This we
0:33:09.220,0:33:14.260
typically do with hardware describtion[br]languages, so verilog or VHDL which you may
0:33:14.260,0:33:18.900
know from typical FPGA flow. And of course[br]we write a lot of simulations to
0:33:18.900,0:33:24.205
understand whether we are meeting our[br]functional goals or whether our circuit
0:33:24.205,0:33:30.665
behaves as expected. And then we[br]selectively select some parts of the
0:33:30.665,0:33:36.318
circuits which we want to protect from[br]radiation effects. So, for example, we can
0:33:36.318,0:33:42.290
decide to use triplication or some other[br]methods. So these days we typically use
0:33:42.290,0:33:46.645
triplication as the most straightforward[br]and very effective method. So you can ask
0:33:46.645,0:33:50.750
yourself how do we triplicate the logic?[br]So the simplest could be: Just copy
0:33:50.750,0:33:55.099
and paste the code three times at some[br]postfixes like A, B and C and you are
0:33:55.099,0:34:01.653
done. But of course this solution has some[br]drawbacks. So it is time consuming and it
0:34:01.653,0:34:05.964
is very error prone. So maybe you have[br]noticed that I had a typo there. So of
0:34:05.964,0:34:10.220
course we don't want to do that. So we[br]developed our own tool, which we called
0:34:10.220,0:34:16.924
TMRG, which automatizes the process of[br]triplication and eliminates the two main
0:34:16.924,0:34:22.494
drawbacks, which I just described. So[br]after we have our code triplicated and of
0:34:22.494,0:34:27.075
course, not before rerunning all the[br]simulations to make sure that everything
0:34:27.075,0:34:34.230
went as expected. We then proceed to the[br]synthesis process in which we convert our
0:34:34.230,0:34:41.091
high level hardware description languages[br]to gate level netlists, in which all the functions
0:34:41.091,0:34:46.189
are mapped to gates, which were introduced[br]by Stefan, so both combinatorial and
0:34:46.189,0:34:53.631
sequential. And here we also have to be[br]careful because modern CAD tools have a
0:34:53.631,0:34:59.020
tendency, of course, to optimise the logic[br]as much as possible. And our logic in most
0:34:59.020,0:35:03.810
of the cases is really redundant. So it is[br]very easy; So, it should be removed. So we
0:35:03.810,0:35:08.632
really have to make sure that it is not[br]removed. That's why our tool also provides
0:35:08.632,0:35:13.633
some constraints for the synthesizer to[br]make sure that our design intent is
0:35:13.633,0:35:20.900
clearly and well understood by the tool.[br]And once we have the output netlist, we
0:35:20.900,0:35:26.980
proceed to place and route process where[br]this kind of netlist representation is
0:35:26.980,0:35:32.580
mapped to a layout of what will become[br]soon our digital chip where we placed all
0:35:32.580,0:35:36.624
the cells and we route connections between[br]them and here there is
0:35:36.624,0:35:40.907
another danger which I mentioned already,[br]it's that in modern technologies the cells
0:35:40.907,0:35:45.597
are so small that they could be easily[br]affected by a single particle at the same
0:35:45.597,0:35:51.892
time. So we have to really space out[br]the big cells which are responsible for
0:35:51.892,0:35:56.982
keeping the information about the state to[br]make sure that a single particle cannot
0:35:56.982,0:36:04.980
upset A and B, for example, registered[br]from the same register. And then in the
0:36:04.980,0:36:09.540
last step, of course, we'll have to verify[br]that everything, what we have done, is
0:36:09.540,0:36:13.926
correct. And at this level, we also try to[br]introduce some single event effects in our
0:36:13.926,0:36:19.971
simulations. So we could randomly flip[br]bits in our system. We can also inject
0:36:19.971,0:36:26.094
transients. And typically we used to do[br]that on the netlist level, which works
0:36:26.094,0:36:31.424
very fine. And it is very nice. But the[br]problem with this approach is that we can
0:36:31.424,0:36:37.640
perform these actions very late in the[br]design cycle, which is less than ideal.
0:36:37.640,0:36:43.084
And also that if we find that there is[br]problem in our simulation, typical netlist
0:36:43.084,0:36:48.437
at this level has probably few orders of[br]magnitude more lines than our initial RTL
0:36:48.437,0:36:52.990
code. So to trace back what is the[br]problematic line of code is not so
0:36:52.990,0:36:57.533
straightforward. At this time. So you can[br]ask yourself why not to try to inject
0:36:57.533,0:37:05.458
errors in the RTL design? And the answer[br]was, the answer is that it is not so
0:37:05.458,0:37:10.670
trivially to map the hardware description[br]language's high level constructs to
0:37:10.670,0:37:15.585
what will become combinatorial or[br]sequential logic. So in order to eliminate
0:37:15.585,0:37:20.980
this problem, we also develop another open[br]source tool, which allows us to...
0:37:20.980,0:37:27.860
So we decided to use Yosys open[br]source synthesis tool from clifford, which
0:37:27.860,0:37:31.530
was presented in the Congress several[br]years ago. So we use this tool to make a
0:37:31.530,0:37:35.680
first pass through our RTL code to[br]understand which elements will be mapped
0:37:35.680,0:37:40.678
to sequential and combinatorial. And then[br]having this information, we will use
0:37:40.678,0:37:45.951
cocotb, another python verification[br]framework, which allows us programmatic
0:37:45.951,0:37:51.838
access to these nodes and we can[br]effectively inject the errors in our
0:37:51.838,0:37:56.660
simulations. And I forgot to mention that[br]the TMRG tool is also open source. So if
0:37:56.660,0:38:03.841
you are interested in one of the tools,[br]please feel free to contact us. And of
0:38:03.841,0:38:10.505
course, after our simulation is done, then in[br]the next step we would really tape out. And
0:38:10.505,0:38:14.637
so we submit our chip to manufacturing and[br]hopefully a few months later we receive
0:38:14.637,0:38:18.105
our chip back.[br]Stefan: All right. So after patiently
0:38:18.105,0:38:23.546
waiting then for a couple of months while[br]your chip is in manufacturing and you're
0:38:23.546,0:38:28.245
spending time on preparing a test set up[br]and preparing yourself to actually test if
0:38:28.245,0:38:33.772
your chip works as you expected to. Now,[br]it's probably also a good time to think
0:38:33.772,0:38:38.307
about how to actually validate or test if[br]all the measures that you've taken to
0:38:38.307,0:38:41.389
protect your circuit from radiation[br]effects actually are effective or if they
0:38:41.389,0:38:46.196
are not. And so again, we will split this[br]in two parts. So you will probably want to
0:38:46.196,0:38:50.024
start with testing for the total ionizing[br]dose effects. So for the cumulative effect
0:38:50.024,0:38:54.554
and for that, you typically use x ray[br]radiation relatively similar to the one
0:38:54.554,0:38:59.005
used in medical treatment. So this[br]radiation is relatively low, energetic,
0:38:59.005,0:39:03.344
which has the upside of not producing any[br]single event effects, but you can really
0:39:03.344,0:39:07.462
only accumulate radiation dose and focus[br]on the accumulating effects. And typically
0:39:07.462,0:39:11.600
you would use a machine that looks[br]somewhat like this, a relatively compact
0:39:11.600,0:39:16.840
thing. You can have in your laboratory and[br]you can use that to really accumulate
0:39:16.840,0:39:21.520
large amounts of radiation dose on your[br]circuit. And then you need some sort of
0:39:21.520,0:39:26.641
mechanism to verify or to quantify how[br]much your circuit slows down due to this
0:39:26.641,0:39:31.285
radiation dose. And if you do that, you[br]typically end up with a graphic such as
0:39:31.285,0:39:36.567
this one, where in the x axis you have the[br]radiation dose your circuit was exposed
0:39:36.567,0:39:40.639
to. And on the y axis, you see that the[br]frequency has gone down over time and you
0:39:40.639,0:39:44.536
can use this information to say:[br]"OK, my final application, I expect this
0:39:44.536,0:39:49.324
level of radiation dose. I mean, I can[br]still see that my circuit will work fine
0:39:49.324,0:39:53.565
under some given environmental condition[br]or some operation condition." So this is
0:39:53.565,0:39:58.285
the test for the first class of effects.[br]And the test for the second class of
0:39:58.285,0:40:02.318
effects for the single event effect is a[br]bit more involved. So there what you would
0:40:02.318,0:40:07.157
typically start to do is go for a heavy[br]ion test campaign. So you would go to a
0:40:07.157,0:40:12.760
specialized, relatively rare facility. We[br]have a couple of those in Europe and would
0:40:12.760,0:40:16.532
look perhaps somewhat like this. So it's a[br]small particle accelerator somewhere.
0:40:16.532,0:40:20.794
They typically have[br]different types of heavy ions at their
0:40:20.794,0:40:26.311
disposal that they can accelerate and then[br]shoot at your chip that you can place in a
0:40:26.311,0:40:32.390
vacuum chamber and these ions can deposit[br]very well known amounts of energy in your
0:40:32.390,0:40:36.818
circuit and you can use that information[br]to characterize your circuit. The downside
0:40:36.818,0:40:41.207
is a bit that these facilities tend to be[br]relatively expensive to access and also a
0:40:41.207,0:40:45.161
bit hard to access. So typically you need[br]to book them a lot of time in advance and
0:40:45.161,0:40:50.351
that's sometimes not very easy. But what[br]it offers you, you can use different types
0:40:50.351,0:40:55.244
of ions with different energies. You can[br]really make a very well-defined
0:40:55.244,0:41:00.190
sensitivity curve similar to the one that[br]Szymon has described. You can get from
0:41:00.190,0:41:04.052
simulations and really characterize your[br]circuit for how often, any single event
0:41:04.052,0:41:09.026
effects will appear in the final[br]application if there is any remaining
0:41:09.026,0:41:12.827
effects left. If you have left something[br]unprotected. The problem here is that
0:41:12.827,0:41:18.190
these particle accelerators typically just[br]bombard your circuit with like thousands
0:41:18.190,0:41:23.310
of particles per second and they hit[br]basically the whole area in a random
0:41:23.310,0:41:26.940
fashion. So you don't really have a way of[br]steering those or measuring the position
0:41:26.940,0:41:30.964
of these particles. So typically you are a[br]bit in the dark and really have to really
0:41:30.964,0:41:34.884
carefully know the behavior of your[br]circuit and all the quirks it has even
0:41:34.884,0:41:39.481
without the radiation to instantly notice[br]when something has gone wrong. And
0:41:39.481,0:41:44.088
this is typically not very easy[br]and you can kind of compare it with having
0:41:44.088,0:41:47.372
some weird crash somewhere in your[br]software stack and then having to have
0:41:47.372,0:41:51.800
first take a look and see what actually[br]has happened. Typically
0:41:51.800,0:41:57.058
you find something that has not been[br]properly protected and you see some weird
0:41:57.058,0:42:01.847
effect on your circuit and then you try to[br]get a better idea of where that problem
0:42:01.847,0:42:06.256
actually is located. And the answer for[br]these types of problems involving position
0:42:06.256,0:42:11.381
is, of course, always lasers. So we have[br]two types of laser experiments available
0:42:11.381,0:42:15.796
that can be used to more selectively probe[br]your circuit for these problems. The first
0:42:15.796,0:42:19.691
one being the single photon absorption[br]laser. And it sounds this relatively
0:42:19.691,0:42:24.709
simple in terms of setup. You just use a[br]single laser beam that shoots straight up
0:42:24.709,0:42:29.884
at your circuit from the back. And while[br]it does that, it deposits energy all along
0:42:29.884,0:42:34.180
the silicon and also in the diffusions of[br]your transistors and is therefore also
0:42:34.180,0:42:38.388
able to inject energy there, potentially[br]upsetting a bit of memory or exposing
0:42:38.388,0:42:43.053
whatever other single event effects you[br]have. And of course, you can steer this
0:42:43.053,0:42:46.880
beam across the surface of your chip or[br]whatever circuit you are testing and then
0:42:46.880,0:42:51.330
find the sensitive location. The problem[br]here is that the amount of energy that is
0:42:51.330,0:42:55.238
deposited is really large due to the fact[br]that it has to go through the whole
0:42:55.238,0:42:59.053
silicon until it reaches the transistor.[br]And therefore it's mostly used to find
0:42:59.053,0:43:02.582
these destructive effects that really[br]break something in your circuit. The more
0:43:02.582,0:43:07.972
clever and somehow beautiful experiment is[br]the two photon absorption laser experiment
0:43:07.972,0:43:12.624
in which you use two laser beams of a[br]different wavelength. And these actually
0:43:12.624,0:43:18.366
do not have enough energy to cause any[br]effect in your silicon. If only one of the
0:43:18.366,0:43:22.174
laser beams is present, but only in the[br]small location where the two beams
0:43:22.174,0:43:26.874
intersect, the energy is actually large[br]enough to produce the effect. And this
0:43:26.874,0:43:30.664
allows you to very selectively and only on[br]a very small volume induce charge and
0:43:30.664,0:43:37.818
cause an effect in your circuit. And when[br]you do that now, you can systematically
0:43:37.818,0:43:41.964
scan both the X and Y directions across[br]your chip and also the Z direction and can
0:43:41.964,0:43:46.366
really measure the volume of sensitive[br]area. And this is what you would typically
0:43:46.366,0:43:50.804
get of such an experiment. So in black and[br]white in the back, you'll see an infrared
0:43:50.804,0:43:54.621
image of your chip where you can really[br]make out the individual, say structural
0:43:54.621,0:43:59.406
components. And then overlaid in blue, you[br]can basically highlight all the sensitive
0:43:59.406,0:44:03.897
points that made you measure something you[br]didn't expect, some weird bit flip in a
0:44:03.897,0:44:08.338
register or something. And you can really[br]then go to your layout software and find
0:44:08.338,0:44:13.644
what is the the register or the gate in[br]your netlist that is responsible for
0:44:13.644,0:44:17.465
this. And then it's more like operating a[br]debugger in a software environment.
0:44:17.465,0:44:22.889
Tracing back from there what the line of[br]code responsible for this bug is. And
0:44:22.889,0:44:31.260
to close out, it is always best to learn[br]from mistakes. And we offer our mistakes
0:44:31.260,0:44:35.901
as a guideline for if you ever feel[br]yourself the need to design radiation
0:44:35.901,0:44:40.695
tolerant circuits. So we want to present[br]two or three small issues we had and
0:44:40.695,0:44:45.300
circuits where we were convinced it should[br]have been working fine. So the first one
0:44:45.300,0:44:50.018
this you will probably recognize is this[br]full triple modular redundancy scheme that
0:44:50.018,0:44:55.279
Szymon has presented. So we made sure to[br]triplicate everything and we were relatively
0:44:55.279,0:44:59.102
sure that everything should be fine. The[br]only modification we did is that to all
0:44:59.102,0:45:03.506
those registers in our design, we've added[br]a reset, because we wanted to initialize
0:45:03.506,0:45:07.710
the system to some known state when we[br]started up, which is a very obvious thing
0:45:07.710,0:45:12.327
to do. Every CPU has a reset. But of[br]course, what we didn't think about here
0:45:12.327,0:45:16.577
was that at some point there's a buffer[br]driving this reset line somewhere. And if
0:45:16.577,0:45:20.355
there's only a single buffer. What happens[br]if this buffer experiences a small
0:45:20.355,0:45:24.501
transient event? Of course, the obvious[br]thing that happened is that as soon as
0:45:24.501,0:45:28.247
that happened, all the registers were[br]upset at the same time and were basically
0:45:28.247,0:45:32.205
cleared and all our fancy protection was[br]invalidated. So next time we decided,
0:45:32.205,0:45:37.679
let's be smarter this time. And of course,[br]we triplicate all the logic and all the
0:45:37.679,0:45:40.633
voters and all the registers. So let's[br]also triplicate the reset lines. And while
0:45:40.633,0:45:44.955
the designer of that block probably had[br]very good intentions, it turned out
0:45:44.955,0:45:49.268
that later than when we manufactured the[br]chip, it still sometimes showed a complete
0:45:49.268,0:45:54.570
reset without any good explanation for[br]that. And what was left out of the the
0:45:54.570,0:45:59.981
scope of thinking here was that this reset[br]actually was connected to the system reset
0:45:59.981,0:46:05.033
of the chip that we had. And typically[br]pins are on the chip or something that is
0:46:05.033,0:46:09.005
not available in huge quantities. So you[br]typically don't want to spend three pins
0:46:09.005,0:46:13.128
of your chip just for a stupid reset that[br]you don't use ninety nine percent of the
0:46:13.128,0:46:17.895
time. So what we did at some point we just[br]connected again the reset lines to a
0:46:17.895,0:46:21.972
single input buffer. That was then[br]connected to a pin of the chip. And of
0:46:21.972,0:46:25.590
course, this also represented a small[br]sensitive area in the chip. And again,
0:46:25.590,0:46:30.216
a single upset here was able to destroy[br]all three of our flip flops. All right.
0:46:30.216,0:46:35.132
And the last lesson I'm bringing or the[br]last thing that goes back to the
0:46:35.132,0:46:38.930
implementation details that Szymon has[br]mentioned. So this time, really simple
0:46:38.930,0:46:42.532
circuit. We were absolutely convinced it[br]must work because it was basically the
0:46:42.532,0:46:46.072
textbook example that Szymon was[br]presenting. And the code was so
0:46:46.072,0:46:49.817
small we were able to inspect everything[br]and were very much sure that nothing
0:46:49.817,0:46:54.690
should have happened. And what we saw when[br]we went for this laser testing experiment,
0:46:54.690,0:46:59.769
in simplified form is basically that[br]only this first voter. And when this was
0:46:59.769,0:47:04.414
hit, always all our register was [br]upset while the other ones were
0:47:04.414,0:47:09.161
never manifested to show anything strange.[br]And it took us quite a while to actually
0:47:09.161,0:47:13.563
look at the layout later on and figure out[br]that what was in the chip was rather this.
0:47:13.563,0:47:17.250
So two of the voters were actually not[br]there. And Szymon mentioned the reason for
0:47:17.250,0:47:21.208
that. So synthesis tool these days are[br]really clever at identifying redundant
0:47:21.208,0:47:26.102
logic and because we forgot to tell it to[br]not optimize these redundant pieces of
0:47:26.102,0:47:30.248
logic, which the voters really are. It[br]just merged them into one. And that
0:47:30.248,0:47:34.393
explains why we only saw this one voter[br]being the sensitive one. And of course, if
0:47:34.393,0:47:38.255
you have a transient event there, then you[br]suddenly upset all your registers and that
0:47:38.255,0:47:41.871
without even knowing it and with being[br]sure, having looked at every single line
0:47:41.871,0:47:45.652
of verilog code and being very sure,[br]everything should have been fine. But that
0:47:45.652,0:47:51.805
seems to be how this business goes. So we[br]hope we had been we had the chance and you
0:47:51.805,0:47:56.648
were able to get some insight in in what[br]we do to make sure the experiments at the
0:47:56.648,0:48:01.966
LHC work fine. What you can do to[br]make sure the satellite you are working on
0:48:01.966,0:48:06.393
might be working OK. Even before launching[br]it into space, if you're interested into
0:48:06.393,0:48:10.715
some more information on this topic, feel[br]free to pass by at the assembly I
0:48:10.715,0:48:15.014
mentioned at the beginning or just meet us[br]after the talk and otherwise thank you
0:48:15.014,0:48:22.286
very much.[br]Applause
0:48:22.286,0:48:27.041
Herald: Thank you very much indeed.[br]There's about 10 minutes left for Q and A,
0:48:27.041,0:48:31.872
so if you have any questions go to a[br]microphone. And as a cautious reminder,
0:48:31.872,0:48:38.297
questions are short sentences with. That[br]starts with a question. Well, ends with a
0:48:38.297,0:48:42.548
question mark and the first question goes[br]to the Internet.
0:48:42.548,0:48:46.433
Internet: Well, hello. Um, do you also[br]incorporate radiation as the source for
0:48:46.433,0:48:50.596
randomness when that's needed?[br]Stefan: So we personally don't. So in our
0:48:50.596,0:48:56.880
designs we don't. But it is done indeed[br]for a random number generator. This is
0:48:56.880,0:49:01.081
sometimes done that they use radioactive[br]decay as a source for randomness. So this
0:49:01.081,0:49:03.989
is done, but we don't do it in our[br]experiments.
0:49:03.989,0:49:06.802
We rather want deterministic data out of[br]the things we built.
0:49:06.802,0:49:10.929
Herald: Okay. Next question goes to[br]microphone number four.
0:49:10.929,0:49:16.714
Mic 4: Do you do your tripplication before[br]or after elaboration?
0:49:16.714,0:49:21.003
Szymon: So currently we do it before[br]elaboration. So we decided that our tool
0:49:21.003,0:49:25.764
works on verilog input and it produces[br]verilog output because it offers much more
0:49:25.764,0:49:30.496
flexibility in the way how you can[br]incorporate different tripplication
0:49:30.496,0:49:34.423
schemes. If you were to apply to only[br]after elaboration, then of course doing a
0:49:34.423,0:49:38.453
full tripplication might be easy. But then[br]you - to having a really precise control
0:49:38.453,0:49:43.438
or on types of tripplication on different[br]levels is much more difficult.
0:49:43.438,0:49:47.296
Herald: Next question from microphone[br]number two.
0:49:47.296,0:49:50.840
Mic 2: Is it possible to use DCDC[br]converters or switch mode power supplies
0:49:50.840,0:49:54.630
within the radiation environment to power[br]your logic? Or you use only linear power?
0:49:54.630,0:49:59.866
Szymon: Yes, alternatively we also have a[br]dedicated program which develops radiation
0:49:59.866,0:50:05.366
hardened DCDC converters who operate[br]in our environments. So they are available
0:50:05.366,0:50:10.988
also for space applications, as far as I'm[br]aware. And they are hardened against total
0:50:10.988,0:50:16.027
ionizing dose as well as single event[br]upsets.
0:50:16.027,0:50:19.667
Herald: Okay next question goes to[br]microphone number one.
0:50:19.667,0:50:22.614
Mic 1: Thank you very much for the great[br]talk. I'm just wondering, would it be
0:50:22.614,0:50:27.435
possible to hook up every logic gate in[br]every water in a way of mesh network? And
0:50:27.435,0:50:31.873
what are the pitfalls and limitations for[br]that?
0:50:31.873,0:50:36.734
Stefan: So that is not something I'm aware[br]of, of being done. So typically: No. I
0:50:36.734,0:50:41.473
wouldn't say that that's something we[br]would do.
0:50:41.473,0:50:43.431
Szymon: I'm not really sure if I[br]understood the question.
0:50:43.431,0:50:46.401
Stefan: So maybe you can rephrase what[br]your idea is?
0:50:46.401,0:50:52.613
Mic 1: On the last slide, there were a[br]lesson learned.
0:50:52.613,0:50:56.253
Stefan: Yes. One of those?[br]Mic 1: In here. Yeah. Would you be able to
0:50:56.253,0:51:00.309
connect everything interchangeably in a[br]mesh network?
0:51:00.309,0:51:04.030
Szymon: So what you are probably asking[br]about is whether we can build our own
0:51:04.030,0:51:08.166
FPGA, like programable logic device.[br]Mic 1: Probably.
0:51:08.166,0:51:11.074
Szymon: Yeah. And so this we typically[br]don't do, because in our experiments, our
0:51:11.074,0:51:15.857
power budget is also very limited, so we[br]cannot really afford this level of
0:51:15.857,0:51:20.903
complexity. So of course you can make your[br]FPGA design radiation hard, but this is
0:51:20.903,0:51:24.890
not what we will typically do in our[br]experiments.
0:51:24.890,0:51:28.630
Herald: Next question goes to microphone[br]number two.
0:51:28.630,0:51:32.059
Mic 2: Hi, I would like to ask if the[br]orientation of your transistors and your
0:51:32.059,0:51:38.029
chip is part of your design. So mostly you[br]have something like a bounding box around
0:51:38.029,0:51:42.921
your design and with an attack surface in[br]different sizes. So do you use this
0:51:42.921,0:51:48.350
orientation to minimize the attack surface[br]of the radiation on chips, if you know
0:51:48.350,0:51:52.616
the source of the radiation?[br]Szymon: No. So I don't think we'd do that.
0:51:52.616,0:51:58.515
So, of course, we control our orientation[br]of transistors during the design phase.
0:51:58.515,0:52:02.651
But usually in our experiment, the[br]radiation is really perpendicular to the
0:52:02.651,0:52:07.981
chip area, which means that if you rotate[br]it by 90 degrees, you don't really gain
0:52:07.981,0:52:12.082
that much. And moreover, our chips,[br]usually they are mounted in a bigger
0:52:12.082,0:52:16.625
system where we don't control how they are[br]oriented.
0:52:16.625,0:52:24.420
Herald: Again, microphone number two.[br]Mic 2: Do you take meta stability into
0:52:24.420,0:52:33.140
account when designing voters?[br]Szymon: The voter itself is combinatorial.
0:52:33.140,0:52:38.820
So ... -[br]Mic 2: Yeah, but if the state of the rest
0:52:38.820,0:52:45.300
can change in any time that then the[br]voters can have like glitches, yeah?
0:52:45.300,0:52:51.140
Szymon: Correct. So that's why - so to[br]avoid this, we don't take it into account
0:52:51.140,0:52:55.060
during the design phase. But if we use[br]that scheme which is just displayed here,
0:52:55.060,0:52:58.980
we avoid this problem altogether, right?[br]Because even if you have meta stability in
0:52:58.980,0:53:05.300
one of the blocks like A, B or C, then it[br]will be fixed in the next clock cycle.
0:53:05.300,0:53:09.940
Because usually our systems operate on[br]clocks with low frequencies, hundreds of
0:53:09.940,0:53:13.236
megahertz, which means that any meta[br]stability should be resolved by the next
0:53:13.236,0:53:15.065
clock cycle.[br]Mic 2: Thank you.
0:53:15.065,0:53:19.145
Herald: Next question microphone number[br]one.
0:53:19.145,0:53:23.014
Mic 1: How do you handle the register[br]duplication that can be performed by a
0:53:23.014,0:53:27.947
synthesis and pleasant route? So the tools[br]will try to optimize timing sometimes by
0:53:27.947,0:53:32.375
adding registers. And these registers are[br]not trippled.
0:53:32.375,0:53:35.784
Stefan: Yes. So what we do is that I mean,[br]in a typical, let's say, standard ASIC
0:53:35.784,0:53:40.405
design flaw, this is not what happens. So[br]you have to actually instruct a tool to do
0:53:40.405,0:53:44.585
that, to do re timing and add additional[br]registers. But for what we are doing, we
0:53:44.585,0:53:48.174
have to - let's say not do this[br]optimization and instruct a tool to keep
0:53:48.174,0:53:52.823
all the registers we described in our RTL[br]code to keep them until the very end. And
0:53:52.823,0:53:56.908
we realy also constrain them to always[br]keep their associated logic tripplicated.
0:53:56.908,0:54:01.759
Herald: The next question is from the[br]internet.
0:54:01.759,0:54:07.887
Internet: Do you have some simple tips for[br]improving radiation tolerance?
0:54:07.887,0:54:12.020
Stefan: Simple tips? Ahhhm...[br]Szymon: Put your electronics inside a
0:54:12.020,0:54:12.820
box.[br]Stefan: Yes.
0:54:12.820,0:54:17.380
some laughter[br]There's there's just no
0:54:17.380,0:54:22.980
single one size fits all textbook recipe[br]for this as it really always comes down to
0:54:22.980,0:54:28.020
analyzing your environment, really getting[br]an awareness first of what rate and what
0:54:28.020,0:54:31.940
number of events you are looking at, what[br]type of particles cause them, and then
0:54:31.940,0:54:36.420
take the appropriate measures to mitigate[br]them. So there is no one size fits all
0:54:36.420,0:54:38.095
thing I say.[br]Herald: Next question goes from mycrophone
0:54:38.095,0:54:41.620
number two.[br]Mic 2: Hi. Thanks for the talk. How much
0:54:41.620,0:54:47.611
of your software used to design is[br]actually open source? I only know a super
0:54:47.611,0:54:54.495
expensive chip design software.[br]Stefan: You write the core of all the
0:54:54.495,0:55:00.604
implementation tools like the synthesis[br]and place and route stage for the ASICS,
0:55:00.604,0:55:04.987
that we design is actually a commercial[br]closed source tools. And if
0:55:04.987,0:55:10.443
you're asking for the fraction, that's a[br]bit hard to answer. I cannot give a
0:55:10.443,0:55:14.518
statement about the size of the commercial[br]closed tools. But we tried to do
0:55:14.518,0:55:18.638
everything we develop, tried to make it[br]available to the widest possible audience
0:55:18.638,0:55:22.353
and therefore decided to make the[br]extensions to this design flaw available
0:55:22.353,0:55:26.237
in public form. And that's why these[br]tools that we develop and share among the
0:55:26.237,0:55:30.541
community of ASIC designers and this[br]environment are open source.
0:55:30.541,0:55:35.196
Herald: Microphone number four.[br]Mic 4: Have you ever tried using steered
0:55:35.196,0:55:41.098
iron beams for more localized, radiation[br]ingress testing?
0:55:41.098,0:55:44.495
Stefan: Yes, indeed! And the picture I[br]showed actually, uh, didn't disclaimer
0:55:44.495,0:55:49.311
that, but the facility you saw here is[br]actually a facility in Darmstadt in
0:55:49.311,0:55:53.366
Germany and is actually a micro beam[br]facility. So it's a facility that allows
0:55:53.366,0:55:58.400
steering a heavy ion beam really on a[br]single position with less than a
0:55:58.400,0:56:01.808
micrometer accuracy. So it provides[br]probably exactly what you were asking for.
0:56:01.808,0:56:05.854
But that's not the typical case. That is[br]really a special thing. And it's probably
0:56:05.854,0:56:09.405
also the only facility in Europe that can[br]do that.
0:56:09.405,0:56:13.316
Herald: Microphone number one.[br]Mic 1: Was very good very good talk. Thank
0:56:13.316,0:56:19.282
you very much. My question is, did you[br]compare what you did to what is done for
0:56:19.282,0:56:25.380
securing secret chips? You know, when you[br]have credit card chips, you can make fault
0:56:25.380,0:56:29.949
attacks into them so you can make them[br]malfunction and extract the cryptographic
0:56:29.949,0:56:33.830
key for example from the banking card.[br]There are techniques here to harden these
0:56:33.830,0:56:38.207
chips against fault attacks. So which are[br]like voluntary faults while you have like
0:56:38.207,0:56:43.121
random less faults due to like involatility[br]attacks. You know what? Can you explain if
0:56:43.121,0:56:47.294
you compared in a way what you did to[br]this?
0:56:47.294,0:56:50.861
Stefan: Um, so no, we didn't explicitly[br]compared it, but it is right that the
0:56:50.861,0:56:54.427
techniques we present can also be used in[br]a variety of different contexts. So one
0:56:54.427,0:56:59.134
thing that's not exactly what you are[br]referring to, but relatively on a similar
0:56:59.134,0:57:03.513
scale is that currently in very small[br]technologies you get two problems with the
0:57:03.513,0:57:07.855
reliability and yield of the manufacturing[br]process itself, meaning that sometimes
0:57:07.855,0:57:11.721
just the metal interconnection between two[br]gates and your circuit might be broken
0:57:11.721,0:57:16.297
after manufacturing and then adding the[br]sort of redundancy with the same kinds of
0:57:16.297,0:57:20.576
techniques can be used to make, to[br]produce more working chips out of a
0:57:20.576,0:57:24.715
manufacturing run. So in this sort of[br]context, these sorts of techniques are
0:57:24.715,0:57:30.674
used very often these days. But, um, I'm[br]and I'm pretty sure they can be applied to
0:57:30.674,0:57:34.953
these sorts of, uh, security fault attack[br]scenarios as well.
0:57:34.953,0:57:39.703
Herald: Next question from microphone[br]number two.
0:57:39.703,0:57:44.126
Mic 2: Hi, you briefly also mentioned the[br]mitigation techniques on the cell level
0:57:44.126,0:57:52.426
and yesterday there was a very nice talk[br]from the Libre Silicon people and they
0:57:52.426,0:57:55.914
are trying to build a standard cell[br]library, uh, open source standard cell
0:57:55.914,0:58:00.015
library. So are you in contact with them[br]or maybe you could help them to improve
0:58:00.015,0:58:03.980
their design and then the radiation[br]hardness?
0:58:03.980,0:58:07.430
Stefan: No. We also saw the talk[br]yesterday, but we are not yet in
0:58:07.430,0:58:14.180
contact with them. No.[br]Herald: Does the Internet have questions?
0:58:14.180,0:58:21.380
Internet: Yes, I do. Um, two in fact.[br]First one would be would TTL or other BJT
0:58:21.380,0:58:26.740
based logic be more resistant?[br]Szymon: Uh, yeah. So depending on which
0:58:26.740,0:58:31.126
type of errors we are considering. So BJT[br]transistors, they have ...
0:58:31.126,0:58:35.917
Stefan in his part mentioned that[br]displacement damage is not a problem for
0:58:35.917,0:58:40.305
seamless devices, but it is not the case[br]for BJT devices. So when they are exposed
0:58:40.305,0:58:47.074
to high energy hadrons or protons,[br]they degrade a lot. So that's why we don't
0:58:47.074,0:58:52.393
use them in really our environment. They[br]could be probably much more robust to
0:58:52.393,0:58:57.369
single event effects because their[br]resistance everywhere is much lower. But
0:58:57.369,0:59:01.633
they would have other problems. And also[br]another problem which is worth
0:59:01.633,0:59:06.204
mentioning is that for those devices, they[br]consume much, much, much more power, which
0:59:06.204,0:59:13.041
we cannot afford in our applications.[br]Internet: And the last one would be how do
0:59:13.041,0:59:19.396
I use the output of the full TMR setup? Is[br]it still three signals? How do I know
0:59:19.396,0:59:26.260
which one to use and to trust?[br]Stefan: Um, yes. So with this, um,
0:59:26.260,0:59:30.047
architecture, what you could either do is[br]really do the full triplication scheme
0:59:30.047,0:59:34.804
to your whole logic tree basically and[br]really triplicate everything or, and
0:59:34.804,0:59:38.903
that's going in the direction of one of[br]the lessons learned I had, at some point
0:59:38.903,0:59:43.261
of course you have an interface to your[br]chip, so you have pins left and right that
0:59:43.261,0:59:46.630
are inputs and outputs. And then you have[br]to decide either you want to spend the
0:59:46.630,0:59:51.025
effort and also have three dedicated input[br]pins for each of the signals, or you at
0:59:51.025,0:59:54.260
some point have the voter and say, okay.[br]At this point, all these signals are
0:59:54.260,0:59:58.202
combined. But I was able to reduce the[br]amount of sensitive area in my chip
0:59:58.202,1:00:03.780
significantly and can live with the very[br]small remaining sensitive area that just
1:00:03.780,1:00:07.460
the input and output pins provide.[br]Szymon: So maybe I will add one more thing
1:00:07.460,1:00:11.780
is that typically in our systems, of[br]course we triplicate our logic internally,
1:00:11.780,1:00:15.300
but when we interface with external[br]world, we can apply another protection
1:00:15.300,1:00:20.340
mechanism. So for example, for our high[br]speed serialisers, we will use different types
1:00:20.340,1:00:23.733
of encoding to add protect..., [br]to add like forward error correction
1:00:23.733,1:00:30.340
codes which would allow us to recover these[br]type of faults in the backend later on.
1:00:30.340,1:00:36.522
Herald: Okay. If ...if we keep this very,[br]very short. Last question goes to
1:00:36.522,1:00:41.401
microphone number two.[br]Mic 2: I don't know much about physics. So
1:00:41.401,1:00:47.370
just the question, how important is the[br]physical testing after the chip is
1:00:47.370,1:00:51.895
manufactured? Isn't the simulation, the[br]computer simulation enough if you just
1:00:51.895,1:00:56.332
shoot particles at it?[br]Stefan: Yes and no. So in principle, of
1:00:56.332,1:01:01.267
course, you are right that you should be[br]able to simulate all the effects we look
1:01:01.267,1:01:06.531
at. The problem is that as the designs[br]grow big and they do grow bigger as the
1:01:06.531,1:01:10.892
technologies shrink, so[br]this final net list that you end up with
1:01:10.892,1:01:15.175
can have millions or billions of nodes and[br]it just is not feasible anymore to
1:01:15.175,1:01:19.558
simulate it exhaustively because you have[br]to have so many dimensions. You have to
1:01:19.558,1:01:25.852
change when you inject. For example, bit[br]flips or transients in your design in any
1:01:25.852,1:01:30.745
of those nodes for varying time offsets.[br]And it's just the state space the circuit
1:01:30.745,1:01:34.553
can be in is just too huge to capture in a[br]in a full simulation. So it's not possible
1:01:34.553,1:01:38.803
to exhaustively test it in simulation. And[br]so typically you end up with having missed
1:01:38.803,1:01:43.048
something that you discover only in the[br]physical testing afterwards, which you
1:01:43.048,1:01:47.311
always want to do before you put your, uh,[br]your chip into final experiment or on your
1:01:47.311,1:01:50.934
satellite and then realise it's it's not[br]working as intended. So it has a big
1:01:50.934,1:01:55.540
importance as well.[br]Herald: Okay. Thank you. Time is up. All
1:01:55.540,1:01:58.584
right. Thank you all very much.
1:01:58.584,1:02:04.602
applause
1:02:04.602,1:02:09.599
36c3 postroll music
1:02:09.599,1:02:32.100
Subtitles created by c3subtitles.de[br]in the year 2021. Join, and help us!