How physicists analyze massive data: LHC + brain + ROOT = Higgs (33c3)
-
0:00 - 0:13music
-
0:13 - 0:17Herald: Good morning and welcome back to
stage one. It's kind of going to be the -
0:17 - 0:21second talk about physics on this day
already and it's about big data and -
0:21 - 0:27science and big data became something like
Uber in science. It's everywhere every -
0:27 - 0:33discipline has it. Axel Naumann's working
for CERN, the accelerator in Switzerland -
0:33 - 0:39and he talks about how physics and
computing bridge in this area and he works -
0:39 - 0:43a lot with ROOT, a program that helps
transform data into knowledge. A warm -
0:43 - 0:45welcome.
-
0:45 - 0:45Axel Naumann: Thank you.
-
0:45 - 0:51applause
-
0:51 - 0:58AN: Thanks a lot. So, well you know, when,
when I was discussing this abstract with -
0:58 - 1:01the science track people they tell me:
"Well, you know about three hundred people -
1:01 - 1:06might be in the audience." But well, hey,
you are huge that's much more than three -
1:06 - 1:11hundred people. So thank you so much for
inviting me over it's a real honor. And of -
1:11 - 1:15course originally when talking to 300
people are all science interested I -
1:15 - 1:21thought you know I pick something fairly
narrow focuswise but then I learned I'm -
1:21 - 1:25going to be in Saal one and that's
different, so I decided to make the scope -
1:25 - 1:31a little bit wider and that's what I ended
up with. I'll talk a little bit about -
1:31 - 1:38CERN in society as well if you so choose,
you'll see what that means in a minute. So -
1:38 - 1:42the things I'll cover here is obviously
CERN just a little bit of an introduction -
1:42 - 1:46how we do physics, how we do computing,
what data means to us and I can tell you -
1:46 - 1:52it means everything, you heard about that
already, right? How we do data analysis in -
1:52 - 1:56high energy physics and just because
we've been doing it for a while and -
1:56 - 2:01because I've been doing it for more than
ten years, I'm one of the guys who's -
2:01 - 2:07providing the software to do data
analysis in high energy physics, so, you -
2:07 - 2:11know, because we know what we are doing
and we have some experience, I thought -
2:11 - 2:18maybe you might be interested in hearing
what my forecast is for data analysis in -
2:18 - 2:25general, in the future. So let's start
with CERN. And so if you wonder what CERN -
2:25 - 2:32is, you've all heard about CERN, about
the fantastic funds we love to use, then -
2:32 - 2:37you've probably also heard that we are
doing science. We were founded right after -
2:37 - 2:41the Second World War or soon after the
Second World War, basically as a way to -
2:41 - 2:47entertain those freaky scientists. You
know that was the idea: peace europewide. -
2:47 - 2:52And damn, that's working out really well
and so well there's not just Europe -
2:52 - 2:58anymore these days. We are located near
Geneva, we are doing only fundamental -
2:58 - 3:02research, so we don't do any weapons,
nuclear stuff you -
3:02 - 3:10know, these kind of things. The WWW was
invented at CERN but that was just a, you -
3:10 - 3:15know, side effect happens sometimes, that
we invent things. But usually we just do -
3:15 - 3:22science. So what we do is, we take money,
lots off, and brains who like to discuss -
3:22 - 3:27and think and come up with ideas and from
that we generate knowledge. It's really -
3:27 - 3:33all about curiosity. The things we try to
answer is what is mass? Which is funny -
3:33 - 3:37question right? Like we all know what mass
is but actually we don't. We know what -
3:37 - 3:42mass is in the universe. We understand
that masses attract one another: gravity. -
3:42 - 3:49Which is beautifully correct. And in the
small scale, our particles, we know that -
3:49 - 3:53mass is energy and we can't convert them.
But we don't understand how these two -
3:53 - 3:58things go together. Like there is no
bridge, they contradict one another. So we -
3:58 - 4:05are trying to understand what that bridge
might be. Part of that mass thing is of -
4:05 - 4:09course also what's out there in the
universe? That's a big question. We only -
4:09 - 4:14understand a few percent of that. 90 and
some percent are completely unknown to -
4:14 - 4:20us, and that's scary right? I mean we know
gravity really well, we can deal with -
4:20 - 4:28freaky things like black holes and yet we
don't understand what's out there. Now to -
4:28 - 4:32do all these things we are probing nature
at the smallest scale as we call it, so -
4:32 - 4:36that's particles, we are dealing with
things like the Higgs particle and -
4:36 - 4:44supersymmetry. Here's a little bit of a
fact sheet. We have about 12,000 -
4:44 - 4:48physicists who are working with CERN. We
are basically the workbench that you saw -
4:48 - 4:55in Andre's talk before. We are the table
that physicists use, okay? And, so they -
4:55 - 4:59come to CERN and once a while about
10,000 physicists a year, or they work -
4:59 - 5:03remotely most of the time from about 120
nations. So you're seeing it's not -
5:03 - 5:11European anymore, this is a global thing.
CERN in itself has about 2,500 employees, -
5:11 - 5:15you know those scrubbing the table,
setting things up and so on. And our -
5:15 - 5:21table is right here. In the far end we
have the Alps, it's in Switzerland -
5:21 - 5:26as I said, so the Alps are
always close, with Mont Blanc, we have the -
5:26 - 5:32Lake Geneva we have the Jura, the French
Mountains on the lower end here, it's just -
5:32 - 5:37beautiful. It's really nice, but we
needed to stick a 30-kilometer ring in -
5:37 - 5:44there somewhere and people would have
hated us had we put it like this. But -
5:44 - 5:50luckily people were smart back then in the
70s, and built a tunnel much better. So -
5:50 - 5:55now we have this huge tunnel, and we send
particles through in both directions near -
5:55 - 6:00the speed of light and the tunnel is
filled with magnets simply because if you -
6:00 - 6:08don't use a magnet the particles will fly
straight but we need them to turn around. -
6:08 - 6:14Here you see what it's looking like, you
also see these big halls there that have -
6:14 - 6:22access shafts from the top and that's
where the experiments are. That's sort of -
6:22 - 6:29a sketch of one of the experiments. So the
the LHC is one of the, no, is the biggest -
6:29 - 6:36particle accelerator at the moment, it's a
ring with 27 kilometers circumference, 100 -
6:36 - 6:40meters below Switzerland and France, it
has four big experiments and several -
6:40 - 6:45small ones and we are expected to run
until 2030. So you see that all of that -
6:45 - 6:50is large-scale simply because we're trying
to make good use of the money we have. -
6:50 - 6:56Here, you see one of these caverns that
are used by the experiments while it was -
6:56 - 7:01empty. The experiment was then lowered
through this hole by the roof, piece by -
7:01 - 7:07piece, and these things are humongous. To
give you an impression of how big it is, I -
7:07 - 7:13put Waldo in there, so your job for the
next three slides is to find Waldo. You -
7:13 - 7:16know, that gives you the scale. He's
friendlily waving at you, so it should be -
7:16 - 7:22easy to find him. So then we put a
detector in there. Here it's pulled apart -
7:22 - 7:26a little bit, so it looks nicer, you can
actually see something. You can for -
7:26 - 7:31example see the beam pipe, so that's where
the particles are flying through, and then -
7:31 - 7:35they're coming from both directions and
colliding in the center of the detector -
7:35 - 7:38and then things happen we try to
understand what -
7:38 - 7:45is happening. That's yet another view,
frontal view on one of the detectors and -
7:45 - 7:51now you have to imagine that, you know,
you can't just open up Amazon and order an -
7:51 - 7:56LHC experiment, right, that's not how it
works. We do this stuff ourselves, like -
7:56 - 8:03PhD students, postdocs, engineers. You
know, that's all done by hand, just like -
8:03 - 8:07the microscope you saw before. Of course
you order the parts, but you know the -
8:07 - 8:11design, the whole conception and actually
screwing these things together, making -
8:11 - 8:17sure that all fits, is all done by hand.
And I find that just beautiful, I mean -
8:17 - 8:22that's close to a miracle, right? That
nations, like people no matter what -
8:22 - 8:27nation, people across the globe work
together to build such a huge thing and -
8:27 - 8:39then you turn it on and it works. More or
less, but you get it to work. That's not -
8:39 - 8:44my applause, that's your applause, because
you make this possible. Really, but it's, -
8:44 - 8:50it's huge this is for me one of the things
I love most about CERN: That is this -
8:50 - 8:55international thing that just works
smoothly. Now the detectors are like a -
8:55 - 9:01massive camera. We have lots of pixels and
we take many, many pictures a second. We -
9:01 - 9:07do this to identify particles and then
sort of estimate what has happened during -
9:07 - 9:15the collision. Now, life at CERN is of
course an important ingredient for -
9:15 - 9:20scientists as well, and if you live at
CERN then actually it's just work at CERN -
9:20 - 9:24and that's what it's about. But it's not
that bad, so we hang out together in our -
9:24 - 9:30control rooms, make sure that the
experiments work correctly. We also, you -
9:30 - 9:34know, study the forces.
laughter -
9:34 - 9:39We have scientific discourse, in the sun,
view on the Mont Blanc, with a good -
9:39 - 9:45coffee. We have lectures and we are
lectured and of course, as you, we have -
9:45 - 9:55more laptops than people. And, then we do
stuff and so this presentation is going to -
9:55 - 9:59introduce you to some of the things we are
doing, and more on the computing and the -
9:59 - 10:04society side as I said. But because I have
so much to talk to about I decided that -
10:04 - 10:09you just build your own talk, you tell me
what you want to hear. So let's do this, -
10:09 - 10:14you can choose between A, physics, and B,
model simulation and data. You remember -
10:14 - 10:19these books like from the old days when we
were all young? It's that kind of thing, -
10:19 - 10:24ok? You decide/design your own talk here.
So, by applause, do you want to hear about -
10:24 - 10:28physics?
applause -
10:28 - 10:36Okay. Or the model simulation data part?
louder applause -
10:36 - 10:45Okay, there we go. So, this is what we
skip. Model simulation data it is. You're -
10:45 - 10:50a strange crowd, first time I meet people
who don't want to hear about physics... no -
10:50 - 10:51I'm kidding.
laughter -
10:51 - 10:54Audience: inaudible interjection
laughter -
10:54 - 11:00So model simulation data it is. So our
theory is actually incredibly precise. -
11:00 - 11:04It's so precise that our basic job is
really really boring, because we already -
11:04 - 11:11understand everything. Whenever there is a
collision, we know what's going to happen. -
11:11 - 11:15Except for these very rare things. So we
are trying to find these very rare things -
11:15 - 11:20out of this haystack of fairly boring
things that we really understand well. And -
11:20 - 11:26the weird things are, for example,
monopoles, supersymmetry, or black holes. -
11:26 - 11:32Now the theorists job is to tell us what
we should be seeing in the detector, given -
11:32 - 11:42some fancy physics. Then we use simulation
to see how our detector would respond to -
11:42 - 11:53that. Now, of course the question is: We
are just counting, basically, when we do -
11:53 - 11:58experiments and the question is: How often
do we need to see something to say: "Well, -
11:58 - 12:03that's not just the ordinary. That is
something new, that's something that could -
12:03 - 12:10be explained by a weird theory. We use the
detector simulation as I said to basically -
12:10 - 12:15predict how much we expect to see things.
We use reconstruction software which -
12:15 - 12:21tells us what has happened, or might have
happened in the detector to count how -
12:21 - 12:25often we saw something. And then we use
statistics to compare these two and to say -
12:25 - 12:32whether something is expected or not. Now,
that's fairly abstract but it's fairly -
12:32 - 12:37common, a fairly common approach. For
example, if you look at climate versus -
12:37 - 12:40weather, right, I mean we always have
temperature fluctuations because of -
12:40 - 12:46weather, and the question is: Is that rise
in temperature because of a weather effect -
12:46 - 12:50or because of a climate effect? Is that
large-scale or just a short-term -
12:50 - 12:56fluctuation. So there, we have a very
similar problem and here what you do is -
12:56 - 13:01you measure temperatures, and you want to
detect abnormal variations, and you can -
13:01 - 13:06improve that by measuring longer, like,
for 300 years instead of 20 years. That -
13:06 - 13:12gives you a better prediction what you
would expect in the future. Also, larger -
13:12 - 13:14deviations help, right?. If you look for
something that -
13:14 - 13:20is just 0.1 degree, then you might not be
able to find it. If there is a deviation -
13:20 - 13:25of 5 degrees, you will definitely find it.
And for us it's very similar. So here we -
13:25 - 13:32have a plot, one of the first Higgs
discovery plots, and you can see that we -
13:32 - 13:39have many ingredients there. So, the black
dots are what we measure and they have -
13:39 - 13:44certain uncertainty, because when we
measure, we count and we might have, you -
13:44 - 13:49know, not seen something, or we might have
seen more than we we should have seen, so -
13:49 - 13:55there's always an uncertainty. And then we
also have theory, which tells us you -
13:55 - 14:00should have seen so many and so for the
red part that's something that we know -
14:00 - 14:05exists, it's nothing spectacular. It's
simply what theory is telling us what we -
14:05 - 14:11should be seeing. And you can see the data
follows the red part fairly well. But then -
14:11 - 14:16there is this other bump in our dots on
the right-hand side or in the center and -
14:16 - 14:21that does not make sense, unless you take
the Higgs into account, right, which is -
14:21 - 14:27the light blue part and so here you can
see how this interplay between different -
14:27 - 14:38sources of physics and statistics works
for us. Now just as for the climate, more -
14:38 - 14:44data helps. And there are two versions of
more data more data: Either by having more -
14:44 - 14:48collisions, which is why we are running
24/7, or more data by combining different -
14:48 - 14:52analyses which is what's happening here.
So here you see all these different -
14:52 - 14:57analyses. If you combine them, of course
you get a much stronger prediction of, in -
14:57 - 15:03this case, the Higgs mass, then if you
just take any single one of them. You see -
15:03 - 15:09how similar what we are doing is to, you
know, any of the big data analyses out -
15:09 - 15:16there. Okay, so that was that part. Now
comes the obligatory part again, -
15:16 - 15:23computering. When we were designing the
LHC,not me, when people were designing the -
15:23 - 15:31LHC, they needed to project computing
power from 1990 to 2000 2010 and so on. -
15:31 - 15:34And then they said: "Well, we need
massive amount of computers" and for you -
15:34 - 15:38there's now "Ughhh - everybody has it, we
have it as well, we have our racks of -
15:38 - 15:44computers". This is something that the big
companies usually don't show: You you know -
15:44 - 15:49there is actually a ramp where the trucks
arrive and they offload the things and -
15:49 - 15:54then someone needs to screw them together
and then looks shiny. This is how we are -
15:54 - 16:01spending our CPU time: We have about
60,000 cores that are spinning all the -
16:01 - 16:07time for us, and they are distributed
around the world. You can see that CERN, -
16:07 - 16:15for example, is the red part there near
the bottom. Yeah, so we make good use of -
16:15 - 16:21that. We also monitor the efficiency, and
because 100 percent efficient is for -
16:21 - 16:29beginners we are actually about 700
percent efficient. Don't ask why. They -
16:29 - 16:34decided if you are multi-threading, then
we, you know, we multiply your efficiency -
16:34 - 16:40by the number of threads you have. Makes
no sense to me. We also have storage, -
16:40 - 16:45currently we use about 0.7 exabytes. We
also have available at one point seven -
16:45 - 16:49exabytes, so that's good, we make use of
the storage we have. Where it's, you know, -
16:49 - 16:56tera- peta- exa-, so it's a lot, and here
you can see on the right hand side you -
16:56 - 17:00see, for example, the tape usage on the
bottom and you see this dip that was -
17:00 - 17:04before we were starting the accelerator
again, we needed to make some space so we -
17:04 - 17:09monitor our hard disk usage all the time.
Hey, here comes the next decision point: -
17:09 - 17:14So, do you want to hear about, 1,
distributed computing or 2, measure -
17:14 - 17:18effects of bugs. So, 1, distributed
computing -
17:18 - 17:26applause
and 2, measure the effects of bugs -
17:26 - 17:36similar amount of applause
Okay, so that's my call, and I would say -
17:36 - 17:41we do we do... Measure the effects of
bugs, because it's shorter. -
17:41 - 17:47laughter
So this is one of the views you can, you -
17:47 - 17:51know, electronic views you can get from a
detector and you see how we trace the -
17:51 - 17:55particles that fly through the detector.
Now, that software right, that's the -
17:55 - 18:00result of software, and you might not
believe it, if you have bugs in there, in -
18:00 - 18:01that software.
-
18:03 - 18:07And you know, these bugs are sometimes
wrong coordinate transformations, so -
18:07 - 18:13things don't go this way but that way,
it's kind of weird if you look at it, and -
18:13 - 18:17the result is that our particles don't go
through the path that they should have -
18:17 - 18:25been going, but we are attributing them a
different path. Now, the the nice thing -
18:25 - 18:31is that we are doing this a million times,
right? So all of that is smeared. We are -
18:31 - 18:36not systematically doing this wrong it's
just, we are always doing it a little bit -
18:36 - 18:42wrong. And so the net result is that if we
measure our particles, we will not measure -
18:42 - 18:47the right thing but always a little bit
wobbly left wobbly right you know? Things -
18:47 - 18:54are not as precise. That's simply an
uncertainty. So for us just like counting -
18:54 - 18:59has an uncertainty and predictions have
an uncertainty, software bugs introduced -
18:59 - 19:06another source of uncertainties. And here
you can see how we are tracking -
19:06 - 19:09uncertainties for for all of our
analyses. We are trying to understand the -
19:09 - 19:16different forces of uncertainties. And
again, bugs are only one of the sources -
19:16 - 19:23here, so if we find the bug then we
reduce our uncertainty and we can find new -
19:23 - 19:28physics earlier, instead of having to
wait and collect more data. So for us -
19:28 - 19:32finding bugs is really key, we really
love finding bugs because it brings -
19:32 - 19:37physics closer. I thought that was
interesting. It's kind of rare that you're -
19:37 - 19:42in environment where you're able to
measure the effect of bugs. Okay, so now -
19:42 - 19:48we are talking, we'll be talking about
data. I talked, told you that we are -
19:48 - 19:53trying to find particle traces in our
data and the way we do this is by using -
19:53 - 19:57reconstruction programs and there are
multiple gigabytes of binaries in shared -
19:57 - 20:02libraries and stuff. They're huge, they're
experiment specific and they are curated -
20:02 - 20:06by the experiments, open-source for some
of them, and we want them to be correct -
20:06 - 20:14and efficient. The data format we use is
not comma separated values, it's binary -
20:14 - 20:21and for some strange reason it's our own
custom binary format. The reason is that -
20:21 - 20:27it's really targeted and the kind of
data we are having. We have collisions -
20:27 - 20:32that are independent, so we only need one
in memory at any time and we have nested -
20:32 - 20:39collections which makes the regular table
layout a non-starter. We actually generate -
20:39 - 20:44them from C++ objects so from classes,
class definitions, C++ class definitions -
20:44 - 20:51and we can read them back into C++ but
also into JavaScript or Scala. Database -
20:51 - 20:57just didn't do it for us. They have the
wrong model of data axis, they don't -
20:57 - 21:03scale, it's just not the kind of system
that works for us. Also using a file -
21:03 - 21:09system as a storage back-end might sound
really very traditional and boring but it -
21:09 - 21:14works amazingly well and seems to be
future proof as well, so that's just the -
21:14 - 21:20way to go for us. There are many other
structured data formats out there, many of -
21:20 - 21:26those did not exist when we started root
our own data format. But they also miss -
21:26 - 21:30many things. For example, we wanted to
make sure that we have schema evolution -
21:30 - 21:34support. We can change the class layout
and still read back all data. We don't -
21:34 - 21:39want to throw away all data just because
we're changing the class. Also we do not -
21:39 - 21:43trust people. That is a, you know, as a
computer scientist or whatever you -
21:43 - 21:47probably know what I'm talking about
right? If people have to write their own -
21:47 - 21:51streaming algorithm, there will be bugs
and we will lose data. -
21:51 - 21:55We really don't want to do this, so we
were trying to automate this, based on the -
21:55 - 22:03class definition. So, last decision point
for the story. Do you want to hear about -
22:03 - 22:10cling, our C++ interpreter or about Open
Data and Applied Science? Let's start with -
22:10 - 22:15option 1, the C++ interpreter
applause -
22:15 - 22:21Okay and and Open Data and Applied
Science? -
22:21 - 22:30more applause than before
Yeah. I'm heading there. You miss a fish. -
22:30 - 22:35You can look at the slides later. Okay, so
there we go. Really? No. The slide number -
22:35 - 22:41is wrong. Oh a bug! So, Open Data and
Applied Science. Okay, you really wanted -
22:41 - 22:48to know about our budget, I understand
that. So we get from you about 1 billion -
22:48 - 22:51year and the currency doesn't really
matter anymore at this, at this point of -
22:51 - 22:54time.
laughter -
22:54 - 23:01And that is a lot of money. And you know?
We try to do really wonderful things, I -
23:01 - 23:05mean we really enjoy our job, we love it.
It's fantastic to work in such an -
23:05 - 23:09environment. And thank you very much for
making that possible. Really, I mean it. -
23:11 - 23:17But it also means that you decided as
society to enable something like CERN. -
23:17 - 23:22Which I think really deserves my applause
and yours probably as well. I think it's a -
23:22 - 23:24great decision to do something like this.
-
23:24 - 23:30applause
-
23:31 - 23:36So we realize this, right? We realized
that we are basically, that we can do what -
23:36 - 23:40we do because of you, and we are trying to
react to that by giving back what we do. -
23:40 - 23:47Software, research results, hardware and
data. So the way we share research results -
23:47 - 23:53is through open access. We have it,
finally. It took us a long time to fight -
23:53 - 23:58with publishers and, you know, the
establishment, but now we have it. We -
23:58 - 23:59also, yes thank you.
-
23:59 - 24:03applause
-
24:03 - 24:08We also put a lot of effort in
communicating our results and what we are -
24:08 - 24:13doing. And if you're in the region, it's
definitely worth a visit. I mean the URL -
24:13 - 24:18is really easy to remember, it's
visit.cern, and you know, works. And you -
24:18 - 24:22should go there by April, actually, if you
can because then you can ask people how to -
24:22 - 24:28get on the ground, because the accelerator
is off at the moment. We also do applied -
24:28 - 24:32research, for example we have this super
cool experiment where we try to study how -
24:32 - 24:40clouds form, based on cosmic rays. So the
the influence of cosmic rays and cloud -
24:40 - 24:46formation. Which is a key element in the
uncertainty of climate models. We are -
24:46 - 24:50trying to, to think about, you know, how
to make energy from nuclear waste. So -
24:50 - 24:55getting rid of nuclear waste while making
energy from it. And we are trying to -
24:55 - 25:02repurpose detectors that we have and you
know develop. We have something called -
25:02 - 25:08open hardware, for example White Rabbit:
deterministic ethernet, we have Open Data, -
25:08 - 25:13and we have the LHC@home and some other
programs, where either you can donate -
25:13 - 25:21compute power or your brain and help us
get better results. We explicitly try to -
25:21 - 25:26use open source as much as possible, and
also feed back, whenever we see issues. -
25:28 - 25:34But we also create open source. For
example, we create Geant, which is a -
25:34 - 25:38program that allows you to simulate how
particles fly through a matter, for -
25:38 - 25:45example used by the NASA. We have Indico,
which allows us to schedule meetings, -
25:45 - 25:49upload slides, you know, these kind of
things. Across the globe, lots of people, -
25:49 - 25:53with access protection, all these kind of
things. And it's open source. We have -
25:53 - 25:59DaviX, the dimension we love HTTP. That's
the next machine of Tim Berners-Lee. And -
25:59 - 26:03that's his futile effort in trying to
prevent the cleaning personnel from -
26:03 - 26:08switching it off. They don't speak
English, they did not back then at least. -
26:09 - 26:16So we use we used DaviX to transfer files
over HTTP, with a high bandwidth. Or we -
26:16 - 26:21have CVM-FS, which allows us to distribute
our binaries across the globe, and not -
26:21 - 26:27rely on admins downloading stuff and
making sure it actually runs, and these -
26:27 - 26:32kind of things. That is a lifesaver, it's
really fantastic, it's a great tool. But -
26:32 - 26:38nobody knows it. And we have ROOT, but
that's coming up. So now, the last -
26:38 - 26:43official part of this, of this
presentation, how do we do data analysis? -
26:43 - 26:45Not like that.
laughter -
26:45 - 26:52applause
We use, we use C++ and actually physicists -
26:52 - 26:58need to write their own analysis in C++.
We have very few people who have an actual -
26:58 - 27:04education in programming. so that's sort
of a clash. As I said, we need to keep one -
27:05 - 27:08collision in memory. And for what, you
know, what matters to us is throughput. We -
27:08 - 27:13want to have, we want to analyze as many
collisions as possible per second. What we -
27:13 - 27:17can do, is specialize our data format to
match the analysis, because we don't want -
27:17 - 27:23to waste I/O cycles, if we can, you know,
if we can make use of the CPU better. ROOT -
27:23 - 27:29allows us to do this since twenty years.
It's really the workhorse for the analysis -
27:29 - 27:35in high energy physics. And it's also an
interface to complex software. We have -
27:35 - 27:41serialization facilities, we have the
statistical tools, that people need, and -
27:41 - 27:44we have graphics, because once you have
done your analysis you need to communicate -
27:44 - 27:48that to your peers and convince people,
and publish, and so on, so that's part of -
27:48 - 27:54the game. All of that is open source, and,
of course, all of that is not just used by -
27:54 - 28:03high energy physics. So, to conclude: We
are here, because you make it possible. -
28:03 - 28:05Thank you very much. It's fantastic to
have you. -
28:05 - 28:11applause
We want to share and we have great people -
28:11 - 28:17for science outreach, but we have nobody
for software outreach, basically. So maybe -
28:17 - 28:25it's worth a look to see what what CERN is
producing software-wise. Scientific -
28:25 - 28:30computing is nothing new, it existed since
a long time, but we had to start fairly -
28:30 - 28:35early on a large scale. So when we were
building it up, we had to take... we were -
28:35 - 28:40trying to take pieces that existed and did
not found find much. So now we ended up -
28:40 - 28:45with C++ data serialization, efficient
computing for non computer scientists -
28:45 - 28:50even... In the part that I skipped and,
you know, one of the alternate tracks, you -
28:50 - 28:54would have seen that we have a Python
binding as well for the whole software -
28:54 - 29:00stack in C++. And for us, what matters
most is scale. Now we are seeing that we -
29:00 - 29:04are not the only ones. There are many more
natural sciences arriving at a similar -
29:04 - 29:09challenge of having to analyze large
amounts of data. Now I promised to you -
29:09 - 29:12that I'll be bold and I'll try to make a
few statements of what will happen with -
29:12 - 29:17data analysis, not just in science.
Because what we see is that we actually -
29:17 - 29:23educate the people who will do data
analysis, not just in science. What we see -
29:23 - 29:31is that in the past, data volume mattered
most. So more data meant more power. Now -
29:31 - 29:36that's not the complete truth anymore.
It's a lot about finding correlations. So -
29:36 - 29:41even with the amount of data not growing
anymore, because it's already humongous, -
29:41 - 29:46we try to squeeze more knowledge out of
it. And for that, I/O becomes important -
29:46 - 29:54and CPU limitations is the crucial factor.
We see that multivariate techniques are -
29:54 - 29:59still rising and they will just be part of
the toolchain of the statistical tools; -
30:00 - 30:07except for generative parts, which, I
believe, will change the way we model. -
30:10 - 30:16Now, based on what I just described, this
is not a big surprise anymore. As we need -
30:16 - 30:21throughput, we need to have a language for
the core analysis part, that is close to -
30:21 - 30:27metal, so something like C++.
On the other hand writing analyses is -
30:27 - 30:32still complex, so you need a higher-level
language and for that people could, for -
30:32 - 30:36example, use Python. So, now language
binding becomes relevant all of a sudden. -
30:36 - 30:42It's much more important in the future.
And we need to tailor I/O to the actual -
30:42 - 30:49analysis to not waste CPU cycles. So
throughput is the king and, in my point of -
30:49 - 30:54view, also in the future we will see much
more effort in increasing the throughput. -
30:56 - 31:03Okay, so that was it. In case you want to
discuss anything with me, like "That's -
31:03 - 31:08just wrong!", that's fine. I'm probably
have several bugs in there. I'm still here -
31:08 - 31:13until tomorrow. I don't know where yet,
so I'll wander around and you can contact -
31:13 - 31:17me by email or Twitter. Thank you very
much for your attention. Thank you. -
31:17 - 31:21applause
-
31:21 - 31:28music
-
31:28 - 31:45subtitles created by c3subtitles.de
in the year 2017. Join, and help us!
- Title:
- How physicists analyze massive data: LHC + brain + ROOT = Higgs (33c3)
- Description:
-
https://media.ccc.de/v/33c3-8083-how_physicists_analyze_massive_data_lhc_brain_root_higgs
Physicists are not computer scientists. But at CERN and worldwide, they need to analyze petabytes of data, efficiently. Since more than 20 years now, ROOT helps them with interactive development of analysis algorithms (in the context of the experiments' multi-gigabyte software libraries), serialization of virtually any C++ object, fast statistical and general math tools, and high quality graphics for publications. I.e. ROOT helps physicists transform data into knowledge.
The presentation will introduce the life of data, the role of computing for physicists and how physicists analyze data with ROOT. It will sketch out how some of us foresee the development of data analysis given that the rest of the world all of a sudden also has big data tools: where they fit, where they don't, and what's missing.
['Axel']
- Video Language:
- English
- Duration:
- 31:45
C3Subtitles edited English subtitles for How physicists analyze massive data: LHC + brain + ROOT = Higgs (33c3) | ||
locu edited English subtitles for How physicists analyze massive data: LHC + brain + ROOT = Higgs (33c3) | ||
locu edited English subtitles for How physicists analyze massive data: LHC + brain + ROOT = Higgs (33c3) | ||
Maximilian Marx edited English subtitles for How physicists analyze massive data: LHC + brain + ROOT = Higgs (33c3) |