0:00:19.509,0:00:25.998
Herald: Vincezo Izzo is entrepreneur and[br]investor with a focus on cybersecurity. He

0:00:25.998,0:00:32.180
has started up, gotten bought, and[br]repeated this a few times, and now he is

0:00:32.180,0:00:37.280
an advisor who advises people on starting[br]up companies, getting bought, and

0:00:37.280,0:00:45.489
repeating that. He is also director at[br]CrowdStrike and an associate at MIT Media

0:00:45.489,0:00:54.489
Lab.[br]Just checking the time to make sure that

0:00:54.489,0:01:00.129
we start on time, and this is, start[br]talking now. On the scale of cyber

0:01:00.129,0:01:03.812
security. Please give a warm welcome to[br]Vincenzo.

0:01:03.812,0:01:09.004
<i>Applause</i>

0:01:09.004,0:01:14.290
Vincenzo Izzo: So hi, everyone, thanks for[br]being here. As Karen said, I have made a

0:01:14.290,0:01:19.080
few changes to my career, but my[br]background is originally technical, and

0:01:19.080,0:01:24.530
what I wanted to do today is to talk about[br]a trend that I think we sort of take for

0:01:24.530,0:01:32.150
granted and it's to some extent obvious[br]but also underappreciated. And that is the

0:01:32.150,0:01:39.705
cloud scale in security. Specifically,[br]when I say cloud scale, what I mean is the

0:01:39.705,0:01:47.110
ability to process very large amounts of[br]data as well as spawn computing power with

0:01:47.110,0:01:55.258
ease, and how that has played a role in[br]our industry in the past decade or so. But

0:01:55.258,0:02:01.060
before I talk about that, I think some[br]context is important. So I joined the

0:02:01.060,0:02:05.950
industry about 15 years ago and back in[br]the days, even even a place like the

0:02:05.950,0:02:12.640
Congress was a much smaller place. It was[br]to some extent cozier and the community

0:02:12.640,0:02:19.670
was tiny. The industry was fairly niche.[br]And then something happened around 2010.

0:02:19.670,0:02:23.980
People realized that there were more and[br]more state sponsored attacks being carried

0:02:23.980,0:02:31.560
out. From Operation Aurora against Google,[br]to the Mandiant report APT1, that was the

0:02:31.560,0:02:39.060
first reported document how the Chinese[br]PLA was hacking west- let's call it the

0:02:39.060,0:02:47.150
western world infrastructure for IP theft.[br]And that changed a lot for the

0:02:47.150,0:02:53.770
industry. There have been two significant[br]changes because of all of this attention.

0:02:53.770,0:02:59.000
The first one is notoriety. We went from[br]being, as I said, a relatively unknown

0:02:59.000,0:03:04.920
industry to something that everyone talk[br]about. If you open any kind of a

0:03:04.920,0:03:09.880
newspaper, there's almost always an[br]article on cybersecurity, boardrooms talk

0:03:09.880,0:03:15.000
about cybersecurity... and in a sense,[br]again, back when I joined, cybersecurity

0:03:15.000,0:03:19.300
wasn't a thing. It used to be called[br]infosec. And now very few people know what

0:03:19.300,0:03:25.150
infosec even means. So notoriety is one[br]thing, but notoriety is not the only thing

0:03:25.150,0:03:29.099
that changed. The other thing that changed[br]is the amount of money deployed in the

0:03:29.099,0:03:37.120
sector. So, back in 2004, depending on the[br]estimate you trust there, the total

0:03:37.120,0:03:42.220
spending for cybersecurity was between[br]three point five to ten billion dollars.

0:03:42.220,0:03:48.750
Today's over 120 billion dollars. And so[br]it kind of looks exponential. But the

0:03:48.750,0:03:55.910
spending came with a almost... Like, a[br]very significant change in the type of

0:03:55.910,0:04:00.520
players there are in the industry today.[br]So a lot of the traditional vendors that

0:04:00.520,0:04:04.790
used to sell security software have kind[br]of disappeared. And what you have today

0:04:04.790,0:04:09.910
are two kinds of player largely. You have[br]the big tech vendors. So you have

0:04:09.910,0:04:13.709
companies like Google, Amazon, Apple and[br]so on, and so forth, that have sort of

0:04:13.709,0:04:17.341
decided to take security more seriously.[br]Some of them are trying to monetize

0:04:17.341,0:04:23.620
security. Others are trying to use it as a[br]sort of like slogan to sell more phones.

0:04:23.620,0:04:30.186
The other group of people or entities are[br]large cloud-based security vendors. And

0:04:30.186,0:04:34.629
what both groups have in common is that[br]they're using more and more sort of like

0:04:34.629,0:04:43.470
cloud-scale and cloud resources to try to[br]tackle security problems. And so what I

0:04:43.470,0:04:49.950
want to discuss today is from a somewhat[br]technical perspective, our scale has made

0:04:49.950,0:04:55.820
a significant impact in the way we[br]approach problems, but also in the kind of

0:04:55.820,0:05:00.520
people that we have in the industry today.[br]So what I'm gonna do is to give you a few

0:05:00.520,0:05:08.320
examples of the change that we've gone[br]through. And one of the, I think one of

0:05:08.320,0:05:12.669
the important things to keep in mind is[br]that what scale has done, at least in the

0:05:12.669,0:05:20.474
past decade, is it has given defense a[br]significant edge over offense. It's not

0:05:20.474,0:05:24.449
necessarily here to stay, but I think it's[br]an important trend that it's somewhat

0:05:24.449,0:05:32.150
overlooked. So let me start with endpoint[br]security. So back in the 80s, a few people

0:05:32.150,0:05:37.850
started to toy with this idea of IDS[br]systems. And the idea behind an IDS system

0:05:37.850,0:05:42.750
is pretty straightforward. You want to[br]create a baseline benign behavior for a

0:05:42.750,0:05:48.060
machine, and then if that machine starts[br]to exhibit anomalous behavior, you would

0:05:48.060,0:05:54.510
flag that as potentially malicious. This[br]was the first paper published on a host

0:05:54.510,0:05:59.860
based IDS systems. Now, the problem with[br]host based IDS systems is that they never

0:05:59.860,0:06:06.800
actually quite made it as a commercial[br]product. And the reason for this... There

0:06:06.800,0:06:11.229
were largely two reasons for this: The[br]first one is that it was really hard to

0:06:11.229,0:06:17.320
interpret results. So it was really hard[br]to figure out: "Hey, here's an anomaly and

0:06:17.320,0:06:21.509
this is why this anomaly might actually be[br]a security incident." The second problem

0:06:21.509,0:06:26.669
was, you had a lot of false positives and[br]it was kind of hard to establish a benign

0:06:26.669,0:06:31.830
baseline on a single machine, because you[br]had a lot of variance on how an individual

0:06:31.830,0:06:37.180
machine would behave. So what happened is[br]that commercially we kind of got stuck

0:06:37.180,0:06:43.910
with antivirus, antivirus vendors, and[br]signatures for a very long time. Now, fast

0:06:43.910,0:06:55.890
forward to 2013. As I mentioned, the APT1[br]report came out and AV companies actually

0:06:55.890,0:07:02.310
admitted that they weren't that useful at[br]detecting stuff like Stuxnet or Flame. And

0:07:02.310,0:07:08.570
so there was kind of like a new kid on the[br]block, and the buzzword name for it was

0:07:08.570,0:07:13.760
EDR. So, endpoint detection and response.[br]But when you strip EDR from like the

0:07:13.760,0:07:20.080
marketing fluff, what EDR really is, is[br]effectively host-based intrusion detection

0:07:20.080,0:07:27.430
system at scale. So in other words, scale[br]and ability to have cloud-scale has made

0:07:27.430,0:07:33.530
IDS systems possible in two ways. The[br]first one is that because you actually now

0:07:33.530,0:07:38.310
have this sort of like data lake with a[br]number of machines, you have much larger

0:07:38.310,0:07:43.930
datasets to train and test detections on.[br]What that means is, it's much easier to

0:07:43.930,0:07:49.580
establish the benign baseline, and it's[br]much easier to create proper detection, so

0:07:49.580,0:07:54.779
they don't detect just malware, but also[br]sort of like malware-less attacks. The

0:07:54.779,0:08:01.370
other thing is that EDR vendors and also[br]companies that have internal EDR systems

0:08:01.370,0:08:05.630
have -to a large extent- economy of scale.[br]And what that means is you can actually

0:08:05.630,0:08:11.410
have a team of analysts that can create[br]explanation and sort of an ontology to

0:08:11.410,0:08:17.740
explain why a given detection may actually[br]represent a security incident. On top of

0:08:17.740,0:08:22.410
it, because you have those data lake, you[br]are now able to mine that for a data to

0:08:22.410,0:08:27.900
figure out new attack patterns that you[br]weren't aware of in the past. So this in

0:08:27.900,0:08:31.550
itself is a pretty significant[br]achievement, because we finally managed to

0:08:31.550,0:08:36.550
move away from signatures to something[br]that works much better and is able to

0:08:36.550,0:08:42.079
detect a broader range of attacks. But the[br]other thing that EDR system solved, sort

0:08:42.079,0:08:48.050
of like as a side effect, is the data[br]sharing problem. So, if you've been around

0:08:48.050,0:08:52.770
industry for a long time, there have been[br]many attempts at sharing threat data

0:08:52.770,0:08:57.870
across different entities and they all[br]kind of failed because it was really hard

0:08:57.870,0:09:04.290
to establish sort of like a protocol to[br]share those data. But implicitly, what EDR

0:09:04.290,0:09:12.810
has done, is to force people to share and[br]collect threat intelligence data and just

0:09:12.810,0:09:18.750
in general data from endpoints. And so now[br]you have the vendors being the sort of

0:09:18.750,0:09:24.310
implicitly trusted third party that can[br]use that data to write detections that can

0:09:24.310,0:09:29.870
be applied to all the systems, not just an[br]individual company or any individual

0:09:29.870,0:09:36.150
machine. And the result of that, the[br]implication of that is that the meme that

0:09:36.150,0:09:39.820
the attacker only needs to get it right[br]once and the defender needs to get it

0:09:39.820,0:09:43.770
right all the time is actually not that[br]true anymore, because in the past you were

0:09:43.770,0:09:48.570
in a situation where if you had an[br]offensive infrastructure, whether it was

0:09:48.570,0:09:54.500
servers, whether it was exploit chains,[br]you could more often than not reuse them

0:09:54.500,0:09:58.630
over and over again. Even if you had[br]malware, all you had to do was to slightly

0:09:58.630,0:10:05.440
mutate the sample and you would pass any[br]kind of detection. But today that is not

0:10:05.440,0:10:10.570
true anymore in most cases. If you get[br]detected on one machine, all of the

0:10:10.570,0:10:14.661
sudden, all of your offensive[br]infrastructure has to be scrapped and you

0:10:14.661,0:10:20.480
need to start from scratch. So this is the[br]first example and I think in itself is

0:10:20.480,0:10:26.790
quite significant. The second example that[br]I want to talk about is fuzzing. And

0:10:26.790,0:10:30.610
fuzzing is interesting also for another[br]reason, which is it gives us a glimpse

0:10:30.610,0:10:36.310
into what I think the future might look[br]like. So as you're probably familiar, if

0:10:36.310,0:10:40.240
you've done any apps like work in the[br]past, Fuzzing has been sort of like a

0:10:40.240,0:10:46.790
staple in the apps like arsenal for a very[br]long time. But in the past, probably five

0:10:46.790,0:10:51.990
years or so, fuzzing has gone through some[br]kind of renaissance in the sense that two

0:10:51.990,0:10:57.630
things have changed. Two things have[br]improved massively. The first one is that

0:10:57.630,0:11:05.180
we finally managed to find a better way to[br]assess the fitness function that we use to

0:11:05.180,0:11:10.360
guide fuzzing. So a few years ago,[br]somebody called Michal Zalewski release a

0:11:10.360,0:11:18.170
fuzzer called AFL, and one of the primary[br]intuitions behind AFL was that you could

0:11:18.170,0:11:23.399
actually instead of using code coverage to[br]drive the fuzzer, you could use path

0:11:23.399,0:11:28.640
coverage to drive the fuzzer and that[br]turned fuzzing in a way more, you know,

0:11:28.640,0:11:34.300
much more effective instrument to find[br]bugs. But the second intuition that I

0:11:34.300,0:11:39.620
think is even more important and that[br]changed fuzzing significantly is the fact

0:11:39.620,0:11:46.410
that as far as fuzzing is concerned, speed[br]is more important than smarts. You know,

0:11:46.410,0:11:52.850
in a way. And what I mean by this is that[br]when you look at AFL, AFL as an example,

0:11:52.850,0:11:57.959
is an extremely dumb fuzzer. It does stuff[br]like byte flipping, bit flipping. It has

0:11:57.959,0:12:04.080
very, very simple strategies for mutation.[br]But what AFL does very well is, it's an

0:12:04.080,0:12:09.970
extremely optimized piece of C code and it[br]scales very well. And so you are in a

0:12:09.970,0:12:15.279
situation where if you have a reasonably[br]good server, where you can run AFL, you

0:12:15.279,0:12:23.360
can synthesize a very complex file formats[br]in very few iterations. And what I find

0:12:23.360,0:12:27.689
that amazing is that this intuition[br]doesn't apply just to file formats. This

0:12:27.689,0:12:32.190
intuition applies to much more complicated[br]state machines. So the other example that

0:12:32.190,0:12:38.010
I want to talk about as far as fuzzing[br]goes, is ClusterFuzz. ClusterFuzz is a

0:12:38.010,0:12:45.810
fuzzing harness used by the Chrome team to[br]find bugs in Chrome and ClusterFuzz has

0:12:45.810,0:12:50.760
been around for about six years. In the[br]span of six years ClusterFuzz found

0:12:50.760,0:12:54.649
sixteen thousand bugs in Chrome alone,[br]plus another eleven thousand bugs in a

0:12:54.649,0:13:00.250
bunch of open source projects. If you[br]compare ClusterFuzz with the second most

0:13:00.250,0:13:06.940
successful fuzzer are out there for[br]JavaScript engines, you'll find that the

0:13:06.940,0:13:12.980
second fuzzer called jsfunfuzz found about[br]six thousand bugs in the span of eight to

0:13:12.980,0:13:18.570
nine years. And if you look at the code,[br]the main difference between the two is not

0:13:18.570,0:13:22.850
the mutation engine. The mutation engine[br]is actually pretty similar. They don't...

0:13:22.850,0:13:26.280
ClusterFuzz doesn't do anything[br]particularly fancy, but what ClusterFuzz

0:13:26.280,0:13:33.010
does very well is it scales massively. So[br]ClusterFuzz today runs on about twenty

0:13:33.010,0:13:38.920
five thousand cores. And so with fuzzing[br]we're now at a stage where the bug churn

0:13:38.920,0:13:45.410
is so high that defense again has an[br]advantage compared to offense because it

0:13:45.410,0:13:51.080
becomes much quicker to fix bugs than it[br]becomes to fix exploit chains, which would

0:13:51.080,0:13:56.680
have been unthinkable just a few years[br]ago. The last example that I want to bring

0:13:56.680,0:14:04.019
up is a slightly different one. So, a few[br]months ago, the TAG team at Google found

0:14:04.019,0:14:11.839
in the wild a server that was used for a[br]watering hole attack, and it was thought

0:14:11.839,0:14:17.118
that the server was used against Chinese[br]Muslim dissidents. But what's interesting

0:14:17.118,0:14:21.040
is that the way you would detect this kind[br]of attack in the past was that you would

0:14:21.040,0:14:26.920
have a compromised device and you would[br]sort of like work backwards from there.

0:14:26.920,0:14:32.250
You would try to figure out how the device[br]got compromised. What's interesting is

0:14:32.250,0:14:36.190
that the way they found the server was[br]effectively to mine their local copy of

0:14:36.190,0:14:43.370
the Internet. And so, again, this is[br]another example of scale that gives them a

0:14:43.370,0:14:49.920
significant advantage to defense versus[br]offense. So, in all of these examples

0:14:49.920,0:14:55.680
that I brought up, I think when you look[br]deeper into them, you realise that it's

0:14:55.680,0:14:59.680
not that the state of security has[br]improved because we've necessarily got

0:14:59.680,0:15:05.600
better at security. It's that it has[br]improved because we got better at handling

0:15:05.600,0:15:10.380
large amounts of data, storing large[br]amounts of data and spawning computing

0:15:10.380,0:15:18.780
power and resources quickly when needed.[br]So, if that is true, then one of... the

0:15:18.780,0:15:22.550
other thing to realise is that in many of[br]these cases, when you look back at the

0:15:22.550,0:15:29.360
examples that I brought up, it actually is[br]the case that the problem at scale looks

0:15:29.360,0:15:33.790
very different from the problem at a much[br]smaller scale, and the solution as a

0:15:33.790,0:15:39.270
result is very different. So I'm going to[br]use a silly example to try to drive the

0:15:39.270,0:15:45.190
point home. Let's say that your job is to[br]audit this function. And so you need to

0:15:45.190,0:15:49.279
find bugs and this function. In case[br]you're not familiar with C code, the

0:15:49.279,0:15:58.170
problem here is that you can overflow or[br]underflow that buffer at your pleasure

0:15:58.170,0:16:05.329
just by passing a random value for "pos".[br]Now, if you were to manually audit this

0:16:05.329,0:16:12.760
thing, or if your job was to audit[br]this function, well, you could use... You

0:16:12.760,0:16:18.010
would have many tools you could use. You[br]could do manual code auditing. You could

0:16:18.010,0:16:21.880
use a symbolic execution engine. You could[br]use a fuzzer. You could use static

0:16:21.880,0:16:27.910
analysis. And a lot of the solutions that[br]are optimal for this case end up being

0:16:27.910,0:16:32.389
completely useless, if now your task[br]becomes to audit this function and this is

0:16:32.389,0:16:39.010
because the state machine that this[br]function implements is so complex that a

0:16:39.010,0:16:44.890
lot of those tools don't scale to get[br]here. Now, for a lot of the problems I've

0:16:44.890,0:16:51.329
talked about it, we kind of face the same[br]situation where the solution at scale and

0:16:51.329,0:16:58.269
a problem of scale looks very different.[br]And so one thing, one realization is that

0:16:58.269,0:17:02.839
engineering skills today are actually more[br]important than security skills in many

0:17:02.839,0:17:09.240
ways. So when you look... when you think[br]back at fuzzers like ClusterFuzz, or AFL,

0:17:09.240,0:17:14.530
or again EDR tools, what matters there is[br]not really any kind of security expertise.

0:17:14.530,0:17:20.490
What matters there is the ability to[br]design systems that scale arbitrarily

0:17:20.490,0:17:26.260
well, in sort of like their backend, to[br]design, to write code that is very

0:17:26.260,0:17:32.799
performant and none of this has really[br]much to do with traditional security

0:17:32.799,0:17:38.529
skills. The other thing you realize is[br]when you combine these two things is that

0:17:38.529,0:17:47.129
a lot of what we consider research is[br]happening in a different world to some

0:17:47.129,0:17:52.139
extent. So, six years ago, about six years[br]ago, I gave a talk at a conference called

0:17:52.139,0:17:57.479
CCS and it's an academic conference. And[br]basically what I... my message there was

0:17:57.479,0:18:02.321
that if academia wanted to do research[br]that was relevant to the industry, they

0:18:02.321,0:18:07.489
had to talk to the industry more. And I[br]think we are now reached the point where

0:18:07.489,0:18:13.379
this is true for industry in the sense[br]that if we want to still produce

0:18:13.379,0:18:19.960
significant research at places like CCC,[br]we are kind of in a bad spot because a lot

0:18:19.960,0:18:25.549
of the innovation that is practical in the[br]real world is happening very large... in

0:18:25.549,0:18:30.930
very large environments that few of us[br]have access to. And I'm going to talk a

0:18:30.930,0:18:35.090
bit more about this in a second. But[br]before I do, there is a question that I

0:18:35.090,0:18:41.970
think is important to digress on a bit.[br]And this is the question of:

0:18:41.970,0:18:46.350
Have we changed[br]significantly as an industry, are we are

0:18:46.350,0:18:53.389
in sort of like a new age of the industry?[br]And I think that if you were to split the

0:18:53.389,0:18:58.780
industry in phases, we left the kind of[br]like artisanal phase, the phase where what

0:18:58.780,0:19:03.789
mattered the most was security knowledge.[br]And we're now in a phase where we have

0:19:03.789,0:19:08.710
this large scale expert systems that[br]require significant more

0:19:08.710,0:19:13.549
engineering skills, that they require[br]security skills, but they still take input

0:19:13.549,0:19:18.979
from kind of like security practitioners.[br]And I think there is a question of: Is

0:19:18.979,0:19:23.950
this it? Or is this the kind of like where[br]the industry is going to stay, or is there

0:19:23.950,0:19:31.499
more to come? I know better than to make[br]predictions in security, 'cause most of

0:19:31.499,0:19:36.200
the times they tend to be wrong, but I[br]want to draw a parallel. And that parallel

0:19:36.200,0:19:41.539
is with another industry, and it's Machine[br]Learning. So, somebody called Rich Sutton

0:19:41.539,0:19:45.760
who is one of the godfather of machine[br]learning, wrote an essay called "The

0:19:45.760,0:19:52.380
Bitter Truth". And in that essay, he[br]reflects on many decades of machine

0:19:52.380,0:19:58.090
learning work and what he says in the[br]essay is that people tried for a very long

0:19:58.090,0:20:02.149
time to embed knowledge in machine[br]learning systems. The rationale was that

0:20:02.149,0:20:06.220
if you could embed knowledge, you would[br]have a smart... you could build smarter

0:20:06.220,0:20:12.019
systems. But it turns out that what[br]actually worked were things that scale

0:20:12.019,0:20:18.239
arbitrarily well with more computational[br]power, more storage capabilities. And so,

0:20:18.239,0:20:23.049
what he realized was that what actually[br]worked for machine learning was search and

0:20:23.049,0:20:29.619
learning. And when you look at stuff like[br]AlphaGo today, AlphaGo works not really

0:20:29.619,0:20:36.190
because it has a lot of goal knowledge. It[br]works because it has a lot of computing

0:20:36.190,0:20:44.220
power. It has the ability to train itself[br]faster and faster. And so there is a

0:20:44.220,0:20:49.230
question of how much of this can[br]potentially port to security. Obviously,

0:20:49.230,0:20:53.140
security is a bit different, it's more[br]adversarial in nature, so it's not quite

0:20:53.140,0:20:58.340
the same thing. But I think we are... we[br]have only scratched the surface of what

0:20:58.340,0:21:04.789
can be done as far as reaching a newer[br]level of automation where security

0:21:04.789,0:21:09.960
knowledge will matter less and less. So I[br]want to go back to the AFL example that I

0:21:09.960,0:21:16.019
brought up earlier, because one way to[br]think about AFL is to think about it as a

0:21:16.019,0:21:22.619
reinforcement learning fuzzer. And what I[br]mean by this... is in this slide, what AFL

0:21:22.619,0:21:29.960
capable to do, was to take one single JPEG[br]file and in the span of about twelve

0:21:29.960,0:21:35.700
hundred days iteration, they were[br]completely random dumb mutation. Go to

0:21:35.700,0:21:40.909
another well-formed JPEG file. And when[br]you think about it, this is an amazing

0:21:40.909,0:21:47.749
achievement because there was no knowledge[br]of the file format in AFL. And so we are

0:21:47.749,0:21:52.830
in... we are now more and more building[br]systems that do not require any kind of

0:21:52.830,0:21:57.470
expert knowledge as far as security is[br]concerned. The other example that I want

0:21:57.470,0:22:01.799
to talk about is the Cyber Grand[br]Challenge. So DARPA ... a few years ago

0:22:01.799,0:22:04.809
started this competition called Cyber[br]Grand Challenge,

0:22:04.809,0:22:09.529
and the Idea behind cyber grand challenge[br]was to try to answer the question of can

0:22:09.529,0:22:14.230
you automagically do exploit generation[br]and can you automatically do patch

0:22:14.230,0:22:20.940
generation. And obviously they did it on[br]some well toy environments. But if you

0:22:20.940,0:22:24.859
talk today to anybody who does automatic[br]export generation research, they'll tell

0:22:24.859,0:22:30.509
you that we are probably five years away[br]from being able to automatically

0:22:30.509,0:22:35.991
synthesize non trivial exploits, which is[br]an amazing achievement because if you

0:22:35.991,0:22:40.659
asked anybody five years ago, most people,[br]myself included, would tell you that

0:22:40.659,0:22:45.850
time would not come anytime soon. The[br]third example that I want to bring up is

0:22:45.850,0:22:51.130
something called Amazon Macie, which is a[br]new sort of service released by Amazon.

0:22:51.130,0:22:56.309
And what it does is basically uses machine[br]learning to try to automatically identify

0:22:56.309,0:23:01.740
PII information and intellectual property[br]in the data. You started with a AWS and

0:23:01.740,0:23:07.950
then tried to give you a better sense of[br]what happens to that data. So in all of

0:23:07.950,0:23:11.809
these cases, when you think about them,[br]again, it's a scenario where there is very

0:23:11.809,0:23:21.120
little security expertise needed. What[br]matters more is engineering skills. So

0:23:21.120,0:23:28.299
everything I've said so far is reasonably[br]positive for scale. Is a positive scale,

0:23:28.299,0:23:34.429
it is a positive, sort of like case for[br]scale. But I think that there is another

0:23:34.429,0:23:41.649
side of scale that is worth touching on.[br]And I think especially to this audience is

0:23:41.649,0:23:48.779
important to think about. And the other[br]side of scale is that scale breeds

0:23:48.779,0:23:55.239
centralization. And so to the point I was[br]making earlier about where, where is

0:23:55.239,0:24:00.070
research happening, where is real word[br]applicable research happening, and that

0:24:00.070,0:24:08.129
happens increasingly in places like Amazon[br]or Google or large security vendors or

0:24:08.129,0:24:14.140
some intelligence agencies. And so what[br]that means is the field, the barriers to

0:24:14.140,0:24:20.399
entry to the field are are significantly[br]higher. So I said earlier that I tried to

0:24:20.399,0:24:24.962
join the industry about 15 years ago. Back[br]then, I was still in high school. And one

0:24:24.962,0:24:28.602
of the things that was cool about the[br]industry for me was that as long as you

0:24:28.602,0:24:33.519
had a reasonably decent internet[br]connection and a laptop, you could

0:24:33.519,0:24:39.669
contribute to the top of the industry. You[br]could see what everyone was up to. You

0:24:39.669,0:24:44.450
could do research that was relevant to[br]what the what the industry was working on.

0:24:44.450,0:24:49.450
But today, the same sort of like 15, 16[br]year old kid in high school would have a

0:24:49.450,0:24:54.639
much harder time contributing to the[br]industry. And so we are in a situation

0:24:54.639,0:25:00.769
where... but because scale breeds[br]centralization. We are in a situation

0:25:00.769,0:25:06.029
where we will likely increase the barrier[br]of entry to a point where if you want to

0:25:06.029,0:25:11.279
contribute meaningfully to security, you[br]will have to go through a very

0:25:11.279,0:25:16.070
standardized path where you probably do[br]computer science and then you go work for

0:25:16.070,0:25:25.890
a big tech company. And that's not[br]necessarily a positive. So I think the

0:25:25.890,0:25:31.309
same Kranzberg principle applies to scale[br]in a sense, where it has done a lot of

0:25:31.309,0:25:37.539
positive things for the sector, but it[br]also comes with some consequences. And if

0:25:37.539,0:25:44.960
there is one takeaway from this talk[br]that I would like you to have is to think

0:25:44.960,0:25:51.639
about how much something that is pretty[br]mundane that we take for granted in our

0:25:51.639,0:25:56.590
day to day has changed the industry and[br]how much that will probably contribute to

0:25:56.590,0:26:00.479
the next phase of the industry. And not[br]just from a technical standpoint. It's not

0:26:00.479,0:26:04.157
that the solutions we use today are[br]much different from what we used to use,

0:26:04.157,0:26:08.349
but also from the kind of people that are[br]part of the industry and the community.

0:26:08.349,0:26:12.714
And that's all I had. Thank you for[br]listening.

0:26:12.714,0:26:24.379
<i>Applause</i>

0:26:24.379,0:26:27.929
Herald: Thank you very much. We have time[br]for questions. So if you have any

0:26:27.929,0:26:31.740
questions for Vincenzo, please line up[br]behind the microphones that are marked

0:26:31.740,0:26:36.779
with numbers and I will give you a signal[br]if you can ask a question. We also have

0:26:36.779,0:26:41.169
our wonderful signal angels that have been[br]keeping an eye on the Internet to see if

0:26:41.169,0:26:46.820
there are any questions from either[br]Twitter, Mastodon or IRC. Are there any

0:26:46.820,0:26:53.159
questions from the Internet? We'll just[br]have to mic fourth... microphone number

0:26:53.159,0:26:57.849
nine to be turned on and then we'll have a[br]question from the Internet for Vincenzo.

0:26:57.849,0:27:01.419
And please don't be shy. Line up behind[br]the microphone. Ask any questions.

0:27:01.419,0:27:05.212
Signal Angel: Now it's on. But actually[br]there are no questions from the Internet

0:27:05.212,0:27:08.859
right now.[br]Herald: There must be people in the room

0:27:08.859,0:27:14.799
that have some questions. I cannot see[br]anybody lining up. Do you have any advice

0:27:14.799,0:27:18.789
for people that want to work on some[br]security on scale?

0:27:18.789,0:27:23.690
Vincenzo: I mean, I just had to think a[br]lot of the interesting research is

0:27:23.690,0:27:28.499
happening more and more like tech[br]companies and similar. And so as much as

0:27:28.499,0:27:34.500
it pains me. It's probably the advice to[br]think either whether you can find other

0:27:34.500,0:27:40.080
ways to get access to large amounts of[br]data or and computational power or maybe

0:27:40.080,0:27:45.330
consideresting into one of those places.[br]Herald: And we now actually have questions

0:27:45.330,0:27:51.302
at microphone number one.[br]Microphone 1: Can you hear me? Yeah. Thank

0:27:51.302,0:27:55.200
you for the great talk. You're making a[br]very strong case that information at scale

0:27:55.200,0:27:59.619
has benefited security, but is that also[br]statistical evidence for that?

0:27:59.619,0:28:04.999
Vincenzo: So I think, well, it's a bit[br]hard to answer the question because a lot

0:28:04.999,0:28:09.570
of the people that have an incentive to[br]answer that question are also kind of

0:28:09.570,0:28:17.200
biased, but I think when you look at[br]metrics like well, time in terms of how

0:28:17.200,0:28:22.049
much time people spend on attackers[br]machine, that has decreased significantly

0:28:22.049,0:28:28.840
like it, it has statistically decreased[br]significantly. As far as the other

0:28:28.840,0:28:33.330
examples I brought up, like fuzzing and[br]similar. I don't think I as far as I'm

0:28:33.330,0:28:42.830
aware, there hasn't been any sort of[br]rigorous study around where now we are.

0:28:42.830,0:28:50.809
We've reached the place where defense has[br]kind of like an edge against offense. But

0:28:50.809,0:28:56.509
I think if I talk to anybody who has kind[br]of like some offensive security knowledge

0:28:56.509,0:29:04.830
or they did work in offense, the overall[br]feedback that I hear is that it's becoming

0:29:04.830,0:29:12.339
much harder to keep bug chains alive for a[br]long time. And this is in large part not

0:29:12.339,0:29:18.339
really for for countermeasures. It's in[br]large part because bugs keep churning.

0:29:18.339,0:29:23.369
So there isn't a lot of[br]statistical evidence, but from what I can

0:29:23.369,0:29:28.649
gather, it seems to be the case.[br]Herald: We have one more question from

0:29:28.649,0:29:32.179
microphone number one.[br]Microphone 1: So thank you for the

0:29:32.179,0:29:36.190
interesting talk. My question goes in the[br]direction of the centralization that you

0:29:36.190,0:29:39.570
mentioned, that the large like the[br]hyperscalers are converging to be the

0:29:39.570,0:29:43.919
hotspots for security research. So is[br]there any guidance you can give for us as

0:29:43.919,0:29:49.908
a community how to to retain access to the[br]field and contribute?

0:29:49.908,0:29:53.119
Vincenzo: Yes. So. So I think[br]it's an interesting situation

0:29:53.119,0:29:56.869
because more and more there[br]are open source tools that

0:29:56.869,0:30:01.549
allow you to gather the data. But the[br]problem with these data gathering

0:30:01.549,0:30:06.369
exercises is not too much how to gather[br]the data. The problem is what to gather

0:30:06.369,0:30:11.999
and how to keep it. Because when you look[br]at the cloud bill, for most

0:30:11.999,0:30:17.216
players, it's extraordinarily high.[br]And I don't unfortunately, I don't have an

0:30:17.216,0:30:23.099
easy solution to that. I mean, you can use[br]pretty cheap cloud providers, but

0:30:23.099,0:30:29.159
it's still like, the expenditure is still[br]an order of magnitude higher than it used

0:30:29.159,0:30:34.580
to be. And I don't know, maybe academia[br]can step up. I'm not sure.

0:30:34.580,0:30:38.669
Herald: We have one last question from the[br]Internet. And you can stay at the

0:30:38.669,0:30:41.369
microphone if you have another question[br]for Vincenzo.

0:30:41.369,0:30:45.519
Signal: Yes. The Internet asked that. You[br]ask a lot about fuzzing at scale about

0:30:45.519,0:30:51.409
besides OSS-Fuzz, are you aware of any[br]other scaled large fuzzing infrastructure?

0:30:51.409,0:30:58.349
Vincenzo: That is publicly available? No.[br]But when you look at, I mean when you

0:30:58.349,0:31:03.630
look, for instance, of the participants[br]for Cyber Grand Challenge, a lot of them

0:31:03.630,0:31:13.389
were effectively using a significant[br]amount of CPU power for fuzzing. So I'm

0:31:13.389,0:31:17.409
not aware of any kind of like plug and[br]play fuzzing infrastructure you can use

0:31:17.409,0:31:25.539
aside from OSS-Fuzz. But there is a law,[br]like as far as I'm aware, everyone there

0:31:25.539,0:31:33.110
that does fuzzing for a living has now[br]access to significant resources and tries

0:31:33.110,0:31:38.719
to scale fuzzing infrastructure.[br]Herald: If we don't have any more

0:31:38.719,0:31:42.749
questions, this is your last chance to run[br]to a microphone or write a question on the

0:31:42.749,0:31:46.929
Internet. Then I think we should give a[br]big round of applause to Vincenzo.

0:31:46.929,0:31:48.449
Vincenzo: Thank you.

0:31:48.449,0:31:52.540
<i>Applause</i>

0:31:52.540,0:32:19.000
subtitles created by c3subtitles.de[br]in the year 2019. Join, and help us!