0:00:19.509,0:00:25.998 Herald: Vincezo Izzo is entrepreneur and[br]investor with a focus on cybersecurity. He 0:00:25.998,0:00:32.180 has started up, gotten bought, and[br]repeated this a few times, and now he is 0:00:32.180,0:00:37.280 an advisor who advises people on starting[br]up companies, getting bought, and 0:00:37.280,0:00:45.489 repeating that. He is also director at[br]CrowdStrike and an associate at MIT Media 0:00:45.489,0:00:54.489 Lab.[br]Just checking the time to make sure that 0:00:54.489,0:01:00.129 we start on time, and this is, start[br]talking now. On the scale of cyber 0:01:00.129,0:01:03.812 security. Please give a warm welcome to[br]Vincenzo. 0:01:03.812,0:01:09.004 Applause 0:01:09.004,0:01:14.290 Vincenzo Izzo: So hi, everyone, thanks for[br]being here. As Karen said, I have made a 0:01:14.290,0:01:19.080 few changes to my career, but my[br]background is originally technical, and 0:01:19.080,0:01:24.530 what I wanted to do today is to talk about[br]a trend that I think we sort of take for 0:01:24.530,0:01:32.150 granted and it's to some extent obvious[br]but also underappreciated. And that is the 0:01:32.150,0:01:39.705 cloud scale in security. Specifically,[br]when I say cloud scale, what I mean is the 0:01:39.705,0:01:47.110 ability to process very large amounts of[br]data as well as spawn computing power with 0:01:47.110,0:01:55.258 ease, and how that has played a role in[br]our industry in the past decade or so. But 0:01:55.258,0:02:01.060 before I talk about that, I think some[br]context is important. So I joined the 0:02:01.060,0:02:05.950 industry about 15 years ago and back in[br]the days, even even a place like the 0:02:05.950,0:02:12.640 Congress was a much smaller place. It was[br]to some extent cozier and the community 0:02:12.640,0:02:19.670 was tiny. The industry was fairly niche.[br]And then something happened around 2010. 0:02:19.670,0:02:23.980 People realized that there were more and[br]more state sponsored attacks being carried 0:02:23.980,0:02:31.560 out. From Operation Aurora against Google,[br]to the Mandiant report APT1, that was the 0:02:31.560,0:02:39.060 first reported document how the Chinese[br]PLA was hacking west- let's call it the 0:02:39.060,0:02:47.150 western world infrastructure for IP theft.[br]And that changed a lot for the 0:02:47.150,0:02:53.770 industry. There have been two significant[br]changes because of all of this attention. 0:02:53.770,0:02:59.000 The first one is notoriety. We went from[br]being, as I said, a relatively unknown 0:02:59.000,0:03:04.920 industry to something that everyone talk[br]about. If you open any kind of a 0:03:04.920,0:03:09.880 newspaper, there's almost always an[br]article on cybersecurity, boardrooms talk 0:03:09.880,0:03:15.000 about cybersecurity... and in a sense,[br]again, back when I joined, cybersecurity 0:03:15.000,0:03:19.300 wasn't a thing. It used to be called[br]infosec. And now very few people know what 0:03:19.300,0:03:25.150 infosec even means. So notoriety is one[br]thing, but notoriety is not the only thing 0:03:25.150,0:03:29.099 that changed. The other thing that changed[br]is the amount of money deployed in the 0:03:29.099,0:03:37.120 sector. So, back in 2004, depending on the[br]estimate you trust there, the total 0:03:37.120,0:03:42.220 spending for cybersecurity was between[br]three point five to ten billion dollars. 0:03:42.220,0:03:48.750 Today's over 120 billion dollars. And so[br]it kind of looks exponential. But the 0:03:48.750,0:03:55.910 spending came with a almost... Like, a[br]very significant change in the type of 0:03:55.910,0:04:00.520 players there are in the industry today.[br]So a lot of the traditional vendors that 0:04:00.520,0:04:04.790 used to sell security software have kind[br]of disappeared. And what you have today 0:04:04.790,0:04:09.910 are two kinds of player largely. You have[br]the big tech vendors. So you have 0:04:09.910,0:04:13.709 companies like Google, Amazon, Apple and[br]so on, and so forth, that have sort of 0:04:13.709,0:04:17.341 decided to take security more seriously.[br]Some of them are trying to monetize 0:04:17.341,0:04:23.620 security. Others are trying to use it as a[br]sort of like slogan to sell more phones. 0:04:23.620,0:04:30.186 The other group of people or entities are[br]large cloud-based security vendors. And 0:04:30.186,0:04:34.629 what both groups have in common is that[br]they're using more and more sort of like 0:04:34.629,0:04:43.470 cloud-scale and cloud resources to try to[br]tackle security problems. And so what I 0:04:43.470,0:04:49.950 want to discuss today is from a somewhat[br]technical perspective, our scale has made 0:04:49.950,0:04:55.820 a significant impact in the way we[br]approach problems, but also in the kind of 0:04:55.820,0:05:00.520 people that we have in the industry today.[br]So what I'm gonna do is to give you a few 0:05:00.520,0:05:08.320 examples of the change that we've gone[br]through. And one of the, I think one of 0:05:08.320,0:05:12.669 the important things to keep in mind is[br]that what scale has done, at least in the 0:05:12.669,0:05:20.474 past decade, is it has given defense a[br]significant edge over offense. It's not 0:05:20.474,0:05:24.449 necessarily here to stay, but I think it's[br]an important trend that it's somewhat 0:05:24.449,0:05:32.150 overlooked. So let me start with endpoint[br]security. So back in the 80s, a few people 0:05:32.150,0:05:37.850 started to toy with this idea of IDS[br]systems. And the idea behind an IDS system 0:05:37.850,0:05:42.750 is pretty straightforward. You want to[br]create a baseline benign behavior for a 0:05:42.750,0:05:48.060 machine, and then if that machine starts[br]to exhibit anomalous behavior, you would 0:05:48.060,0:05:54.510 flag that as potentially malicious. This[br]was the first paper published on a host 0:05:54.510,0:05:59.860 based IDS systems. Now, the problem with[br]host based IDS systems is that they never 0:05:59.860,0:06:06.800 actually quite made it as a commercial[br]product. And the reason for this... There 0:06:06.800,0:06:11.229 were largely two reasons for this: The[br]first one is that it was really hard to 0:06:11.229,0:06:17.320 interpret results. So it was really hard[br]to figure out: "Hey, here's an anomaly and 0:06:17.320,0:06:21.509 this is why this anomaly might actually be[br]a security incident." The second problem 0:06:21.509,0:06:26.669 was, you had a lot of false positives and[br]it was kind of hard to establish a benign 0:06:26.669,0:06:31.830 baseline on a single machine, because you[br]had a lot of variance on how an individual 0:06:31.830,0:06:37.180 machine would behave. So what happened is[br]that commercially we kind of got stuck 0:06:37.180,0:06:43.910 with antivirus, antivirus vendors, and[br]signatures for a very long time. Now, fast 0:06:43.910,0:06:55.890 forward to 2013. As I mentioned, the APT1[br]report came out and AV companies actually 0:06:55.890,0:07:02.310 admitted that they weren't that useful at[br]detecting stuff like Stuxnet or Flame. And 0:07:02.310,0:07:08.570 so there was kind of like a new kid on the[br]block, and the buzzword name for it was 0:07:08.570,0:07:13.760 EDR. So, endpoint detection and response.[br]But when you strip EDR from like the 0:07:13.760,0:07:20.080 marketing fluff, what EDR really is, is[br]effectively host-based intrusion detection 0:07:20.080,0:07:27.430 system at scale. So in other words, scale[br]and ability to have cloud-scale has made 0:07:27.430,0:07:33.530 IDS systems possible in two ways. The[br]first one is that because you actually now 0:07:33.530,0:07:38.310 have this sort of like data lake with a[br]number of machines, you have much larger 0:07:38.310,0:07:43.930 datasets to train and test detections on.[br]What that means is, it's much easier to 0:07:43.930,0:07:49.580 establish the benign baseline, and it's[br]much easier to create proper detection, so 0:07:49.580,0:07:54.779 they don't detect just malware, but also[br]sort of like malware-less attacks. The 0:07:54.779,0:08:01.370 other thing is that EDR vendors and also[br]companies that have internal EDR systems 0:08:01.370,0:08:05.630 have -to a large extent- economy of scale.[br]And what that means is you can actually 0:08:05.630,0:08:11.410 have a team of analysts that can create[br]explanation and sort of an ontology to 0:08:11.410,0:08:17.740 explain why a given detection may actually[br]represent a security incident. On top of 0:08:17.740,0:08:22.410 it, because you have those data lake, you[br]are now able to mine that for a data to 0:08:22.410,0:08:27.900 figure out new attack patterns that you[br]weren't aware of in the past. So this in 0:08:27.900,0:08:31.550 itself is a pretty significant[br]achievement, because we finally managed to 0:08:31.550,0:08:36.550 move away from signatures to something[br]that works much better and is able to 0:08:36.550,0:08:42.079 detect a broader range of attacks. But the[br]other thing that EDR system solved, sort 0:08:42.079,0:08:48.050 of like as a side effect, is the data[br]sharing problem. So, if you've been around 0:08:48.050,0:08:52.770 industry for a long time, there have been[br]many attempts at sharing threat data 0:08:52.770,0:08:57.870 across different entities and they all[br]kind of failed because it was really hard 0:08:57.870,0:09:04.290 to establish sort of like a protocol to[br]share those data. But implicitly, what EDR 0:09:04.290,0:09:12.810 has done, is to force people to share and[br]collect threat intelligence data and just 0:09:12.810,0:09:18.750 in general data from endpoints. And so now[br]you have the vendors being the sort of 0:09:18.750,0:09:24.310 implicitly trusted third party that can[br]use that data to write detections that can 0:09:24.310,0:09:29.870 be applied to all the systems, not just an[br]individual company or any individual 0:09:29.870,0:09:36.150 machine. And the result of that, the[br]implication of that is that the meme that 0:09:36.150,0:09:39.820 the attacker only needs to get it right[br]once and the defender needs to get it 0:09:39.820,0:09:43.770 right all the time is actually not that[br]true anymore, because in the past you were 0:09:43.770,0:09:48.570 in a situation where if you had an[br]offensive infrastructure, whether it was 0:09:48.570,0:09:54.500 servers, whether it was exploit chains,[br]you could more often than not reuse them 0:09:54.500,0:09:58.630 over and over again. Even if you had[br]malware, all you had to do was to slightly 0:09:58.630,0:10:05.440 mutate the sample and you would pass any[br]kind of detection. But today that is not 0:10:05.440,0:10:10.570 true anymore in most cases. If you get[br]detected on one machine, all of the 0:10:10.570,0:10:14.661 sudden, all of your offensive[br]infrastructure has to be scrapped and you 0:10:14.661,0:10:20.480 need to start from scratch. So this is the[br]first example and I think in itself is 0:10:20.480,0:10:26.790 quite significant. The second example that[br]I want to talk about is fuzzing. And 0:10:26.790,0:10:30.610 fuzzing is interesting also for another[br]reason, which is it gives us a glimpse 0:10:30.610,0:10:36.310 into what I think the future might look[br]like. So as you're probably familiar, if 0:10:36.310,0:10:40.240 you've done any apps like work in the[br]past, Fuzzing has been sort of like a 0:10:40.240,0:10:46.790 staple in the apps like arsenal for a very[br]long time. But in the past, probably five 0:10:46.790,0:10:51.990 years or so, fuzzing has gone through some[br]kind of renaissance in the sense that two 0:10:51.990,0:10:57.630 things have changed. Two things have[br]improved massively. The first one is that 0:10:57.630,0:11:05.180 we finally managed to find a better way to[br]assess the fitness function that we use to 0:11:05.180,0:11:10.360 guide fuzzing. So a few years ago,[br]somebody called Michal Zalewski release a 0:11:10.360,0:11:18.170 fuzzer called AFL, and one of the primary[br]intuitions behind AFL was that you could 0:11:18.170,0:11:23.399 actually instead of using code coverage to[br]drive the fuzzer, you could use path 0:11:23.399,0:11:28.640 coverage to drive the fuzzer and that[br]turned fuzzing in a way more, you know, 0:11:28.640,0:11:34.300 much more effective instrument to find[br]bugs. But the second intuition that I 0:11:34.300,0:11:39.620 think is even more important and that[br]changed fuzzing significantly is the fact 0:11:39.620,0:11:46.410 that as far as fuzzing is concerned, speed[br]is more important than smarts. You know, 0:11:46.410,0:11:52.850 in a way. And what I mean by this is that[br]when you look at AFL, AFL as an example, 0:11:52.850,0:11:57.959 is an extremely dumb fuzzer. It does stuff[br]like byte flipping, bit flipping. It has 0:11:57.959,0:12:04.080 very, very simple strategies for mutation.[br]But what AFL does very well is, it's an 0:12:04.080,0:12:09.970 extremely optimized piece of C code and it[br]scales very well. And so you are in a 0:12:09.970,0:12:15.279 situation where if you have a reasonably[br]good server, where you can run AFL, you 0:12:15.279,0:12:23.360 can synthesize a very complex file formats[br]in very few iterations. And what I find 0:12:23.360,0:12:27.689 that amazing is that this intuition[br]doesn't apply just to file formats. This 0:12:27.689,0:12:32.190 intuition applies to much more complicated[br]state machines. So the other example that 0:12:32.190,0:12:38.010 I want to talk about as far as fuzzing[br]goes, is ClusterFuzz. ClusterFuzz is a 0:12:38.010,0:12:45.810 fuzzing harness used by the Chrome team to[br]find bugs in Chrome and ClusterFuzz has 0:12:45.810,0:12:50.760 been around for about six years. In the[br]span of six years ClusterFuzz found 0:12:50.760,0:12:54.649 sixteen thousand bugs in Chrome alone,[br]plus another eleven thousand bugs in a 0:12:54.649,0:13:00.250 bunch of open source projects. If you[br]compare ClusterFuzz with the second most 0:13:00.250,0:13:06.940 successful fuzzer are out there for[br]JavaScript engines, you'll find that the 0:13:06.940,0:13:12.980 second fuzzer called jsfunfuzz found about[br]six thousand bugs in the span of eight to 0:13:12.980,0:13:18.570 nine years. And if you look at the code,[br]the main difference between the two is not 0:13:18.570,0:13:22.850 the mutation engine. The mutation engine[br]is actually pretty similar. They don't... 0:13:22.850,0:13:26.280 ClusterFuzz doesn't do anything[br]particularly fancy, but what ClusterFuzz 0:13:26.280,0:13:33.010 does very well is it scales massively. So[br]ClusterFuzz today runs on about twenty 0:13:33.010,0:13:38.920 five thousand cores. And so with fuzzing[br]we're now at a stage where the bug churn 0:13:38.920,0:13:45.410 is so high that defense again has an[br]advantage compared to offense because it 0:13:45.410,0:13:51.080 becomes much quicker to fix bugs than it[br]becomes to fix exploit chains, which would 0:13:51.080,0:13:56.680 have been unthinkable just a few years[br]ago. The last example that I want to bring 0:13:56.680,0:14:04.019 up is a slightly different one. So, a few[br]months ago, the TAG team at Google found 0:14:04.019,0:14:11.839 in the wild a server that was used for a[br]watering hole attack, and it was thought 0:14:11.839,0:14:17.118 that the server was used against Chinese[br]Muslim dissidents. But what's interesting 0:14:17.118,0:14:21.040 is that the way you would detect this kind[br]of attack in the past was that you would 0:14:21.040,0:14:26.920 have a compromised device and you would[br]sort of like work backwards from there. 0:14:26.920,0:14:32.250 You would try to figure out how the device[br]got compromised. What's interesting is 0:14:32.250,0:14:36.190 that the way they found the server was[br]effectively to mine their local copy of 0:14:36.190,0:14:43.370 the Internet. And so, again, this is[br]another example of scale that gives them a 0:14:43.370,0:14:49.920 significant advantage to defense versus[br]offense. So, in all of these examples 0:14:49.920,0:14:55.680 that I brought up, I think when you look[br]deeper into them, you realise that it's 0:14:55.680,0:14:59.680 not that the state of security has[br]improved because we've necessarily got 0:14:59.680,0:15:05.600 better at security. It's that it has[br]improved because we got better at handling 0:15:05.600,0:15:10.380 large amounts of data, storing large[br]amounts of data and spawning computing 0:15:10.380,0:15:18.780 power and resources quickly when needed.[br]So, if that is true, then one of... the 0:15:18.780,0:15:22.550 other thing to realise is that in many of[br]these cases, when you look back at the 0:15:22.550,0:15:29.360 examples that I brought up, it actually is[br]the case that the problem at scale looks 0:15:29.360,0:15:33.790 very different from the problem at a much[br]smaller scale, and the solution as a 0:15:33.790,0:15:39.270 result is very different. So I'm going to[br]use a silly example to try to drive the 0:15:39.270,0:15:45.190 point home. Let's say that your job is to[br]audit this function. And so you need to 0:15:45.190,0:15:49.279 find bugs and this function. In case[br]you're not familiar with C code, the 0:15:49.279,0:15:58.170 problem here is that you can overflow or[br]underflow that buffer at your pleasure 0:15:58.170,0:16:05.329 just by passing a random value for "pos".[br]Now, if you were to manually audit this 0:16:05.329,0:16:12.760 thing, or if your job was to audit[br]this function, well, you could use... You 0:16:12.760,0:16:18.010 would have many tools you could use. You[br]could do manual code auditing. You could 0:16:18.010,0:16:21.880 use a symbolic execution engine. You could[br]use a fuzzer. You could use static 0:16:21.880,0:16:27.910 analysis. And a lot of the solutions that[br]are optimal for this case end up being 0:16:27.910,0:16:32.389 completely useless, if now your task[br]becomes to audit this function and this is 0:16:32.389,0:16:39.010 because the state machine that this[br]function implements is so complex that a 0:16:39.010,0:16:44.890 lot of those tools don't scale to get[br]here. Now, for a lot of the problems I've 0:16:44.890,0:16:51.329 talked about it, we kind of face the same[br]situation where the solution at scale and 0:16:51.329,0:16:58.269 a problem of scale looks very different.[br]And so one thing, one realization is that 0:16:58.269,0:17:02.839 engineering skills today are actually more[br]important than security skills in many 0:17:02.839,0:17:09.240 ways. So when you look... when you think[br]back at fuzzers like ClusterFuzz, or AFL, 0:17:09.240,0:17:14.530 or again EDR tools, what matters there is[br]not really any kind of security expertise. 0:17:14.530,0:17:20.490 What matters there is the ability to[br]design systems that scale arbitrarily 0:17:20.490,0:17:26.260 well, in sort of like their backend, to[br]design, to write code that is very 0:17:26.260,0:17:32.799 performant and none of this has really[br]much to do with traditional security 0:17:32.799,0:17:38.529 skills. The other thing you realize is[br]when you combine these two things is that 0:17:38.529,0:17:47.129 a lot of what we consider research is[br]happening in a different world to some 0:17:47.129,0:17:52.139 extent. So, six years ago, about six years[br]ago, I gave a talk at a conference called 0:17:52.139,0:17:57.479 CCS and it's an academic conference. And[br]basically what I... my message there was 0:17:57.479,0:18:02.321 that if academia wanted to do research[br]that was relevant to the industry, they 0:18:02.321,0:18:07.489 had to talk to the industry more. And I[br]think we are now reached the point where 0:18:07.489,0:18:13.379 this is true for industry in the sense[br]that if we want to still produce 0:18:13.379,0:18:19.960 significant research at places like CCC,[br]we are kind of in a bad spot because a lot 0:18:19.960,0:18:25.549 of the innovation that is practical in the[br]real world is happening very large... in 0:18:25.549,0:18:30.930 very large environments that few of us[br]have access to. And I'm going to talk a 0:18:30.930,0:18:35.090 bit more about this in a second. But[br]before I do, there is a question that I 0:18:35.090,0:18:41.970 think is important to digress on a bit.[br]And this is the question of: 0:18:41.970,0:18:46.350 Have we changed[br]significantly as an industry, are we are 0:18:46.350,0:18:53.389 in sort of like a new age of the industry?[br]And I think that if you were to split the 0:18:53.389,0:18:58.780 industry in phases, we left the kind of[br]like artisanal phase, the phase where what 0:18:58.780,0:19:03.789 mattered the most was security knowledge.[br]And we're now in a phase where we have 0:19:03.789,0:19:08.710 this large scale expert systems that[br]require significant more 0:19:08.710,0:19:13.549 engineering skills, that they require[br]security skills, but they still take input 0:19:13.549,0:19:18.979 from kind of like security practitioners.[br]And I think there is a question of: Is 0:19:18.979,0:19:23.950 this it? Or is this the kind of like where[br]the industry is going to stay, or is there 0:19:23.950,0:19:31.499 more to come? I know better than to make[br]predictions in security, 'cause most of 0:19:31.499,0:19:36.200 the times they tend to be wrong, but I[br]want to draw a parallel. And that parallel 0:19:36.200,0:19:41.539 is with another industry, and it's Machine[br]Learning. So, somebody called Rich Sutton 0:19:41.539,0:19:45.760 who is one of the godfather of machine[br]learning, wrote an essay called "The 0:19:45.760,0:19:52.380 Bitter Truth". And in that essay, he[br]reflects on many decades of machine 0:19:52.380,0:19:58.090 learning work and what he says in the[br]essay is that people tried for a very long 0:19:58.090,0:20:02.149 time to embed knowledge in machine[br]learning systems. The rationale was that 0:20:02.149,0:20:06.220 if you could embed knowledge, you would[br]have a smart... you could build smarter 0:20:06.220,0:20:12.019 systems. But it turns out that what[br]actually worked were things that scale 0:20:12.019,0:20:18.239 arbitrarily well with more computational[br]power, more storage capabilities. And so, 0:20:18.239,0:20:23.049 what he realized was that what actually[br]worked for machine learning was search and 0:20:23.049,0:20:29.619 learning. And when you look at stuff like[br]AlphaGo today, AlphaGo works not really 0:20:29.619,0:20:36.190 because it has a lot of goal knowledge. It[br]works because it has a lot of computing 0:20:36.190,0:20:44.220 power. It has the ability to train itself[br]faster and faster. And so there is a 0:20:44.220,0:20:49.230 question of how much of this can[br]potentially port to security. Obviously, 0:20:49.230,0:20:53.140 security is a bit different, it's more[br]adversarial in nature, so it's not quite 0:20:53.140,0:20:58.340 the same thing. But I think we are... we[br]have only scratched the surface of what 0:20:58.340,0:21:04.789 can be done as far as reaching a newer[br]level of automation where security 0:21:04.789,0:21:09.960 knowledge will matter less and less. So I[br]want to go back to the AFL example that I 0:21:09.960,0:21:16.019 brought up earlier, because one way to[br]think about AFL is to think about it as a 0:21:16.019,0:21:22.619 reinforcement learning fuzzer. And what I[br]mean by this... is in this slide, what AFL 0:21:22.619,0:21:29.960 capable to do, was to take one single JPEG[br]file and in the span of about twelve 0:21:29.960,0:21:35.700 hundred days iteration, they were[br]completely random dumb mutation. Go to 0:21:35.700,0:21:40.909 another well-formed JPEG file. And when[br]you think about it, this is an amazing 0:21:40.909,0:21:47.749 achievement because there was no knowledge[br]of the file format in AFL. And so we are 0:21:47.749,0:21:52.830 in... we are now more and more building[br]systems that do not require any kind of 0:21:52.830,0:21:57.470 expert knowledge as far as security is[br]concerned. The other example that I want 0:21:57.470,0:22:01.799 to talk about is the Cyber Grand[br]Challenge. So DARPA ... a few years ago 0:22:01.799,0:22:04.809 started this competition called Cyber[br]Grand Challenge, 0:22:04.809,0:22:09.529 and the Idea behind cyber grand challenge[br]was to try to answer the question of can 0:22:09.529,0:22:14.230 you automagically do exploit generation[br]and can you automatically do patch 0:22:14.230,0:22:20.940 generation. And obviously they did it on[br]some well toy environments. But if you 0:22:20.940,0:22:24.859 talk today to anybody who does automatic[br]export generation research, they'll tell 0:22:24.859,0:22:30.509 you that we are probably five years away[br]from being able to automatically 0:22:30.509,0:22:35.991 synthesize non trivial exploits, which is[br]an amazing achievement because if you 0:22:35.991,0:22:40.659 asked anybody five years ago, most people,[br]myself included, would tell you that 0:22:40.659,0:22:45.850 time would not come anytime soon. The[br]third example that I want to bring up is 0:22:45.850,0:22:51.130 something called Amazon Macie, which is a[br]new sort of service released by Amazon. 0:22:51.130,0:22:56.309 And what it does is basically uses machine[br]learning to try to automatically identify 0:22:56.309,0:23:01.740 PII information and intellectual property[br]in the data. You started with a AWS and 0:23:01.740,0:23:07.950 then tried to give you a better sense of[br]what happens to that data. So in all of 0:23:07.950,0:23:11.809 these cases, when you think about them,[br]again, it's a scenario where there is very 0:23:11.809,0:23:21.120 little security expertise needed. What[br]matters more is engineering skills. So 0:23:21.120,0:23:28.299 everything I've said so far is reasonably[br]positive for scale. Is a positive scale, 0:23:28.299,0:23:34.429 it is a positive, sort of like case for[br]scale. But I think that there is another 0:23:34.429,0:23:41.649 side of scale that is worth touching on.[br]And I think especially to this audience is 0:23:41.649,0:23:48.779 important to think about. And the other[br]side of scale is that scale breeds 0:23:48.779,0:23:55.239 centralization. And so to the point I was[br]making earlier about where, where is 0:23:55.239,0:24:00.070 research happening, where is real word[br]applicable research happening, and that 0:24:00.070,0:24:08.129 happens increasingly in places like Amazon[br]or Google or large security vendors or 0:24:08.129,0:24:14.140 some intelligence agencies. And so what[br]that means is the field, the barriers to 0:24:14.140,0:24:20.399 entry to the field are are significantly[br]higher. So I said earlier that I tried to 0:24:20.399,0:24:24.962 join the industry about 15 years ago. Back[br]then, I was still in high school. And one 0:24:24.962,0:24:28.602 of the things that was cool about the[br]industry for me was that as long as you 0:24:28.602,0:24:33.519 had a reasonably decent internet[br]connection and a laptop, you could 0:24:33.519,0:24:39.669 contribute to the top of the industry. You[br]could see what everyone was up to. You 0:24:39.669,0:24:44.450 could do research that was relevant to[br]what the what the industry was working on. 0:24:44.450,0:24:49.450 But today, the same sort of like 15, 16[br]year old kid in high school would have a 0:24:49.450,0:24:54.639 much harder time contributing to the[br]industry. And so we are in a situation 0:24:54.639,0:25:00.769 where... but because scale breeds[br]centralization. We are in a situation 0:25:00.769,0:25:06.029 where we will likely increase the barrier[br]of entry to a point where if you want to 0:25:06.029,0:25:11.279 contribute meaningfully to security, you[br]will have to go through a very 0:25:11.279,0:25:16.070 standardized path where you probably do[br]computer science and then you go work for 0:25:16.070,0:25:25.890 a big tech company. And that's not[br]necessarily a positive. So I think the 0:25:25.890,0:25:31.309 same Kranzberg principle applies to scale[br]in a sense, where it has done a lot of 0:25:31.309,0:25:37.539 positive things for the sector, but it[br]also comes with some consequences. And if 0:25:37.539,0:25:44.960 there is one takeaway from this talk[br]that I would like you to have is to think 0:25:44.960,0:25:51.639 about how much something that is pretty[br]mundane that we take for granted in our 0:25:51.639,0:25:56.590 day to day has changed the industry and[br]how much that will probably contribute to 0:25:56.590,0:26:00.479 the next phase of the industry. And not[br]just from a technical standpoint. It's not 0:26:00.479,0:26:04.157 that the solutions we use today are[br]much different from what we used to use, 0:26:04.157,0:26:08.349 but also from the kind of people that are[br]part of the industry and the community. 0:26:08.349,0:26:12.714 And that's all I had. Thank you for[br]listening. 0:26:12.714,0:26:24.379 Applause 0:26:24.379,0:26:27.929 Herald: Thank you very much. We have time[br]for questions. So if you have any 0:26:27.929,0:26:31.740 questions for Vincenzo, please line up[br]behind the microphones that are marked 0:26:31.740,0:26:36.779 with numbers and I will give you a signal[br]if you can ask a question. We also have 0:26:36.779,0:26:41.169 our wonderful signal angels that have been[br]keeping an eye on the Internet to see if 0:26:41.169,0:26:46.820 there are any questions from either[br]Twitter, Mastodon or IRC. Are there any 0:26:46.820,0:26:53.159 questions from the Internet? We'll just[br]have to mic fourth... microphone number 0:26:53.159,0:26:57.849 nine to be turned on and then we'll have a[br]question from the Internet for Vincenzo. 0:26:57.849,0:27:01.419 And please don't be shy. Line up behind[br]the microphone. Ask any questions. 0:27:01.419,0:27:05.212 Signal Angel: Now it's on. But actually[br]there are no questions from the Internet 0:27:05.212,0:27:08.859 right now.[br]Herald: There must be people in the room 0:27:08.859,0:27:14.799 that have some questions. I cannot see[br]anybody lining up. Do you have any advice 0:27:14.799,0:27:18.789 for people that want to work on some[br]security on scale? 0:27:18.789,0:27:23.690 Vincenzo: I mean, I just had to think a[br]lot of the interesting research is 0:27:23.690,0:27:28.499 happening more and more like tech[br]companies and similar. And so as much as 0:27:28.499,0:27:34.500 it pains me. It's probably the advice to[br]think either whether you can find other 0:27:34.500,0:27:40.080 ways to get access to large amounts of[br]data or and computational power or maybe 0:27:40.080,0:27:45.330 consideresting into one of those places.[br]Herald: And we now actually have questions 0:27:45.330,0:27:51.302 at microphone number one.[br]Microphone 1: Can you hear me? Yeah. Thank 0:27:51.302,0:27:55.200 you for the great talk. You're making a[br]very strong case that information at scale 0:27:55.200,0:27:59.619 has benefited security, but is that also[br]statistical evidence for that? 0:27:59.619,0:28:04.999 Vincenzo: So I think, well, it's a bit[br]hard to answer the question because a lot 0:28:04.999,0:28:09.570 of the people that have an incentive to[br]answer that question are also kind of 0:28:09.570,0:28:17.200 biased, but I think when you look at[br]metrics like well, time in terms of how 0:28:17.200,0:28:22.049 much time people spend on attackers[br]machine, that has decreased significantly 0:28:22.049,0:28:28.840 like it, it has statistically decreased[br]significantly. As far as the other 0:28:28.840,0:28:33.330 examples I brought up, like fuzzing and[br]similar. I don't think I as far as I'm 0:28:33.330,0:28:42.830 aware, there hasn't been any sort of[br]rigorous study around where now we are. 0:28:42.830,0:28:50.809 We've reached the place where defense has[br]kind of like an edge against offense. But 0:28:50.809,0:28:56.509 I think if I talk to anybody who has kind[br]of like some offensive security knowledge 0:28:56.509,0:29:04.830 or they did work in offense, the overall[br]feedback that I hear is that it's becoming 0:29:04.830,0:29:12.339 much harder to keep bug chains alive for a[br]long time. And this is in large part not 0:29:12.339,0:29:18.339 really for for countermeasures. It's in[br]large part because bugs keep churning. 0:29:18.339,0:29:23.369 So there isn't a lot of[br]statistical evidence, but from what I can 0:29:23.369,0:29:28.649 gather, it seems to be the case.[br]Herald: We have one more question from 0:29:28.649,0:29:32.179 microphone number one.[br]Microphone 1: So thank you for the 0:29:32.179,0:29:36.190 interesting talk. My question goes in the[br]direction of the centralization that you 0:29:36.190,0:29:39.570 mentioned, that the large like the[br]hyperscalers are converging to be the 0:29:39.570,0:29:43.919 hotspots for security research. So is[br]there any guidance you can give for us as 0:29:43.919,0:29:49.908 a community how to to retain access to the[br]field and contribute? 0:29:49.908,0:29:53.119 Vincenzo: Yes. So. So I think[br]it's an interesting situation 0:29:53.119,0:29:56.869 because more and more there[br]are open source tools that 0:29:56.869,0:30:01.549 allow you to gather the data. But the[br]problem with these data gathering 0:30:01.549,0:30:06.369 exercises is not too much how to gather[br]the data. The problem is what to gather 0:30:06.369,0:30:11.999 and how to keep it. Because when you look[br]at the cloud bill, for most 0:30:11.999,0:30:17.216 players, it's extraordinarily high.[br]And I don't unfortunately, I don't have an 0:30:17.216,0:30:23.099 easy solution to that. I mean, you can use[br]pretty cheap cloud providers, but 0:30:23.099,0:30:29.159 it's still like, the expenditure is still[br]an order of magnitude higher than it used 0:30:29.159,0:30:34.580 to be. And I don't know, maybe academia[br]can step up. I'm not sure. 0:30:34.580,0:30:38.669 Herald: We have one last question from the[br]Internet. And you can stay at the 0:30:38.669,0:30:41.369 microphone if you have another question[br]for Vincenzo. 0:30:41.369,0:30:45.519 Signal: Yes. The Internet asked that. You[br]ask a lot about fuzzing at scale about 0:30:45.519,0:30:51.409 besides OSS-Fuzz, are you aware of any[br]other scaled large fuzzing infrastructure? 0:30:51.409,0:30:58.349 Vincenzo: That is publicly available? No.[br]But when you look at, I mean when you 0:30:58.349,0:31:03.630 look, for instance, of the participants[br]for Cyber Grand Challenge, a lot of them 0:31:03.630,0:31:13.389 were effectively using a significant[br]amount of CPU power for fuzzing. So I'm 0:31:13.389,0:31:17.409 not aware of any kind of like plug and[br]play fuzzing infrastructure you can use 0:31:17.409,0:31:25.539 aside from OSS-Fuzz. But there is a law,[br]like as far as I'm aware, everyone there 0:31:25.539,0:31:33.110 that does fuzzing for a living has now[br]access to significant resources and tries 0:31:33.110,0:31:38.719 to scale fuzzing infrastructure.[br]Herald: If we don't have any more 0:31:38.719,0:31:42.749 questions, this is your last chance to run[br]to a microphone or write a question on the 0:31:42.749,0:31:46.929 Internet. Then I think we should give a[br]big round of applause to Vincenzo. 0:31:46.929,0:31:48.449 Vincenzo: Thank you. 0:31:48.449,0:31:52.540 Applause 0:31:52.540,0:32:19.000 subtitles created by c3subtitles.de[br]in the year 2019. Join, and help us!