WEBVTT 00:00:00.000 --> 00:00:11.169 32C3 preroll music 00:00:11.169 --> 00:00:15.140 M.C.: Hey! So, can you hear me OK? Yeah. 00:00:15.140 --> 00:00:19.779 I am M.C. and I work on Transparency Toolkit along with Brennan Novak 00:00:19.779 --> 00:00:25.799 and Kevin Gallagher. Basically, what we try to do is “Watch the Watchers”. 00:00:25.799 --> 00:00:31.460 Back in May we released a database of over 27.000 people in the Intelligence 00:00:31.460 --> 00:00:37.340 Community called ICWATCH. And this is people who are talking about their work on 00:00:37.340 --> 00:00:41.780 classified programs on the public internet. So we collected it using 00:00:41.780 --> 00:00:46.310 search terms like the code words mentioned in the Snowden documents. 00:00:46.310 --> 00:00:50.710 And today we’re releasing an update to ICWATCH 00:00:50.710 --> 00:00:55.970 doubling the data in the database. 00:00:55.970 --> 00:01:00.920 applause 00:01:00.920 --> 00:01:07.309 And that’s already vive, if anyone wants to look at it. 00:01:07.309 --> 00:01:12.159 For the people who aren’t familiar with this project and the sorts of things 00:01:12.159 --> 00:01:16.810 available on the research methods I’d like to go through an interesting example of 00:01:16.810 --> 00:01:20.350 research things that can be found in this database. 00:01:20.350 --> 00:01:26.449 So this is Lauren Russell, and she works at L-3, a major intelligence contractor. 00:01:26.449 --> 00:01:30.679 But she started her career as an army interrogator in Iraq. She says that 00:01:30.679 --> 00:01:36.900 the information that she collected was used to capture dozens of people. 00:01:36.900 --> 00:01:40.190 But part of her job was also to assure safe and humane treatment of hundreds 00:01:40.190 --> 00:01:45.379 of detainees. So that’s good at least. But then, a few years after that, she went and 00:01:45.379 --> 00:01:50.389 worked for a different company called Exelis in Afghanistan. And this job 00:01:50.389 --> 00:01:55.580 was quite different. It involved finding people to kill. So she says as part 00:01:55.580 --> 00:01:59.840 of this work that she “utilized F3EA methodology to conduct analysis on raw and 00:01:59.840 --> 00:02:05.320 fused HUMINT, SIGINT, and COMINT helping to create 125 Targeting Support Packets 00:02:05.320 --> 00:02:09.299 then nominated to the Joint Priority Effects List (JPEL) for kinetic targeting.” 00:02:09.299 --> 00:02:14.280 So there’s a lot of not very obvious terms and gibberish there. And this is a pretty 00:02:14.280 --> 00:02:17.750 common problem by going through these résumés. So I want to break down how you 00:02:17.750 --> 00:02:22.849 would interpret that sentence. “Signals Intelligence” is what the NSA does. 00:02:22.849 --> 00:02:28.129 It’s collecting data from intercepted communications. COMINT – Communications 00:02:28.129 --> 00:02:31.449 Intelligence – is specifically Signals Intelligence from communication data. 00:02:31.449 --> 00:02:35.420 So what the NSA does when they read your email. 00:02:35.420 --> 00:02:38.580 HUMINT, Human Intelligence is Intelligence on human sources. 00:02:38.580 --> 00:02:45.650 So things like data gain from informers or from torture. 00:02:45.650 --> 00:02:50.210 The “direct priority of XLES” is a list of people the US military and its allies are 00:02:50.210 --> 00:02:54.720 trying to kill and capture in Afghanistan. 00:02:54.720 --> 00:02:58.740 F3EA stands for “Find, Fix, Finish, Exploit and Analyze”. It’s a rapid 00:02:58.740 --> 00:03:02.990 intelligence collection and analysis methodology used for targeting. And 00:03:02.990 --> 00:03:06.670 we recently found out in the Drone Papers that this is often used for 00:03:06.670 --> 00:03:12.869 drone targeting. And “Kinetic Targeting” simply means attacking a moving target. 00:03:12.869 --> 00:03:16.800 So looking at her profile again: she says that she “F3EA methodology 00:03:16.800 --> 00:03:20.819 to conduct analysis on raw and fused HUMINT, SIGINT and COMINT helping to 00:03:20.819 --> 00:03:24.899 create 125 Targeting Support Packets then nominated to the direct priority 00:03:24.899 --> 00:03:28.670 of XLES for conduct targeting.” Basically what she means is that based on 00:03:28.670 --> 00:03:32.759 intercepted communications and information from human sources, possibly gained under 00:03:32.759 --> 00:03:38.560 the rest from torture she is deciding who should be killed and captured. 00:03:42.755 --> 00:03:48.659 The Intelligence Community has long had an attitude of “Collect It All”. 00:03:48.659 --> 00:03:52.670 And General [Keith B.] Alexander started trying to collect all the data 00:03:52.670 --> 00:03:58.400 that they could from every source. One of the first projects to this end 00:03:58.400 --> 00:04:02.700 was something called Real Time Regional Gateway (RT-RG). It’s a master project to 00:04:02.700 --> 00:04:07.949 store, combine, search and analyze data from many different sources at once. 00:04:07.949 --> 00:04:11.530 Everything from intercepted communications to data from drones to data from 00:04:11.530 --> 00:04:17.930 interrogations to even mundane things like traffic patterns and the prize of potatoes. 00:04:17.930 --> 00:04:22.970 They started this program in 2005. The initial version was built by SAIC 00:04:22.970 --> 00:04:27.270 for use in Iraq. And these days it’s mostly used in Afghanistan. 00:04:27.270 --> 00:04:31.520 It searches the US soil because according to documents published in “Der SPIEGEL” 00:04:31.520 --> 00:04:38.479 last year Germany is the 3rd largest contributor to RT-RG. This source 00:04:38.479 --> 00:04:41.400 of collection analysis tools are used for some programs that you might have 00:04:41.400 --> 00:04:47.130 heard of too, like CoTraveller – the program the NSA has to figure who is 00:04:47.130 --> 00:04:52.380 going places with who else. And there is a specific analytic tool. This part of 00:04:52.380 --> 00:04:57.579 RT-RG called SIDEKICK that uses relative velocities to calculate this from any 00:04:57.579 --> 00:05:01.590 different data sources, so that they can calculate that for people across networks. 00:05:01.590 --> 00:05:04.030 Unfortunately, this is really computationally intensive because they 00:05:04.030 --> 00:05:09.459 need to pre-compute all of the travel behaviour for all the pairs of selectors. 00:05:09.459 --> 00:05:12.500 But it’s feasible for them to do computationally intensive things the time 00:05:12.500 --> 00:05:18.199 that it’s built because it’s built on Hadoop and accumulo for distributed data 00:05:18.199 --> 00:05:27.380 processing and storage. So they’re quite serious about this. The goals for RT-RG 00:05:27.380 --> 00:05:33.150 are quite lofty. One of the creators, in an interview with “Defence News” described 00:05:33.150 --> 00:05:37.240 their aim is being able to use intercepted communications and integrate it with 00:05:37.240 --> 00:05:42.000 signals with geolocation. So that they can instantly find people and target them. 00:05:42.000 --> 00:05:47.200 Another counter-terrorism official told the Wall Street Journal that RT-RG 00:05:47.200 --> 00:05:53.079 literally allows them to predict the future. Decorrelation means it’s the 00:05:53.079 --> 00:05:56.890 strongest correlation tool ever. So their goals of this seem to be two-fold: First 00:05:56.890 --> 00:06:02.990 of all to be able to kill or smite any potential enemies. And 2nd one to be 00:06:02.990 --> 00:06:07.970 omniscient. To know everything that’s happening at once. And to correlate it and 00:06:07.970 --> 00:06:13.300 use that to predict what will happen in the future. And these goals sound a little bit beyond 00:06:13.300 --> 00:06:18.560 what you would expect from someone who is trying to simply protect people or 00:06:18.560 --> 00:06:21.569 stop terrorism. It sounds more like they’re trying to become some sort 00:06:21.569 --> 00:06:26.539 of God. Who by collecting and analyzing everything know everything that’s 00:06:26.539 --> 00:06:32.280 happening everywhere and can just smite any enemies from above. Instantly. 00:06:32.280 --> 00:06:37.330 But the thing is they are'nt a God. There are people working on these and they're 00:06:37.330 --> 00:06:40.289 normal people. And they’ve crazy resources and they intercept 00:06:40.289 --> 00:06:44.460 a lot of data. But they also use data that’s freely available to anyone for 00:06:44.460 --> 00:06:49.860 a lot of their work. Open Source Intelligence. This is a pamphlet from 00:06:49.860 --> 00:06:55.270 a startup called ZeroFox that uses data from Social Media to track ISIS. 00:06:55.270 --> 00:07:00.019 And tools like this are quite common. There’s another tool called “LM Wisdom” 00:07:00.019 --> 00:07:03.620 that’s made by Lockheed Martin. And they have a wonderful promotion video 00:07:03.620 --> 00:07:08.699 on their website explaining exactly how it works – that I’d like to play. 00:07:08.699 --> 00:07:11.960 with lowered voice: Hopefully this’ll work… 00:07:11.960 --> 00:07:15.819 audio/video starts Female Narrator: Social Media content has the power 00:07:15.819 --> 00:07:19.300 to incite organized movements and sway political outcomes. 00:07:19.300 --> 00:07:22.879 Person in Video: “It’s an opposition terrorist organization in Iran.” 00:07:22.879 --> 00:07:26.259 Female Narrator: Monitoring and analyzing the massive and rapidly changing 00:07:26.259 --> 00:07:31.210 open source intelligence data, or OSINT, and turning it into actionable intelligence 00:07:31.210 --> 00:07:37.180 for decision-makers is an imperative. Lockheed Martin’s Wisdom software suite 00:07:37.180 --> 00:07:42.199 offers an advanced capability to collect, manage and analyze vast amounts 00:07:42.199 --> 00:07:47.620 of open source data. Enabling analysts to understand, measure and anticipate 00:07:47.620 --> 00:07:52.039 real-world advance through Social Media. Person in Video: “Think of Wisdom as your 00:07:52.039 --> 00:07:58.520 eyes and ears on the web. Wisdom is that tool that would allow it to do this 00:07:58.520 --> 00:08:00.400 at scale!” Female Narrator: Wisdom’s advanced 00:08:00.400 --> 00:08:05.319 Big Data collection capability and data store automatically identify and harvest 00:08:05.319 --> 00:08:09.479 online Social Networking data of operational interest. As well as 00:08:09.479 --> 00:08:14.810 socio-cultural data from standard online open sources like newspaper feeds and 00:08:14.810 --> 00:08:20.110 structured databases. Wisdom’s high- performance analytic algorithms analyze 00:08:20.110 --> 00:08:25.510 the content in near realtime distinguishing noise from high-value information. 00:08:25.510 --> 00:08:30.980 Capturing trends, sentiment and influence; turning open source data into predictive, 00:08:30.980 --> 00:08:36.030 actionable intelligence. audio/video stops 00:08:36.030 --> 00:08:37.210 M.C.: Yeah, so… applause 00:08:37.210 --> 00:08:41.259 …that’s what they’re doing. And they’re not just using this to target terrorists. 00:08:41.259 --> 00:08:46.450 It was recently revealed that they are helping Walmart use this to find employees 00:08:46.450 --> 00:08:50.230 that are organizing for better working conditions and find the main organizers 00:08:50.230 --> 00:08:53.820 and fire them. Using data from Social Media. 00:08:53.820 --> 00:08:59.320 So it’s used for Corporate purposes as well. And LM Wisdom wasn’t even made 00:08:59.320 --> 00:09:02.620 for surveillance in the first place. I tracked down one of the people 00:09:02.620 --> 00:09:09.020 who created it. And at that time he worked for General Electric and was hoping to 00:09:09.020 --> 00:09:14.320 make a… to help NBC make tools so that they can figure out which sites 00:09:14.320 --> 00:09:19.740 to partner with to make their videos go viral. So it’s not just governments that 00:09:19.740 --> 00:09:22.959 are using Open Source Intelligence because there’s no barriers to access it and 00:09:22.959 --> 00:09:27.510 there’s many applications. There’s even many people search databases that 00:09:27.510 --> 00:09:31.120 have information like people’s address, and phone number, and relatives, 00:09:31.120 --> 00:09:35.320 and how old they are. And these include many, many people. Probably everyone 00:09:35.320 --> 00:09:39.230 in the US. And they’re used by many people for all sorts of purposes from private 00:09:39.230 --> 00:09:47.839 detectives to people that are selling advertisements. If this data is available 00:09:47.839 --> 00:09:53.459 already and it’s used for everything from figuring out who to kill to stopping unions 00:09:53.459 --> 00:09:57.440 from organizing to trying to sell things to people – why can’t we use it to 00:09:57.440 --> 00:10:00.529 understand surveillance programs, too? Why can’t we use it to understand human 00:10:00.529 --> 00:10:05.170 rights abuses. Why not use it for accountability? So we started to build 00:10:05.170 --> 00:10:09.940 tools to do this and in the near future we’d like to make it possible for anyone 00:10:09.940 --> 00:10:14.400 to make something like ICWATCH or other databases in less than a day and without 00:10:14.400 --> 00:10:19.560 programming. Long-term goal is to build software similar to what the Intelligence 00:10:19.560 --> 00:10:24.310 Community has. Things similar to LM-Wisdom, things similar to Real Time Regional Gateway. 00:10:24.310 --> 00:10:29.779 So that people can collect all this information in one place and analyze it. 00:10:29.779 --> 00:10:33.389 I’d like to show a demo of some of the tools that we’ve been working on. It’s 00:10:33.389 --> 00:10:41.110 possible to just – this won’t work at all but we’ll see. So this is Harvester. It’s 00:10:41.110 --> 00:10:48.660 a tool for collecting data from online sources in an automated fashion. You can 00:10:48.660 --> 00:10:53.200 choose different data sources, say “Indeed” – this is a résumé website – and 00:10:53.200 --> 00:10:58.240 say you want to find anyone who mentioned XKeyscore and for sake of timing let’s 00:10:58.240 --> 00:11:08.160 just get people in Maryland. And “start collecting”, and it might take a second 00:11:08.160 --> 00:11:12.920 because it’s still a bit rough. But it opens a browser, goes finds other people 00:11:12.920 --> 00:11:19.069 who mention XKeyscore in Maryland and it goes and downloads all of their résumés 00:11:19.069 --> 00:11:24.149 in one place… you can kind of see them as they download because this is being 00:11:24.149 --> 00:11:48.709 slowed a bit down right now. That just works key services and fairly small. 00:11:48.709 --> 00:11:57.699 Something shouted from out of the audience M.C.: laughs 00:11:57.699 --> 00:12:02.060 applause 00:12:05.800 --> 00:12:12.350 Takes a second to load, still kind of rough… 00:12:12.350 --> 00:12:18.930 Yeah, so we’re hoping to add many different data sources, so that people can collect 00:12:18.930 --> 00:12:22.690 data from sources online as well as just take a pile of pdf’s on their computer, 00:12:22.690 --> 00:12:26.570 point at the directory and it will load them and OCR them and people will be able 00:12:26.570 --> 00:12:31.470 to search through them in a searchable database. 00:12:31.470 --> 00:12:35.549 So while this is loading why don’t I go and walk through some of the rest of the 00:12:35.549 --> 00:12:40.020 pipeline. So our goal is to have tools for collecting data, loading it into 00:12:40.020 --> 00:12:46.770 a database; and then tools for matching data across various sources on the same 00:12:46.770 --> 00:12:50.220 person or the same company. So it should take someone’s résumés and Social Media 00:12:50.220 --> 00:12:54.130 profiles and everything and link it together and then also link that to the 00:12:54.130 --> 00:12:57.180 companies they work(ed) for, the other people they know, the locations they’ve 00:12:57.180 --> 00:13:01.540 lived. As well as tools for extracting things from data. So to be able to go 00:13:01.540 --> 00:13:04.330 through a résumé, extract all the code words mentioned, to be able to go through 00:13:04.330 --> 00:13:08.019 a document and extract all the companies mentioned and generating 00:13:08.019 --> 00:13:13.190 entities that way. And tools for searching through data in databases where you can 00:13:13.190 --> 00:13:17.699 search for search queries and browse by categories. And for viewing data and 00:13:17.699 --> 00:13:23.649 network graphs and maps. Let’s see if this is done… Right now it just shows the 00:13:23.649 --> 00:13:32.540 raw JSON. The connection between tools is a bit rough. But we should be able to 00:13:32.540 --> 00:13:41.240 index the data and load it into a search tool. Will take a second. Hopefully this 00:13:41.240 --> 00:14:05.760 works. Ouh, it’s going! Yah… So it takes a little bit. Index… And you can see… 00:14:05.760 --> 00:14:13.699 The data will be at… It kind of circle loaded into a subscriptions list… 00:14:13.699 --> 00:14:17.310 So there’s a searchable database on all the people who are working on XKeyscore 00:14:17.310 --> 00:14:27.400 in Maryland! applause, cheers from audience 00:14:27.400 --> 00:14:33.100 So I think that in using this Free Software and open data really the key is 00:14:33.100 --> 00:14:38.070 because we have far, far fewer resources than the Intelligence Community. And we 00:14:38.070 --> 00:14:41.240 don’t even have the resources that a company like Lockheed Martin has. We can’t 00:14:41.240 --> 00:14:45.269 internally build all of this software. I hope that we will anticipate every future 00:14:45.269 --> 00:14:50.609 use to be able to help people adapt to that. Having people be able to take our 00:14:50.609 --> 00:14:54.199 data, take our tools and adapt it to their own situations is absolutely key to 00:14:54.199 --> 00:14:58.380 actually ensuring that they’re useful. And there are also a lot of open source tools 00:14:58.380 --> 00:15:01.269 that the Intelligence Community has, really. It’s like accumulo, the thing 00:15:01.269 --> 00:15:05.399 that’s used in Real Time Regional Gateway. It was released by the NSA and made open 00:15:05.399 --> 00:15:11.029 source. And Gaffer which is a graph database recently released by GCHQ. 00:15:11.029 --> 00:15:15.660 So we can sort of take those and possibly also build on those in some cases. 00:15:15.660 --> 00:15:17.940 As well are using the same tools chuckles 00:15:17.940 --> 00:15:22.050 And it’s appropriate because our goal is to enable people to collect and use 00:15:22.050 --> 00:15:27.529 information in the same way that the Intelligence Community can. 00:15:27.529 --> 00:15:31.880 But, well, I think that we should aim to collect it all and collect all the 00:15:31.880 --> 00:15:35.009 information that we can. I think we also need to be careful to avoid a lot of the 00:15:35.009 --> 00:15:39.740 mistakes that the Intelligence Community has made. Because some of the effects are 00:15:39.740 --> 00:15:45.550 quite bad and lead to people being killed for no reason at all. And – it’s quite 00:15:45.550 --> 00:15:49.729 absurd. And the main one of these, I think, is de-humanizing people. 00:15:49.729 --> 00:15:53.370 Torture techniques are specifically designed to de-humanize people. 00:15:53.370 --> 00:15:56.100 When people are looking at data that they’ve intercepted, they’re not looking 00:15:56.100 --> 00:15:59.569 at a person, they’re looking at meta-data, they’re looking at numbers on a screen. 00:15:59.569 --> 00:16:05.819 It’s not something that’s easy to find a way around. When I was working on ICWATCH 00:16:05.819 --> 00:16:11.410 I was grabbling with this problem quite a bit. So I decided to try to see who some 00:16:11.410 --> 00:16:15.649 of these people are and try to put faces to these issues. So I started going to 00:16:15.649 --> 00:16:19.440 Intelligence conferences. Many of these conferences are quite open and you can 00:16:19.440 --> 00:16:24.490 just go in. And I wasn’t that out of place either, I just told people that I made 00:16:24.490 --> 00:16:27.430 tools to collect and analyze Open Source Intelligence. 00:16:27.430 --> 00:16:29.139 laughter and applause 00:16:29.139 --> 00:16:35.590 There're many people doing. 00:16:35.590 --> 00:16:38.080 There’re many people doing simmilar things out there, too. Like I met the 00:16:38.080 --> 00:16:42.409 Zerofox people who were one of the examples I showed earlier at one of these conferences. 00:16:42.409 --> 00:16:45.409 They are actually very, very nice. And 00:16:45.409 --> 00:16:48.139 there were also some people who were quite interested in what I was doing. There was 00:16:48.139 --> 00:16:50.970 one recruiter from Northrop-Grumman who seemed somewhat interested in hiring me 00:16:50.970 --> 00:16:54.300 and I looked her up later and found a bunch of job listings where she was 00:16:54.300 --> 00:16:59.159 trying to hire people who… to work on programs related to XKeyscore. It wasn't 00:16:59.159 --> 00:17:03.639 all good, I got kicked out of one conference. I got some strange requests like there was 00:17:03.639 --> 00:17:09.690 one guy who was trying to figure how to use open data to help venture capitalists 00:17:09.690 --> 00:17:15.170 figure out what porn the founders of the startups they funded watched. I’m not sure 00:17:15.170 --> 00:17:18.109 that’s even possible. But it was really weird and he was asking me for help and 00:17:18.109 --> 00:17:20.260 I was like “I don’t think I can help with that, sorry!” 00:17:20.260 --> 00:17:27.160 laughter and applause 00:17:27.160 --> 00:17:30.940 Of course there were some negative comments on things like Manning and Snowden 00:17:30.940 --> 00:17:33.990 and some confusion like there was someone who is making insider threat detection 00:17:33.990 --> 00:17:39.130 software, who was talking about how it would stop a situation like when Snowden 00:17:39.130 --> 00:17:43.070 leaked documents to Wikileaks and things like that. So people don’t actually 00:17:43.070 --> 00:17:46.280 know what’s going on. But generally most of them were decent people and some of 00:17:46.280 --> 00:17:49.250 them were quite nice, some of them were quite funny. And some of them really 00:17:49.250 --> 00:17:52.570 seemed to think that what they were doing is saving lives. So they’re not evil people 00:17:52.570 --> 00:17:57.540 who want to hurt others but they’re not infallible either. They’re human beings. 00:17:57.540 --> 00:18:02.800 And our strategy – looking at individuals – scares a lot of people. But what you 00:18:02.800 --> 00:18:09.810 have to realize is that institutions are made up by people. It’s easier to just 00:18:09.810 --> 00:18:12.810 look at the institution. It’s easier to just look at an abstract program. Just 00:18:12.810 --> 00:18:15.590 like it’s easier not to think of the person who you just decided to kill in a 00:18:15.590 --> 00:18:21.430 drone strike as a person. That’s why these things continue to happen. I think that 00:18:21.430 --> 00:18:24.520 there’s a lot of benefit to looking at people as people, both to avoid some of 00:18:24.520 --> 00:18:28.970 the problems the Intelligence Community has as well as because people’s data trails 00:18:28.970 --> 00:18:31.780 are part of the data trails of the institutions. And if we’re only looking at 00:18:31.780 --> 00:18:36.490 institutions we’re missing part of the data trail the people leave. 00:18:36.490 --> 00:18:40.690 Though, of course, no one person is responsible for the wrong-doings of the 00:18:40.690 --> 00:18:46.900 Intelligence Community. So we shouldn’t demonize any one person. But… 00:18:46.900 --> 00:18:49.650 these are the people who go to work every day and perpetuate the actions of the 00:18:49.650 --> 00:18:54.810 Intelligence Community. So I think everyone involved is a little bit at fault. 00:18:54.810 --> 00:18:57.950 And the other benefit of looking at people as people is that we can start to 00:18:57.950 --> 00:19:01.220 understand them. Because you have to understand what their hopes are, what 00:19:01.220 --> 00:19:05.330 their fears are. How they see the world. What upsets them. And what might cause 00:19:05.330 --> 00:19:08.920 them to change their behaviour. And from that we can start to maybe come up with 00:19:08.920 --> 00:19:13.150 alternatives. So let’s look at some of these people and look at some of their 00:19:13.150 --> 00:19:21.960 stories. This is Jason Epperson. He works on Intelligence collection for Special 00:19:21.960 --> 00:19:27.420 Operations. In his spare time he enjoys coaching children sports. He currently 00:19:27.420 --> 00:19:32.050 works at the US Special Ops Command (USSOCOM) helping different agencies 00:19:32.050 --> 00:19:35.190 collect data, share it, say and figure out what data they need, just generally 00:19:35.190 --> 00:19:39.340 helping them integrate it. But when he started his career back in 1998 also 00:19:39.340 --> 00:19:43.950 working on collecting data for Special Operations. Then later, in 2004, he went 00:19:43.950 --> 00:19:49.650 to work at the US Central Command in the NSA cryptologic services group and he was 00:19:49.650 --> 00:19:53.330 focused on tracking down high-value targets and individuals. And he claimed 00:19:53.330 --> 00:19:56.710 that as a result of his work, numerous high-value individuals were captured 00:19:56.710 --> 00:20:03.990 or killed. It is especially interesting because he was working on this in 2007 00:20:03.990 --> 00:20:09.330 when PRISM was launched and at the top of his résumé he lists in his specialties 00:20:09.330 --> 00:20:14.620 PRISM as “possible”, so that’s kind of a dinagra but based on his background it 00:20:14.620 --> 00:20:20.640 might not be. So I think it probably is actually PRISM. 00:20:20.640 --> 00:20:27.530 Then after he was working there he went and started working counter-radicalization 00:20:27.530 --> 00:20:31.030 efforts – things like boosting the capacity of Muslim Faith Leaders to win 00:20:31.030 --> 00:20:33.910 hearts and minds and establishing competing social networks to counter 00:20:33.910 --> 00:20:37.150 Al Qaeda ideology and he’s very clear in his job description that he’s not killing 00:20:37.150 --> 00:20:43.480 people, he’s just helping allies of the US figure out who is who, set Interpol notices for. 00:20:43.480 --> 00:20:46.790 But the most interesting thing about him isn’t any of his jobs. It’s this 00:20:46.790 --> 00:20:50.940 publication that he has at the bottom of his résumé called “An Examination of the 00:20:50.940 --> 00:20:55.980 Effect of Government Data Mining on US Citizens”. And this clearly an area where 00:20:55.980 --> 00:21:00.470 he has a lot of expertise. And he presented this at a conference back in 00:21:00.470 --> 00:21:04.810 2010. I still don’t have a copy yet. It’s not easily available. I think it might be 00:21:04.810 --> 00:21:09.630 possible to get either by buying it from the company directly or by going to the 00:21:09.630 --> 00:21:14.820 Library of Congress that seems to have some copies of the conference proceedings. 00:21:14.820 --> 00:21:19.670 That could be quite interesting. Both because he was relatively high up, he was 00:21:19.670 --> 00:21:23.700 in command of nearly 400 people back when PRISM started and he was working with the 00:21:23.700 --> 00:21:27.840 NSA. It’s possible that he had some role early on in the program and this might 00:21:27.840 --> 00:21:33.790 provide some clues. And then also the little “data mining on US Citizens” a bit 00:21:33.790 --> 00:21:36.910 in the title is kind of interesting because that’s supposed to be the last 00:21:36.910 --> 00:21:40.500 protection – I think that’s kind of a super protection because most US citizens 00:21:40.500 --> 00:21:43.200 wouldn’t find it very comforting if the Chinese Government said: “Oh yeah, we have 00:21:43.200 --> 00:21:47.420 a mass surveillance program but we only spy on people who aren’t Chinese citizens.” 00:21:47.420 --> 00:21:50.680 That’s not really comforting to them, so I don’t see why it would be. But it’s been 00:21:50.680 --> 00:21:54.800 the one thing that people were impeding. “We don’t collect it on US citizens”. And 00:21:54.800 --> 00:21:59.960 just seeing that on the title of a paper is like a tiny admission that maybe they 00:21:59.960 --> 00:22:08.240 do. So some of these (?) files tell other interesting stories about people’s lives. 00:22:08.240 --> 00:22:11.760 If you’ve seen any of my other talks, this is someone you’ve heard me talk about 00:22:11.760 --> 00:22:15.920 a lot. Solomon Varnado. He spent most of his life in the military intelligence 00:22:15.920 --> 00:22:20.190 community, focused on Signals Intelligence and Geolocation. He took down his résumé 00:22:20.190 --> 00:22:25.960 after ICWATCH launched. But I actually recently found another résumé of his on 00:22:25.960 --> 00:22:31.070 another website that has additional information like on the side in the 00:22:31.070 --> 00:22:35.580 military he ran diversity programs and a sexual assault prevention program and 00:22:35.580 --> 00:22:39.070 things like that. I first came across this profile because he mentions a lot of 00:22:39.070 --> 00:22:45.010 interesting code words. This is probably the first known mention of XKeyscore back 00:22:45.010 --> 00:22:54.610 in 2004/2005. But these aren’t the most interesting part of his résumé. Later on 00:22:54.610 --> 00:22:58.230 he… after he works on Intelligence Collection Management – just Standard 00:22:58.230 --> 00:23:05.170 Signals Intelligence Collection – he goes and he works for L-3 Stratis. And there he 00:23:05.170 --> 00:23:08.550 says that he identified, collected, and performed direction finding 00:23:08.550 --> 00:23:13.000 of specified target signals using PENNANTRACE, DISPLAYVIEW and CEGS. 00:23:13.000 --> 00:23:14.450 But I wasn't sure what “PENNANTRACE” was 00:23:14.450 --> 00:23:17.200 so I found it a definition very conveniently located in 00:23:17.200 --> 00:23:21.800 another résumé. That said it was an airborne collection platform for PENNANTRACE. 00:23:21.800 --> 00:23:27.500 That sounds like some sort of Signals Intelligence collection platform. 00:23:27.500 --> 00:23:31.760 And the other interesting thing about this job is that he said that he called for 00:23:31.760 --> 00:23:35.720 external review of intelligence management processes which is not something I see 00:23:35.720 --> 00:23:39.130 normally. And he was there for a fairly short time, only a couple of months. 00:23:39.130 --> 00:23:43.170 After staying at most of his other jobs for over a year. And then at his next job 00:23:43.170 --> 00:23:44.900 he was also there for only a couple of months. 00:23:44.900 --> 00:23:47.540 He was working at Pluribus International, also on Drone Intelligence, 00:23:47.540 --> 00:23:50.470 this time definitely Drone Intelligence, on Predator drones because he 00:23:50.470 --> 00:23:54.370 mentions Airhandler which we now know more about thanks to the catalogue 00:23:54.370 --> 00:23:58.320 released by The Intercept. It’s a 00:23:58.320 --> 00:24:02.290 geo-processing system for geolocation data from Predator drones. 00:24:02.290 --> 00:24:06.330 And the update to ICWATCH includes all the data on all of the words 00:24:06.330 --> 00:24:13.610 mentioned in that catalogue. And then he leaves the Intelligence Community 00:24:13.610 --> 00:24:19.090 entirely after that job. And he goes and works as a used car salesman at this used 00:24:19.090 --> 00:24:23.160 car dealership. And it turns out he is actually – found him on this other résumé 00:24:23.160 --> 00:24:25.580 that I just found – He’s actually quite a successful used cars salesman. 00:24:25.580 --> 00:24:27.760 He’s won a bunch of awards. He’s one of the best 00:24:27.760 --> 00:24:30.740 salesmen in the region. So he’s doing quite well. And he won a bunch of awards 00:24:30.740 --> 00:24:32.420 and he's in the military too, so it seems like 00:24:32.420 --> 00:24:35.730 he’s very committed to what he does. But still that’s quite a huge career 00:24:35.730 --> 00:24:39.880 change and it sounds like maybe he was starting to get upset with some of how 00:24:39.880 --> 00:24:42.840 things are really being done and he couldn’t figure out a way to fix it after 00:24:42.840 --> 00:24:46.840 calling for external review so he just left. 00:24:49.010 --> 00:24:54.190 applause 00:24:54.190 --> 00:25:02.360 And then, this is Michael Dial. Michael Dial is a pipe fitter and a plumber. And 00:25:02.360 --> 00:25:08.400 this is him with his family. He’s actually a pipe fitter and a plumber. But he’s not 00:25:08.400 --> 00:25:13.780 just any pipe fitter. He has security clearance. And he goes and he fits pipes 00:25:13.780 --> 00:25:17.990 in secure facilities. As you might expect he does a lot of pipe fitting for naval 00:25:17.990 --> 00:25:27.080 ships. He also does things like he goes to embassies and other secret locations in 00:25:27.080 --> 00:25:38.170 Afghanistan and Iraq, Ecuador, Serbia and sets up their pipes. He also did some 00:25:38.170 --> 00:25:43.620 pipe fitting in Djibouti at some sort of Homeland Security facility which 00:25:43.620 --> 00:25:50.170 coincidently is also where many of the drone programs are run out of. So there’s 00:25:50.170 --> 00:25:54.640 some interesting cases like that’s where there are people like Michael Dial who 00:25:54.640 --> 00:25:59.020 aren’t involved in Intelligence at all, directly. But the information in the 00:25:59.020 --> 00:26:04.960 résumés still provides very interesting useful details about where secret 00:26:04.960 --> 00:26:07.880 facilities are located and other aspects of the Intelligence Community. Because 00:26:07.880 --> 00:26:11.090 secret facilities don’t just materialize out of thin air. They need people to build 00:26:11.090 --> 00:26:15.750 them, they need people to operate them. So from tracking down these people we can 00:26:15.750 --> 00:26:18.740 start to map them. And then there’re other useful things like we can figure out which 00:26:18.740 --> 00:26:25.740 companies clean the NSA. I’m sure that has all sorts of useful applications. 00:26:25.740 --> 00:26:33.850 This is Eleana Costa. He lives in D.C. and he works for the DOD. And this is him at his 00:26:33.850 --> 00:26:38.340 High School Graduation back in 1988. He has been working in Military and 00:26:38.340 --> 00:26:45.240 Intelligence for nearly 20 years. And back in 2003, he worked on Psi Ops programs. 00:26:45.240 --> 00:26:50.880 Specifically he worked on Psi Ops programs in Paraguay, Columbia and Bolivia. And 00:26:50.880 --> 00:26:55.970 these were in support of DEED, the drug enforcement agency and the CIA. 00:26:55.970 --> 00:26:59.260 And there are a few other reasons ICWATCH you mention involvement in Psi Ops in 00:26:59.260 --> 00:27:04.480 Latin America for the DEA. It seems me quite an extensive thing especially since 00:27:04.480 --> 00:27:08.900 I didn’t collect any data on this specifically, and I had just suddenly a bunch 00:27:08.900 --> 00:27:13.950 of people on the database on this, so: maybe worth looking into a bit. And then 00:27:13.950 --> 00:27:17.320 after that he went and he worked on Psi Ops programs in Iraq. So it’s kind of 00:27:17.320 --> 00:27:22.120 interesting. Then he went and worked at the DOD on Human Intelligence. 00:27:22.120 --> 00:27:27.240 The other interesting thing about Kiliana Costa is that he’s one of the people who 00:27:27.240 --> 00:27:34.010 deleted his résumé after ICWATCH launched and that was how I found him. 00:27:34.010 --> 00:27:41.090 laughter and applause 00:27:41.090 --> 00:27:46.050 So after ICWATCH launched a lot of people were positively interested in it, but we 00:27:46.050 --> 00:27:49.180 also got a lot of threats because… it’s really absurd, because all we’re doing is 00:27:49.180 --> 00:27:52.670 collecting information that people explicitly, independently, willingly 00:27:52.670 --> 00:27:56.720 posted online about the profession; as we’re not posting addresses or 00:27:56.720 --> 00:28:02.930 anything like that. And making it more searchable. Just like google does. 00:28:02.930 --> 00:28:07.200 But a lot of people in the Intelligence Community contacted us and for the first 00:28:07.200 --> 00:28:11.730 few weeks, we saw a new response every day. Some of these were kind of 00:28:11.730 --> 00:28:17.580 interesting and reveals some sort of non- sensical mind sets of people in the 00:28:17.580 --> 00:28:25.330 Intelligence Community. Like this guy. This is Alexander Irinovitch. He sent me 00:28:25.330 --> 00:28:29.380 a…, actually a nice email, a very nice email. It was really nice. Saying that he 00:28:29.380 --> 00:28:32.740 couldn’t understand why he was in ICWATCH because he wasn’t involved in surveillance. 00:28:32.740 --> 00:28:36.610 He was working at a private company that had nothing to do with surveillance. 00:28:36.610 --> 00:28:42.750 So I looked at his profile and I saw that he was working at unit 8200, the Israeli 00:28:42.750 --> 00:28:46.930 Intelligence unit which, okay, there are mandatory military services not that 00:28:46.930 --> 00:28:50.810 weird, though he was there for several years, not just the mandatory portion, 00:28:50.810 --> 00:28:57.800 and this is the Intelligence unit that spies on Palestinians. And then I looked 00:28:57.800 --> 00:29:02.700 at where he works now. And he works for a company called Verint. According to their 00:29:02.700 --> 00:29:09.160 website they make software for analyzing data from wiretaps. So I think that has to 00:29:09.160 --> 00:29:13.220 do with surveillance. I’m not sure why he interpreted that as “nothing to do with 00:29:13.220 --> 00:29:16.940 surveillance”. But it’s kind of interesting interpretation, I think it makes sense for him 00:29:16.940 --> 00:29:20.220 to be in the database, but of course, for any particular profile, there is 00:29:20.220 --> 00:29:23.140 some noise. So it’s up to whoever is looking at it to make the call 00:29:23.140 --> 00:29:26.050 and do the research. 00:29:26.050 --> 00:29:30.040 And sometimes other people who complained also helped us find interesting details. 00:29:30.040 --> 00:29:34.420 Like this guy, Joshua Lively. He’s one of the people who reported us to the FBI for 00:29:34.420 --> 00:29:43.120 domestic terrorism. He worked as a linguist at this company. I looked at 00:29:43.120 --> 00:29:48.490 his profile and he mentions a lot of interesting code words in it. 00:29:48.490 --> 00:29:51.750 Some of them didn’t make so much sense for the time. This thing called ZB. 00:29:51.750 --> 00:29:55.740 And then a few weeks later the Intercept released this article on a thing called 00:29:55.740 --> 00:30:03.830 Skynet. It’s used to use machine learning to analyze travel data, the telecom 00:30:03.830 --> 00:30:08.130 providers. And ZB is one of the databases they use and he, coincidently, has a lot 00:30:08.130 --> 00:30:12.130 of the databases that are used in this listed in his skills. And as a linguist 00:30:12.130 --> 00:30:14.860 professioned with the language that’s used in the region that’s mainly targeted 00:30:14.860 --> 00:30:18.510 in this… So I’m not sure if he’s involved in this particular program. But it seems 00:30:18.510 --> 00:30:22.860 like he’s involved in something similar. 00:30:22.860 --> 00:30:28.160 So it’s quite interesting. Generally there are a lot of angry people in the 00:30:28.160 --> 00:30:31.750 Intelligence Community. Some are nicer than others and were just asking questions 00:30:31.750 --> 00:30:35.910 being like “Can you please take my profile down!”, some other more afraid, some other 00:30:35.910 --> 00:30:40.640 were more violent and sending things like death threats. Our server started getting 00:30:40.640 --> 00:30:44.440 hit pretty hard and ICWATCH kept going down. We wanted to be sure that we weren’t 00:30:44.440 --> 00:30:48.090 going to be compelled to take the data down some way. And the easiest way not 00:30:48.090 --> 00:30:52.130 to be compelled to take the data down is to make it so you can’t really take the 00:30:52.130 --> 00:30:55.700 data down yourself. And the people had much less incentive to go after you. 00:30:55.700 --> 00:31:00.970 So we moved ICWATCH to Wikileaks which has been great, and they’ve been wonderful 00:31:00.970 --> 00:31:03.940 helping with all this. So thank you, Wikileaks! 00:31:03.940 --> 00:31:09.720 applause 00:31:09.720 --> 00:31:11.610 from the audience: Your welcome! 00:31:11.610 --> 00:31:13.760 M.C.: chuckles laughter 00:31:13.760 --> 00:31:17.500 As I mentioned earlier a lot of people are taking down their résumés in response to 00:31:17.500 --> 00:31:24.700 ICWATCH. Specifically 1.030 people have, out of the original 27.000. And others have 00:31:24.700 --> 00:31:29.120 edited them and made them private. So as part of the update in addition to doubling 00:31:29.120 --> 00:31:35.050 the number of résumés available we also recollected all of the initial résumés 00:31:35.050 --> 00:31:39.750 and you can go on the site and see which ones are removed, which ones are made 00:31:39.750 --> 00:31:43.590 private, which ones have been modified and all of that is fug so you can easily see 00:31:43.590 --> 00:31:50.540 how that’s changed. applause 00:31:50.540 --> 00:31:55.330 And some of these revealed details that people hadn’t posted… that many wish that 00:31:55.330 --> 00:32:00.760 they hadn’t posted in the first place. But they also provide useful updates on where 00:32:00.760 --> 00:32:05.480 people are working. Because they’re to track people as they move from job to job. 00:32:05.480 --> 00:32:10.840 E.g. there’s this guy, Michael Acosta, from the original ICWATCH. From 2011 00:32:10.840 --> 00:32:15.750 to 2012 he worked at Guantanamo. He was primarily trying to find out about 00:32:15.750 --> 00:32:21.690 potential attacks on Guantanamo itself. He monitored various detainees and 00:32:21.690 --> 00:32:27.660 collaborated with the Behavioural Science Team and was trying to figure out if 00:32:27.660 --> 00:32:32.790 detainees were planning some sort of coup, I guess. And then he started working for 00:32:32.790 --> 00:32:41.030 the Airforce. And here he was working on Drone Intelligence and targeting and such 00:32:41.030 --> 00:32:44.230 things like how he was responsible for “the production made instant upgrade of 00:32:44.230 --> 00:32:47.960 DGS2 mission critical Intelligence databases which include high value target 00:32:47.960 --> 00:32:52.550 development folders” like the things used for JPAL targeting, regional fairbriefs, 00:32:52.550 --> 00:32:57.980 mission storyboards and mission target logs with document FMV mission rollups. 00:32:57.980 --> 00:33:00.520 But the most interesting thing on this résumé isn’t any of those things. 00:33:00.520 --> 00:33:05.510 It’s the thing that changed between the original launch of ICWATCH and now. 00:33:05.510 --> 00:33:08.980 And that’s that he moved and started working for a different company. 00:33:08.980 --> 00:33:14.160 He started working for this company called… he called SOS International 00:33:14.160 --> 00:33:20.780 as All Source Analyst. He unfortunately had to leave the position that he had 00:33:20.780 --> 00:33:24.880 on the site coaching High School Baseball which he seemed to really like. 00:33:24.880 --> 00:33:27.630 And he kind of liked it because right now he’s looking for Baseball opportunities 00:33:27.630 --> 00:33:31.610 in Germany. So he seems to be in Germany working for this company called SOS 00:33:31.610 --> 00:33:34.730 International that I never heard of before. So I went on the website and they 00:33:34.730 --> 00:33:38.040 have a list of the cities that they operate in Germany. These 6 cities, 00:33:38.040 --> 00:33:43.870 along with Guantanamo and a number of other sketchy locations. And based on 00:33:43.870 --> 00:33:47.610 Michael Acosta’s past record of working at Guantanamo and on Drone targeting and 00:33:47.610 --> 00:33:50.130 things like that it sounds like this company is probably doing something quite 00:33:50.130 --> 00:33:56.450 sketchy. By tracking changes to where people work we can start to find things 00:33:56.450 --> 00:34:00.360 like this we might not otherwise think to look at. That we might not otherwise about 00:34:00.360 --> 00:34:03.070 as interesting. 00:34:03.070 --> 00:34:10.219 But it’s not just open data that we collect. Because the same tools for 00:34:10.219 --> 00:34:13.549 collecting and analyzing open data are also useful for other data sets, 00:34:13.549 --> 00:34:18.510 they’re useful. Like we made a search tool in collaboration with Church Foundation 00:34:18.510 --> 00:34:22.149 for all of the published Snowden documents that allows you to search the full text of 00:34:22.149 --> 00:34:26.280 the documents, browse which code words are in these documents, see documents that 00:34:26.280 --> 00:34:33.139 mention particular countries, see the full PDFs and articles. And we also made a… 00:34:33.139 --> 00:34:37.230 when the Hacking Team data came out this summer we mirrored the data and became one 00:34:37.230 --> 00:34:41.659 of the primary mirrors of the data. We had a torrent that was almost downing the server 00:34:41.659 --> 00:34:44.350 with a lot of space and figured that none of the other people had that, so we put it 00:34:44.350 --> 00:34:51.510 up. And that got a lot of traffic, it got about 57 M hits in the first 2 days. 00:34:51.510 --> 00:34:54.300 And soon we realized there was a problem where our server charged a lot for 00:34:54.300 --> 00:34:59.370 bandwidth and did cost us 48$ everytime someone decided to download the 400GB 00:34:59.370 --> 00:35:07.480 with WGET. So that was interesting but it’s been resolved now. It hopefully made 00:35:07.480 --> 00:35:11.030 the data more accessible to people who don’t have 400GB of harddrive space 00:35:11.030 --> 00:35:15.990 available or enough internet connectivity to download that. So then we’ve also made 00:35:15.990 --> 00:35:21.240 a search tool for all of the Hacking Team emails; that has a search interface that 00:35:21.240 --> 00:35:25.400 lets you browse them like you would in a normal email client with threading, and a 00:35:25.400 --> 00:35:28.870 network graph so that you can see the connections between senders and 00:35:28.870 --> 00:35:39.860 recipients. The Intelligence Community has a variety of collection disciplines: 00:35:39.860 --> 00:35:45.350 SIGINT, OSINT, HUMINT, measurements of Signals Intelligence, Symmetry 00:35:45.350 --> 00:35:49.080 Intelligence. They have all these different sources that they’re gathering 00:35:49.080 --> 00:35:55.780 data from. I think that we should try to duplicate this. Because there are a lot 00:35:55.780 --> 00:35:58.230 of different sources that we can gather data from as well, and we need to find 00:35:58.230 --> 00:36:01.600 base to better collect data from all these sources and to fuse them together. 00:36:01.600 --> 00:36:06.300 These are some other ones that I’ve been spending all the time looking at. 00:36:06.300 --> 00:36:10.170 And there’s open source Intelligence things like ICWATCH where you’re 00:36:10.170 --> 00:36:13.060 collecting data from purely public sources. But this is just part of the vare 00:36:13.060 --> 00:36:17.950 ecosystem that we can draw on. This is mostly information that people and 00:36:17.950 --> 00:36:21.230 institutions make about themselves publicly, either intentionally or 00:36:21.230 --> 00:36:25.840 unintentionally. And it’s really difficult to use because there’s a lot of it and it 00:36:25.840 --> 00:36:29.940 needs to be collected and matched up and pulled together in a browsable way for 00:36:29.940 --> 00:36:33.390 people to be able to use it. So you can’t really just mainly go and use it at scale. 00:36:33.390 --> 00:36:39.900 You can do it a little bit but not nearly enough. And so we’re working on making 00:36:39.900 --> 00:36:44.540 this easier to use. The other sort of data, it’s anonymously leaked documents, 00:36:44.540 --> 00:36:47.370 documents that were (?) sent journalists, that they think should be 00:36:47.370 --> 00:36:51.700 public and these often pretty explicitly reveal corruption, human rights abuses 00:36:51.700 --> 00:36:56.480 or other issues. But this can also be used to collect more data. Like we used the 00:36:56.480 --> 00:37:00.800 published Snowden documents very heavily to find code words that we could use to 00:37:00.800 --> 00:37:05.240 collect the data in ICWATCH. And once we start to collect data on secret things 00:37:05.240 --> 00:37:10.800 that were recently not known at all, but now are, and we can find data on that, we 00:37:10.800 --> 00:37:14.140 can start to find data on unknown code words and unknown things that we might not 00:37:14.140 --> 00:37:20.560 otherwise recognize. And then there’s data released by governments, from FOIA 00:37:20.560 --> 00:37:25.400 requests through open data initiatives. This, of course, can be spun or things can 00:37:25.400 --> 00:37:31.370 be held back. So it’s not ideal to use on its own. But it can be used like the other 00:37:31.370 --> 00:37:34.740 2 types with in combination with each other. You can use that to provide context, you 00:37:34.740 --> 00:37:42.540 can use open source data to frame FOIA requests and things like that. So the goal 00:37:42.540 --> 00:37:46.730 of Transparency Toolkit is to make it easier to collect all these types of data 00:37:46.730 --> 00:37:50.950 in one place and to start to use this data in the same ways that the Intelligence 00:37:50.950 --> 00:37:55.330 Community uses the data collected from all the various collection disciplines. 00:37:55.330 --> 00:38:00.400 Except their goal isn’t to kill people or be some sort of omniscient to God-like being 00:38:00.400 --> 00:38:04.370 but we just want to build some sort of external structure of accountability. 00:38:04.370 --> 00:38:09.690 To make it easier to uncover and understand things like surveillance programs or human 00:38:09.690 --> 00:38:14.520 rights abuses or corruption. And when we can find the people and companies that are 00:38:14.520 --> 00:38:18.290 involved in things like surveillance we can start to map who’s doing what. 00:38:18.290 --> 00:38:21.870 And we can start to request information about specific contracts. And we know who 00:38:21.870 --> 00:38:24.580 we can ask questions about particular programs. And then we can start to use the 00:38:24.580 --> 00:38:30.020 data to start legal cases against specific companies. And we can start to take more 00:38:30.020 --> 00:38:34.850 concrete actions than we would be able to, otherwise, if we were dealing simply in 00:38:34.850 --> 00:38:38.820 theory or in guesses as to what’s going on. 00:38:38.820 --> 00:38:42.310 So – open source intelligence – let’s just be more pro-active and more direct with 00:38:42.310 --> 00:38:49.280 our techniques. And it also lets us find some of this information earlier, because 00:38:49.280 --> 00:38:52.490 many of the programs mentioned in the Snowden documents were mentioned first 00:38:52.490 --> 00:38:58.890 in other and open data sources. And if we can start to figure out where these are 00:38:58.890 --> 00:39:02.390 and start to figure out what they are, then we know what data we’re missing and 00:39:02.390 --> 00:39:05.410 we can start to go after it with FOIA requests or trying to find it by other 00:39:05.410 --> 00:39:14.060 means. But all of this a really, really big project and we can’t… this is not 00:39:14.060 --> 00:39:17.220 going to work if it’s just us working on it. We need to work with other people. 00:39:17.220 --> 00:39:20.650 We need to work with activists who have ideas of how they want to use the data. 00:39:20.650 --> 00:39:23.640 We need to work with journalists that collect the data and write stories about 00:39:23.640 --> 00:39:27.130 it. We need to work with human rights lawyers to help them with their research 00:39:27.130 --> 00:39:30.430 help them build legal cases based on the findings. We need to work with NGOs and 00:39:30.430 --> 00:39:34.800 human rights researchers who want to collect and use open data in their work. 00:39:34.800 --> 00:39:38.330 And we need more people going through databases like ICWATCH. This doesn’t 00:39:38.330 --> 00:39:42.340 require any special expertise. You gain the knowledge that you need as you’re 00:39:42.340 --> 00:39:46.490 going through them looking up terms. It’s not easy but it can be quite interesting 00:39:46.490 --> 00:39:52.040 once you combine all of these obscure terms and it’s like “Oh, that’s what 00:39:52.040 --> 00:39:56.840 they’re doing!” and oftentimes what they’re doing is something entirely absurd 00:39:56.840 --> 00:40:01.300 like reading all your email or killing people. 00:40:01.300 --> 00:40:05.870 And we also need software developers to help develop software and help us figure 00:40:05.870 --> 00:40:11.130 out how all of these tools should fit together. So if anyone’s interested in 00:40:11.130 --> 00:40:14.770 working with us to take on the Intelligence Agencies of the world and 00:40:14.770 --> 00:40:18.430 figure out what they’re doing please let us know. I think it sounds a bit insane 00:40:18.430 --> 00:40:23.130 and I know that, but (they) have far more resources and far more experience but if 00:40:23.130 --> 00:40:27.720 we keep ignoring the situation and we continue as we are now making scattered 00:40:27.720 --> 00:40:30.640 attempts to change things that aren’t coordinated, that are based on limited 00:40:30.640 --> 00:40:36.290 information, nothing is going to change longterm. So I think we need to collect 00:40:36.290 --> 00:40:40.800 all the information we can and figure out how to effectively combine it and use it 00:40:40.800 --> 00:40:45.510 for concrete goals. And I think we need to do this with free software and open 00:40:45.510 --> 00:40:49.100 data, because against such powerful adversaries they’re probably the best 00:40:49.100 --> 00:40:51.490 hopes we have. 00:40:51.490 --> 00:41:01.940 applause 00:41:01.940 --> 00:41:05.960 Herald: Thank you, thank you so much! Now we have the round of Q&A, 00:41:05.960 --> 00:41:11.630 for anyone who liked to ask a question, please forward to the mikes on both sides 00:41:11.630 --> 00:41:17.070 of this Saal (Hall). Start taking the question from… 00:41:17.070 --> 00:41:18.440 is nodding towards first person asking …yeah. 00:41:18.440 --> 00:41:24.610 Q: So I’d like to ask about documents which are scans. Which are sometimes 00:41:24.610 --> 00:41:30.010 released as official open source information. What kind of workflow do you 00:41:30.010 --> 00:41:35.950 have or even if you have any kind of workflow for some OCR on these…!? 00:41:35.950 --> 00:41:40.870 M.C.: A serious (?) that depends on the document. There’s some open source 00:41:40.870 --> 00:41:46.960 software called Tesseract that’s quite good, but it doesn’t always work in cases 00:41:46.960 --> 00:41:51.260 where there needs to be more specialized parsing. I like to use something that’s 00:41:51.260 --> 00:41:54.830 called Abbyy (FineReader) which is, unfortunately, not open source and we are 00:41:54.830 --> 00:41:59.220 looking for an alternative. For the published Snowden documents, because we 00:41:59.220 --> 00:42:03.560 needed to extract the classification headers and that wasn’t so working with 00:42:03.560 --> 00:42:07.150 Tesseract. But Tesseract works for most things. 00:42:07.150 --> 00:42:10.030 listens to unrecorded comment from the audience 00:42:10.030 --> 00:42:15.190 Yeah. 00:42:15.190 --> 00:42:19.720 Herald: Thank you. Do we have question from… [the internet]? Yeah, oui! 00:42:19.720 --> 00:42:24.310 Signal Angel: Yes, rooty is asking on IRC: What would you recommend the NSA to 00:42:24.310 --> 00:42:27.540 develop towards a future of Social Usefulness!?? 00:42:27.540 --> 00:42:35.780 E.g. what value have databases from 2015, people cell phone sensors in 2115!?? 00:42:35.780 --> 00:42:40.550 Could you give the NSA, maybe CEO there, useful work!?? 00:42:40.550 --> 00:42:42.760 M.C.: Can you rephr..-, sorry !?? 00:42:42.760 --> 00:42:50.010 Signal Angel: naively repeats first of the apparent Troll questions 00:42:50.010 --> 00:42:52.290 M.C.: laughs Social Usefulness… 00:42:52.290 --> 00:42:56.070 Probably the most useful thing they could do is stop collecting the data in the 00:42:56.070 --> 00:43:01.760 first place, especially the data that’s being intercepted or illegally collected. 00:43:01.760 --> 00:43:07.250 There’s probably some amounts of useful tracking they could do, but I’m not sure 00:43:07.250 --> 00:43:10.300 that’s the best approach using the tactice that they were to collect the data at that 00:43:10.300 --> 00:43:12.670 time. 00:43:12.670 --> 00:43:16.070 Herald: Thank you. So, next question from you, please! 00:43:16.070 --> 00:43:20.490 Question: Hello, thanks for the talk, that was one of the best ones I’ve seen at this 00:43:20.490 --> 00:43:26.740 congress. I was wondering what you think about the question you’re raising about 00:43:26.740 --> 00:43:30.840 “we shouldn’t make the same mistakes”. Because I’m not totally sure that’s 00:43:30.840 --> 00:43:34.780 possible because of things I’ve seen in other communities. All communities have 00:43:34.780 --> 00:43:41.100 their extremists and they will abuse this data. And then that allows a political 00:43:41.100 --> 00:43:46.610 attack on you, because they say you made that happen, it’s not true. But it will celd 00:43:46.610 --> 00:43:50.230 people. So how do you protect against that? 00:43:50.230 --> 00:43:53.660 M.C.: I think it’s hard to entirely protect against it because we can’t 00:43:53.660 --> 00:43:57.330 control the actions of other people. But people could also go off and use this data 00:43:57.330 --> 00:44:01.530 negatively by collecting it on their own, independently of us. I was actually quite 00:44:01.530 --> 00:44:05.280 impressed, after we launched ICWATCH, I haven’t heard of anyone complaining of 00:44:05.280 --> 00:44:07.380 threats that they’ve gotten from people… 00:44:07.380 --> 00:44:10.040 People in the Intelligence Community: I haven’t heard of anyone in the 00:44:10.040 --> 00:44:11.980 Intelligence Community complaining about threats that they’ve gotten as the results 00:44:11.980 --> 00:44:16.450 of ICWATCH being launched. All of the complaints have been theoretical. The only 00:44:16.450 --> 00:44:19.340 threats I’ve heard of resulting from ICWATCH are that from the Intelligence 00:44:19.340 --> 00:44:21.940 Community to us. I haven’t heard of anything, so I’ve been very impressed with 00:44:21.940 --> 00:44:27.190 the civility of the internet in that case. And I think that maybe, by framing it, and 00:44:27.190 --> 00:44:30.400 actually bringing it down to the individual level, and making it clear that 00:44:30.400 --> 00:44:35.460 these are people, that makes it a little bit less likely that people will go after 00:44:35.460 --> 00:44:37.610 them in a vicious way. 00:44:37.610 --> 00:44:43.260 Q: Have you thought of creating a kind of usage guidelines? I mean that's not gonna change what 00:44:43.260 --> 00:44:48.270 anyone does. But if someone does something you can then say “That’s against our usage 00:44:48.270 --> 00:44:52.170 guidelines” and it’s a political defence against someone accusing it… 00:44:52.170 --> 00:44:56.040 M.C.: Yeah, I don’t think there’s any way that we can enforce something like that. 00:44:56.040 --> 00:44:59.830 But we do try to be very careful with how we’re framing it in saying – like I - 00:44:59.830 --> 00:45:02.920 since a long time, all this talk saying these are people that are not evil people. They’re 00:45:02.920 --> 00:45:06.570 normal people that you should look at as such. So I think being very careful of 00:45:06.570 --> 00:45:09.140 framing it and we’ll be developing some sort of guidelines. That’s definitely a 00:45:09.140 --> 00:45:11.230 good idea. 00:45:11.230 --> 00:45:13.740 Herald: Thank you. Your question, please! 00:45:13.740 --> 00:45:19.590 Troll: Hi! First, thank you very much for this tool that makes it possible to fight 00:45:19.590 --> 00:45:27.750 back against, legally. For people who try to punish or yeah… 00:45:27.750 --> 00:45:34.020 What I have to say, or my question is: I worked in the last 3 1/2 years, let’s say, 00:45:34.020 --> 00:45:39.530 in the field of IT Forensics. And I worked with Maltego and stuff, and so I know what 00:45:39.530 --> 00:45:45.210 a lot of work it is to collect data and bring it into good conditions, so others 00:45:45.210 --> 00:45:57.480 could read it or you can get a goal, or see a goal. And what I personally think 00:45:57.480 --> 00:46:04.700 is very important: this could be very sensible data to people and my question 00:46:04.700 --> 00:46:12.620 is: How do you care that this data which you will offer to download will keep 00:46:12.620 --> 00:46:20.470 safe? That’s the first question, and the second is: Did you think about 00:46:20.470 --> 00:46:27.830 verifications? So you are collecting a lot of data, and in a few years another person 00:46:27.830 --> 00:46:34.650 wants to see if this data was correct. So do you verify the sources like MD5 sum 00:46:34.650 --> 00:46:44.230 or so you can say “This fingerprint taken at this-day and this-time is correct?” 00:46:44.230 --> 00:46:51.220 M.C.: For the first question: I don’t think there’s really… I’m not sure (?) 00:46:51.220 --> 00:46:56.220 protected because this is a version that people posted publicly themselves. So they 00:46:56.220 --> 00:47:00.720 sort of said that they don’t want it to be protected or secured because they’re 00:47:00.720 --> 00:47:07.250 posting it on the public internet. So I’m not sure there’s really any reason to try 00:47:07.250 --> 00:47:11.510 to protect it when it’s something that they’ve published very publicly. 00:47:11.510 --> 00:47:16.050 And on the second one, for verification, that’s quite tricky with some of the data 00:47:16.050 --> 00:47:18.990 especially around the Intelligence Community because all of these things 00:47:18.990 --> 00:47:22.320 are secretive and it’s hard to confirm them. We can confirm them against each 00:47:22.320 --> 00:47:26.760 other like now we have multiple résumé sites on ICWATCH, so sometimes we can find 00:47:26.760 --> 00:47:31.020 the same person’s résumé on another site and compare over time and we can go 00:47:31.020 --> 00:47:34.410 finding their profiles they have and try to combine as much data on the same 00:47:34.410 --> 00:47:36.310 as is possible and have it over time. 00:47:36.310 --> 00:47:41.790 Q: What I did: I made a fingerprint when I downloaded a website, I made a 00:47:41.790 --> 00:47:45.790 fingerprint and then I can say OK, this is… yeah. 00:47:45.790 --> 00:47:48.730 M.C.: Of truth verifying various actions collected, then. Yeah, I mean that's a bit harder to 00:47:48.730 --> 00:47:54.980 absolutely do that on the behalf all of the full text of the web page save, then 00:47:54.980 --> 00:48:01.350 we have it all published on Github so you can verify those collected then but, yeah. 00:48:01.350 --> 00:48:03.980 Herald: We’ll take the questions from up there. 00:48:03.980 --> 00:48:10.390 Jake Appelbaum: Hi, community extremist here… So I wanted to say something which 00:48:10.390 --> 00:48:13.380 is that I think what Julian did for leaking documents you’re doing for 00:48:13.380 --> 00:48:17.800 analysis. Which is really great! Because transparency is enough – you need action! 00:48:17.800 --> 00:48:21.310 And so I just wanted to say that I hope that everyone can give and see in 00:48:21.310 --> 00:48:28.000 Transparency Toolkit a lot of material support. And maybe a round of applause! 00:48:28.000 --> 00:48:33.750 applause 00:48:33.750 --> 00:48:37.940 Definitely the best talk at the congress and I had a couple of suggestions. But 00:48:37.940 --> 00:48:41.640 one of them is: I think it would be great if you could focus on American Domestic 00:48:41.640 --> 00:48:43.060 Police Agencies. M.C.: Hmm-mhm… 00:48:43.060 --> 00:48:48.110 Jake: In particular collecting the images of Police Academy Graduation photographs. 00:48:48.110 --> 00:48:53.340 And to be able to move in the direction of facial recognition, so that we can find 00:48:53.340 --> 00:48:56.440 Undercover Police Officers that are in our midst… 00:48:56.440 --> 00:49:01.740 applause 00:49:01.740 --> 00:49:06.640 And I think it would be great if you could create a FOIA wizard, essentially, ’cause 00:49:06.640 --> 00:49:10.720 everybody likes wizards, and who doesn’t like UNIX… So it’d be great if you could 00:49:10.720 --> 00:49:14.290 create a FOIA wizard where you could say: “I wanna know about these terms” and it 00:49:14.290 --> 00:49:19.310 would just generate automatically – maybe by partnering with Macroc e.g. – 00:49:19.310 --> 00:49:22.890 interesting things, where there’s a kind of “Wait!”. Where you realize there’s a lot 00:49:22.890 --> 00:49:26.630 of people working on this classified program and it’s at this agency and they 00:49:26.630 --> 00:49:29.350 have a contract with this company and these are the people involved and just 00:49:29.350 --> 00:49:34.020 automatically generate those FOIAs and then get people to sort of sign up to put 00:49:34.020 --> 00:49:38.440 their name down and sort of sponsor a little transparency and to say “Oh, that’s 00:49:38.440 --> 00:49:41.610 the FOIA I wanna get behind, I’m in a check on it, you know, once a week, I’m 00:49:41.610 --> 00:49:45.170 gonna do this thing. Through Macroc.” I think that would be a way to take this 00:49:45.170 --> 00:49:49.410 information in a legal manner and to make it actionable. And I think there’s lots of 00:49:49.410 --> 00:49:53.869 other interesting things you could do that are not about the law. But I leave that to 00:49:53.869 --> 00:49:57.270 the imagination of other people. It should be legal but it doesn’t need to be through 00:49:57.270 --> 00:50:02.090 legal channels like, say, FOIA. So thanks for the work that you’re doing, M.C. and 00:50:02.090 --> 00:50:06.170 I hope that you will expand it to, basically, all of the pigs of the whole 00:50:06.170 --> 00:50:10.190 world. And I would really encourage you to read Hannah Ahrend’s “Eichmann in 00:50:10.190 --> 00:50:15.760 Jerusalem”, because you described a fundamental thing: these people aren’t 00:50:15.760 --> 00:50:21.280 evil. But actually, Evil itself doesn’t exist. These people are the Banality of 00:50:21.280 --> 00:50:26.040 Evil. They’re people who have soccer practice, and they have a dog, and they 00:50:26.040 --> 00:50:29.540 like to go home and fuck their wife, and they’re regular people who do drone 00:50:29.540 --> 00:50:31.520 strikes. 00:50:31.520 --> 00:50:36.340 applause 00:50:36.340 --> 00:50:40.150 Herald: Thank you. We have a question on mike 1. 00:50:40.150 --> 00:50:46.540 Q: How easy is it to add support for new databases or new sources of information? 00:50:46.540 --> 00:50:51.050 M.C.: It depends on the source and how that site is structured. But generally 00:50:51.050 --> 00:50:55.110 it’s not too difficult. The adding to proper new sources does require 00:50:55.110 --> 00:51:00.060 programming at this point. But it’s not particularly complex programming and we 00:51:00.060 --> 00:51:03.350 have some libraries that make some parts of it easier, as well. And if you’re 00:51:03.350 --> 00:51:05.700 interested in adding a data source we’re more than happy to help with that. 00:51:05.700 --> 00:51:10.980 Q: Awesome! My favourite is the list of… the report of when people were denied 00:51:10.980 --> 00:51:16.440 security clearance and why and if their appeal was then, like, removed. 00:51:16.440 --> 00:51:18.280 M.C.: Yeah, that would be quite interesting! 00:51:18.280 --> 00:51:24.490 Q: Okay! 00:51:24.490 --> 00:51:29.050 Herald: If there’s no further questions… moment… 00:51:29.050 --> 00:51:34.140 yeah, okay! Please! 00:51:34.140 --> 00:51:44.010 Q: Yesterday it was said that we have to make sure that they know that we watch 00:51:44.010 --> 00:51:50.900 them and make sure that they know that we watch them. Because some day they will get 00:51:50.900 --> 00:51:57.680 prosecuted. So, in some way. I think you are exactly doing this. So this is 00:51:57.680 --> 00:52:12.350 brilliant. Are you already in the stage where you’re thinking you can start 00:52:12.350 --> 00:52:18.390 concrete legal actions against some individuals that you are getting 00:52:18.390 --> 00:52:24.590 information with your tools. We’ve been working with some lawyers towards that. 00:52:24.590 --> 00:52:29.230 We are looking to do more in this, so if you know… if you have any ideas for 00:52:29.230 --> 00:52:32.080 particular situations where this may be applicable, our lawyers, that we should 00:52:32.080 --> 00:52:37.150 work with, let us know! But we’re working towards that and making some progress. 00:52:37.150 --> 00:52:41.730 Q: Thanks! 00:52:41.730 --> 00:52:44.690 Herald: Getting a question from up there, please! 00:52:44.690 --> 00:52:49.840 Q: I just wanna say that you are a visionary who is more passionate than 00:52:49.840 --> 00:52:53.420 anybody I have ever collaborated with and it’s a total honor. 00:52:53.420 --> 00:52:54.369 applause 00:52:54.369 --> 00:52:57.220 Herald: Thank you. 00:52:57.220 --> 00:53:02.780 M.C.: Yeah, and just to everyone, that’s Brennan who also works on Transparency 00:53:02.780 --> 00:53:06.710 Toolkit. He made the awesome UI for Harvester and Lookingglass that you saw 00:53:06.710 --> 00:53:09.470 in the Tabs of all this. 00:53:09.470 --> 00:53:14.780 applause 00:53:14.780 --> 00:53:17.900 Jake: If no one else is gonna ask a question, I’d like to ask a question which 00:53:17.900 --> 00:53:21.260 I know the answer to but no one else in the room does. And I think it’s very 00:53:21.260 --> 00:53:25.210 fascinating. I wonder if you could talk about lessons that you’ve learned from 00:53:25.210 --> 00:53:28.490 studying about the South African Resistance to Apartheid. 00:53:28.490 --> 00:53:30.020 M.C. is laughing Jake: And maybe you could talk about the 00:53:30.020 --> 00:53:34.880 things that drive you to work on these things. E.g. what inspires you to justice? 00:53:34.880 --> 00:53:39.310 E.g. experiences at MIT and maybe – I mean if you don’t want to talk about it, I’m 00:53:39.310 --> 00:53:42.940 sorry for asking it. But if you do wanna talk about it I think you can inspire 00:53:42.940 --> 00:53:48.930 everyone else here to raise their fist with you! In solidarity. 00:53:48.930 --> 00:53:57.150 M.C.: Yeah… Okay… I guess it’s been nearly 3 years now, so maybe that’s okay 00:53:57.150 --> 00:54:06.480 to talk about. 3 years ago there was this case at MIT… everyone has probably heard 00:54:06.480 --> 00:54:13.930 of Aaron Swartz and he was being prosecuted for downloading documents from 00:54:13.930 --> 00:54:22.480 JSTOR. And I was brought in trying to figure out MIT’s role in this situation, and if you 00:54:22.480 --> 00:54:26.400 might be able to sway a public opinion, a few people in Boston. I think some of 00:54:26.400 --> 00:54:31.110 them are in this room. And we were trying to help him. And eventually, part way into 00:54:31.110 --> 00:54:35.770 the process, he became afraid and decided that it would be more risky for us to help 00:54:35.770 --> 00:54:38.890 him, with the prosecutor who might lash back, so we stopped. But one of the things 00:54:38.890 --> 00:54:45.650 that I did in this process was, I sent out a survey to all of the professors at MIT 00:54:45.650 --> 00:54:54.450 asking their opinion on his case. And whether they identified with his actions. 00:54:54.450 --> 00:54:59.280 And I got a lot of response to this survey. Some were quite nice and were 00:54:59.280 --> 00:55:03.560 quite supportive. Some were very vicious, saying that he should go to jail and that 00:55:03.560 --> 00:55:09.040 he is a waste of humanity and he works at this Harvard Center for Ethics, so how is 00:55:09.040 --> 00:55:13.390 this ethical. And things like that. They were quite horrible. And initially he had 00:55:13.390 --> 00:55:17.540 access to this database and somehow over the next year, when we weren’t doing much, 00:55:17.540 --> 00:55:21.970 he lost access to this database. And he emailed me asking for access again. And 00:55:21.970 --> 00:55:26.800 back then I was on some stupid kick about research ethics and redaction and thought 00:55:26.800 --> 00:55:30.570 that there’s no reason to… It really seems that’s like “I cannot give you the answers 00:55:30.570 --> 00:55:34.770 about the names”. I was just stupid because the names are the most useful part of that 00:55:34.770 --> 00:55:42.470 data. And I kind of abandoned him, along with a lot of other people in that. And I 00:55:42.470 --> 00:55:50.119 feel like if I had given him the names that might have been something that could 00:55:50.119 --> 00:55:53.490 be used to find supporters within MIT or people who were rallying against him. And 00:55:53.490 --> 00:55:56.050 I don’t think it would have made a huge difference but it might have made just a 00:55:56.050 --> 00:56:02.140 little bit. And that was one of the things that really showed me the power of data on 00:56:02.140 --> 00:56:06.190 individuals and the role of individuals within institutions. And I feel like I 00:56:06.190 --> 00:56:10.780 really failed there. So I don’t want to do that again. 00:56:10.780 --> 00:56:16.270 applause 00:56:16.270 --> 00:56:20.540 Herald: Thank you. Unfortunately, we need to wrap up because we are out of time. 00:56:20.540 --> 00:56:26.900 Thank you for attending this very interesting lecture and, quite touching 00:56:26.900 --> 00:56:28.230 in the end. 00:56:28.230 --> 00:56:33.780 postroll music 00:56:33.780 --> 00:56:38.350 Subtitles created by c3subtitles.de in 2016. Join and help us do more!