< Return to Video

M. C. McGrath: Collect It All: Open Source Intelligence (OSINT) for Everyone

  • 0:00 - 0:11
    32C3 preroll music
  • 0:11 - 0:15
    M.C.: Hey! So, can you hear me OK? Yeah.
  • 0:15 - 0:20
    I am M.C. and I work on Transparency
    Toolkit along with Brennan Novak
  • 0:20 - 0:26
    and Kevin Gallagher. Basically, what
    we try to do is “Watch the Watchers”.
  • 0:26 - 0:31
    Back in May we released a database of
    over 27.000 people in the Intelligence
  • 0:31 - 0:37
    Community called ICWATCH. And this is
    people who are talking about their work on
  • 0:37 - 0:42
    classified programs on the public
    internet. So we collected it using
  • 0:42 - 0:46
    search terms like the code words
    mentioned in the Snowden documents.
  • 0:46 - 0:51
    And today we’re releasing
    an update to ICWATCH
  • 0:51 - 0:56
    doubling the data in the database.
  • 0:56 - 1:01
    applause
  • 1:01 - 1:07
    And that’s already vive, if
    anyone wants to look at it.
  • 1:07 - 1:12
    For the people who aren’t familiar with
    this project and the sorts of things
  • 1:12 - 1:17
    available on the research methods I’d like
    to go through an interesting example of
  • 1:17 - 1:20
    research things that can
    be found in this database.
  • 1:20 - 1:26
    So this is Lauren Russell, and she works
    at L-3, a major intelligence contractor.
  • 1:26 - 1:31
    But she started her career as an army
    interrogator in Iraq. She says that
  • 1:31 - 1:37
    the information that she collected was
    used to capture dozens of people.
  • 1:37 - 1:40
    But part of her job was also to assure
    safe and humane treatment of hundreds
  • 1:40 - 1:45
    of detainees. So that’s good at least. But
    then, a few years after that, she went and
  • 1:45 - 1:50
    worked for a different company called
    Exelis in Afghanistan. And this job
  • 1:50 - 1:56
    was quite different. It involved finding
    people to kill. So she says as part
  • 1:56 - 2:00
    of this work that she “utilized F3EA
    methodology to conduct analysis on raw and
  • 2:00 - 2:05
    fused HUMINT, SIGINT, and COMINT helping
    to create 125 Targeting Support Packets
  • 2:05 - 2:09
    then nominated to the Joint Priority
    Effects List (JPEL) for kinetic targeting.”
  • 2:09 - 2:14
    So there’s a lot of not very obvious terms
    and gibberish there. And this is a pretty
  • 2:14 - 2:18
    common problem by going through these
    résumés. So I want to break down how you
  • 2:18 - 2:23
    would interpret that sentence. “Signals
    Intelligence” is what the NSA does.
  • 2:23 - 2:28
    It’s collecting data from intercepted
    communications. COMINT – Communications
  • 2:28 - 2:31
    Intelligence – is specifically Signals
    Intelligence from communication data.
  • 2:31 - 2:35
    So what the NSA does
    when they read your email.
  • 2:35 - 2:39
    HUMINT, Human Intelligence is
    Intelligence on human sources.
  • 2:39 - 2:46
    So things like data gain
    from informers or from torture.
  • 2:46 - 2:50
    The “direct priority of XLES” is a list of
    people the US military and its allies are
  • 2:50 - 2:55
    trying to kill and capture in Afghanistan.
  • 2:55 - 2:59
    F3EA stands for “Find, Fix, Finish,
    Exploit and Analyze”. It’s a rapid
  • 2:59 - 3:03
    intelligence collection and analysis
    methodology used for targeting. And
  • 3:03 - 3:07
    we recently found out in the Drone
    Papers that this is often used for
  • 3:07 - 3:13
    drone targeting. And “Kinetic Targeting”
    simply means attacking a moving target.
  • 3:13 - 3:17
    So looking at her profile again: she says
    that she “F3EA methodology
  • 3:17 - 3:21
    to conduct analysis on raw and fused
    HUMINT, SIGINT and COMINT helping to
  • 3:21 - 3:25
    create 125 Targeting Support Packets
    then nominated to the direct priority
  • 3:25 - 3:29
    of XLES for conduct targeting.” Basically
    what she means is that based on
  • 3:29 - 3:33
    intercepted communications and information
    from human sources, possibly gained under
  • 3:33 - 3:39
    the rest from torture she is deciding
    who should be killed and captured.
  • 3:43 - 3:49
    The Intelligence Community has long
    had an attitude of “Collect It All”.
  • 3:49 - 3:53
    And General [Keith B.] Alexander
    started trying to collect all the data
  • 3:53 - 3:58
    that they could from every source.
    One of the first projects to this end
  • 3:58 - 4:03
    was something called Real Time Regional
    Gateway (RT-RG). It’s a master project to
  • 4:03 - 4:08
    store, combine, search and analyze data
    from many different sources at once.
  • 4:08 - 4:12
    Everything from intercepted communications
    to data from drones to data from
  • 4:12 - 4:18
    interrogations to even mundane things like
    traffic patterns and the prize of potatoes.
  • 4:18 - 4:23
    They started this program in 2005.
    The initial version was built by SAIC
  • 4:23 - 4:27
    for use in Iraq. And these days it’s
    mostly used in Afghanistan.
  • 4:27 - 4:32
    It searches the US soil because according
    to documents published in “Der SPIEGEL”
  • 4:32 - 4:38
    last year Germany is the 3rd largest
    contributor to RT-RG. This source
  • 4:38 - 4:41
    of collection analysis tools are used
    for some programs that you might have
  • 4:41 - 4:47
    heard of too, like CoTraveller – the
    program the NSA has to figure who is
  • 4:47 - 4:52
    going places with who else. And there is
    a specific analytic tool. This part of
  • 4:52 - 4:58
    RT-RG called SIDEKICK that uses relative
    velocities to calculate this from any
  • 4:58 - 5:02
    different data sources, so that they can
    calculate that for people across networks.
  • 5:02 - 5:04
    Unfortunately, this is really
    computationally intensive because they
  • 5:04 - 5:09
    need to pre-compute all of the travel
    behaviour for all the pairs of selectors.
  • 5:09 - 5:12
    But it’s feasible for them to do
    computationally intensive things the time
  • 5:12 - 5:18
    that it’s built because it’s built on
    Hadoop and accumulo for distributed data
  • 5:18 - 5:27
    processing and storage. So they’re quite
    serious about this. The goals for RT-RG
  • 5:27 - 5:33
    are quite lofty. One of the creators, in
    an interview with “Defence News” described
  • 5:33 - 5:37
    their aim is being able to use intercepted
    communications and integrate it with
  • 5:37 - 5:42
    signals with geolocation. So that they can
    instantly find people and target them.
  • 5:42 - 5:47
    Another counter-terrorism official told
    the Wall Street Journal that RT-RG
  • 5:47 - 5:53
    literally allows them to predict the
    future. Decorrelation means it’s the
  • 5:53 - 5:57
    strongest correlation tool ever. So their
    goals of this seem to be two-fold: First
  • 5:57 - 6:03
    of all to be able to kill or smite any
    potential enemies. And 2nd one to be
  • 6:03 - 6:08
    omniscient. To know everything that’s
    happening at once. And to correlate it and
  • 6:08 - 6:13
    use that to predict what will happen in the
    future. And these goals sound a little bit beyond
  • 6:13 - 6:19
    what you would expect from someone
    who is trying to simply protect people or
  • 6:19 - 6:22
    stop terrorism. It sounds more like
    they’re trying to become some sort
  • 6:22 - 6:27
    of God. Who by collecting and analyzing
    everything know everything that’s
  • 6:27 - 6:32
    happening everywhere and can just smite
    any enemies from above. Instantly.
  • 6:32 - 6:37
    But the thing is they are'nt a God. There are
    people working on these and they're
  • 6:37 - 6:40
    normal people. And they’ve crazy
    resources and they intercept
  • 6:40 - 6:44
    a lot of data. But they also use data
    that’s freely available to anyone for
  • 6:44 - 6:50
    a lot of their work. Open Source
    Intelligence. This is a pamphlet from
  • 6:50 - 6:55
    a startup called ZeroFox that uses data
    from Social Media to track ISIS.
  • 6:55 - 7:00
    And tools like this are quite common.
    There’s another tool called “LM Wisdom”
  • 7:00 - 7:04
    that’s made by Lockheed Martin. And
    they have a wonderful promotion video
  • 7:04 - 7:09
    on their website explaining exactly how it
    works – that I’d like to play.
  • 7:09 - 7:12
    with lowered voice:
    Hopefully this’ll work…
  • 7:12 - 7:16
    audio/video starts Female Narrator:
    Social Media content has the power
  • 7:16 - 7:19
    to incite organized movements
    and sway political outcomes.
  • 7:19 - 7:23
    Person in Video: “It’s an opposition
    terrorist organization in Iran.”
  • 7:23 - 7:26
    Female Narrator: Monitoring and analyzing
    the massive and rapidly changing
  • 7:26 - 7:31
    open source intelligence data, or OSINT,
    and turning it into actionable intelligence
  • 7:31 - 7:37
    for decision-makers is an imperative.
    Lockheed Martin’s Wisdom software suite
  • 7:37 - 7:42
    offers an advanced capability to collect,
    manage and analyze vast amounts
  • 7:42 - 7:48
    of open source data. Enabling analysts
    to understand, measure and anticipate
  • 7:48 - 7:52
    real-world advance through Social Media.
    Person in Video: “Think of Wisdom as your
  • 7:52 - 7:59
    eyes and ears on the web. Wisdom is
    that tool that would allow it to do this
  • 7:59 - 8:00
    at scale!”
    Female Narrator: Wisdom’s advanced
  • 8:00 - 8:05
    Big Data collection capability and data
    store automatically identify and harvest
  • 8:05 - 8:09
    online Social Networking data of
    operational interest. As well as
  • 8:09 - 8:15
    socio-cultural data from standard online
    open sources like newspaper feeds and
  • 8:15 - 8:20
    structured databases. Wisdom’s high-
    performance analytic algorithms analyze
  • 8:20 - 8:26
    the content in near realtime distinguishing
    noise from high-value information.
  • 8:26 - 8:31
    Capturing trends, sentiment and influence;
    turning open source data into predictive,
  • 8:31 - 8:36
    actionable intelligence.
    audio/video stops
  • 8:36 - 8:37
    M.C.: Yeah, so…
    applause
  • 8:37 - 8:41
    …that’s what they’re doing. And they’re
    not just using this to target terrorists.
  • 8:41 - 8:46
    It was recently revealed that they are
    helping Walmart use this to find employees
  • 8:46 - 8:50
    that are organizing for better working
    conditions and find the main organizers
  • 8:50 - 8:54
    and fire them. Using
    data from Social Media.
  • 8:54 - 8:59
    So it’s used for Corporate purposes as
    well. And LM Wisdom wasn’t even made
  • 8:59 - 9:03
    for surveillance in the first place.
    I tracked down one of the people
  • 9:03 - 9:09
    who created it. And at that time he worked
    for General Electric and was hoping to
  • 9:09 - 9:14
    make a… to help NBC make tools so
    that they can figure out which sites
  • 9:14 - 9:20
    to partner with to make their videos go
    viral. So it’s not just governments that
  • 9:20 - 9:23
    are using Open Source Intelligence because
    there’s no barriers to access it and
  • 9:23 - 9:28
    there’s many applications. There’s even
    many people search databases that
  • 9:28 - 9:31
    have information like people’s address,
    and phone number, and relatives,
  • 9:31 - 9:35
    and how old they are. And these include
    many, many people. Probably everyone
  • 9:35 - 9:39
    in the US. And they’re used by many people
    for all sorts of purposes from private
  • 9:39 - 9:48
    detectives to people that are selling
    advertisements. If this data is available
  • 9:48 - 9:53
    already and it’s used for everything from
    figuring out who to kill to stopping unions
  • 9:53 - 9:57
    from organizing to trying to sell things
    to people – why can’t we use it to
  • 9:57 - 10:01
    understand surveillance programs, too?
    Why can’t we use it to understand human
  • 10:01 - 10:05
    rights abuses. Why not use it for
    accountability? So we started to build
  • 10:05 - 10:10
    tools to do this and in the near future
    we’d like to make it possible for anyone
  • 10:10 - 10:14
    to make something like ICWATCH or other
    databases in less than a day and without
  • 10:14 - 10:20
    programming. Long-term goal is to build
    software similar to what the Intelligence
  • 10:20 - 10:24
    Community has. Things similar to LM-Wisdom,
    things similar to Real Time Regional Gateway.
  • 10:24 - 10:30
    So that people can collect all this
    information in one place and analyze it.
  • 10:30 - 10:33
    I’d like to show a demo of some of the
    tools that we’ve been working on. It’s
  • 10:33 - 10:41
    possible to just – this won’t work at all
    but we’ll see. So this is Harvester. It’s
  • 10:41 - 10:49
    a tool for collecting data from online
    sources in an automated fashion. You can
  • 10:49 - 10:53
    choose different data sources, say
    “Indeed” – this is a résumé website – and
  • 10:53 - 10:58
    say you want to find anyone who mentioned
    XKeyscore and for sake of timing let’s
  • 10:58 - 11:08
    just get people in Maryland. And “start
    collecting”, and it might take a second
  • 11:08 - 11:13
    because it’s still a bit rough. But it
    opens a browser, goes finds other people
  • 11:13 - 11:19
    who mention XKeyscore in Maryland and it
    goes and downloads all of their résumés
  • 11:19 - 11:24
    in one place… you can kind of see them
    as they download because this is being
  • 11:24 - 11:49
    slowed a bit down right now. That just
    works key services and fairly small.
  • 11:49 - 11:58
    Something shouted from out of the audience
    M.C.: laughs
  • 11:58 - 12:02
    applause
  • 12:06 - 12:12
    Takes a second to load,
    still kind of rough…
  • 12:12 - 12:19
    Yeah, so we’re hoping to add many different
    data sources, so that people can collect
  • 12:19 - 12:23
    data from sources online as well as just
    take a pile of pdf’s on their computer,
  • 12:23 - 12:27
    point at the directory and it will load
    them and OCR them and people will be able
  • 12:27 - 12:31
    to search through them
    in a searchable database.
  • 12:31 - 12:36
    So while this is loading why don’t I go
    and walk through some of the rest of the
  • 12:36 - 12:40
    pipeline. So our goal is to have tools
    for collecting data, loading it into
  • 12:40 - 12:47
    a database; and then tools for matching
    data across various sources on the same
  • 12:47 - 12:50
    person or the same company. So it should
    take someone’s résumés and Social Media
  • 12:50 - 12:54
    profiles and everything and link it
    together and then also link that to the
  • 12:54 - 12:57
    companies they work(ed) for, the other
    people they know, the locations they’ve
  • 12:57 - 13:02
    lived. As well as tools for extracting
    things from data. So to be able to go
  • 13:02 - 13:04
    through a résumé, extract all the code
    words mentioned, to be able to go through
  • 13:04 - 13:08
    a document and extract all the
    companies mentioned and generating
  • 13:08 - 13:13
    entities that way. And tools for searching
    through data in databases where you can
  • 13:13 - 13:18
    search for search queries and browse by
    categories. And for viewing data and
  • 13:18 - 13:24
    network graphs and maps. Let’s see if this
    is done… Right now it just shows the
  • 13:24 - 13:33
    raw JSON. The connection between tools
    is a bit rough. But we should be able to
  • 13:33 - 13:41
    index the data and load it into a search
    tool. Will take a second. Hopefully this
  • 13:41 - 14:06
    works. Ouh, it’s going! Yah… So it takes
    a little bit. Index… And you can see…
  • 14:06 - 14:14
    The data will be at… It kind of circle
    loaded into a subscriptions list…
  • 14:14 - 14:17
    So there’s a searchable database on all the
    people who are working on XKeyscore
  • 14:17 - 14:27
    in Maryland!
    applause, cheers from audience
  • 14:27 - 14:33
    So I think that in using this Free
    Software and open data really the key is
  • 14:33 - 14:38
    because we have far, far fewer resources
    than the Intelligence Community. And we
  • 14:38 - 14:41
    don’t even have the resources that a
    company like Lockheed Martin has. We can’t
  • 14:41 - 14:45
    internally build all of this software. I
    hope that we will anticipate every future
  • 14:45 - 14:51
    use to be able to help people adapt to
    that. Having people be able to take our
  • 14:51 - 14:54
    data, take our tools and adapt it to their
    own situations is absolutely key to
  • 14:54 - 14:58
    actually ensuring that they’re useful. And
    there are also a lot of open source tools
  • 14:58 - 15:01
    that the Intelligence Community has,
    really. It’s like accumulo, the thing
  • 15:01 - 15:05
    that’s used in Real Time Regional Gateway.
    It was released by the NSA and made open
  • 15:05 - 15:11
    source. And Gaffer which is a graph
    database recently released by GCHQ.
  • 15:11 - 15:16
    So we can sort of take those and possibly
    also build on those in some cases.
  • 15:16 - 15:18
    As well are using the same tools
    chuckles
  • 15:18 - 15:22
    And it’s appropriate because our goal is
    to enable people to collect and use
  • 15:22 - 15:28
    information in the same way that the
    Intelligence Community can.
  • 15:28 - 15:32
    But, well, I think that we should aim
    to collect it all and collect all the
  • 15:32 - 15:35
    information that we can. I think we also
    need to be careful to avoid a lot of the
  • 15:35 - 15:40
    mistakes that the Intelligence Community
    has made. Because some of the effects are
  • 15:40 - 15:46
    quite bad and lead to people being killed
    for no reason at all. And – it’s quite
  • 15:46 - 15:50
    absurd. And the main one of these,
    I think, is de-humanizing people.
  • 15:50 - 15:53
    Torture techniques are specifically
    designed to de-humanize people.
  • 15:53 - 15:56
    When people are looking at data that
    they’ve intercepted, they’re not looking
  • 15:56 - 16:00
    at a person, they’re looking at meta-data,
    they’re looking at numbers on a screen.
  • 16:00 - 16:06
    It’s not something that’s easy to find a
    way around. When I was working on ICWATCH
  • 16:06 - 16:11
    I was grabbling with this problem quite a
    bit. So I decided to try to see who some
  • 16:11 - 16:16
    of these people are and try to put faces
    to these issues. So I started going to
  • 16:16 - 16:19
    Intelligence conferences. Many of these
    conferences are quite open and you can
  • 16:19 - 16:24
    just go in. And I wasn’t that out of place
    either, I just told people that I made
  • 16:24 - 16:27
    tools to collect and analyze
    Open Source Intelligence.
  • 16:27 - 16:29
    laughter and applause
  • 16:29 - 16:36
    There're many people doing.
  • 16:36 - 16:38
    There’re many people doing simmilar
    things out there, too. Like I met the
  • 16:38 - 16:42
    Zerofox people who were one of the examples
    I showed earlier at one of these conferences.
  • 16:42 - 16:45
    They are actually very, very nice. And
  • 16:45 - 16:48
    there were also some people who were quite
    interested in what I was doing. There was
  • 16:48 - 16:51
    one recruiter from Northrop-Grumman who
    seemed somewhat interested in hiring me
  • 16:51 - 16:54
    and I looked her up later and found
    a bunch of job listings where she was
  • 16:54 - 16:59
    trying to hire people who… to work on
    programs related to XKeyscore. It wasn't
  • 16:59 - 17:04
    all good, I got kicked out of one conference.
    I got some strange requests like there was
  • 17:04 - 17:10
    one guy who was trying to figure how to
    use open data to help venture capitalists
  • 17:10 - 17:15
    figure out what porn the founders of the
    startups they funded watched. I’m not sure
  • 17:15 - 17:18
    that’s even possible. But it was really
    weird and he was asking me for help and
  • 17:18 - 17:20
    I was like “I don’t think I can
    help with that, sorry!”
  • 17:20 - 17:27
    laughter and applause
  • 17:27 - 17:31
    Of course there were some negative comments
    on things like Manning and Snowden
  • 17:31 - 17:34
    and some confusion like there was someone
    who is making insider threat detection
  • 17:34 - 17:39
    software, who was talking about how it
    would stop a situation like when Snowden
  • 17:39 - 17:43
    leaked documents to Wikileaks and
    things like that. So people don’t actually
  • 17:43 - 17:46
    know what’s going on. But generally most
    of them were decent people and some of
  • 17:46 - 17:49
    them were quite nice, some of them were
    quite funny. And some of them really
  • 17:49 - 17:53
    seemed to think that what they were doing
    is saving lives. So they’re not evil people
  • 17:53 - 17:58
    who want to hurt others but they’re not
    infallible either. They’re human beings.
  • 17:58 - 18:03
    And our strategy – looking at individuals
    – scares a lot of people. But what you
  • 18:03 - 18:10
    have to realize is that institutions are
    made up by people. It’s easier to just
  • 18:10 - 18:13
    look at the institution. It’s easier to
    just look at an abstract program. Just
  • 18:13 - 18:16
    like it’s easier not to think of the
    person who you just decided to kill in a
  • 18:16 - 18:21
    drone strike as a person. That’s why these
    things continue to happen. I think that
  • 18:21 - 18:25
    there’s a lot of benefit to looking at
    people as people, both to avoid some of
  • 18:25 - 18:29
    the problems the Intelligence Community
    has as well as because people’s data trails
  • 18:29 - 18:32
    are part of the data trails of the
    institutions. And if we’re only looking at
  • 18:32 - 18:36
    institutions we’re missing part of the
    data trail the people leave.
  • 18:36 - 18:41
    Though, of course, no one person is
    responsible for the wrong-doings of the
  • 18:41 - 18:47
    Intelligence Community. So we shouldn’t
    demonize any one person. But…
  • 18:47 - 18:50
    these are the people who go to work every
    day and perpetuate the actions of the
  • 18:50 - 18:55
    Intelligence Community. So I think everyone
    involved is a little bit at fault.
  • 18:55 - 18:58
    And the other benefit of looking at people
    as people is that we can start to
  • 18:58 - 19:01
    understand them. Because you have to
    understand what their hopes are, what
  • 19:01 - 19:05
    their fears are. How they see the world.
    What upsets them. And what might cause
  • 19:05 - 19:09
    them to change their behaviour. And from
    that we can start to maybe come up with
  • 19:09 - 19:13
    alternatives. So let’s look at some of
    these people and look at some of their
  • 19:13 - 19:22
    stories. This is Jason Epperson. He works
    on Intelligence collection for Special
  • 19:22 - 19:27
    Operations. In his spare time he enjoys
    coaching children sports. He currently
  • 19:27 - 19:32
    works at the US Special Ops Command
    (USSOCOM) helping different agencies
  • 19:32 - 19:35
    collect data, share it, say and figure out
    what data they need, just generally
  • 19:35 - 19:39
    helping them integrate it. But when he
    started his career back in 1998 also
  • 19:39 - 19:44
    working on collecting data for Special
    Operations. Then later, in 2004, he went
  • 19:44 - 19:50
    to work at the US Central Command in the
    NSA cryptologic services group and he was
  • 19:50 - 19:53
    focused on tracking down high-value
    targets and individuals. And he claimed
  • 19:53 - 19:57
    that as a result of his work, numerous
    high-value individuals were captured
  • 19:57 - 20:04
    or killed. It is especially interesting
    because he was working on this in 2007
  • 20:04 - 20:09
    when PRISM was launched and at the top
    of his résumé he lists in his specialties
  • 20:09 - 20:15
    PRISM as “possible”, so that’s kind of a
    dinagra but based on his background it
  • 20:15 - 20:21
    might not be. So I think it probably is
    actually PRISM.
  • 20:21 - 20:28
    Then after he was working there he went
    and started working counter-radicalization
  • 20:28 - 20:31
    efforts – things like boosting the
    capacity of Muslim Faith Leaders to win
  • 20:31 - 20:34
    hearts and minds and establishing
    competing social networks to counter
  • 20:34 - 20:37
    Al Qaeda ideology and he’s very clear in
    his job description that he’s not killing
  • 20:37 - 20:43
    people, he’s just helping allies of the US
    figure out who is who, set Interpol notices for.
  • 20:43 - 20:47
    But the most interesting thing about him
    isn’t any of his jobs. It’s this
  • 20:47 - 20:51
    publication that he has at the bottom of
    his résumé called “An Examination of the
  • 20:51 - 20:56
    Effect of Government Data Mining on US
    Citizens”. And this clearly an area where
  • 20:56 - 21:00
    he has a lot of expertise. And he
    presented this at a conference back in
  • 21:00 - 21:05
    2010. I still don’t have a copy yet. It’s
    not easily available. I think it might be
  • 21:05 - 21:10
    possible to get either by buying it from
    the company directly or by going to the
  • 21:10 - 21:15
    Library of Congress that seems to have
    some copies of the conference proceedings.
  • 21:15 - 21:20
    That could be quite interesting. Both
    because he was relatively high up, he was
  • 21:20 - 21:24
    in command of nearly 400 people back when
    PRISM started and he was working with the
  • 21:24 - 21:28
    NSA. It’s possible that he had some role
    early on in the program and this might
  • 21:28 - 21:34
    provide some clues. And then also the
    little “data mining on US Citizens” a bit
  • 21:34 - 21:37
    in the title is kind of interesting
    because that’s supposed to be the last
  • 21:37 - 21:40
    protection – I think that’s kind of a super
    protection because most US citizens
  • 21:40 - 21:43
    wouldn’t find it very comforting if the
    Chinese Government said: “Oh yeah, we have
  • 21:43 - 21:47
    a mass surveillance program but we only
    spy on people who aren’t Chinese citizens.”
  • 21:47 - 21:51
    That’s not really comforting to them, so I
    don’t see why it would be. But it’s been
  • 21:51 - 21:55
    the one thing that people were impeding.
    “We don’t collect it on US citizens”. And
  • 21:55 - 22:00
    just seeing that on the title of a paper
    is like a tiny admission that maybe they
  • 22:00 - 22:08
    do. So some of these (?) files tell other
    interesting stories about people’s lives.
  • 22:08 - 22:12
    If you’ve seen any of my other talks, this
    is someone you’ve heard me talk about
  • 22:12 - 22:16
    a lot. Solomon Varnado. He spent most of
    his life in the military intelligence
  • 22:16 - 22:20
    community, focused on Signals Intelligence
    and Geolocation. He took down his résumé
  • 22:20 - 22:26
    after ICWATCH launched. But I actually
    recently found another résumé of his on
  • 22:26 - 22:31
    another website that has additional
    information like on the side in the
  • 22:31 - 22:36
    military he ran diversity programs and a
    sexual assault prevention program and
  • 22:36 - 22:39
    things like that. I first came across this
    profile because he mentions a lot of
  • 22:39 - 22:45
    interesting code words. This is probably
    the first known mention of XKeyscore back
  • 22:45 - 22:55
    in 2004/2005. But these aren’t the most
    interesting part of his résumé. Later on
  • 22:55 - 22:58
    he… after he works on Intelligence
    Collection Management – just Standard
  • 22:58 - 23:05
    Signals Intelligence Collection – he goes
    and he works for L-3 Stratis. And there he
  • 23:05 - 23:09
    says that he identified, collected, and
    performed direction finding
  • 23:09 - 23:13
    of specified target signals using
    PENNANTRACE, DISPLAYVIEW and CEGS.
  • 23:13 - 23:14
    But I wasn't sure what “PENNANTRACE” was
  • 23:14 - 23:17
    so I found it a definition
    very conveniently located in
  • 23:17 - 23:22
    another résumé. That said it was an
    airborne collection platform for PENNANTRACE.
  • 23:22 - 23:28
    That sounds like some sort of
    Signals Intelligence collection platform.
  • 23:28 - 23:32
    And the other interesting thing about this
    job is that he said that he called for
  • 23:32 - 23:36
    external review of intelligence management
    processes which is not something I see
  • 23:36 - 23:39
    normally. And he was there for a fairly
    short time, only a couple of months.
  • 23:39 - 23:43
    After staying at most of his other jobs
    for over a year. And then at his next job
  • 23:43 - 23:45
    he was also there for
    only a couple of months.
  • 23:45 - 23:48
    He was working at Pluribus International,
    also on Drone Intelligence,
  • 23:48 - 23:50
    this time definitely Drone Intelligence,
    on Predator drones because he
  • 23:50 - 23:54
    mentions Airhandler which we now know
    more about thanks to the catalogue
  • 23:54 - 23:58
    released by The Intercept. It’s a
  • 23:58 - 24:02
    geo-processing system for geolocation
    data from Predator drones.
  • 24:02 - 24:06
    And the update to ICWATCH
    includes all the data on all of the words
  • 24:06 - 24:14
    mentioned in that catalogue. And then
    he leaves the Intelligence Community
  • 24:14 - 24:19
    entirely after that job. And he goes and
    works as a used car salesman at this used
  • 24:19 - 24:23
    car dealership. And it turns out he is
    actually – found him on this other résumé
  • 24:23 - 24:26
    that I just found – He’s actually quite
    a successful used cars salesman.
  • 24:26 - 24:28
    He’s won a bunch of awards.
    He’s one of the best
  • 24:28 - 24:31
    salesmen in the region. So he’s doing quite
    well. And he won a bunch of awards
  • 24:31 - 24:32
    and he's in the military too,
    so it seems like
  • 24:32 - 24:36
    he’s very committed to what he
    does. But still that’s quite a huge career
  • 24:36 - 24:40
    change and it sounds like maybe he was
    starting to get upset with some of how
  • 24:40 - 24:43
    things are really being done and he
    couldn’t figure out a way to fix it after
  • 24:43 - 24:47
    calling for external review
    so he just left.
  • 24:49 - 24:54
    applause
  • 24:54 - 25:02
    And then, this is Michael Dial. Michael
    Dial is a pipe fitter and a plumber. And
  • 25:02 - 25:08
    this is him with his family. He’s actually
    a pipe fitter and a plumber. But he’s not
  • 25:08 - 25:14
    just any pipe fitter. He has security
    clearance. And he goes and he fits pipes
  • 25:14 - 25:18
    in secure facilities. As you might expect
    he does a lot of pipe fitting for naval
  • 25:18 - 25:27
    ships. He also does things like he goes to
    embassies and other secret locations in
  • 25:27 - 25:38
    Afghanistan and Iraq, Ecuador, Serbia
    and sets up their pipes. He also did some
  • 25:38 - 25:44
    pipe fitting in Djibouti at some sort of
    Homeland Security facility which
  • 25:44 - 25:50
    coincidently is also where many of the
    drone programs are run out of. So there’s
  • 25:50 - 25:55
    some interesting cases like that’s where
    there are people like Michael Dial who
  • 25:55 - 25:59
    aren’t involved in Intelligence at all,
    directly. But the information in the
  • 25:59 - 26:05
    résumés still provides very interesting
    useful details about where secret
  • 26:05 - 26:08
    facilities are located and other aspects
    of the Intelligence Community. Because
  • 26:08 - 26:11
    secret facilities don’t just materialize
    out of thin air. They need people to build
  • 26:11 - 26:16
    them, they need people to operate them.
    So from tracking down these people we can
  • 26:16 - 26:19
    start to map them. And then there’re other
    useful things like we can figure out which
  • 26:19 - 26:26
    companies clean the NSA. I’m sure that
    has all sorts of useful applications.
  • 26:26 - 26:34
    This is Eleana Costa. He lives in D.C. and
    he works for the DOD. And this is him at his
  • 26:34 - 26:38
    High School Graduation back in 1988. He
    has been working in Military and
  • 26:38 - 26:45
    Intelligence for nearly 20 years. And back
    in 2003, he worked on Psi Ops programs.
  • 26:45 - 26:51
    Specifically he worked on Psi Ops programs
    in Paraguay, Columbia and Bolivia. And
  • 26:51 - 26:56
    these were in support of DEED, the drug
    enforcement agency and the CIA.
  • 26:56 - 26:59
    And there are a few other reasons ICWATCH
    you mention involvement in Psi Ops in
  • 26:59 - 27:04
    Latin America for the DEA. It seems me
    quite an extensive thing especially since
  • 27:04 - 27:09
    I didn’t collect any data on this
    specifically, and I had just suddenly a bunch
  • 27:09 - 27:14
    of people on the database on this, so:
    maybe worth looking into a bit. And then
  • 27:14 - 27:17
    after that he went and he worked on Psi
    Ops programs in Iraq. So it’s kind of
  • 27:17 - 27:22
    interesting. Then he went and worked
    at the DOD on Human Intelligence.
  • 27:22 - 27:27
    The other interesting thing about Kiliana
    Costa is that he’s one of the people who
  • 27:27 - 27:34
    deleted his résumé after ICWATCH
    launched and that was how I found him.
  • 27:34 - 27:41
    laughter and applause
  • 27:41 - 27:46
    So after ICWATCH launched a lot of people
    were positively interested in it, but we
  • 27:46 - 27:49
    also got a lot of threats because… it’s
    really absurd, because all we’re doing is
  • 27:49 - 27:53
    collecting information that people
    explicitly, independently, willingly
  • 27:53 - 27:57
    posted online about the profession;
    as we’re not posting addresses or
  • 27:57 - 28:03
    anything like that. And making it more
    searchable. Just like google does.
  • 28:03 - 28:07
    But a lot of people in the Intelligence
    Community contacted us and for the first
  • 28:07 - 28:12
    few weeks, we saw a new response
    every day. Some of these were kind of
  • 28:12 - 28:18
    interesting and reveals some sort of non-
    sensical mind sets of people in the
  • 28:18 - 28:25
    Intelligence Community. Like this guy.
    This is Alexander Irinovitch. He sent me
  • 28:25 - 28:29
    a…, actually a nice email, a very nice
    email. It was really nice. Saying that he
  • 28:29 - 28:33
    couldn’t understand why he was in ICWATCH
    because he wasn’t involved in surveillance.
  • 28:33 - 28:37
    He was working at a private company that
    had nothing to do with surveillance.
  • 28:37 - 28:43
    So I looked at his profile and I saw that
    he was working at unit 8200, the Israeli
  • 28:43 - 28:47
    Intelligence unit which, okay, there are
    mandatory military services not that
  • 28:47 - 28:51
    weird, though he was there for several
    years, not just the mandatory portion,
  • 28:51 - 28:58
    and this is the Intelligence unit that
    spies on Palestinians. And then I looked
  • 28:58 - 29:03
    at where he works now. And he works for a
    company called Verint. According to their
  • 29:03 - 29:09
    website they make software for analyzing
    data from wiretaps. So I think that has to
  • 29:09 - 29:13
    do with surveillance. I’m not sure why he
    interpreted that as “nothing to do with
  • 29:13 - 29:17
    surveillance”. But it’s kind of interesting
    interpretation, I think it makes sense for him
  • 29:17 - 29:20
    to be in the database, but of course,
    for any particular profile, there is
  • 29:20 - 29:23
    some noise. So it’s up to whoever
    is looking at it to make the call
  • 29:23 - 29:26
    and do the research.
  • 29:26 - 29:30
    And sometimes other people who complained
    also helped us find interesting details.
  • 29:30 - 29:34
    Like this guy, Joshua Lively. He’s one of
    the people who reported us to the FBI for
  • 29:34 - 29:43
    domestic terrorism. He worked as a
    linguist at this company. I looked at
  • 29:43 - 29:48
    his profile and he mentions a lot
    of interesting code words in it.
  • 29:48 - 29:52
    Some of them didn’t make so much sense
    for the time. This thing called ZB.
  • 29:52 - 29:56
    And then a few weeks later the Intercept
    released this article on a thing called
  • 29:56 - 30:04
    Skynet. It’s used to use machine learning
    to analyze travel data, the telecom
  • 30:04 - 30:08
    providers. And ZB is one of the databases
    they use and he, coincidently, has a lot
  • 30:08 - 30:12
    of the databases that are used in this
    listed in his skills. And as a linguist
  • 30:12 - 30:15
    professioned with the language that’s used
    in the region that’s mainly targeted
  • 30:15 - 30:19
    in this… So I’m not sure if he’s involved
    in this particular program. But it seems
  • 30:19 - 30:23
    like he’s involved in something similar.
  • 30:23 - 30:28
    So it’s quite interesting. Generally there
    are a lot of angry people in the
  • 30:28 - 30:32
    Intelligence Community. Some are nicer
    than others and were just asking questions
  • 30:32 - 30:36
    being like “Can you please take my profile
    down!”, some other more afraid, some other
  • 30:36 - 30:41
    were more violent and sending things like
    death threats. Our server started getting
  • 30:41 - 30:44
    hit pretty hard and ICWATCH kept going
    down. We wanted to be sure that we weren’t
  • 30:44 - 30:48
    going to be compelled to take the data
    down some way. And the easiest way not
  • 30:48 - 30:52
    to be compelled to take the data down is
    to make it so you can’t really take the
  • 30:52 - 30:56
    data down yourself. And the people had
    much less incentive to go after you.
  • 30:56 - 31:01
    So we moved ICWATCH to Wikileaks which has
    been great, and they’ve been wonderful
  • 31:01 - 31:04
    helping with all this. So thank you,
    Wikileaks!
  • 31:04 - 31:10
    applause
  • 31:10 - 31:12
    from the audience: Your welcome!
  • 31:12 - 31:14
    M.C.: chuckles
    laughter
  • 31:14 - 31:18
    As I mentioned earlier a lot of people are
    taking down their résumés in response to
  • 31:18 - 31:25
    ICWATCH. Specifically 1.030 people have,
    out of the original 27.000. And others have
  • 31:25 - 31:29
    edited them and made them private. So as
    part of the update in addition to doubling
  • 31:29 - 31:35
    the number of résumés available we also
    recollected all of the initial résumés
  • 31:35 - 31:40
    and you can go on the site and see which
    ones are removed, which ones are made
  • 31:40 - 31:44
    private, which ones have been modified and
    all of that is fug so you can easily see
  • 31:44 - 31:51
    how that’s changed.
    applause
  • 31:51 - 31:55
    And some of these revealed details that
    people hadn’t posted… that many wish that
  • 31:55 - 32:01
    they hadn’t posted in the first place. But
    they also provide useful updates on where
  • 32:01 - 32:05
    people are working. Because they’re to
    track people as they move from job to job.
  • 32:05 - 32:11
    E.g. there’s this guy, Michael Acosta,
    from the original ICWATCH. From 2011
  • 32:11 - 32:16
    to 2012 he worked at Guantanamo. He
    was primarily trying to find out about
  • 32:16 - 32:22
    potential attacks on Guantanamo itself.
    He monitored various detainees and
  • 32:22 - 32:28
    collaborated with the Behavioural Science
    Team and was trying to figure out if
  • 32:28 - 32:33
    detainees were planning some sort of coup,
    I guess. And then he started working for
  • 32:33 - 32:41
    the Airforce. And here he was working on
    Drone Intelligence and targeting and such
  • 32:41 - 32:44
    things like how he was responsible for
    “the production made instant upgrade of
  • 32:44 - 32:48
    DGS2 mission critical Intelligence
    databases which include high value target
  • 32:48 - 32:53
    development folders” like the things used
    for JPAL targeting, regional fairbriefs,
  • 32:53 - 32:58
    mission storyboards and mission target
    logs with document FMV mission rollups.
  • 32:58 - 33:01
    But the most interesting thing on this
    résumé isn’t any of those things.
  • 33:01 - 33:06
    It’s the thing that changed between the
    original launch of ICWATCH and now.
  • 33:06 - 33:09
    And that’s that he moved and started
    working for a different company.
  • 33:09 - 33:14
    He started working for this company
    called… he called SOS International
  • 33:14 - 33:21
    as All Source Analyst. He unfortunately
    had to leave the position that he had
  • 33:21 - 33:25
    on the site coaching High School Baseball
    which he seemed to really like.
  • 33:25 - 33:28
    And he kind of liked it because right now
    he’s looking for Baseball opportunities
  • 33:28 - 33:32
    in Germany. So he seems to be in Germany
    working for this company called SOS
  • 33:32 - 33:35
    International that I never heard of
    before. So I went on the website and they
  • 33:35 - 33:38
    have a list of the cities that they
    operate in Germany. These 6 cities,
  • 33:38 - 33:44
    along with Guantanamo and a number of
    other sketchy locations. And based on
  • 33:44 - 33:48
    Michael Acosta’s past record of working at
    Guantanamo and on Drone targeting and
  • 33:48 - 33:50
    things like that it sounds like this
    company is probably doing something quite
  • 33:50 - 33:56
    sketchy. By tracking changes to where
    people work we can start to find things
  • 33:56 - 34:00
    like this we might not otherwise think to
    look at. That we might not otherwise about
  • 34:00 - 34:03
    as interesting.
  • 34:03 - 34:10
    But it’s not just open data that we
    collect. Because the same tools for
  • 34:10 - 34:14
    collecting and analyzing open data
    are also useful for other data sets,
  • 34:14 - 34:19
    they’re useful. Like we made a search tool
    in collaboration with Church Foundation
  • 34:19 - 34:22
    for all of the published Snowden documents
    that allows you to search the full text of
  • 34:22 - 34:26
    the documents, browse which code words
    are in these documents, see documents that
  • 34:26 - 34:33
    mention particular countries, see the full
    PDFs and articles. And we also made a…
  • 34:33 - 34:37
    when the Hacking Team data came out this
    summer we mirrored the data and became one
  • 34:37 - 34:42
    of the primary mirrors of the data. We had a
    torrent that was almost downing the server
  • 34:42 - 34:44
    with a lot of space and figured that none
    of the other people had that, so we put it
  • 34:44 - 34:52
    up. And that got a lot of traffic, it got
    about 57 M hits in the first 2 days.
  • 34:52 - 34:54
    And soon we realized there was a problem
    where our server charged a lot for
  • 34:54 - 34:59
    bandwidth and did cost us 48$ everytime
    someone decided to download the 400GB
  • 34:59 - 35:07
    with WGET. So that was interesting but
    it’s been resolved now. It hopefully made
  • 35:07 - 35:11
    the data more accessible to people who
    don’t have 400GB of harddrive space
  • 35:11 - 35:16
    available or enough internet connectivity
    to download that. So then we’ve also made
  • 35:16 - 35:21
    a search tool for all of the Hacking Team
    emails; that has a search interface that
  • 35:21 - 35:25
    lets you browse them like you would in a
    normal email client with threading, and a
  • 35:25 - 35:29
    network graph so that you can see the
    connections between senders and
  • 35:29 - 35:40
    recipients. The Intelligence Community
    has a variety of collection disciplines:
  • 35:40 - 35:45
    SIGINT, OSINT, HUMINT, measurements
    of Signals Intelligence, Symmetry
  • 35:45 - 35:49
    Intelligence. They have all these
    different sources that they’re gathering
  • 35:49 - 35:56
    data from. I think that we should try to
    duplicate this. Because there are a lot
  • 35:56 - 35:58
    of different sources that we can gather
    data from as well, and we need to find
  • 35:58 - 36:02
    base to better collect data from all these
    sources and to fuse them together.
  • 36:02 - 36:06
    These are some other ones that I’ve
    been spending all the time looking at.
  • 36:06 - 36:10
    And there’s open source Intelligence
    things like ICWATCH where you’re
  • 36:10 - 36:13
    collecting data from purely public
    sources. But this is just part of the vare
  • 36:13 - 36:18
    ecosystem that we can draw on. This is
    mostly information that people and
  • 36:18 - 36:21
    institutions make about themselves
    publicly, either intentionally or
  • 36:21 - 36:26
    unintentionally. And it’s really difficult
    to use because there’s a lot of it and it
  • 36:26 - 36:30
    needs to be collected and matched up and
    pulled together in a browsable way for
  • 36:30 - 36:33
    people to be able to use it. So you can’t
    really just mainly go and use it at scale.
  • 36:33 - 36:40
    You can do it a little bit but not nearly
    enough. And so we’re working on making
  • 36:40 - 36:45
    this easier to use. The other sort of data,
    it’s anonymously leaked documents,
  • 36:45 - 36:47
    documents that were (?) sent
    journalists, that they think should be
  • 36:47 - 36:52
    public and these often pretty explicitly
    reveal corruption, human rights abuses
  • 36:52 - 36:56
    or other issues. But this can also be used
    to collect more data. Like we used the
  • 36:56 - 37:01
    published Snowden documents very heavily
    to find code words that we could use to
  • 37:01 - 37:05
    collect the data in ICWATCH. And once we
    start to collect data on secret things
  • 37:05 - 37:11
    that were recently not known at all, but
    now are, and we can find data on that, we
  • 37:11 - 37:14
    can start to find data on unknown code
    words and unknown things that we might not
  • 37:14 - 37:21
    otherwise recognize. And then there’s data
    released by governments, from FOIA
  • 37:21 - 37:25
    requests through open data initiatives.
    This, of course, can be spun or things can
  • 37:25 - 37:31
    be held back. So it’s not ideal to use on
    its own. But it can be used like the other
  • 37:31 - 37:35
    2 types with in combination with each other.
    You can use that to provide context, you
  • 37:35 - 37:43
    can use open source data to frame FOIA
    requests and things like that. So the goal
  • 37:43 - 37:47
    of Transparency Toolkit is to make it
    easier to collect all these types of data
  • 37:47 - 37:51
    in one place and to start to use this data
    in the same ways that the Intelligence
  • 37:51 - 37:55
    Community uses the data collected from
    all the various collection disciplines.
  • 37:55 - 38:00
    Except their goal isn’t to kill people or be
    some sort of omniscient to God-like being
  • 38:00 - 38:04
    but we just want to build some sort of
    external structure of accountability.
  • 38:04 - 38:10
    To make it easier to uncover and understand
    things like surveillance programs or human
  • 38:10 - 38:15
    rights abuses or corruption. And when we
    can find the people and companies that are
  • 38:15 - 38:18
    involved in things like surveillance we
    can start to map who’s doing what.
  • 38:18 - 38:22
    And we can start to request information
    about specific contracts. And we know who
  • 38:22 - 38:25
    we can ask questions about particular
    programs. And then we can start to use the
  • 38:25 - 38:30
    data to start legal cases against specific
    companies. And we can start to take more
  • 38:30 - 38:35
    concrete actions than we would be able to,
    otherwise, if we were dealing simply in
  • 38:35 - 38:39
    theory or in guesses as to
    what’s going on.
  • 38:39 - 38:42
    So – open source intelligence – let’s just
    be more pro-active and more direct with
  • 38:42 - 38:49
    our techniques. And it also lets us find
    some of this information earlier, because
  • 38:49 - 38:52
    many of the programs mentioned in the
    Snowden documents were mentioned first
  • 38:52 - 38:59
    in other and open data sources. And if we
    can start to figure out where these are
  • 38:59 - 39:02
    and start to figure out what they are,
    then we know what data we’re missing and
  • 39:02 - 39:05
    we can start to go after it with FOIA
    requests or trying to find it by other
  • 39:05 - 39:14
    means. But all of this a really, really
    big project and we can’t… this is not
  • 39:14 - 39:17
    going to work if it’s just us working on
    it. We need to work with other people.
  • 39:17 - 39:21
    We need to work with activists who have
    ideas of how they want to use the data.
  • 39:21 - 39:24
    We need to work with journalists that
    collect the data and write stories about
  • 39:24 - 39:27
    it. We need to work with human rights
    lawyers to help them with their research
  • 39:27 - 39:30
    help them build legal cases based on the
    findings. We need to work with NGOs and
  • 39:30 - 39:35
    human rights researchers who want to
    collect and use open data in their work.
  • 39:35 - 39:38
    And we need more people going through
    databases like ICWATCH. This doesn’t
  • 39:38 - 39:42
    require any special expertise. You gain
    the knowledge that you need as you’re
  • 39:42 - 39:46
    going through them looking up terms. It’s
    not easy but it can be quite interesting
  • 39:46 - 39:52
    once you combine all of these obscure
    terms and it’s like “Oh, that’s what
  • 39:52 - 39:57
    they’re doing!” and oftentimes what
    they’re doing is something entirely absurd
  • 39:57 - 40:01
    like reading all your email
    or killing people.
  • 40:01 - 40:06
    And we also need software developers to
    help develop software and help us figure
  • 40:06 - 40:11
    out how all of these tools should fit
    together. So if anyone’s interested in
  • 40:11 - 40:15
    working with us to take on the
    Intelligence Agencies of the world and
  • 40:15 - 40:18
    figure out what they’re doing please let
    us know. I think it sounds a bit insane
  • 40:18 - 40:23
    and I know that, but (they) have far more
    resources and far more experience but if
  • 40:23 - 40:28
    we keep ignoring the situation and we
    continue as we are now making scattered
  • 40:28 - 40:31
    attempts to change things that aren’t
    coordinated, that are based on limited
  • 40:31 - 40:36
    information, nothing is going to change
    longterm. So I think we need to collect
  • 40:36 - 40:41
    all the information we can and figure out
    how to effectively combine it and use it
  • 40:41 - 40:46
    for concrete goals. And I think we need
    to do this with free software and open
  • 40:46 - 40:49
    data, because against such powerful
    adversaries they’re probably the best
  • 40:49 - 40:51
    hopes we have.
  • 40:51 - 41:02
    applause
  • 41:02 - 41:06
    Herald: Thank you, thank you so much!
    Now we have the round of Q&A,
  • 41:06 - 41:12
    for anyone who liked to ask a question,
    please forward to the mikes on both sides
  • 41:12 - 41:17
    of this Saal (Hall). Start
    taking the question from…
  • 41:17 - 41:18
    is nodding towards first person asking
    …yeah.
  • 41:18 - 41:25
    Q: So I’d like to ask about documents
    which are scans. Which are sometimes
  • 41:25 - 41:30
    released as official open source
    information. What kind of workflow do you
  • 41:30 - 41:36
    have or even if you have any kind of
    workflow for some OCR on these…!?
  • 41:36 - 41:41
    M.C.: A serious (?) that depends on the
    document. There’s some open source
  • 41:41 - 41:47
    software called Tesseract that’s quite
    good, but it doesn’t always work in cases
  • 41:47 - 41:51
    where there needs to be more specialized
    parsing. I like to use something that’s
  • 41:51 - 41:55
    called Abbyy (FineReader) which is,
    unfortunately, not open source and we are
  • 41:55 - 41:59
    looking for an alternative. For the
    published Snowden documents, because we
  • 41:59 - 42:04
    needed to extract the classification
    headers and that wasn’t so working with
  • 42:04 - 42:07
    Tesseract. But Tesseract
    works for most things.
  • 42:07 - 42:10
    listens to unrecorded comment
    from the audience
  • 42:10 - 42:15
    Yeah.
  • 42:15 - 42:20
    Herald: Thank you. Do we have question
    from… [the internet]? Yeah, oui!
  • 42:20 - 42:24
    Signal Angel: Yes, rooty is asking on IRC:
    What would you recommend the NSA to
  • 42:24 - 42:28
    develop towards a future
    of Social Usefulness!??
  • 42:28 - 42:36
    E.g. what value have databases from
    2015, people cell phone sensors in 2115!??
  • 42:36 - 42:41
    Could you give the NSA, maybe
    CEO there, useful work!??
  • 42:41 - 42:43
    M.C.: Can you rephr..-, sorry !??
  • 42:43 - 42:50
    Signal Angel: naively repeats first
    of the apparent Troll questions
  • 42:50 - 42:52
    M.C.: laughs
    Social Usefulness…
  • 42:52 - 42:56
    Probably the most useful thing they could
    do is stop collecting the data in the
  • 42:56 - 43:02
    first place, especially the data that’s
    being intercepted or illegally collected.
  • 43:02 - 43:07
    There’s probably some amounts of useful
    tracking they could do, but I’m not sure
  • 43:07 - 43:10
    that’s the best approach using the tactice
    that they were to collect the data at that
  • 43:10 - 43:13
    time.
  • 43:13 - 43:16
    Herald: Thank you. So, next
    question from you, please!
  • 43:16 - 43:20
    Question: Hello, thanks for the talk, that
    was one of the best ones I’ve seen at this
  • 43:20 - 43:27
    congress. I was wondering what you think
    about the question you’re raising about
  • 43:27 - 43:31
    “we shouldn’t make the same mistakes”.
    Because I’m not totally sure that’s
  • 43:31 - 43:35
    possible because of things I’ve seen in
    other communities. All communities have
  • 43:35 - 43:41
    their extremists and they will abuse this
    data. And then that allows a political
  • 43:41 - 43:47
    attack on you, because they say you made
    that happen, it’s not true. But it will celd
  • 43:47 - 43:50
    people. So how do you protect
    against that?
  • 43:50 - 43:54
    M.C.: I think it’s hard to entirely
    protect against it because we can’t
  • 43:54 - 43:57
    control the actions of other people. But
    people could also go off and use this data
  • 43:57 - 44:02
    negatively by collecting it on their own,
    independently of us. I was actually quite
  • 44:02 - 44:05
    impressed, after we launched ICWATCH, I
    haven’t heard of anyone complaining of
  • 44:05 - 44:07
    threats that they’ve gotten from people…
  • 44:07 - 44:10
    People in the Intelligence Community:
    I haven’t heard of anyone in the
  • 44:10 - 44:12
    Intelligence Community complaining about
    threats that they’ve gotten as the results
  • 44:12 - 44:16
    of ICWATCH being launched. All of the
    complaints have been theoretical. The only
  • 44:16 - 44:19
    threats I’ve heard of resulting from
    ICWATCH are that from the Intelligence
  • 44:19 - 44:22
    Community to us. I haven’t heard of
    anything, so I’ve been very impressed with
  • 44:22 - 44:27
    the civility of the internet in that case.
    And I think that maybe, by framing it, and
  • 44:27 - 44:30
    actually bringing it down to the
    individual level, and making it clear that
  • 44:30 - 44:35
    these are people, that makes it a little
    bit less likely that people will go after
  • 44:35 - 44:38
    them in a vicious way.
  • 44:38 - 44:43
    Q: Have you thought of creating a kind of usage
    guidelines? I mean that's not gonna change what
  • 44:43 - 44:48
    anyone does. But if someone does something
    you can then say “That’s against our usage
  • 44:48 - 44:52
    guidelines” and it’s a political defence
    against someone accusing it…
  • 44:52 - 44:56
    M.C.: Yeah, I don’t think there’s any way
    that we can enforce something like that.
  • 44:56 - 45:00
    But we do try to be very careful with how
    we’re framing it in saying – like I -
  • 45:00 - 45:03
    since a long time, all this talk saying these are
    people that are not evil people. They’re
  • 45:03 - 45:07
    normal people that you should look at as
    such. So I think being very careful of
  • 45:07 - 45:09
    framing it and we’ll be developing some
    sort of guidelines. That’s definitely a
  • 45:09 - 45:11
    good idea.
  • 45:11 - 45:14
    Herald: Thank you. Your question, please!
  • 45:14 - 45:20
    Troll: Hi! First, thank you very much for
    this tool that makes it possible to fight
  • 45:20 - 45:28
    back against, legally. For people who try
    to punish or yeah…
  • 45:28 - 45:34
    What I have to say, or my question is: I
    worked in the last 3 1/2 years, let’s say,
  • 45:34 - 45:40
    in the field of IT Forensics. And I worked
    with Maltego and stuff, and so I know what
  • 45:40 - 45:45
    a lot of work it is to collect data and
    bring it into good conditions, so others
  • 45:45 - 45:57
    could read it or you can get a goal, or
    see a goal. And what I personally think
  • 45:57 - 46:05
    is very important: this could be very
    sensible data to people and my question
  • 46:05 - 46:13
    is: How do you care that this data
    which you will offer to download will keep
  • 46:13 - 46:20
    safe? That’s the first question, and
    the second is: Did you think about
  • 46:20 - 46:28
    verifications? So you are collecting a lot
    of data, and in a few years another person
  • 46:28 - 46:35
    wants to see if this data was correct. So
    do you verify the sources like MD5 sum
  • 46:35 - 46:44
    or so you can say “This fingerprint taken
    at this-day and this-time is correct?”
  • 46:44 - 46:51
    M.C.: For the first question: I don’t
    think there’s really… I’m not sure (?)
  • 46:51 - 46:56
    protected because this is a version that
    people posted publicly themselves. So they
  • 46:56 - 47:01
    sort of said that they don’t want it to be
    protected or secured because they’re
  • 47:01 - 47:07
    posting it on the public internet. So I’m
    not sure there’s really any reason to try
  • 47:07 - 47:12
    to protect it when it’s something that
    they’ve published very publicly.
  • 47:12 - 47:16
    And on the second one, for verification,
    that’s quite tricky with some of the data
  • 47:16 - 47:19
    especially around the Intelligence
    Community because all of these things
  • 47:19 - 47:22
    are secretive and it’s hard to confirm
    them. We can confirm them against each
  • 47:22 - 47:27
    other like now we have multiple résumé
    sites on ICWATCH, so sometimes we can find
  • 47:27 - 47:31
    the same person’s résumé on another site
    and compare over time and we can go
  • 47:31 - 47:34
    finding their profiles they have and try
    to combine as much data on the same
  • 47:34 - 47:36
    as is possible and have it over time.
  • 47:36 - 47:42
    Q: What I did: I made a fingerprint
    when I downloaded a website, I made a
  • 47:42 - 47:46
    fingerprint and then I can say OK, this
    is… yeah.
  • 47:46 - 47:49
    M.C.: Of truth verifying various actions
    collected, then. Yeah, I mean that's a bit harder to
  • 47:49 - 47:55
    absolutely do that on the behalf all of
    the full text of the web page save, then
  • 47:55 - 48:01
    we have it all published on Github so you
    can verify those collected then but, yeah.
  • 48:01 - 48:04
    Herald: We’ll take the questions
    from up there.
  • 48:04 - 48:10
    Jake Appelbaum: Hi, community extremist
    here… So I wanted to say something which
  • 48:10 - 48:13
    is that I think what Julian did for
    leaking documents you’re doing for
  • 48:13 - 48:18
    analysis. Which is really great! Because
    transparency is enough – you need action!
  • 48:18 - 48:21
    And so I just wanted to say that I hope
    that everyone can give and see in
  • 48:21 - 48:28
    Transparency Toolkit a lot of material
    support. And maybe a round of applause!
  • 48:28 - 48:34
    applause
  • 48:34 - 48:38
    Definitely the best talk at the congress
    and I had a couple of suggestions. But
  • 48:38 - 48:42
    one of them is: I think it would be great
    if you could focus on American Domestic
  • 48:42 - 48:43
    Police Agencies.
    M.C.: Hmm-mhm…
  • 48:43 - 48:48
    Jake: In particular collecting the images
    of Police Academy Graduation photographs.
  • 48:48 - 48:53
    And to be able to move in the direction of
    facial recognition, so that we can find
  • 48:53 - 48:56
    Undercover Police Officers
    that are in our midst…
  • 48:56 - 49:02
    applause
  • 49:02 - 49:07
    And I think it would be great if you could
    create a FOIA wizard, essentially, ’cause
  • 49:07 - 49:11
    everybody likes wizards, and who doesn’t
    like UNIX… So it’d be great if you could
  • 49:11 - 49:14
    create a FOIA wizard where you could say:
    “I wanna know about these terms” and it
  • 49:14 - 49:19
    would just generate automatically – maybe
    by partnering with Macroc e.g. –
  • 49:19 - 49:23
    interesting things, where there’s a kind
    of “Wait!”. Where you realize there’s a lot
  • 49:23 - 49:27
    of people working on this classified
    program and it’s at this agency and they
  • 49:27 - 49:29
    have a contract with this company and
    these are the people involved and just
  • 49:29 - 49:34
    automatically generate those FOIAs and
    then get people to sort of sign up to put
  • 49:34 - 49:38
    their name down and sort of sponsor a
    little transparency and to say “Oh, that’s
  • 49:38 - 49:42
    the FOIA I wanna get behind, I’m in a
    check on it, you know, once a week, I’m
  • 49:42 - 49:45
    gonna do this thing. Through Macroc.”
    I think that would be a way to take this
  • 49:45 - 49:49
    information in a legal manner and to make
    it actionable. And I think there’s lots of
  • 49:49 - 49:54
    other interesting things you could do that
    are not about the law. But I leave that to
  • 49:54 - 49:57
    the imagination of other people. It should
    be legal but it doesn’t need to be through
  • 49:57 - 50:02
    legal channels like, say, FOIA. So thanks
    for the work that you’re doing, M.C. and
  • 50:02 - 50:06
    I hope that you will expand it to,
    basically, all of the pigs of the whole
  • 50:06 - 50:10
    world. And I would really encourage you
    to read Hannah Ahrend’s “Eichmann in
  • 50:10 - 50:16
    Jerusalem”, because you described a
    fundamental thing: these people aren’t
  • 50:16 - 50:21
    evil. But actually, Evil itself doesn’t
    exist. These people are the Banality of
  • 50:21 - 50:26
    Evil. They’re people who have soccer
    practice, and they have a dog, and they
  • 50:26 - 50:30
    like to go home and fuck their wife, and
    they’re regular people who do drone
  • 50:30 - 50:32
    strikes.
  • 50:32 - 50:36
    applause
  • 50:36 - 50:40
    Herald: Thank you. We
    have a question on mike 1.
  • 50:40 - 50:47
    Q: How easy is it to add support for new
    databases or new sources of information?
  • 50:47 - 50:51
    M.C.: It depends on the source and how
    that site is structured. But generally
  • 50:51 - 50:55
    it’s not too difficult. The adding to
    proper new sources does require
  • 50:55 - 51:00
    programming at this point. But it’s not
    particularly complex programming and we
  • 51:00 - 51:03
    have some libraries that make some
    parts of it easier, as well. And if you’re
  • 51:03 - 51:06
    interested in adding a data source we’re
    more than happy to help with that.
  • 51:06 - 51:11
    Q: Awesome! My favourite is the list of…
    the report of when people were denied
  • 51:11 - 51:16
    security clearance and why and if their
    appeal was then, like, removed.
  • 51:16 - 51:18
    M.C.: Yeah, that would
    be quite interesting!
  • 51:18 - 51:24
    Q: Okay!
  • 51:24 - 51:29
    Herald: If there’s no further
    questions… moment…
  • 51:29 - 51:34
    yeah, okay! Please!
  • 51:34 - 51:44
    Q: Yesterday it was said that we have to
    make sure that they know that we watch
  • 51:44 - 51:51
    them and make sure that they know that we
    watch them. Because some day they will get
  • 51:51 - 51:58
    prosecuted. So, in some way. I think
    you are exactly doing this. So this is
  • 51:58 - 52:12
    brilliant. Are you already in the stage
    where you’re thinking you can start
  • 52:12 - 52:18
    concrete legal actions against some
    individuals that you are getting
  • 52:18 - 52:25
    information with your tools. We’ve been
    working with some lawyers towards that.
  • 52:25 - 52:29
    We are looking to do more in this, so if
    you know… if you have any ideas for
  • 52:29 - 52:32
    particular situations where this may be
    applicable, our lawyers, that we should
  • 52:32 - 52:37
    work with, let us know! But we’re working
    towards that and making some progress.
  • 52:37 - 52:42
    Q: Thanks!
  • 52:42 - 52:45
    Herald: Getting a question
    from up there, please!
  • 52:45 - 52:50
    Q: I just wanna say that you are a
    visionary who is more passionate than
  • 52:50 - 52:53
    anybody I have ever collaborated with
    and it’s a total honor.
  • 52:53 - 52:54
    applause
  • 52:54 - 52:57
    Herald: Thank you.
  • 52:57 - 53:03
    M.C.: Yeah, and just to everyone, that’s
    Brennan who also works on Transparency
  • 53:03 - 53:07
    Toolkit. He made the awesome UI for
    Harvester and Lookingglass that you saw
  • 53:07 - 53:09
    in the Tabs of all this.
  • 53:09 - 53:15
    applause
  • 53:15 - 53:18
    Jake: If no one else is gonna ask a
    question, I’d like to ask a question which
  • 53:18 - 53:21
    I know the answer to but no one else
    in the room does. And I think it’s very
  • 53:21 - 53:25
    fascinating. I wonder if you could talk
    about lessons that you’ve learned from
  • 53:25 - 53:28
    studying about the South African
    Resistance to Apartheid.
  • 53:28 - 53:30
    M.C. is laughing
    Jake: And maybe you could talk about the
  • 53:30 - 53:35
    things that drive you to work on these
    things. E.g. what inspires you to justice?
  • 53:35 - 53:39
    E.g. experiences at MIT and maybe – I mean
    if you don’t want to talk about it, I’m
  • 53:39 - 53:43
    sorry for asking it. But if you do wanna
    talk about it I think you can inspire
  • 53:43 - 53:49
    everyone else here to raise their fist
    with you! In solidarity.
  • 53:49 - 53:57
    M.C.: Yeah… Okay… I guess it’s been
    nearly 3 years now, so maybe that’s okay
  • 53:57 - 54:06
    to talk about. 3 years ago there was this
    case at MIT… everyone has probably heard
  • 54:06 - 54:14
    of Aaron Swartz and he was being
    prosecuted for downloading documents from
  • 54:14 - 54:22
    JSTOR. And I was brought in trying to figure out
    MIT’s role in this situation, and if you
  • 54:22 - 54:26
    might be able to sway a public opinion,
    a few people in Boston. I think some of
  • 54:26 - 54:31
    them are in this room. And we were trying
    to help him. And eventually, part way into
  • 54:31 - 54:36
    the process, he became afraid and decided
    that it would be more risky for us to help
  • 54:36 - 54:39
    him, with the prosecutor who might lash
    back, so we stopped. But one of the things
  • 54:39 - 54:46
    that I did in this process was, I sent out
    a survey to all of the professors at MIT
  • 54:46 - 54:54
    asking their opinion on his case. And
    whether they identified with his actions.
  • 54:54 - 54:59
    And I got a lot of response to this
    survey. Some were quite nice and were
  • 54:59 - 55:04
    quite supportive. Some were very vicious,
    saying that he should go to jail and that
  • 55:04 - 55:09
    he is a waste of humanity and he works at
    this Harvard Center for Ethics, so how is
  • 55:09 - 55:13
    this ethical. And things like that. They
    were quite horrible. And initially he had
  • 55:13 - 55:18
    access to this database and somehow over
    the next year, when we weren’t doing much,
  • 55:18 - 55:22
    he lost access to this database. And he
    emailed me asking for access again. And
  • 55:22 - 55:27
    back then I was on some stupid kick about
    research ethics and redaction and thought
  • 55:27 - 55:31
    that there’s no reason to… It really seems
    that’s like “I cannot give you the answers
  • 55:31 - 55:35
    about the names”. I was just stupid because
    the names are the most useful part of that
  • 55:35 - 55:42
    data. And I kind of abandoned him, along
    with a lot of other people in that. And I
  • 55:42 - 55:50
    feel like if I had given him the names
    that might have been something that could
  • 55:50 - 55:53
    be used to find supporters within MIT or
    people who were rallying against him. And
  • 55:53 - 55:56
    I don’t think it would have made a huge
    difference but it might have made just a
  • 55:56 - 56:02
    little bit. And that was one of the things
    that really showed me the power of data on
  • 56:02 - 56:06
    individuals and the role of individuals
    within institutions. And I feel like I
  • 56:06 - 56:11
    really failed there. So
    I don’t want to do that again.
  • 56:11 - 56:16
    applause
  • 56:16 - 56:21
    Herald: Thank you. Unfortunately, we need
    to wrap up because we are out of time.
  • 56:21 - 56:27
    Thank you for attending this very
    interesting lecture and, quite touching
  • 56:27 - 56:28
    in the end.
  • 56:28 - 56:34
    postroll music
  • 56:34 - 56:38
    Subtitles created by c3subtitles.de
    in 2016. Join and help us do more!
Title:
M. C. McGrath: Collect It All: Open Source Intelligence (OSINT) for Everyone
Description:

Governments post reports and data about their operations. Journalists publish documents from whistleblowers. But there is a third type of open data that is often overlooked- the information people and companies post about themselves. People need jobs. Companies need to hire people. Secret prisons do not build themselves.

By making it feasible for anyone to collect public data online in bulk and exploring ways to effectively use this data for concrete objectives, we can build an independent, distributed system of accountability.

M. C. McGrath

more » « less
Video Language:
English
Duration:
56:40

English subtitles

Revisions