32C3 preroll music
M.C.: Hey! So, can you hear me OK? Yeah.
I am M.C. and I work on Transparency
Toolkit along with Brennan Novak
and Kevin Gallagher. Basically, what
we try to do is “Watch the Watchers”.
Back in May we released a database of
over 27.000 people in the Intelligence
Community called ICWATCH. And this is
people who are talking about their work on
classified programs on the public
internet. So we collected it using
search terms like the code words
mentioned in the Snowden documents.
And today we’re releasing
an update to ICWATCH
doubling the data in the database.
applause
And that’s already vive, if
anyone wants to look at it.
For the people who aren’t familiar with
this project and the sorts of things
available on the research methods I’d like
to go through an interesting example of
research things that can
be found in this database.
So this is Lauren Russell, and she works
at L-3, a major intelligence contractor.
But she started her career as an army
interrogator in Iraq. She says that
the information that she collected was
used to capture dozens of people.
But part of her job was also to assure
safe and humane treatment of hundreds
of detainees. So that’s good at least. But
then, a few years after that, she went and
worked for a different company called
Exelis in Afghanistan. And this job
was quite different. It involved finding
people to kill. So she says as part
of this work that she “utilized F3EA
methodology to conduct analysis on raw and
fused HUMINT, SIGINT, and COMINT helping
to create 125 Targeting Support Packets
then nominated to the Joint Priority
Effects List (JPEL) for kinetic targeting.”
So there’s a lot of not very obvious terms
and gibberish there. And this is a pretty
common problem by going through these
résumés. So I want to break down how you
would interpret that sentence. “Signals
Intelligence” is what the NSA does.
It’s collecting data from intercepted
communications. COMINT – Communications
Intelligence – is specifically Signals
Intelligence from communication data.
So what the NSA does
when they read your email.
HUMINT, Human Intelligence is
Intelligence on human sources.
So things like data gain
from informers or from torture.
The “direct priority of XLES” is a list of
people the US military and its allies are
trying to kill and capture in Afghanistan.
F3EA stands for “Find, Fix, Finish,
Exploit and Analyze”. It’s a rapid
intelligence collection and analysis
methodology used for targeting. And
we recently found out in the Drone
Papers that this is often used for
drone targeting. And “Kinetic Targeting”
simply means attacking a moving target.
So looking at her profile again: she says
that she “F3EA methodology
to conduct analysis on raw and fused
HUMINT, SIGINT and COMINT helping to
create 125 Targeting Support Packets
then nominated to the direct priority
of XLES for conduct targeting.” Basically
what she means is that based on
intercepted communications and information
from human sources, possibly gained under
the rest from torture she is deciding
who should be killed and captured.
The Intelligence Community has long
had an attitude of “Collect It All”.
And General [Keith B.] Alexander
started trying to collect all the data
that they could from every source.
One of the first projects to this end
was something called Real Time Regional
Gateway (RT-RG). It’s a master project to
store, combine, search and analyze data
from many different sources at once.
Everything from intercepted communications
to data from drones to data from
interrogations to even mundane things like
traffic patterns and the prize of potatoes.
They started this program in 2005.
The initial version was built by SAIC
for use in Iraq. And these days it’s
mostly used in Afghanistan.
It searches the US soil because according
to documents published in “Der SPIEGEL”
last year Germany is the 3rd largest
contributor to RT-RG. This source
of collection analysis tools are used
for some programs that you might have
heard of too, like CoTraveller – the
program the NSA has to figure who is
going places with who else. And there is
a specific analytic tool. This part of
RT-RG called SIDEKICK that uses relative
velocities to calculate this from any
different data sources, so that they can
calculate that for people across networks.
Unfortunately, this is really
computationally intensive because they
need to pre-compute all of the travel
behaviour for all the pairs of selectors.
But it’s feasible for them to do
computationally intensive things the time
that it’s built because it’s built on
Hadoop and accumulo for distributed data
processing and storage. So they’re quite
serious about this. The goals for RT-RG
are quite lofty. One of the creators, in
an interview with “Defence News” described
their aim is being able to use intercepted
communications and integrate it with
signals with geolocation. So that they can
instantly find people and target them.
Another counter-terrorism official told
the Wall Street Journal that RT-RG
literally allows them to predict the
future. Decorrelation means it’s the
strongest correlation tool ever. So their
goals of this seem to be two-fold: First
of all to be able to kill or smite any
potential enemies. And 2nd one to be
omniscient. To know everything that’s
happening at once. And to correlate it and
use that to predict what will happen in the
future. And these goals sound a little bit beyond
what you would expect from someone
who is trying to simply protect people or
stop terrorism. It sounds more like
they’re trying to become some sort
of God. Who by collecting and analyzing
everything know everything that’s
happening everywhere and can just smite
any enemies from above. Instantly.
But the thing is they are'nt a God. There are
people working on these and they're
normal people. And they’ve crazy
resources and they intercept
a lot of data. But they also use data
that’s freely available to anyone for
a lot of their work. Open Source
Intelligence. This is a pamphlet from
a startup called ZeroFox that uses data
from Social Media to track ISIS.
And tools like this are quite common.
There’s another tool called “LM Wisdom”
that’s made by Lockheed Martin. And
they have a wonderful promotion video
on their website explaining exactly how it
works – that I’d like to play.
with lowered voice:
Hopefully this’ll work…
audio/video starts Female Narrator:
Social Media content has the power
to incite organized movements
and sway political outcomes.
Person in Video: “It’s an opposition
terrorist organization in Iran.”
Female Narrator: Monitoring and analyzing
the massive and rapidly changing
open source intelligence data, or OSINT,
and turning it into actionable intelligence
for decision-makers is an imperative.
Lockheed Martin’s Wisdom software suite
offers an advanced capability to collect,
manage and analyze vast amounts
of open source data. Enabling analysts
to understand, measure and anticipate
real-world advance through Social Media.
Person in Video: “Think of Wisdom as your
eyes and ears on the web. Wisdom is
that tool that would allow it to do this
at scale!”
Female Narrator: Wisdom’s advanced
Big Data collection capability and data
store automatically identify and harvest
online Social Networking data of
operational interest. As well as
socio-cultural data from standard online
open sources like newspaper feeds and
structured databases. Wisdom’s high-
performance analytic algorithms analyze
the content in near realtime distinguishing
noise from high-value information.
Capturing trends, sentiment and influence;
turning open source data into predictive,
actionable intelligence.
audio/video stops
M.C.: Yeah, so…
applause
…that’s what they’re doing. And they’re
not just using this to target terrorists.
It was recently revealed that they are
helping Walmart use this to find employees
that are organizing for better working
conditions and find the main organizers
and fire them. Using
data from Social Media.
So it’s used for Corporate purposes as
well. And LM Wisdom wasn’t even made
for surveillance in the first place.
I tracked down one of the people
who created it. And at that time he worked
for General Electric and was hoping to
make a… to help NBC make tools so
that they can figure out which sites
to partner with to make their videos go
viral. So it’s not just governments that
are using Open Source Intelligence because
there’s no barriers to access it and
there’s many applications. There’s even
many people search databases that
have information like people’s address,
and phone number, and relatives,
and how old they are. And these include
many, many people. Probably everyone
in the US. And they’re used by many people
for all sorts of purposes from private
detectives to people that are selling
advertisements. If this data is available
already and it’s used for everything from
figuring out who to kill to stopping unions
from organizing to trying to sell things
to people – why can’t we use it to
understand surveillance programs, too?
Why can’t we use it to understand human
rights abuses. Why not use it for
accountability? So we started to build
tools to do this and in the near future
we’d like to make it possible for anyone
to make something like ICWATCH or other
databases in less than a day and without
programming. Long-term goal is to build
software similar to what the Intelligence
Community has. Things similar to LM-Wisdom,
things similar to Real Time Regional Gateway.
So that people can collect all this
information in one place and analyze it.
I’d like to show a demo of some of the
tools that we’ve been working on. It’s
possible to just – this won’t work at all
but we’ll see. So this is Harvester. It’s
a tool for collecting data from online
sources in an automated fashion. You can
choose different data sources, say
“Indeed” – this is a résumé website – and
say you want to find anyone who mentioned
XKeyscore and for sake of timing let’s
just get people in Maryland. And “start
collecting”, and it might take a second
because it’s still a bit rough. But it
opens a browser, goes finds other people
who mention XKeyscore in Maryland and it
goes and downloads all of their résumés
in one place… you can kind of see them
as they download because this is being
slowed a bit down right now. That just
works key services and fairly small.
Something shouted from out of the audience
M.C.: laughs
applause
Takes a second to load,
still kind of rough…
Yeah, so we’re hoping to add many different
data sources, so that people can collect
data from sources online as well as just
take a pile of pdf’s on their computer,
point at the directory and it will load
them and OCR them and people will be able
to search through them
in a searchable database.
So while this is loading why don’t I go
and walk through some of the rest of the
pipeline. So our goal is to have tools
for collecting data, loading it into
a database; and then tools for matching
data across various sources on the same
person or the same company. So it should
take someone’s résumés and Social Media
profiles and everything and link it
together and then also link that to the
companies they work(ed) for, the other
people they know, the locations they’ve
lived. As well as tools for extracting
things from data. So to be able to go
through a résumé, extract all the code
words mentioned, to be able to go through
a document and extract all the
companies mentioned and generating
entities that way. And tools for searching
through data in databases where you can
search for search queries and browse by
categories. And for viewing data and
network graphs and maps. Let’s see if this
is done… Right now it just shows the
raw JSON. The connection between tools
is a bit rough. But we should be able to
index the data and load it into a search
tool. Will take a second. Hopefully this
works. Ouh, it’s going! Yah… So it takes
a little bit. Index… And you can see…
The data will be at… It kind of circle
loaded into a subscriptions list…
So there’s a searchable database on all the
people who are working on XKeyscore
in Maryland!
applause, cheers from audience
So I think that in using this Free
Software and open data really the key is
because we have far, far fewer resources
than the Intelligence Community. And we
don’t even have the resources that a
company like Lockheed Martin has. We can’t
internally build all of this software. I
hope that we will anticipate every future
use to be able to help people adapt to
that. Having people be able to take our
data, take our tools and adapt it to their
own situations is absolutely key to
actually ensuring that they’re useful. And
there are also a lot of open source tools
that the Intelligence Community has,
really. It’s like accumulo, the thing
that’s used in Real Time Regional Gateway.
It was released by the NSA and made open
source. And Gaffer which is a graph
database recently released by GCHQ.
So we can sort of take those and possibly
also build on those in some cases.
As well are using the same tools
chuckles
And it’s appropriate because our goal is
to enable people to collect and use
information in the same way that the
Intelligence Community can.
But, well, I think that we should aim
to collect it all and collect all the
information that we can. I think we also
need to be careful to avoid a lot of the
mistakes that the Intelligence Community
has made. Because some of the effects are
quite bad and lead to people being killed
for no reason at all. And – it’s quite
absurd. And the main one of these,
I think, is de-humanizing people.
Torture techniques are specifically
designed to de-humanize people.
When people are looking at data that
they’ve intercepted, they’re not looking
at a person, they’re looking at meta-data,
they’re looking at numbers on a screen.
It’s not something that’s easy to find a
way around. When I was working on ICWATCH
I was grabbling with this problem quite a
bit. So I decided to try to see who some
of these people are and try to put faces
to these issues. So I started going to
Intelligence conferences. Many of these
conferences are quite open and you can
just go in. And I wasn’t that out of place
either, I just told people that I made
tools to collect and analyze
Open Source Intelligence.
laughter and applause
There're many people doing.
There’re many people doing simmilar
things out there, too. Like I met the
Zerofox people who were one of the examples
I showed earlier at one of these conferences.
They are actually very, very nice. And
there were also some people who were quite
interested in what I was doing. There was
one recruiter from Northrop-Grumman who
seemed somewhat interested in hiring me
and I looked her up later and found
a bunch of job listings where she was
trying to hire people who… to work on
programs related to XKeyscore. It wasn't
all good, I got kicked out of one conference.
I got some strange requests like there was
one guy who was trying to figure how to
use open data to help venture capitalists
figure out what porn the founders of the
startups they funded watched. I’m not sure
that’s even possible. But it was really
weird and he was asking me for help and
I was like “I don’t think I can
help with that, sorry!”
laughter and applause
Of course there were some negative comments
on things like Manning and Snowden
and some confusion like there was someone
who is making insider threat detection
software, who was talking about how it
would stop a situation like when Snowden
leaked documents to Wikileaks and
things like that. So people don’t actually
know what’s going on. But generally most
of them were decent people and some of
them were quite nice, some of them were
quite funny. And some of them really
seemed to think that what they were doing
is saving lives. So they’re not evil people
who want to hurt others but they’re not
infallible either. They’re human beings.
And our strategy – looking at individuals
– scares a lot of people. But what you
have to realize is that institutions are
made up by people. It’s easier to just
look at the institution. It’s easier to
just look at an abstract program. Just
like it’s easier not to think of the
person who you just decided to kill in a
drone strike as a person. That’s why these
things continue to happen. I think that
there’s a lot of benefit to looking at
people as people, both to avoid some of
the problems the Intelligence Community
has as well as because people’s data trails
are part of the data trails of the
institutions. And if we’re only looking at
institutions we’re missing part of the
data trail the people leave.
Though, of course, no one person is
responsible for the wrong-doings of the
Intelligence Community. So we shouldn’t
demonize any one person. But…
these are the people who go to work every
day and perpetuate the actions of the
Intelligence Community. So I think everyone
involved is a little bit at fault.
And the other benefit of looking at people
as people is that we can start to
understand them. Because you have to
understand what their hopes are, what
their fears are. How they see the world.
What upsets them. And what might cause
them to change their behaviour. And from
that we can start to maybe come up with
alternatives. So let’s look at some of
these people and look at some of their
stories. This is Jason Epperson. He works
on Intelligence collection for Special
Operations. In his spare time he enjoys
coaching children sports. He currently
works at the US Special Ops Command
(USSOCOM) helping different agencies
collect data, share it, say and figure out
what data they need, just generally
helping them integrate it. But when he
started his career back in 1998 also
working on collecting data for Special
Operations. Then later, in 2004, he went
to work at the US Central Command in the
NSA cryptologic services group and he was
focused on tracking down high-value
targets and individuals. And he claimed
that as a result of his work, numerous
high-value individuals were captured
or killed. It is especially interesting
because he was working on this in 2007
when PRISM was launched and at the top
of his résumé he lists in his specialties
PRISM as “possible”, so that’s kind of a
dinagra but based on his background it
might not be. So I think it probably is
actually PRISM.
Then after he was working there he went
and started working counter-radicalization
efforts – things like boosting the
capacity of Muslim Faith Leaders to win
hearts and minds and establishing
competing social networks to counter
Al Qaeda ideology and he’s very clear in
his job description that he’s not killing
people, he’s just helping allies of the US
figure out who is who, set Interpol notices for.
But the most interesting thing about him
isn’t any of his jobs. It’s this
publication that he has at the bottom of
his résumé called “An Examination of the
Effect of Government Data Mining on US
Citizens”. And this clearly an area where
he has a lot of expertise. And he
presented this at a conference back in
2010. I still don’t have a copy yet. It’s
not easily available. I think it might be
possible to get either by buying it from
the company directly or by going to the
Library of Congress that seems to have
some copies of the conference proceedings.
That could be quite interesting. Both
because he was relatively high up, he was
in command of nearly 400 people back when
PRISM started and he was working with the
NSA. It’s possible that he had some role
early on in the program and this might
provide some clues. And then also the
little “data mining on US Citizens” a bit
in the title is kind of interesting
because that’s supposed to be the last
protection – I think that’s kind of a super
protection because most US citizens
wouldn’t find it very comforting if the
Chinese Government said: “Oh yeah, we have
a mass surveillance program but we only
spy on people who aren’t Chinese citizens.”
That’s not really comforting to them, so I
don’t see why it would be. But it’s been
the one thing that people were impeding.
“We don’t collect it on US citizens”. And
just seeing that on the title of a paper
is like a tiny admission that maybe they
do. So some of these (?) files tell other
interesting stories about people’s lives.
If you’ve seen any of my other talks, this
is someone you’ve heard me talk about
a lot. Solomon Varnado. He spent most of
his life in the military intelligence
community, focused on Signals Intelligence
and Geolocation. He took down his résumé
after ICWATCH launched. But I actually
recently found another résumé of his on
another website that has additional
information like on the side in the
military he ran diversity programs and a
sexual assault prevention program and
things like that. I first came across this
profile because he mentions a lot of
interesting code words. This is probably
the first known mention of XKeyscore back
in 2004/2005. But these aren’t the most
interesting part of his résumé. Later on
he… after he works on Intelligence
Collection Management – just Standard
Signals Intelligence Collection – he goes
and he works for L-3 Stratis. And there he
says that he identified, collected, and
performed direction finding
of specified target signals using
PENNANTRACE, DISPLAYVIEW and CEGS.
But I wasn't sure what “PENNANTRACE” was
so I found it a definition
very conveniently located in
another résumé. That said it was an
airborne collection platform for PENNANTRACE.
That sounds like some sort of
Signals Intelligence collection platform.
And the other interesting thing about this
job is that he said that he called for
external review of intelligence management
processes which is not something I see
normally. And he was there for a fairly
short time, only a couple of months.
After staying at most of his other jobs
for over a year. And then at his next job
he was also there for
only a couple of months.
He was working at Pluribus International,
also on Drone Intelligence,
this time definitely Drone Intelligence,
on Predator drones because he
mentions Airhandler which we now know
more about thanks to the catalogue
released by The Intercept. It’s a
geo-processing system for geolocation
data from Predator drones.
And the update to ICWATCH
includes all the data on all of the words
mentioned in that catalogue. And then
he leaves the Intelligence Community
entirely after that job. And he goes and
works as a used car salesman at this used
car dealership. And it turns out he is
actually – found him on this other résumé
that I just found – He’s actually quite
a successful used cars salesman.
He’s won a bunch of awards.
He’s one of the best
salesmen in the region. So he’s doing quite
well. And he won a bunch of awards
and he's in the military too,
so it seems like
he’s very committed to what he
does. But still that’s quite a huge career
change and it sounds like maybe he was
starting to get upset with some of how
things are really being done and he
couldn’t figure out a way to fix it after
calling for external review
so he just left.
applause
And then, this is Michael Dial. Michael
Dial is a pipe fitter and a plumber. And
this is him with his family. He’s actually
a pipe fitter and a plumber. But he’s not
just any pipe fitter. He has security
clearance. And he goes and he fits pipes
in secure facilities. As you might expect
he does a lot of pipe fitting for naval
ships. He also does things like he goes to
embassies and other secret locations in
Afghanistan and Iraq, Ecuador, Serbia
and sets up their pipes. He also did some
pipe fitting in Djibouti at some sort of
Homeland Security facility which
coincidently is also where many of the
drone programs are run out of. So there’s
some interesting cases like that’s where
there are people like Michael Dial who
aren’t involved in Intelligence at all,
directly. But the information in the
résumés still provides very interesting
useful details about where secret
facilities are located and other aspects
of the Intelligence Community. Because
secret facilities don’t just materialize
out of thin air. They need people to build
them, they need people to operate them.
So from tracking down these people we can
start to map them. And then there’re other
useful things like we can figure out which
companies clean the NSA. I’m sure that
has all sorts of useful applications.
This is Eleana Costa. He lives in D.C. and
he works for the DOD. And this is him at his
High School Graduation back in 1988. He
has been working in Military and
Intelligence for nearly 20 years. And back
in 2003, he worked on Psi Ops programs.
Specifically he worked on Psi Ops programs
in Paraguay, Columbia and Bolivia. And
these were in support of DEED, the drug
enforcement agency and the CIA.
And there are a few other reasons ICWATCH
you mention involvement in Psi Ops in
Latin America for the DEA. It seems me
quite an extensive thing especially since
I didn’t collect any data on this
specifically, and I had just suddenly a bunch
of people on the database on this, so:
maybe worth looking into a bit. And then
after that he went and he worked on Psi
Ops programs in Iraq. So it’s kind of
interesting. Then he went and worked
at the DOD on Human Intelligence.
The other interesting thing about Kiliana
Costa is that he’s one of the people who
deleted his résumé after ICWATCH
launched and that was how I found him.
laughter and applause
So after ICWATCH launched a lot of people
were positively interested in it, but we
also got a lot of threats because… it’s
really absurd, because all we’re doing is
collecting information that people
explicitly, independently, willingly
posted online about the profession;
as we’re not posting addresses or
anything like that. And making it more
searchable. Just like google does.
But a lot of people in the Intelligence
Community contacted us and for the first
few weeks, we saw a new response
every day. Some of these were kind of
interesting and reveals some sort of non-
sensical mind sets of people in the
Intelligence Community. Like this guy.
This is Alexander Irinovitch. He sent me
a…, actually a nice email, a very nice
email. It was really nice. Saying that he
couldn’t understand why he was in ICWATCH
because he wasn’t involved in surveillance.
He was working at a private company that
had nothing to do with surveillance.
So I looked at his profile and I saw that
he was working at unit 8200, the Israeli
Intelligence unit which, okay, there are
mandatory military services not that
weird, though he was there for several
years, not just the mandatory portion,
and this is the Intelligence unit that
spies on Palestinians. And then I looked
at where he works now. And he works for a
company called Verint. According to their
website they make software for analyzing
data from wiretaps. So I think that has to
do with surveillance. I’m not sure why he
interpreted that as “nothing to do with
surveillance”. But it’s kind of interesting
interpretation, I think it makes sense for him
to be in the database, but of course,
for any particular profile, there is
some noise. So it’s up to whoever
is looking at it to make the call
and do the research.
And sometimes other people who complained
also helped us find interesting details.
Like this guy, Joshua Lively. He’s one of
the people who reported us to the FBI for
domestic terrorism. He worked as a
linguist at this company. I looked at
his profile and he mentions a lot
of interesting code words in it.
Some of them didn’t make so much sense
for the time. This thing called ZB.
And then a few weeks later the Intercept
released this article on a thing called
Skynet. It’s used to use machine learning
to analyze travel data, the telecom
providers. And ZB is one of the databases
they use and he, coincidently, has a lot
of the databases that are used in this
listed in his skills. And as a linguist
professioned with the language that’s used
in the region that’s mainly targeted
in this… So I’m not sure if he’s involved
in this particular program. But it seems
like he’s involved in something similar.
So it’s quite interesting. Generally there
are a lot of angry people in the
Intelligence Community. Some are nicer
than others and were just asking questions
being like “Can you please take my profile
down!”, some other more afraid, some other
were more violent and sending things like
death threats. Our server started getting
hit pretty hard and ICWATCH kept going
down. We wanted to be sure that we weren’t
going to be compelled to take the data
down some way. And the easiest way not
to be compelled to take the data down is
to make it so you can’t really take the
data down yourself. And the people had
much less incentive to go after you.
So we moved ICWATCH to Wikileaks which has
been great, and they’ve been wonderful
helping with all this. So thank you,
Wikileaks!
applause
from the audience: Your welcome!
M.C.: chuckles
laughter
As I mentioned earlier a lot of people are
taking down their résumés in response to
ICWATCH. Specifically 1.030 people have,
out of the original 27.000. And others have
edited them and made them private. So as
part of the update in addition to doubling
the number of résumés available we also
recollected all of the initial résumés
and you can go on the site and see which
ones are removed, which ones are made
private, which ones have been modified and
all of that is fug so you can easily see
how that’s changed.
applause
And some of these revealed details that
people hadn’t posted… that many wish that
they hadn’t posted in the first place. But
they also provide useful updates on where
people are working. Because they’re to
track people as they move from job to job.
E.g. there’s this guy, Michael Acosta,
from the original ICWATCH. From 2011
to 2012 he worked at Guantanamo. He
was primarily trying to find out about
potential attacks on Guantanamo itself.
He monitored various detainees and
collaborated with the Behavioural Science
Team and was trying to figure out if
detainees were planning some sort of coup,
I guess. And then he started working for
the Airforce. And here he was working on
Drone Intelligence and targeting and such
things like how he was responsible for
“the production made instant upgrade of
DGS2 mission critical Intelligence
databases which include high value target
development folders” like the things used
for JPAL targeting, regional fairbriefs,
mission storyboards and mission target
logs with document FMV mission rollups.
But the most interesting thing on this
résumé isn’t any of those things.
It’s the thing that changed between the
original launch of ICWATCH and now.
And that’s that he moved and started
working for a different company.
He started working for this company
called… he called SOS International
as All Source Analyst. He unfortunately
had to leave the position that he had
on the site coaching High School Baseball
which he seemed to really like.
And he kind of liked it because right now
he’s looking for Baseball opportunities
in Germany. So he seems to be in Germany
working for this company called SOS
International that I never heard of
before. So I went on the website and they
have a list of the cities that they
operate in Germany. These 6 cities,
along with Guantanamo and a number of
other sketchy locations. And based on
Michael Acosta’s past record of working at
Guantanamo and on Drone targeting and
things like that it sounds like this
company is probably doing something quite
sketchy. By tracking changes to where
people work we can start to find things
like this we might not otherwise think to
look at. That we might not otherwise about
as interesting.
But it’s not just open data that we
collect. Because the same tools for
collecting and analyzing open data
are also useful for other data sets,
they’re useful. Like we made a search tool
in collaboration with Church Foundation
for all of the published Snowden documents
that allows you to search the full text of
the documents, browse which code words
are in these documents, see documents that
mention particular countries, see the full
PDFs and articles. And we also made a…
when the Hacking Team data came out this
summer we mirrored the data and became one
of the primary mirrors of the data. We had a
torrent that was almost downing the server
with a lot of space and figured that none
of the other people had that, so we put it
up. And that got a lot of traffic, it got
about 57 M hits in the first 2 days.
And soon we realized there was a problem
where our server charged a lot for
bandwidth and did cost us 48$ everytime
someone decided to download the 400GB
with WGET. So that was interesting but
it’s been resolved now. It hopefully made
the data more accessible to people who
don’t have 400GB of harddrive space
available or enough internet connectivity
to download that. So then we’ve also made
a search tool for all of the Hacking Team
emails; that has a search interface that
lets you browse them like you would in a
normal email client with threading, and a
network graph so that you can see the
connections between senders and
recipients. The Intelligence Community
has a variety of collection disciplines:
SIGINT, OSINT, HUMINT, measurements
of Signals Intelligence, Symmetry
Intelligence. They have all these
different sources that they’re gathering
data from. I think that we should try to
duplicate this. Because there are a lot
of different sources that we can gather
data from as well, and we need to find
base to better collect data from all these
sources and to fuse them together.
These are some other ones that I’ve
been spending all the time looking at.
And there’s open source Intelligence
things like ICWATCH where you’re
collecting data from purely public
sources. But this is just part of the vare
ecosystem that we can draw on. This is
mostly information that people and
institutions make about themselves
publicly, either intentionally or
unintentionally. And it’s really difficult
to use because there’s a lot of it and it
needs to be collected and matched up and
pulled together in a browsable way for
people to be able to use it. So you can’t
really just mainly go and use it at scale.
You can do it a little bit but not nearly
enough. And so we’re working on making
this easier to use. The other sort of data,
it’s anonymously leaked documents,
documents that were (?) sent
journalists, that they think should be
public and these often pretty explicitly
reveal corruption, human rights abuses
or other issues. But this can also be used
to collect more data. Like we used the
published Snowden documents very heavily
to find code words that we could use to
collect the data in ICWATCH. And once we
start to collect data on secret things
that were recently not known at all, but
now are, and we can find data on that, we
can start to find data on unknown code
words and unknown things that we might not
otherwise recognize. And then there’s data
released by governments, from FOIA
requests through open data initiatives.
This, of course, can be spun or things can
be held back. So it’s not ideal to use on
its own. But it can be used like the other
2 types with in combination with each other.
You can use that to provide context, you
can use open source data to frame FOIA
requests and things like that. So the goal
of Transparency Toolkit is to make it
easier to collect all these types of data
in one place and to start to use this data
in the same ways that the Intelligence
Community uses the data collected from
all the various collection disciplines.
Except their goal isn’t to kill people or be
some sort of omniscient to God-like being
but we just want to build some sort of
external structure of accountability.
To make it easier to uncover and understand
things like surveillance programs or human
rights abuses or corruption. And when we
can find the people and companies that are
involved in things like surveillance we
can start to map who’s doing what.
And we can start to request information
about specific contracts. And we know who
we can ask questions about particular
programs. And then we can start to use the
data to start legal cases against specific
companies. And we can start to take more
concrete actions than we would be able to,
otherwise, if we were dealing simply in
theory or in guesses as to
what’s going on.
So – open source intelligence – let’s just
be more pro-active and more direct with
our techniques. And it also lets us find
some of this information earlier, because
many of the programs mentioned in the
Snowden documents were mentioned first
in other and open data sources. And if we
can start to figure out where these are
and start to figure out what they are,
then we know what data we’re missing and
we can start to go after it with FOIA
requests or trying to find it by other
means. But all of this a really, really
big project and we can’t… this is not
going to work if it’s just us working on
it. We need to work with other people.
We need to work with activists who have
ideas of how they want to use the data.
We need to work with journalists that
collect the data and write stories about
it. We need to work with human rights
lawyers to help them with their research
help them build legal cases based on the
findings. We need to work with NGOs and
human rights researchers who want to
collect and use open data in their work.
And we need more people going through
databases like ICWATCH. This doesn’t
require any special expertise. You gain
the knowledge that you need as you’re
going through them looking up terms. It’s
not easy but it can be quite interesting
once you combine all of these obscure
terms and it’s like “Oh, that’s what
they’re doing!” and oftentimes what
they’re doing is something entirely absurd
like reading all your email
or killing people.
And we also need software developers to
help develop software and help us figure
out how all of these tools should fit
together. So if anyone’s interested in
working with us to take on the
Intelligence Agencies of the world and
figure out what they’re doing please let
us know. I think it sounds a bit insane
and I know that, but (they) have far more
resources and far more experience but if
we keep ignoring the situation and we
continue as we are now making scattered
attempts to change things that aren’t
coordinated, that are based on limited
information, nothing is going to change
longterm. So I think we need to collect
all the information we can and figure out
how to effectively combine it and use it
for concrete goals. And I think we need
to do this with free software and open
data, because against such powerful
adversaries they’re probably the best
hopes we have.
applause
Herald: Thank you, thank you so much!
Now we have the round of Q&A,
for anyone who liked to ask a question,
please forward to the mikes on both sides
of this Saal (Hall). Start
taking the question from…
is nodding towards first person asking
…yeah.
Q: So I’d like to ask about documents
which are scans. Which are sometimes
released as official open source
information. What kind of workflow do you
have or even if you have any kind of
workflow for some OCR on these…!?
M.C.: A serious (?) that depends on the
document. There’s some open source
software called Tesseract that’s quite
good, but it doesn’t always work in cases
where there needs to be more specialized
parsing. I like to use something that’s
called Abbyy (FineReader) which is,
unfortunately, not open source and we are
looking for an alternative. For the
published Snowden documents, because we
needed to extract the classification
headers and that wasn’t so working with
Tesseract. But Tesseract
works for most things.
listens to unrecorded comment
from the audience
Yeah.
Herald: Thank you. Do we have question
from… [the internet]? Yeah, oui!
Signal Angel: Yes, rooty is asking on IRC:
What would you recommend the NSA to
develop towards a future
of Social Usefulness!??
E.g. what value have databases from
2015, people cell phone sensors in 2115!??
Could you give the NSA, maybe
CEO there, useful work!??
M.C.: Can you rephr..-, sorry !??
Signal Angel: naively repeats first
of the apparent Troll questions
M.C.: laughs
Social Usefulness…
Probably the most useful thing they could
do is stop collecting the data in the
first place, especially the data that’s
being intercepted or illegally collected.
There’s probably some amounts of useful
tracking they could do, but I’m not sure
that’s the best approach using the tactice
that they were to collect the data at that
time.
Herald: Thank you. So, next
question from you, please!
Question: Hello, thanks for the talk, that
was one of the best ones I’ve seen at this
congress. I was wondering what you think
about the question you’re raising about
“we shouldn’t make the same mistakes”.
Because I’m not totally sure that’s
possible because of things I’ve seen in
other communities. All communities have
their extremists and they will abuse this
data. And then that allows a political
attack on you, because they say you made
that happen, it’s not true. But it will celd
people. So how do you protect
against that?
M.C.: I think it’s hard to entirely
protect against it because we can’t
control the actions of other people. But
people could also go off and use this data
negatively by collecting it on their own,
independently of us. I was actually quite
impressed, after we launched ICWATCH, I
haven’t heard of anyone complaining of
threats that they’ve gotten from people…
People in the Intelligence Community:
I haven’t heard of anyone in the
Intelligence Community complaining about
threats that they’ve gotten as the results
of ICWATCH being launched. All of the
complaints have been theoretical. The only
threats I’ve heard of resulting from
ICWATCH are that from the Intelligence
Community to us. I haven’t heard of
anything, so I’ve been very impressed with
the civility of the internet in that case.
And I think that maybe, by framing it, and
actually bringing it down to the
individual level, and making it clear that
these are people, that makes it a little
bit less likely that people will go after
them in a vicious way.
Q: Have you thought of creating a kind of usage
guidelines? I mean that's not gonna change what
anyone does. But if someone does something
you can then say “That’s against our usage
guidelines” and it’s a political defence
against someone accusing it…
M.C.: Yeah, I don’t think there’s any way
that we can enforce something like that.
But we do try to be very careful with how
we’re framing it in saying – like I -
since a long time, all this talk saying these are
people that are not evil people. They’re
normal people that you should look at as
such. So I think being very careful of
framing it and we’ll be developing some
sort of guidelines. That’s definitely a
good idea.
Herald: Thank you. Your question, please!
Troll: Hi! First, thank you very much for
this tool that makes it possible to fight
back against, legally. For people who try
to punish or yeah…
What I have to say, or my question is: I
worked in the last 3 1/2 years, let’s say,
in the field of IT Forensics. And I worked
with Maltego and stuff, and so I know what
a lot of work it is to collect data and
bring it into good conditions, so others
could read it or you can get a goal, or
see a goal. And what I personally think
is very important: this could be very
sensible data to people and my question
is: How do you care that this data
which you will offer to download will keep
safe? That’s the first question, and
the second is: Did you think about
verifications? So you are collecting a lot
of data, and in a few years another person
wants to see if this data was correct. So
do you verify the sources like MD5 sum
or so you can say “This fingerprint taken
at this-day and this-time is correct?”
M.C.: For the first question: I don’t
think there’s really… I’m not sure (?)
protected because this is a version that
people posted publicly themselves. So they
sort of said that they don’t want it to be
protected or secured because they’re
posting it on the public internet. So I’m
not sure there’s really any reason to try
to protect it when it’s something that
they’ve published very publicly.
And on the second one, for verification,
that’s quite tricky with some of the data
especially around the Intelligence
Community because all of these things
are secretive and it’s hard to confirm
them. We can confirm them against each
other like now we have multiple résumé
sites on ICWATCH, so sometimes we can find
the same person’s résumé on another site
and compare over time and we can go
finding their profiles they have and try
to combine as much data on the same
as is possible and have it over time.
Q: What I did: I made a fingerprint
when I downloaded a website, I made a
fingerprint and then I can say OK, this
is… yeah.
M.C.: Of truth verifying various actions
collected, then. Yeah, I mean that's a bit harder to
absolutely do that on the behalf all of
the full text of the web page save, then
we have it all published on Github so you
can verify those collected then but, yeah.
Herald: We’ll take the questions
from up there.
Jake Appelbaum: Hi, community extremist
here… So I wanted to say something which
is that I think what Julian did for
leaking documents you’re doing for
analysis. Which is really great! Because
transparency is enough – you need action!
And so I just wanted to say that I hope
that everyone can give and see in
Transparency Toolkit a lot of material
support. And maybe a round of applause!
applause
Definitely the best talk at the congress
and I had a couple of suggestions. But
one of them is: I think it would be great
if you could focus on American Domestic
Police Agencies.
M.C.: Hmm-mhm…
Jake: In particular collecting the images
of Police Academy Graduation photographs.
And to be able to move in the direction of
facial recognition, so that we can find
Undercover Police Officers
that are in our midst…
applause
And I think it would be great if you could
create a FOIA wizard, essentially, ’cause
everybody likes wizards, and who doesn’t
like UNIX… So it’d be great if you could
create a FOIA wizard where you could say:
“I wanna know about these terms” and it
would just generate automatically – maybe
by partnering with Macroc e.g. –
interesting things, where there’s a kind
of “Wait!”. Where you realize there’s a lot
of people working on this classified
program and it’s at this agency and they
have a contract with this company and
these are the people involved and just
automatically generate those FOIAs and
then get people to sort of sign up to put
their name down and sort of sponsor a
little transparency and to say “Oh, that’s
the FOIA I wanna get behind, I’m in a
check on it, you know, once a week, I’m
gonna do this thing. Through Macroc.”
I think that would be a way to take this
information in a legal manner and to make
it actionable. And I think there’s lots of
other interesting things you could do that
are not about the law. But I leave that to
the imagination of other people. It should
be legal but it doesn’t need to be through
legal channels like, say, FOIA. So thanks
for the work that you’re doing, M.C. and
I hope that you will expand it to,
basically, all of the pigs of the whole
world. And I would really encourage you
to read Hannah Ahrend’s “Eichmann in
Jerusalem”, because you described a
fundamental thing: these people aren’t
evil. But actually, Evil itself doesn’t
exist. These people are the Banality of
Evil. They’re people who have soccer
practice, and they have a dog, and they
like to go home and fuck their wife, and
they’re regular people who do drone
strikes.
applause
Herald: Thank you. We
have a question on mike 1.
Q: How easy is it to add support for new
databases or new sources of information?
M.C.: It depends on the source and how
that site is structured. But generally
it’s not too difficult. The adding to
proper new sources does require
programming at this point. But it’s not
particularly complex programming and we
have some libraries that make some
parts of it easier, as well. And if you’re
interested in adding a data source we’re
more than happy to help with that.
Q: Awesome! My favourite is the list of…
the report of when people were denied
security clearance and why and if their
appeal was then, like, removed.
M.C.: Yeah, that would
be quite interesting!
Q: Okay!
Herald: If there’s no further
questions… moment…
yeah, okay! Please!
Q: Yesterday it was said that we have to
make sure that they know that we watch
them and make sure that they know that we
watch them. Because some day they will get
prosecuted. So, in some way. I think
you are exactly doing this. So this is
brilliant. Are you already in the stage
where you’re thinking you can start
concrete legal actions against some
individuals that you are getting
information with your tools. We’ve been
working with some lawyers towards that.
We are looking to do more in this, so if
you know… if you have any ideas for
particular situations where this may be
applicable, our lawyers, that we should
work with, let us know! But we’re working
towards that and making some progress.
Q: Thanks!
Herald: Getting a question
from up there, please!
Q: I just wanna say that you are a
visionary who is more passionate than
anybody I have ever collaborated with
and it’s a total honor.
applause
Herald: Thank you.
M.C.: Yeah, and just to everyone, that’s
Brennan who also works on Transparency
Toolkit. He made the awesome UI for
Harvester and Lookingglass that you saw
in the Tabs of all this.
applause
Jake: If no one else is gonna ask a
question, I’d like to ask a question which
I know the answer to but no one else
in the room does. And I think it’s very
fascinating. I wonder if you could talk
about lessons that you’ve learned from
studying about the South African
Resistance to Apartheid.
M.C. is laughing
Jake: And maybe you could talk about the
things that drive you to work on these
things. E.g. what inspires you to justice?
E.g. experiences at MIT and maybe – I mean
if you don’t want to talk about it, I’m
sorry for asking it. But if you do wanna
talk about it I think you can inspire
everyone else here to raise their fist
with you! In solidarity.
M.C.: Yeah… Okay… I guess it’s been
nearly 3 years now, so maybe that’s okay
to talk about. 3 years ago there was this
case at MIT… everyone has probably heard
of Aaron Swartz and he was being
prosecuted for downloading documents from
JSTOR. And I was brought in trying to figure out
MIT’s role in this situation, and if you
might be able to sway a public opinion,
a few people in Boston. I think some of
them are in this room. And we were trying
to help him. And eventually, part way into
the process, he became afraid and decided
that it would be more risky for us to help
him, with the prosecutor who might lash
back, so we stopped. But one of the things
that I did in this process was, I sent out
a survey to all of the professors at MIT
asking their opinion on his case. And
whether they identified with his actions.
And I got a lot of response to this
survey. Some were quite nice and were
quite supportive. Some were very vicious,
saying that he should go to jail and that
he is a waste of humanity and he works at
this Harvard Center for Ethics, so how is
this ethical. And things like that. They
were quite horrible. And initially he had
access to this database and somehow over
the next year, when we weren’t doing much,
he lost access to this database. And he
emailed me asking for access again. And
back then I was on some stupid kick about
research ethics and redaction and thought
that there’s no reason to… It really seems
that’s like “I cannot give you the answers
about the names”. I was just stupid because
the names are the most useful part of that
data. And I kind of abandoned him, along
with a lot of other people in that. And I
feel like if I had given him the names
that might have been something that could
be used to find supporters within MIT or
people who were rallying against him. And
I don’t think it would have made a huge
difference but it might have made just a
little bit. And that was one of the things
that really showed me the power of data on
individuals and the role of individuals
within institutions. And I feel like I
really failed there. So
I don’t want to do that again.
applause
Herald: Thank you. Unfortunately, we need
to wrap up because we are out of time.
Thank you for attending this very
interesting lecture and, quite touching
in the end.
postroll music
Subtitles created by c3subtitles.de
in 2016. Join and help us do more!