-
applause
-
Thank you very much, can you…
You can hear me? Yes!
-
I’ve been at this now 23 years. We
worked, with… My colleagues and I,
-
we worked in about 30 countries,
we’ve advised 9 Truth Commissions,
-
official Truth Commissions, 4 UN missions,
-
4 international criminal tribunals.
We have testified in 4 different cases
-
– 2 internationally, 2 domestically – and
we’ve advised dozens and dozens
-
of non-governmental Human Rights groups
around the world. The point of this stuff
-
is to figure out how to bring the
knowledge of the people who’ve suffered
-
human rights violations to bear,
on demanding accountability
-
from the perpetrators. Our job is to
figure out how we can tell the truth.
-
It is one of the moral foundations of the
international Human Rights movement
-
that we speak Truth to Power. We
look in the face of the powerful
-
and we tell them what we believe
they have done that is wrong.
-
If that’s gonna work, we
have to speak the truth.
-
We have to be right, we
have to get the analysis on.
-
That’s not always easy and to get there,
-
there are sort of 3 themes that
I wanna try to touch in this talk.
-
Since the talk is pretty short I’m
really gonna touch on 2 of them, so
-
at the very end of the talk I’ll invite
people who’d like to talk more about
-
the specifically technical aspects of this
work, about classifiers, about clustering,
-
about statistical estimation, about
database techniques. People who wanna talk
-
about that I’d love to gather and we’ll
try to find a space. I’ve been fighting
-
with the Wiki for 2 days; I think
I’m probably not the only one.
-
We can gather, we can talk about
that stuff more in detail. So today,
-
in the next 25 minutes I’m
going to focus specifically on
-
the trial of General
José Efraín Ríos Montt
-
who ruled Guatemala from
March 1982 until August 1983.
-
That’s General Ríos, there in
the upper corner in the red tie.
-
During the government
of General Ríos Montt
-
tens of thousands of people were killed by
the army of Guatemala. And the question
-
that has been facing Guatemalans
since that time is:
-
“Did the pattern of killing
that the army committed
-
constitute acts of genocide?”. Now
genocide is a very specific crime
-
in International Law. It does not
mean you killed a lot of people.
-
There are other war crimes for mass
killing. Genocide specifically means
-
that you picked out a particular group;
and to the exclusion of other groups
-
nearby them you focused
on eliminating that group.
-
That’s key because for a statistician
that gives us a hypothesis we can test
-
which is: “What is the relative risk,
what is the differential probability
-
of people in the target group being
killed relative to their neighbours
-
who are not in the target group?”
So without further ado,
-
let’s look at the relative risk of
being killed for indigenous people
-
in the 3 rural counties of
Chajul, Cotzal and Nebaj
-
relative to their
non-indigenous neighbours.
-
We have – and I’ll talk in a moment about
how we have this – we have information,
-
and evidence, and estimations of the
deaths of about 2150 indigenous people.
-
People killed by the army in the period
of the government of General Ríos.
-
The population, the total number of
people alive who were indigenous
-
in those counties in the census
of 1981 is about 39,000.
-
So the approximate crude mortality
rate due to homicide by the army
-
is 5.5% for indigenous people in
that period. Now that’s relative
-
to the homicide rate for non-indigenous
people in the same place
-
of approximately 0.7%. So what
we ask is: “What is the ratio
-
between those 2 numbers?” And
the ratio between those 2 numbers
-
is the relative risk. It’s approximately
8. We interpret that as: if you were
-
an indigenous person alive in
one of those 3 counties in 1982,
-
your probability of being killed
by the army was 8 times greater
-
than a person also living
in those 3 counties
-
who was not indigenous.
Eight times, 8 times!
-
To put that in relative terms: the
probability… the relative risk of being
-
a Bosniac relative to being Serb
in Bosnia during the war in Bosnia
-
was a little less than 3. So your
relative risk of being indigenous
-
was more than twice nearly 3 times
as much as your relative risk
-
of being Bosniac in the Bosnian War.
It’s an astonishing level of focus.
-
It shows a tremendous planning
and coherence, I believe.
-
So, again coming back to the statistical
conclusion, how do we come to that?
-
How do we find that information? How do we
make that conclusion? First, we’re only
-
looking at homicides committed by the
army. We’re not looking at homicides
-
committed by other parties, by
the guerrillas, by private actors.
-
We’re not looking at excess mortality,
the mortality that we might find
-
in conflict that is in excess of
normal peacetime mortality.
-
We’re not looking at any of that,
only homicide. And the percentage
-
relates the number of people killed by the
army with the population that was alive.
-
That’s crucial here. We’re looking at
rates and we’re comparing the rate
-
of the indigenous people shown in the
blue bar to non-indigenous people
-
shown in the green bar. The width of
the bars show the relative populations
-
in each of those 2 communities. So clearly
there are many more indigenous people,
-
but a higher fraction of them are also
killed. The bars also show something else.
-
And that’s what I’ll focus on for the
rest of the talk. There are 2 sections
-
to each of the 2 bars, a dark section
on the bottom, a lighter section on top.
-
And what that indicates is what we know
in terms of being able to name people
-
with their first and last name, their
location and dates of death, and
-
what we must infer statistically. Now I’m
beginning to touch on the second theme
-
of my talk: Which is that when we are
studying mass violence and war crimes,
-
we cannot do statistical or pattern
analysis with raw information.
-
We must use the tools of mathematical
statistics to understand
-
what we don’t know! The information
which cannot be observed directly.
-
We have to estimate that in order to
control for the process of the production
-
of information. Information doesn’t just
fall out of the sky, the way it does
-
for industry. If I’m running an ISP I know
every packet that runs through my routers.
-
That’s not how the social world works. In
order to find information about killings
-
we have to hear about that killing from
someone, we have to investigate,
-
we have to find the human remains.
And if we can’t observe the killing
-
we won’t hear about it and many killings
are hidden. In my team we have a kind of
-
catch phrase: that the world… if a lawyer
is killed in a big city at high noon
-
the world knows about it before
dinner time. Every single time.
-
But when a rural peasant is killed 3-days
walk from a road in the dead of night,
-
we’re unlikely to ever hear. And
technology is not changing this.
-
I’ll talk later about that technology is
actually making the problem worse.
-
So, let’s get back to Guatemala
and just conclude
-
that the little vertical bars, little
vertical lines at the top of each bar
-
indicate the confidence interval. Which is
similar to what lay people sometimes call
-
a margin of error. It is our level of
uncertainty about each of those estimates
-
and you’ll notice that the uncertainty
is much, much smaller than
-
the difference between the 2 bars. The
uncertainty does not affect our ability
-
to draw the conclusion that there
was a spectacular difference
-
in the mortality rates between the
people who were the hypothesized
-
target of genocide and those who were not.
-
Now the data: first we
had the census of 1981,
-
this was a crucial piece. I think there’s
very interesting questions to ask
-
about why the Government of Guatemala
conducted a census on the eve of
-
committing a genocide. There is excellent
work done by historical demographers
-
about the use of censuses in mass
violence. It has been common
-
throughout history. Similarly,
or excuse me, in parallel
-
there were 4 very large
projects. First, the CIIDH
-
– a group of non-Governmental
Human Rights groups –
-
collected 1240 records of deaths
in this three-county region.
-
Next, the Catholic Church collected
a bit fewer than 800 deaths.
-
The truth commission – the Comisión
para el Esclarecimiento Histórico (CEH) –
-
conducted a really big research
project in the late 1990s and
-
of that we got information about a little
bit more than a thousand deaths.
-
And then the National Program for
Compensation is very, very large
-
and gave us about 4700
records of deaths.
-
Now, this is interesting
but this is not unique.
-
Many of the deaths are reported in common
across those data sources and so…
-
we think about this in terms of a Venn
diagram. We think of: how did these
-
different data sets intersect with each
other or collide with each other. And
-
we can diagram that as in the sense
of these 3 white circles intersecting.
-
But as I mentioned earlier we’re also
interested in what we have not observed.
-
And this is crucial for us because
when we’re thinking about
-
how much information we have, we have to
distinguish between the world on the left,
-
in which our intersecting circles
cover about a third of the reality,
-
versus the world on the right where our
intersecting circles cover all of reality.
-
These are very different worlds; and the
reason they’re so different is not simply
-
because we want to know the magnitude,
not simply because we want to know
-
the total number of killings. That’s
important – but even more important:
-
we have to know that we’ve covered,
we’ve estimated in equal proportions
-
the two parties. We have to estimate in
equal proportions the number of deaths
-
of non-indigenous people and the
number of deaths of indigenous people.
-
Because if we don’t get those
estimates correct our comparison
-
of their mortality rates will be biased.
Our story will be wrong. We will fail
-
to speak Truth to Power. We can’t have
that. So what do we do? Algebra!
-
Algebra is our friend. So I’m gonna
give you just a tiny taste of how we
-
solve this problem and I’m going to
introduce a series of assumptions.
-
Those of you who would like to debate
those assumptions: I invite you to join me
-
after the talk and we will talk endlessly
and tediously about capture heterogeneity.
-
But in the short term,
-
we have a universe N of total killings in
a specific time/space/ethnicity/location.
-
And of that we have 2 projects A and B.
-
A captures some number of
deaths from the universe N,
-
and the probability with which a death is
captured by project A from the universe N
-
is by elementary probability theory the
number of deaths documented by A
-
divided by the unknown number
of deaths in the population N.
-
Similarly, the probability with which a
death from N is documented by project B
-
is B over N, and this is the cool part:
the probability with which a death
-
is documented by both A and B is M.
-
Now we can put the 2 databases together,
we can compare them. Let’s talk about
-
the use of random force classifiers
and clustering to do that later.
-
But we can put the 2 databases together,
compare them, determine the deaths
-
that are in M – that is in N both
A and B – and divide M by N.
-
But, also by probability theory, the
probability that a death occurs in M
-
is equal to the product of
the individual probabilities.
-
The probability of any compound event, an
event made up of two independent events is
-
equal to the product of those two
events, so M over N is equal to
-
A over N times B over N. Solve for N.
-
Multiply it through by N squared, divide
by M, and we have an estimate of N
-
which is equal to AB over M. Now, the
lights in my eyes, I can’t see, but I saw
-
a few light bulbs go off over people’s
heads. And when I showed this proof
-
to the judge in the trial of General Ríos
-
I saw a light bulb go on over her head.
-
It’s a beautiful thing,
it’s a beautiful thing.
-
applause
-
So we don’t do it in 2 systems because
that takes a lot of assumptions.
-
We do it in 4. You will recall that we
have 4 data sources. We organize
-
the data sources in this format
such that we have an inclusion
-
and an exclusion pattern in the table on
the left, which… for which we can define
-
the number of deaths which fall into
each of these intersecting patterns.
-
And I’ll give you a very quick
metaphor here. The metaphor is:
-
imagine that you have 2 dark rooms and you
want to assess the size of those 2 rooms
-
– which room is larger? And the only
tool that you have to assess the size
-
of those rooms is a handful of little
rubber balls. The little rubber balls
-
have a property that when they hit each
other they make a sound. makes CLICK sound
-
So we throw the balls into the first
room and we listen, and we hear
-
makes several CLICK sounds. We
collect the balls, go to the second room,
-
throw them with equal force – imagining
a spherical cow of uniform density!
-
We throw the balls into the second
room with equal force and we hear
-
makes one CLICK sound
So which room is larger?
-
The second room, because we hear fewer
collisions, right? Well, the estimation,
-
the toy example I gave in the previous
slide is the mathematical formalization
-
of the intuition that fewer
collisions mean a larger space.
-
And so what we’re doing here is
laying out the pattern of collisions.
-
Not just the collisions, the pairwise
collisions, but the three-way and
-
four-way collisions. And that
allows us to make the estimate
-
that was shown in the bar graph of
the light part of each of the bars. So
-
we can come back to our conclusion and put
a confidence interval on the estimates.
-
And the confidence intervals are shown
there. Now I’m gonna move through this
-
somewhat more quickly to get to the end of
the talk but I wanna put up one more slide
-
that was used in the testimony
and that is that we divided time
-
into 16-month periods and
compared the 16-month period of
-
General Ríos’s governance – now it’s only
16 months ’cause we went April to July,
-
because it’s only a few days in August, a
few days in March, so we shaved those off,
-
okay… – 16-month period of General
Ríos’s Government and compared it
-
to several periods before and after. And
I think that the key observation here
-
is that the rate of killing
against indigenous people
-
is substantially higher done under General
Ríos’s Government than under previous
-
or succeeding governments. But more
importantly the ratio between the two,
-
the relative risk of being killed as an
indigenous person, was at its peak
-
during the government of General Ríos.
-
Have we proven genocide? No.
-
This is evidence consistent with the
hypothesis that acts of genocide
-
were committed. The finding of genocide
is a legal finding, not so much
-
a scientific one. So as scientists,
our job is to provide evidence that
-
the finders of fact – the judges in this
case – can use in their determination.
-
This is evidence consistent
with that hypothesis.
-
Were this evidence otherwise, as
scientists we would say we would
-
reject the hypothesis that genocide was
committed. However, with this evidence
-
we find that the evidence,
the data is consistent with
-
the prosecution’s hypothesis.
-
So, it worked!
-
Ríos Montt was convicted on
genocide charges. applause
-
You can clap!
applause
-
applause
-
For a week!
mumbled, surprised laughter
-
Then the Constitutional Court intervened,
-
there I know a couple of experts on
Guatemala here in the audience
-
who can tell you more about why that
happened and exactly what happened.
-
However, the Constitutional
Court ordered a new trial,
-
which is at this time scheduled
for the very beginning of 2015.
-
And I look forward to testifying again,
-
and again, and again, and again!
-
applause
-
Look, but I wanna come back to this point.
Because as a bunch of technologists…
-
– there is a lot of folks who really like
technology here, I really like it too!
-
Technology doesn’t get us to science
– you have to have science
-
to get you to science. Technology helps
you organize the data. It helps you do
-
all kinds of extremely great and cool
things without which we wouldn’t be able
-
to even do the science. But you
can’t have just technology!
-
You can’t just have a bunch of data
and make conclusions. That’s naive,
-
and you will get the wrong conclusions.
‘The point of rigorous statistics is
-
to be right’, and there is a little bit of
a caveat there – or to at least know
-
how uncertain you are. Statistics is often
called the ‘Science of Uncertainty’.
-
That is actually my favorite
definition of it. So,
-
I’m going to assume that we
care about getting it right.
-
No one laughed, that’s good.
-
Not everyone does, to my distress.
-
So if you only have some of the data
-
– and I will argue that we always
only have some of the data –
-
you need some kind of model that will tell
you the relationship between your data
-
and the real world.
Statisticians call that an inference.
-
In order to get from here to there
you’re gonna need some kind of
-
probability model that tells you
why your data is like the world,
-
or in what sense you have to tweet,
twiddle and do algebra with your data
-
to get from what you can
observe to what is actually true.
-
And statistics is about comparisons.
Yeah, we get a big number and
-
journalists love the big number; but
it’s really about these relationships
-
and patterns! So to get those
relationships and patterns,
-
in order for them to be right, in order
for our answer to be correct,
-
every one of the estimates we make
for every point in the pattern
-
has to be right. It’s a hard
problem. It’s a hard problem.
-
And what I worry about is that
we have come into this world
-
in which people throw the notion of Big
Data around as though the data allows us
-
to make an end-run around problems
of sampling and modeling. It doesn’t.
-
So as technologist, the reason I’m,
you know, ranting at you guys about it
-
is that it’s very tempting to have a lot
of data and think you have an answer!
-
And it’s even more tempting because
in industry context you might be right.
-
Not so much in Human Rights, not so
much. Violence is a hidden process.
-
The people who commit violence have
an enormous commitment to hiding it,
-
distorting it, explaining it in different
ways. All of those things dramatically
-
affect the information that is produced
from the violence that we’re going to use
-
to do our analysis. So we usually
don’t know what we don’t know
-
in Human Rights data collection.
And that means that we don’t know
-
if what we don’t know is systematically
different from what we do know.
-
Maybe we know about all the lawyers
and we don’t know about the people
-
in the countryside. Maybe we know
about all the indigenous people
-
and not the non-indigenous people.
If that were true, the argument
-
that I just made would be merely
an artifact of the reporting process
-
rather than some true analysis. Now
we did the estimations why I believe
-
we can reject that critique, but that’s
what we have to worry about.
-
And let’s go back to the Venn diagram
and say: which of these is accurate?
-
It’s not just for one of the
points in our pattern analysis.
-
The problem is that we’re
going to compare things.
-
As in Peru where we compared killings
committed by the Peruvian army against
-
killings committed by the Maoist Guerillas
with the Sendero Luminoso. And we found
-
there that in fact we knew very little
about what the Sendero Luminoso had done.
-
Whereas we knew almost everything
what the Peruvian army had done.
-
This is called the coverage rate.
The rate between what we know and
-
what we don’t know. And
raw data, however big,
-
does not get us to patterns.
And here is a bunch of…
-
kinds of raw data that I’ve used
and that I really enjoy using.
-
You know – truth commission testimonies,
UN investigations, press articles,
-
SMS messages, crowdsourcing, NGO
documentation, social media feeds,
-
perpetrator records, government archives,
state agency registries – I know those
-
sound all the same but they actually
turn out to be slightly different.
-
Happy to talk in tedious detail! Refugee
Camp records, any non-random sample.
-
All of those are gonna take
some kind of probability model
-
and we don’t have that many
probability models to use. So
-
raw data is great for cases – but
it doesn’t get you to patterns.
-
And patterns – again – patterns are
the thing that allow us to do analysis.
-
They are the thing… the patterns are what
get us to something that we can use
-
to help prosecutors, advocates and the…
-
and the victims themselves.
-
I gave a version of this talk, a
much earlier version of this talk
-
several years ago in Medellín, Columbia.
I’ve worked a lot in Columbia,
-
it’s really… it’s a great place to
work. There’s really terrific
-
Victims Rights groups there.
And a woman from a township,
-
smaller than a county, near to Medellín
came up to me after the talk and she said:
-
“You know, a lot of people… you
know I’m a Human Rights activist,
-
my job is to collect data, I tell stories
about people who have suffered.
-
But there are people in my
village I know who have had
-
people in their families disappeared and
they’re never gonna talk about, ever.
-
We’re never going to be able to use
their names, because they are afraid.”
-
We can’t name the victims. At
least we’d better count them.
-
So about that counting: there’s
3 ways to do it right. You can have
-
a perfect census – you can have all the
data. Yeah it’s nice, good luck with that.
-
You can have a random sample
of the population - that’s hard!
-
Sometimes doable but very hard.
In my experience we rarely interview
-
victims of homicide, very rarely.
Laughing
-
And that means there’s a complicated
probability relationship between
-
the person you sampled, the interview
and the death that they talk to you about.
-
Or you can do some kind of posterior
modeling of the sampling process which is…
-
which is in essence what
I proposed in the earlier slide.
-
So what can we do with raw data,
guys? We can collect a bunch of…
-
We can say that a case exists. Ok
– that’s actually important! We can say:
-
“Something happened” with raw data. We can
say: “We know something about that case".
-
We can say: “There were 100 victims
in that case or at least 100 victims
-
in that case”, if we can name 100 people.
-
But we can’t do comparisons: “This
is the biggest massacre this year”.
-
We don’t really know. Because we
don’t know about that massacres
-
we don’t know about. No patterns. Don’t
talk about the hot spot of violence.
-
No, we don’t know that. Happy to talk
more about that if we gather after,
-
but I wanna come to a close here with
the importance of getting it right.
-
I’ve talked about one case today. This
is another case, the case of this man:
-
Edgar Fernando García. Mr. García was
a student Labor leader in Guatemala
-
early in the 1980s. He left
his office in February 1984
-
– did not come home. People reported
later that they saw someone
-
shoving Mr. García into a
vehicle and driving away.
-
His widow became a very important
Human Rights activist in Guatemala
-
and now she’s a very important, and
in my opinion impressive politician.
-
And there’s her infant daughter. She
continued to struggle to find out
-
what had happened to
Mr. García for decades.
-
And in 2006 documents came to light
in the National Archives of the…
-
excuse me, the Historical Archives
of the national Police, showing that
-
the Police had realized an operation
in the area of Mr. García’s office
-
and it was very likely that
they had disappeared him.
-
These 2 guys up here in the upper
right were Police officers in that area;
-
they were arrested, charged with the
disappearance of Mister García and
-
convicted. Part of the evidence used to
convict them was communications meta data
-
showing that documents
flowed through the archive.
-
I mean paper communications! We coded
it by hand. We went through and read
-
the ‘From’ and ‘To’ lines
from every Memo. And
-
they were convicted in 2010
and after that conviction
-
Mr. García’s infant daughter – now
a grown woman – was clearly joyful.
-
Justice brings closure to a family
that never knows when to start talking
-
about someone in the past tense.
Perhaps even more powerfully:
-
those guys’ grand boss, their boss's
boss, Colonel Héctor Bol de la Cruz,
-
this man here, was convicted
of Mr. García’s disappearance
-
in September this year [2013].
applause
-
applause
-
I don’t know if any of you have
ever been dissident students,
-
but if you’ve been dissident students
demonstrating in the street think about
-
how you would feel if your friends
and comrades were disappeared,
-
and take a long look at Colonel Bol
de la Cruz. Here is the rest of the stuff
-
that we will talk about if we gather
afterwards. Thank you very much
-
for your attention. I really
have enjoyed CCC.
-
applause
-
Subtitles created by c3subtitles.de
in the year 2016. Join and help us!
C3Subtitles
Text (nicht Timing) passt bis 14:58.
C3Subtitles
Transkript ist komplett bis auf paar kleine fehlende Worte. Timing ist maschinell erstellt + Zeilenumbrüche entfernt.