Return to Video

https:/.../30c3-5405-en-Data_Mining_for_Good_h264-iprod.mp4

  • 0:10 - 0:18
    applause
  • 0:18 - 0:23
    Thank you very much, can you…
    You can hear me? Yes!
  • 0:23 - 0:28
    I’ve been at this now 23 years. We
    worked, with… My colleagues and I,
  • 0:28 - 0:31
    we worked in about 30 countries,
    we’ve advised 9 Truth Commissions,
  • 0:31 - 0:36
    official Truth Commissions, 4 UN missions,
  • 0:36 - 0:40
    4 international criminal tribunals.
    We have testified in 4 different cases
  • 0:40 - 0:44
    – 2 internationally, 2 domestically – and
    we’ve advised dozens and dozens
  • 0:44 - 0:49
    of non-governmental Human Rights groups
    around the world. The point of this stuff
  • 0:49 - 0:54
    is to figure out how to bring the
    knowledge of the people who’ve suffered
  • 0:54 - 0:59
    human rights violations to bear,
    on demanding accountability
  • 0:59 - 1:05
    from the perpetrators. Our job is to
    figure out how we can tell the truth.
  • 1:05 - 1:09
    It is one of the moral foundations of the
    international Human Rights movement
  • 1:09 - 1:14
    that we speak Truth to Power. We
    look in the face of the powerful
  • 1:14 - 1:19
    and we tell them what we believe
    they have done that is wrong.
  • 1:19 - 1:24
    If that’s gonna work, we
    have to speak the truth.
  • 1:24 - 1:29
    We have to be right, we
    have to get the analysis on.
  • 1:29 - 1:34
    That’s not always easy and to get there,
  • 1:34 - 1:37
    there are sort of 3 themes that
    I wanna try to touch in this talk.
  • 1:37 - 1:40
    Since the talk is pretty short I’m
    really gonna touch on 2 of them, so
  • 1:40 - 1:44
    at the very end of the talk I’ll invite
    people who’d like to talk more about
  • 1:44 - 1:49
    the specifically technical aspects of this
    work, about classifiers, about clustering,
  • 1:49 - 1:54
    about statistical estimation, about
    database techniques. People who wanna talk
  • 1:54 - 1:57
    about that I’d love to gather and we’ll
    try to find a space. I’ve been fighting
  • 1:57 - 2:00
    with the Wiki for 2 days; I think
    I’m probably not the only one.
  • 2:00 - 2:05
    We can gather, we can talk about
    that stuff more in detail. So today,
  • 2:05 - 2:10
    in the next 25 minutes I’m
    going to focus specifically on
  • 2:10 - 2:15
    the trial of General
    José Efraín Ríos Montt
  • 2:15 - 2:20
    who ruled Guatemala from
    March 1982 until August 1983.
  • 2:20 - 2:25
    That’s General Ríos, there in
    the upper corner in the red tie.
  • 2:25 - 2:31
    During the government
    of General Ríos Montt
  • 2:31 - 2:36
    tens of thousands of people were killed by
    the army of Guatemala. And the question
  • 2:36 - 2:40
    that has been facing Guatemalans
    since that time is:
  • 2:40 - 2:44
    “Did the pattern of killing
    that the army committed
  • 2:44 - 2:50
    constitute acts of genocide?”. Now
    genocide is a very specific crime
  • 2:50 - 2:54
    in International Law. It does not
    mean you killed a lot of people.
  • 2:54 - 2:59
    There are other war crimes for mass
    killing. Genocide specifically means
  • 2:59 - 3:04
    that you picked out a particular group;
    and to the exclusion of other groups
  • 3:04 - 3:08
    nearby them you focused
    on eliminating that group.
  • 3:08 - 3:14
    That’s key because for a statistician
    that gives us a hypothesis we can test
  • 3:14 - 3:19
    which is: “What is the relative risk,
    what is the differential probability
  • 3:19 - 3:23
    of people in the target group being
    killed relative to their neighbours
  • 3:23 - 3:28
    who are not in the target group?”
    So without further ado,
  • 3:28 - 3:32
    let’s look at the relative risk of
    being killed for indigenous people
  • 3:32 - 3:37
    in the 3 rural counties of
    Chajul, Cotzal and Nebaj
  • 3:37 - 3:41
    relative to their
    non-indigenous neighbours.
  • 3:41 - 3:46
    We have – and I’ll talk in a moment about
    how we have this – we have information,
  • 3:46 - 3:51
    and evidence, and estimations of the
    deaths of about 2150 indigenous people.
  • 3:51 - 3:59
    People killed by the army in the period
    of the government of General Ríos.
  • 3:59 - 4:03
    The population, the total number of
    people alive who were indigenous
  • 4:03 - 4:07
    in those counties in the census
    of 1981 is about 39,000.
  • 4:07 - 4:14
    So the approximate crude mortality
    rate due to homicide by the army
  • 4:14 - 4:19
    is 5.5% for indigenous people in
    that period. Now that’s relative
  • 4:19 - 4:23
    to the homicide rate for non-indigenous
    people in the same place
  • 4:23 - 4:27
    of approximately 0.7%. So what
    we ask is: “What is the ratio
  • 4:27 - 4:31
    between those 2 numbers?” And
    the ratio between those 2 numbers
  • 4:31 - 4:36
    is the relative risk. It’s approximately
    8. We interpret that as: if you were
  • 4:36 - 4:41
    an indigenous person alive in
    one of those 3 counties in 1982,
  • 4:41 - 4:47
    your probability of being killed
    by the army was 8 times greater
  • 4:47 - 4:51
    than a person also living
    in those 3 counties
  • 4:51 - 4:56
    who was not indigenous.
    Eight times, 8 times!
  • 4:56 - 5:00
    To put that in relative terms: the
    probability… the relative risk of being
  • 5:00 - 5:05
    a Bosniac relative to being Serb
    in Bosnia during the war in Bosnia
  • 5:05 - 5:10
    was a little less than 3. So your
    relative risk of being indigenous
  • 5:10 - 5:13
    was more than twice nearly 3 times
    as much as your relative risk
  • 5:13 - 5:19
    of being Bosniac in the Bosnian War.
    It’s an astonishing level of focus.
  • 5:19 - 5:24
    It shows a tremendous planning
    and coherence, I believe.
  • 5:24 - 5:29
    So, again coming back to the statistical
    conclusion, how do we come to that?
  • 5:29 - 5:33
    How do we find that information? How do we
    make that conclusion? First, we’re only
  • 5:33 - 5:35
    looking at homicides committed by the
    army. We’re not looking at homicides
  • 5:35 - 5:39
    committed by other parties, by
    the guerrillas, by private actors.
  • 5:39 - 5:44
    We’re not looking at excess mortality,
    the mortality that we might find
  • 5:44 - 5:48
    in conflict that is in excess of
    normal peacetime mortality.
  • 5:48 - 5:51
    We’re not looking at any of that,
    only homicide. And the percentage
  • 5:51 - 5:55
    relates the number of people killed by the
    army with the population that was alive.
  • 5:55 - 5:59
    That’s crucial here. We’re looking at
    rates and we’re comparing the rate
  • 5:59 - 6:02
    of the indigenous people shown in the
    blue bar to non-indigenous people
  • 6:02 - 6:07
    shown in the green bar. The width of
    the bars show the relative populations
  • 6:07 - 6:12
    in each of those 2 communities. So clearly
    there are many more indigenous people,
  • 6:12 - 6:15
    but a higher fraction of them are also
    killed. The bars also show something else.
  • 6:15 - 6:18
    And that’s what I’ll focus on for the
    rest of the talk. There are 2 sections
  • 6:18 - 6:22
    to each of the 2 bars, a dark section
    on the bottom, a lighter section on top.
  • 6:22 - 6:28
    And what that indicates is what we know
    in terms of being able to name people
  • 6:28 - 6:31
    with their first and last name, their
    location and dates of death, and
  • 6:31 - 6:36
    what we must infer statistically. Now I’m
    beginning to touch on the second theme
  • 6:36 - 6:41
    of my talk: Which is that when we are
    studying mass violence and war crimes,
  • 6:41 - 6:49
    we cannot do statistical or pattern
    analysis with raw information.
  • 6:49 - 6:52
    We must use the tools of mathematical
    statistics to understand
  • 6:52 - 6:56
    what we don’t know! The information
    which cannot be observed directly.
  • 6:56 - 7:01
    We have to estimate that in order to
    control for the process of the production
  • 7:01 - 7:05
    of information. Information doesn’t just
    fall out of the sky, the way it does
  • 7:05 - 7:10
    for industry. If I’m running an ISP I know
    every packet that runs through my routers.
  • 7:10 - 7:15
    That’s not how the social world works. In
    order to find information about killings
  • 7:15 - 7:18
    we have to hear about that killing from
    someone, we have to investigate,
  • 7:18 - 7:22
    we have to find the human remains.
    And if we can’t observe the killing
  • 7:22 - 7:28
    we won’t hear about it and many killings
    are hidden. In my team we have a kind of
  • 7:28 - 7:34
    catch phrase: that the world… if a lawyer
    is killed in a big city at high noon
  • 7:34 - 7:38
    the world knows about it before
    dinner time. Every single time.
  • 7:38 - 7:42
    But when a rural peasant is killed 3-days
    walk from a road in the dead of night,
  • 7:42 - 7:45
    we’re unlikely to ever hear. And
    technology is not changing this.
  • 7:45 - 7:49
    I’ll talk later about that technology is
    actually making the problem worse.
  • 7:49 - 7:53
    So, let’s get back to Guatemala
    and just conclude
  • 7:53 - 7:58
    that the little vertical bars, little
    vertical lines at the top of each bar
  • 7:58 - 8:03
    indicate the confidence interval. Which is
    similar to what lay people sometimes call
  • 8:03 - 8:07
    a margin of error. It is our level of
    uncertainty about each of those estimates
  • 8:07 - 8:11
    and you’ll notice that the uncertainty
    is much, much smaller than
  • 8:11 - 8:15
    the difference between the 2 bars. The
    uncertainty does not affect our ability
  • 8:15 - 8:18
    to draw the conclusion that there
    was a spectacular difference
  • 8:18 - 8:22
    in the mortality rates between the
    people who were the hypothesized
  • 8:22 - 8:27
    target of genocide and those who were not.
  • 8:27 - 8:31
    Now the data: first we
    had the census of 1981,
  • 8:31 - 8:35
    this was a crucial piece. I think there’s
    very interesting questions to ask
  • 8:35 - 8:40
    about why the Government of Guatemala
    conducted a census on the eve of
  • 8:40 - 8:45
    committing a genocide. There is excellent
    work done by historical demographers
  • 8:45 - 8:48
    about the use of censuses in mass
    violence. It has been common
  • 8:48 - 8:53
    throughout history. Similarly,
    or excuse me, in parallel
  • 8:53 - 8:57
    there were 4 very large
    projects. First, the CIIDH
  • 8:57 - 9:02
    – a group of non-Governmental
    Human Rights groups –
  • 9:02 - 9:07
    collected 1240 records of deaths
    in this three-county region.
  • 9:07 - 9:12
    Next, the Catholic Church collected
    a bit fewer than 800 deaths.
  • 9:12 - 9:17
    The truth commission – the Comisión
    para el Esclarecimiento Histórico (CEH) –
  • 9:17 - 9:22
    conducted a really big research
    project in the late 1990s and
  • 9:22 - 9:26
    of that we got information about a little
    bit more than a thousand deaths.
  • 9:26 - 9:30
    And then the National Program for
    Compensation is very, very large
  • 9:30 - 9:35
    and gave us about 4700
    records of deaths.
  • 9:35 - 9:41
    Now, this is interesting
    but this is not unique.
  • 9:41 - 9:46
    Many of the deaths are reported in common
    across those data sources and so…
  • 9:46 - 9:49
    we think about this in terms of a Venn
    diagram. We think of: how did these
  • 9:49 - 9:54
    different data sets intersect with each
    other or collide with each other. And
  • 9:54 - 9:59
    we can diagram that as in the sense
    of these 3 white circles intersecting.
  • 9:59 - 10:06
    But as I mentioned earlier we’re also
    interested in what we have not observed.
  • 10:06 - 10:09
    And this is crucial for us because
    when we’re thinking about
  • 10:09 - 10:13
    how much information we have, we have to
    distinguish between the world on the left,
  • 10:13 - 10:17
    in which our intersecting circles
    cover about a third of the reality,
  • 10:17 - 10:22
    versus the world on the right where our
    intersecting circles cover all of reality.
  • 10:22 - 10:26
    These are very different worlds; and the
    reason they’re so different is not simply
  • 10:26 - 10:30
    because we want to know the magnitude,
    not simply because we want to know
  • 10:30 - 10:34
    the total number of killings. That’s
    important – but even more important:
  • 10:34 - 10:40
    we have to know that we’ve covered,
    we’ve estimated in equal proportions
  • 10:40 - 10:44
    the two parties. We have to estimate in
    equal proportions the number of deaths
  • 10:44 - 10:48
    of non-indigenous people and the
    number of deaths of indigenous people.
  • 10:48 - 10:52
    Because if we don’t get those
    estimates correct our comparison
  • 10:52 - 10:56
    of their mortality rates will be biased.
    Our story will be wrong. We will fail
  • 10:56 - 11:02
    to speak Truth to Power. We can’t have
    that. So what do we do? Algebra!
  • 11:02 - 11:06
    Algebra is our friend. So I’m gonna
    give you just a tiny taste of how we
  • 11:06 - 11:10
    solve this problem and I’m going to
    introduce a series of assumptions.
  • 11:10 - 11:13
    Those of you who would like to debate
    those assumptions: I invite you to join me
  • 11:13 - 11:18
    after the talk and we will talk endlessly
    and tediously about capture heterogeneity.
  • 11:18 - 11:22
    But in the short term,
  • 11:22 - 11:28
    we have a universe N of total killings in
    a specific time/space/ethnicity/location.
  • 11:28 - 11:31
    And of that we have 2 projects A and B.
  • 11:31 - 11:35
    A captures some number of
    deaths from the universe N,
  • 11:35 - 11:40
    and the probability with which a death is
    captured by project A from the universe N
  • 11:40 - 11:45
    is by elementary probability theory the
    number of deaths documented by A
  • 11:45 - 11:49
    divided by the unknown number
    of deaths in the population N.
  • 11:49 - 11:53
    Similarly, the probability with which a
    death from N is documented by project B
  • 11:53 - 11:58
    is B over N, and this is the cool part:
    the probability with which a death
  • 11:58 - 12:02
    is documented by both A and B is M.
  • 12:02 - 12:06
    Now we can put the 2 databases together,
    we can compare them. Let’s talk about
  • 12:06 - 12:09
    the use of random force classifiers
    and clustering to do that later.
  • 12:09 - 12:12
    But we can put the 2 databases together,
    compare them, determine the deaths
  • 12:12 - 12:17
    that are in M – that is in N both
    A and B – and divide M by N.
  • 12:17 - 12:23
    But, also by probability theory, the
    probability that a death occurs in M
  • 12:23 - 12:28
    is equal to the product of
    the individual probabilities.
  • 12:28 - 12:32
    The probability of any compound event, an
    event made up of two independent events is
  • 12:32 - 12:36
    equal to the product of those two
    events, so M over N is equal to
  • 12:36 - 12:41
    A over N times B over N. Solve for N.
  • 12:41 - 12:45
    Multiply it through by N squared, divide
    by M, and we have an estimate of N
  • 12:45 - 12:49
    which is equal to AB over M. Now, the
    lights in my eyes, I can’t see, but I saw
  • 12:49 - 12:53
    a few light bulbs go off over people’s
    heads. And when I showed this proof
  • 12:53 - 12:57
    to the judge in the trial of General Ríos
  • 12:57 - 13:02
    I saw a light bulb go on over her head.
  • 13:02 - 13:04
    It’s a beautiful thing,
    it’s a beautiful thing.
  • 13:04 - 13:10
    applause
  • 13:10 - 13:13
    So we don’t do it in 2 systems because
    that takes a lot of assumptions.
  • 13:13 - 13:16
    We do it in 4. You will recall that we
    have 4 data sources. We organize
  • 13:16 - 13:22
    the data sources in this format
    such that we have an inclusion
  • 13:22 - 13:26
    and an exclusion pattern in the table on
    the left, which… for which we can define
  • 13:26 - 13:30
    the number of deaths which fall into
    each of these intersecting patterns.
  • 13:30 - 13:34
    And I’ll give you a very quick
    metaphor here. The metaphor is:
  • 13:34 - 13:38
    imagine that you have 2 dark rooms and you
    want to assess the size of those 2 rooms
  • 13:38 - 13:42
    – which room is larger? And the only
    tool that you have to assess the size
  • 13:42 - 13:46
    of those rooms is a handful of little
    rubber balls. The little rubber balls
  • 13:46 - 13:50
    have a property that when they hit each
    other they make a sound. makes CLICK sound
  • 13:50 - 13:53
    So we throw the balls into the first
    room and we listen, and we hear
  • 13:53 - 13:57
    makes several CLICK sounds. We
    collect the balls, go to the second room,
  • 13:57 - 14:00
    throw them with equal force – imagining
    a spherical cow of uniform density!
  • 14:00 - 14:04
    We throw the balls into the second
    room with equal force and we hear
  • 14:04 - 14:08
    makes one CLICK sound
    So which room is larger?
  • 14:08 - 14:12
    The second room, because we hear fewer
    collisions, right? Well, the estimation,
  • 14:12 - 14:16
    the toy example I gave in the previous
    slide is the mathematical formalization
  • 14:16 - 14:20
    of the intuition that fewer
    collisions mean a larger space.
  • 14:20 - 14:23
    And so what we’re doing here is
    laying out the pattern of collisions.
  • 14:23 - 14:27
    Not just the collisions, the pairwise
    collisions, but the three-way and
  • 14:27 - 14:31
    four-way collisions. And that
    allows us to make the estimate
  • 14:31 - 14:37
    that was shown in the bar graph of
    the light part of each of the bars. So
  • 14:37 - 14:41
    we can come back to our conclusion and put
    a confidence interval on the estimates.
  • 14:41 - 14:46
    And the confidence intervals are shown
    there. Now I’m gonna move through this
  • 14:46 - 14:51
    somewhat more quickly to get to the end of
    the talk but I wanna put up one more slide
  • 14:51 - 14:56
    that was used in the testimony
    and that is that we divided time
  • 14:56 - 15:01
    into 16-month periods and
    compared the 16-month period of
  • 15:01 - 15:05
    General Ríos’s governance – now it’s only
    16 months ’cause we went April to July,
  • 15:05 - 15:08
    because it’s only a few days in August, a
    few days in March, so we shaved those off,
  • 15:08 - 15:12
    okay… – 16-month period of General
    Ríos’s Government and compared it
  • 15:12 - 15:17
    to several periods before and after. And
    I think that the key observation here
  • 15:17 - 15:22
    is that the rate of killing
    against indigenous people
  • 15:22 - 15:27
    is substantially higher done under General
    Ríos’s Government than under previous
  • 15:27 - 15:33
    or succeeding governments. But more
    importantly the ratio between the two,
  • 15:33 - 15:38
    the relative risk of being killed as an
    indigenous person, was at its peak
  • 15:38 - 15:43
    during the government of General Ríos.
  • 15:43 - 15:47
    Have we proven genocide? No.
  • 15:47 - 15:50
    This is evidence consistent with the
    hypothesis that acts of genocide
  • 15:50 - 15:54
    were committed. The finding of genocide
    is a legal finding, not so much
  • 15:54 - 15:59
    a scientific one. So as scientists,
    our job is to provide evidence that
  • 15:59 - 16:03
    the finders of fact – the judges in this
    case – can use in their determination.
  • 16:03 - 16:05
    This is evidence consistent
    with that hypothesis.
  • 16:05 - 16:08
    Were this evidence otherwise, as
    scientists we would say we would
  • 16:08 - 16:11
    reject the hypothesis that genocide was
    committed. However, with this evidence
  • 16:11 - 16:15
    we find that the evidence,
    the data is consistent with
  • 16:15 - 16:18
    the prosecution’s hypothesis.
  • 16:18 - 16:25
    So, it worked!
  • 16:25 - 16:29
    Ríos Montt was convicted on
    genocide charges. applause
  • 16:29 - 16:31
    You can clap!
    applause
  • 16:31 - 16:36
    applause
  • 16:36 - 16:39
    For a week!
    mumbled, surprised laughter
  • 16:39 - 16:42
    Then the Constitutional Court intervened,
  • 16:42 - 16:45
    there I know a couple of experts on
    Guatemala here in the audience
  • 16:45 - 16:48
    who can tell you more about why that
    happened and exactly what happened.
  • 16:48 - 16:53
    However, the Constitutional
    Court ordered a new trial,
  • 16:53 - 16:59
    which is at this time scheduled
    for the very beginning of 2015.
  • 16:59 - 17:03
    And I look forward to testifying again,
  • 17:03 - 17:07
    and again, and again, and again!
  • 17:07 - 17:13
    applause
  • 17:13 - 17:17
    Look, but I wanna come back to this point.
    Because as a bunch of technologists…
  • 17:17 - 17:22
    – there is a lot of folks who really like
    technology here, I really like it too!
  • 17:22 - 17:26
    Technology doesn’t get us to science
    – you have to have science
  • 17:26 - 17:29
    to get you to science. Technology helps
    you organize the data. It helps you do
  • 17:29 - 17:32
    all kinds of extremely great and cool
    things without which we wouldn’t be able
  • 17:32 - 17:36
    to even do the science. But you
    can’t have just technology!
  • 17:36 - 17:41
    You can’t just have a bunch of data
    and make conclusions. That’s naive,
  • 17:41 - 17:45
    and you will get the wrong conclusions.
    ‘The point of rigorous statistics is
  • 17:45 - 17:48
    to be right’, and there is a little bit of
    a caveat there – or to at least know
  • 17:48 - 17:52
    how uncertain you are. Statistics is often
    called the ‘Science of Uncertainty’.
  • 17:52 - 17:56
    That is actually my favorite
    definition of it. So,
  • 17:56 - 18:02
    I’m going to assume that we
    care about getting it right.
  • 18:02 - 18:05
    No one laughed, that’s good.
  • 18:05 - 18:09
    Not everyone does, to my distress.
  • 18:09 - 18:11
    So if you only have some of the data
  • 18:11 - 18:15
    – and I will argue that we always
    only have some of the data –
  • 18:15 - 18:20
    you need some kind of model that will tell
    you the relationship between your data
  • 18:20 - 18:24
    and the real world.
    Statisticians call that an inference.
  • 18:24 - 18:26
    In order to get from here to there
    you’re gonna need some kind of
  • 18:26 - 18:30
    probability model that tells you
    why your data is like the world,
  • 18:30 - 18:34
    or in what sense you have to tweet,
    twiddle and do algebra with your data
  • 18:34 - 18:39
    to get from what you can
    observe to what is actually true.
  • 18:39 - 18:43
    And statistics is about comparisons.
    Yeah, we get a big number and
  • 18:43 - 18:46
    journalists love the big number; but
    it’s really about these relationships
  • 18:46 - 18:51
    and patterns! So to get those
    relationships and patterns,
  • 18:51 - 18:54
    in order for them to be right, in order
    for our answer to be correct,
  • 18:54 - 18:57
    every one of the estimates we make
    for every point in the pattern
  • 18:57 - 19:02
    has to be right. It’s a hard
    problem. It’s a hard problem.
  • 19:02 - 19:05
    And what I worry about is that
    we have come into this world
  • 19:05 - 19:09
    in which people throw the notion of Big
    Data around as though the data allows us
  • 19:09 - 19:14
    to make an end-run around problems
    of sampling and modeling. It doesn’t.
  • 19:14 - 19:19
    So as technologist, the reason I’m,
    you know, ranting at you guys about it
  • 19:19 - 19:25
    is that it’s very tempting to have a lot
    of data and think you have an answer!
  • 19:25 - 19:31
    And it’s even more tempting because
    in industry context you might be right.
  • 19:31 - 19:37
    Not so much in Human Rights, not so
    much. Violence is a hidden process.
  • 19:37 - 19:40
    The people who commit violence have
    an enormous commitment to hiding it,
  • 19:40 - 19:44
    distorting it, explaining it in different
    ways. All of those things dramatically
  • 19:44 - 19:48
    affect the information that is produced
    from the violence that we’re going to use
  • 19:48 - 19:54
    to do our analysis. So we usually
    don’t know what we don’t know
  • 19:54 - 19:58
    in Human Rights data collection.
    And that means that we don’t know
  • 19:58 - 20:04
    if what we don’t know is systematically
    different from what we do know.
  • 20:04 - 20:06
    Maybe we know about all the lawyers
    and we don’t know about the people
  • 20:06 - 20:10
    in the countryside. Maybe we know
    about all the indigenous people
  • 20:10 - 20:14
    and not the non-indigenous people.
    If that were true, the argument
  • 20:14 - 20:18
    that I just made would be merely
    an artifact of the reporting process
  • 20:18 - 20:22
    rather than some true analysis. Now
    we did the estimations why I believe
  • 20:22 - 20:25
    we can reject that critique, but that’s
    what we have to worry about.
  • 20:25 - 20:29
    And let’s go back to the Venn diagram
    and say: which of these is accurate?
  • 20:29 - 20:33
    It’s not just for one of the
    points in our pattern analysis.
  • 20:33 - 20:36
    The problem is that we’re
    going to compare things.
  • 20:36 - 20:41
    As in Peru where we compared killings
    committed by the Peruvian army against
  • 20:41 - 20:45
    killings committed by the Maoist Guerillas
    with the Sendero Luminoso. And we found
  • 20:45 - 20:51
    there that in fact we knew very little
    about what the Sendero Luminoso had done.
  • 20:51 - 20:56
    Whereas we knew almost everything
    what the Peruvian army had done.
  • 20:56 - 20:58
    This is called the coverage rate.
    The rate between what we know and
  • 20:58 - 21:03
    what we don’t know. And
    raw data, however big,
  • 21:03 - 21:08
    does not get us to patterns.
    And here is a bunch of…
  • 21:08 - 21:12
    kinds of raw data that I’ve used
    and that I really enjoy using.
  • 21:12 - 21:14
    You know – truth commission testimonies,
    UN investigations, press articles,
  • 21:14 - 21:18
    SMS messages, crowdsourcing, NGO
    documentation, social media feeds,
  • 21:18 - 21:21
    perpetrator records, government archives,
    state agency registries – I know those
  • 21:21 - 21:24
    sound all the same but they actually
    turn out to be slightly different.
  • 21:24 - 21:28
    Happy to talk in tedious detail! Refugee
    Camp records, any non-random sample.
  • 21:28 - 21:32
    All of those are gonna take
    some kind of probability model
  • 21:32 - 21:36
    and we don’t have that many
    probability models to use. So
  • 21:36 - 21:40
    raw data is great for cases – but
    it doesn’t get you to patterns.
  • 21:40 - 21:45
    And patterns – again – patterns are
    the thing that allow us to do analysis.
  • 21:45 - 21:49
    They are the thing… the patterns are what
    get us to something that we can use
  • 21:49 - 21:54
    to help prosecutors, advocates and the…
  • 21:54 - 21:56
    and the victims themselves.
  • 21:56 - 22:01
    I gave a version of this talk, a
    much earlier version of this talk
  • 22:01 - 22:05
    several years ago in Medellín, Columbia.
    I’ve worked a lot in Columbia,
  • 22:05 - 22:08
    it’s really… it’s a great place to
    work. There’s really terrific
  • 22:08 - 22:14
    Victims Rights groups there.
    And a woman from a township,
  • 22:14 - 22:17
    smaller than a county, near to Medellín
    came up to me after the talk and she said:
  • 22:17 - 22:21
    “You know, a lot of people… you
    know I’m a Human Rights activist,
  • 22:21 - 22:25
    my job is to collect data, I tell stories
    about people who have suffered.
  • 22:25 - 22:28
    But there are people in my
    village I know who have had
  • 22:28 - 22:33
    people in their families disappeared and
    they’re never gonna talk about, ever.
  • 22:33 - 22:38
    We’re never going to be able to use
    their names, because they are afraid.”
  • 22:38 - 22:45
    We can’t name the victims. At
    least we’d better count them.
  • 22:45 - 22:50
    So about that counting: there’s
    3 ways to do it right. You can have
  • 22:50 - 22:54
    a perfect census – you can have all the
    data. Yeah it’s nice, good luck with that.
  • 22:54 - 22:59
    You can have a random sample
    of the population - that’s hard!
  • 22:59 - 23:03
    Sometimes doable but very hard.
    In my experience we rarely interview
  • 23:03 - 23:07
    victims of homicide, very rarely.
    Laughing
  • 23:07 - 23:10
    And that means there’s a complicated
    probability relationship between
  • 23:10 - 23:14
    the person you sampled, the interview
    and the death that they talk to you about.
  • 23:14 - 23:17
    Or you can do some kind of posterior
    modeling of the sampling process which is…
  • 23:17 - 23:21
    which is in essence what
    I proposed in the earlier slide.
  • 23:21 - 23:25
    So what can we do with raw data,
    guys? We can collect a bunch of…
  • 23:25 - 23:29
    We can say that a case exists. Ok
    – that’s actually important! We can say:
  • 23:29 - 23:34
    “Something happened” with raw data. We can
    say: “We know something about that case".
  • 23:34 - 23:38
    We can say: “There were 100 victims
    in that case or at least 100 victims
  • 23:38 - 23:42
    in that case”, if we can name 100 people.
  • 23:42 - 23:46
    But we can’t do comparisons: “This
    is the biggest massacre this year”.
  • 23:46 - 23:48
    We don’t really know. Because we
    don’t know about that massacres
  • 23:48 - 23:54
    we don’t know about. No patterns. Don’t
    talk about the hot spot of violence.
  • 23:54 - 23:59
    No, we don’t know that. Happy to talk
    more about that if we gather after,
  • 23:59 - 24:06
    but I wanna come to a close here with
    the importance of getting it right.
  • 24:06 - 24:11
    I’ve talked about one case today. This
    is another case, the case of this man:
  • 24:11 - 24:16
    Edgar Fernando García. Mr. García was
    a student Labor leader in Guatemala
  • 24:16 - 24:20
    early in the 1980s. He left
    his office in February 1984
  • 24:20 - 24:24
    – did not come home. People reported
    later that they saw someone
  • 24:24 - 24:29
    shoving Mr. García into a
    vehicle and driving away.
  • 24:29 - 24:34
    His widow became a very important
    Human Rights activist in Guatemala
  • 24:34 - 24:39
    and now she’s a very important, and
    in my opinion impressive politician.
  • 24:39 - 24:42
    And there’s her infant daughter. She
    continued to struggle to find out
  • 24:42 - 24:46
    what had happened to
    Mr. García for decades.
  • 24:46 - 24:50
    And in 2006 documents came to light
    in the National Archives of the…
  • 24:50 - 24:54
    excuse me, the Historical Archives
    of the national Police, showing that
  • 24:54 - 24:59
    the Police had realized an operation
    in the area of Mr. García’s office
  • 24:59 - 25:02
    and it was very likely that
    they had disappeared him.
  • 25:02 - 25:07
    These 2 guys up here in the upper
    right were Police officers in that area;
  • 25:07 - 25:11
    they were arrested, charged with the
    disappearance of Mister García and
  • 25:11 - 25:16
    convicted. Part of the evidence used to
    convict them was communications meta data
  • 25:16 - 25:20
    showing that documents
    flowed through the archive.
  • 25:20 - 25:24
    I mean paper communications! We coded
    it by hand. We went through and read
  • 25:24 - 25:28
    the ‘From’ and ‘To’ lines
    from every Memo. And
  • 25:28 - 25:34
    they were convicted in 2010
    and after that conviction
  • 25:34 - 25:39
    Mr. García’s infant daughter – now
    a grown woman – was clearly joyful.
  • 25:39 - 25:43
    Justice brings closure to a family
    that never knows when to start talking
  • 25:43 - 25:48
    about someone in the past tense.
    Perhaps even more powerfully:
  • 25:48 - 25:52
    those guys’ grand boss, their boss's
    boss, Colonel Héctor Bol de la Cruz,
  • 25:52 - 25:58
    this man here, was convicted
    of Mr. García’s disappearance
  • 25:58 - 26:02
    in September this year [2013].
    applause
  • 26:02 - 26:08
    applause
  • 26:08 - 26:11
    I don’t know if any of you have
    ever been dissident students,
  • 26:11 - 26:15
    but if you’ve been dissident students
    demonstrating in the street think about
  • 26:15 - 26:19
    how you would feel if your friends
    and comrades were disappeared,
  • 26:19 - 26:23
    and take a long look at Colonel Bol
    de la Cruz. Here is the rest of the stuff
  • 26:23 - 26:26
    that we will talk about if we gather
    afterwards. Thank you very much
  • 26:26 - 26:29
    for your attention. I really
    have enjoyed CCC.
  • 26:29 - 26:36
    applause
  • 26:36 - 26:47
    Subtitles created by c3subtitles.de
    in the year 2016. Join and help us!

Title:
Video Language:
English
  • Text (nicht Timing) passt bis 14:58.

  • Transkript ist komplett bis auf paar kleine fehlende Worte. Timing ist maschinell erstellt + Zeilenumbrüche entfernt.

English subtitles

Revisions