< Return to Video

Whitney Merrill: Predicting Crime in a Big Data World

  • 0:00 - 0:10
    32c3 preroll music
  • 0:10 - 0:14
    Angel: I introduce Whitney Merrill.
    She is an attorney in the US
  • 0:14 - 0:17
    and she just recently, actually
    last week, graduated
  • 0:17 - 0:21
    to her CS masters in Illinois.
  • 0:21 - 0:27
    applause
  • 0:27 - 0:30
    Angel: Without further ado:
    ‘Predicting Crime In A Big Data World’.
  • 0:30 - 0:33
    cautious applause
  • 0:33 - 0:37
    Whitney Merrill: Hi everyone.
    Thank you so much for coming.
  • 0:37 - 0:41
    I know it´s been a exhausting Congress,
    so I appreciate you guys coming
  • 0:41 - 0:45
    to hear me talk about Big
    Data and Crime Prediction.
  • 0:45 - 0:49
    This is kind of a hobby of mine, I,
  • 0:49 - 0:53
    in my last semester at Illinois,
    decided to poke around
  • 0:53 - 0:57
    what´s currently happening, how these
    algorithms are being used and kind of
  • 0:57 - 1:00
    figure out what kind of information can be
    gathered. So, I have about 30 minutes
  • 1:00 - 1:05
    with you guys. I´m gonna do a broad
    overview of the types of programs.
  • 1:05 - 1:10
    I´m gonna talk about what Predictive
    Policing is, the data used,
  • 1:10 - 1:14
    similar systems in other areas
    where predictive algorithms are
  • 1:14 - 1:19
    trying to better society,
    current uses in policing.
  • 1:19 - 1:22
    I´m gonna talk a little bit about their
    effectiveness and then give you
  • 1:22 - 1:26
    some final thoughts. So, imagine,
  • 1:26 - 1:30
    in the very near future a Police
    officer is walking down the street
  • 1:30 - 1:34
    wearing a camera on her collar.
    In her ear is a feed of information
  • 1:34 - 1:39
    about the people and cars she passes
    alerting her to individuals and cars
  • 1:39 - 1:43
    that might fit a particular crime
    or profile for a criminal.
  • 1:43 - 1:48
    Early in the day she examined a
    map highlighting hotspots for crime.
  • 1:48 - 1:52
    In the area she´s been set to patrol
    the predictive policing software
  • 1:52 - 1:58
    indicates that there is an 82%
    chance of burglary at 2 pm,
  • 1:58 - 2:02
    and it´s currently 2:10 pm.
    As she passes one individual
  • 2:02 - 2:06
    her camera captures the
    individual´s face, runs it through
  • 2:06 - 2:10
    a coordinated Police database - all of the
    Police departments that use this database
  • 2:10 - 2:15
    are sharing information. Facial
    recognition software indicates that
  • 2:15 - 2:20
    the person is Bobby Burglar who was
    previously convicted of burglary,
  • 2:20 - 2:25
    was recently released and is now currently
    on patrole. The voice in her ear whispers:
  • 2:25 - 2:30
    50 percent likely to commit a crime.
    Can she stop and search him?
  • 2:30 - 2:33
    Should she chat him up?
    Should see how he acts?
  • 2:33 - 2:37
    Does she need additional information
    to stop and detain him?
  • 2:37 - 2:41
    And does it matter that he´s
    carrying a large duffle bag?
  • 2:41 - 2:46
    Did the algorithm take this into account
    or did it just look at his face?
  • 2:46 - 2:50
    What information was being
    collected at the time the algorithm
  • 2:50 - 2:55
    chose to say 50% to provide
    the final analysis?
  • 2:55 - 2:58
    So, another thought I´m gonna
    have you guys think about as I go
  • 2:58 - 3:02
    through this presentation, is this
    quote that is more favorable
  • 3:02 - 3:06
    towards Police algorithms, which is:
    “As people become data plots
  • 3:06 - 3:10
    and probability scores, law enforcement
    officials and politicians alike
  • 3:10 - 3:17
    can point and say: ‘Technology is void of
    the racist, profiling bias of humans.’”
  • 3:17 - 3:21
    Is that true? Well, they probably
    will point and say that,
  • 3:21 - 3:25
    but is it actually void of
    racist, profiling humans?
  • 3:25 - 3:28
    And I´m gonna talk about that as well.
  • 3:28 - 3:33
    So, Predictive Policing explained.
    Who and what?
  • 3:33 - 3:36
    First of all, Predictive Policing
    actually isn´t new. All we´re doing
  • 3:36 - 3:41
    is adding technology, doing better,
    faster aggregation of data.
  • 3:41 - 3:47
    Analysts in Police departments have been
    doing this by hand for decades.
  • 3:47 - 3:51
    These techniques are used to create
    profiles that accurately match
  • 3:51 - 3:56
    likely offenders with specific past
    crimes. So, there´s individual targeting
  • 3:56 - 3:59
    and then we have location-based
    targeting. The location-based,
  • 3:59 - 4:05
    the goal is to help Police
    forces deploy their resources
  • 4:05 - 4:10
    in a correct manner, in an efficient
    manner. They can be as simple
  • 4:10 - 4:14
    as recommending that general crime
    may happen in a particular area,
  • 4:14 - 4:19
    or specifically, what type of crime will
    happen in a one-block-radius.
  • 4:19 - 4:24
    They take into account the time
    of day, the recent data collected
  • 4:24 - 4:30
    and when in the year it´s happening
    as well as weather etc.
  • 4:30 - 4:34
    So, another really quick thing worth
    going over, cause not everyone
  • 4:34 - 4:39
    is familiar with machine learning.
    This is a very basic breakdown
  • 4:39 - 4:43
    of training an algorithm on a data set.
  • 4:43 - 4:46
    You collect it from many different
    sources, you put it all together,
  • 4:46 - 4:51
    you clean it up, you split it into 3 sets:
    a training set, a validation set
  • 4:51 - 4:56
    and a test set. The training set is
    what is going to develop the rules
  • 4:56 - 5:01
    in which it´s going to kind of
    determine the final outcome.
  • 5:01 - 5:05
    You´re gonna use a validation
    set to optimize it and finally
  • 5:05 - 5:10
    apply this to establish
    a confidence level.
  • 5:10 - 5:15
    There you´ll set a support level where
    you say you need a certain amount of data
  • 5:15 - 5:20
    to determine whether or not the
    algorithm has enough information
  • 5:20 - 5:24
    to kind of make a prediction.
    So, rules with a low support level
  • 5:24 - 5:29
    are less likely to be statistically
    significant and the confidence level
  • 5:29 - 5:34
    in the end is basically if there´s
    an 85% confidence level
  • 5:34 - 5:40
    that means there´s an 85% chance that the
    suspect, e.g. meeting the rule in question,
  • 5:40 - 5:45
    is engaged in criminal conduct.
    So, what does this mean? Well,
  • 5:45 - 5:50
    it encourages collection and hoarding
    of data about crimes and individuals.
  • 5:50 - 5:53
    Because you want as much information
    as possible so that you detect
  • 5:53 - 5:56
    even the less likely scenarios.
  • 5:56 - 6:00
    Information sharing is also
    encouraged because it´s easier,
  • 6:00 - 6:04
    it´s done by third parties, or even
    what are called fourth parties
  • 6:04 - 6:08
    and shared amongst departments.
    And here, your criminal data again
  • 6:08 - 6:11
    was being done by analysts in Police
    departments for decades, but
  • 6:11 - 6:14
    the information sharing and the amount
    of information they could aggregate
  • 6:14 - 6:17
    was just significantly more difficult. So,
  • 6:17 - 6:21
    what are these Predictive Policing
    algorithms and software…
  • 6:21 - 6:25
    what are they doing? Are they
    determining guilt and innocence?
  • 6:25 - 6:29
    And, unlike a thoughtcrime, they
    are not saying this person is guilty,
  • 6:29 - 6:33
    this person is innocent. It´s creating
    a probability of whether or not
  • 6:33 - 6:38
    the person has likely committed
    a crime or will likely commit a crime.
  • 6:38 - 6:41
    And it can only say something
    to the future and the past.
  • 6:41 - 6:46
    This here is a picture from
    one particular piece of software
  • 6:46 - 6:50
    provided by HunchLab; and patterns
    emerge here from past crimes
  • 6:50 - 6:58
    that can profile criminal types and
    associations, detect crime patterns etc.
  • 6:58 - 7:02
    Generally in this types of algorithms
    they are using unsupervised data,
  • 7:02 - 7:05
    that means someone is not going through
    and saying true-false, good-bad, good-bad.
  • 7:05 - 7:11
    There´s just 1) too much information and
    2) they´re trying to do clustering,
  • 7:11 - 7:15
    determine the things that are similar.
  • 7:15 - 7:20
    So, really quickly, I´m also gonna
    talk about the data that´s used.
  • 7:20 - 7:23
    There are several different types:
    Personal characteristics,
  • 7:23 - 7:28
    demographic information, activities
    of individuals, scientific data etc.
  • 7:28 - 7:33
    This comes from all sorts of sources,
    one that really shocked me, was,
  • 7:33 - 7:37
    and I´ll talk about it a little bit in the
    future, but, is the radiation detectors
  • 7:37 - 7:41
    on New York City Police are
    constantly taking in data
  • 7:41 - 7:45
    and it´s so sensitive, it can detect if
    you´ve had a recent medical treatment
  • 7:45 - 7:49
    that involves radiation. Facial
    recognition and biometrics
  • 7:49 - 7:53
    are clear here and the third-party
    doctrine – which basically says
  • 7:53 - 7:57
    in the United States that you have no
    reasonable expectation of privacy in data
  • 7:57 - 8:01
    you share with third parties –
    facilitates easy collection
  • 8:01 - 8:06
    for Police officers and Government
    officials because they can go
  • 8:06 - 8:11
    and ask for the information
    without any sort of warrant.
  • 8:11 - 8:16
    For a really great overview: a friend of
    mine, Dia, did a talk here at CCC
  • 8:16 - 8:21
    on “The architecture of a street level
    panopticon”. Does a really great overview
  • 8:21 - 8:25
    of how this type of data is collected
    on the streets. Worth checking out
  • 8:25 - 8:29
    ´cause I´m gonna gloss over
    kind of the types of data.
  • 8:29 - 8:33
    There is in the United States what
    they call Multistate Anti-Terrorism
  • 8:33 - 8:38
    Information Exchange Program which
    uses everything from credit history,
  • 8:38 - 8:42
    your concealed weapons permits,
    aircraft pilot licenses,
  • 8:42 - 8:47
    fishing licences etc. that´s searchable
    and shared amongst Police departments
  • 8:47 - 8:51
    and Government officials and this is just
    more information. So, if they can collect
  • 8:51 - 8:58
    it, they will aggregate it into a data
    base. So, what are the current uses?
  • 8:58 - 9:02
    There are many, many different
    companies currently
  • 9:02 - 9:05
    making software and marketing
    it to Police departments.
  • 9:05 - 9:08
    All of them are slightly different, have
    different features, but currently
  • 9:08 - 9:12
    it´s a competition to get clients,
    Police departments etc.
  • 9:12 - 9:16
    The more Police departments you have
    the more data sharing you can sell,
  • 9:16 - 9:21
    saying: “Oh, by enrolling you’ll now have
    x,y and z Police departments’ data
  • 9:21 - 9:27
    to access” etc. These here
    are Hitachi and HunchLab,
  • 9:27 - 9:31
    they both are hotspot targeting,
    it´s not individual targeting,
  • 9:31 - 9:35
    those are a lot rarer. And it´s actually
    being used in my home town,
  • 9:35 - 9:40
    which I´ll talk about in a little bit.
    Here, the appropriate tactics
  • 9:40 - 9:44
    are automatically displayed for officers
    when they´re entering mission areas.
  • 9:44 - 9:48
    So HunchLab will tell an officer:
    “Hey, you´re entering an area
  • 9:48 - 9:52
    where there´s gonna be burglary that you
    should keep an eye out, be aware”.
  • 9:52 - 9:58
    And this is updating in live time and
    they´re hoping it mitigates crime.
  • 9:58 - 10:01
    Here are 2 other ones, the Domain
    Awareness System was created
  • 10:01 - 10:05
    in New York City after 9/11
    in conjunction with Microsoft.
  • 10:05 - 10:10
    New York City actually makes
    money selling it to other cities
  • 10:10 - 10:16
    to use this. CCTV-cameras
    are collected, they can…
  • 10:16 - 10:21
    If they say there´s a man
    wearing a red shirt,
  • 10:21 - 10:24
    the software will look for people
    wearing red shirts and at least
  • 10:24 - 10:28
    alert Police departments to
    people that meet this description
  • 10:28 - 10:34
    walking in public in New York
    City. The other one is by IBM
  • 10:34 - 10:40
    and there are quite a few, you know, it´s
    just generally another hotspot targeting,
  • 10:40 - 10:46
    each have a few different features.
    Worth mentioning, too, is the Heat List.
  • 10:46 - 10:51
    This targeted individuals. I’m from the
    city of Chicago. I grew up in the city.
  • 10:51 - 10:55
    There are currently 420 names, when
    this came out about a year ago,
  • 10:55 - 11:00
    of individuals who are 500 times more
    likely than average to be involved
  • 11:00 - 11:05
    in violence. Individual names, passed
    around to each Police officer in Chicago.
  • 11:05 - 11:10
    They consider the rap sheet,
    disturbance calls, social network etc.
  • 11:10 - 11:16
    But one of the main things they considered
    in placing mainly young black individuals
  • 11:16 - 11:19
    on this list were known acquaintances
    and their arrest histories.
  • 11:19 - 11:23
    So if kids went to school or young
    teenagers went to school
  • 11:23 - 11:28
    with several people in a gang – and that
    individual may not even be involved
  • 11:28 - 11:32
    in a gang – they’re more likely to
    appear on the list. The list has been
  • 11:32 - 11:37
    heavily criticized for being racist,
    for not giving these children
  • 11:37 - 11:41
    or young individuals on the list
    a chance to change their history
  • 11:41 - 11:45
    because it’s being decided for them.
    They’re being told: “You are likely
  • 11:45 - 11:50
    to be a criminal, and we’re gonna
    watch you”. Officers in Chicago
  • 11:50 - 11:54
    visited these individuals would do knock
    and announce with a knock on the door
  • 11:54 - 11:58
    and say: “Hi, I’m here, like just
    checking up what are you up to”.
  • 11:58 - 12:02
    Which you don’t need any special
    suspicion to do. But it’s, you know,
  • 12:02 - 12:07
    kind of a harassment that
    might cause a feedback,
  • 12:07 - 12:11
    back into the data collected.
  • 12:11 - 12:15
    This is PRECOBS. It’s currently
    used here in Hamburg.
  • 12:15 - 12:19
    They actually went to Chicago and
    visited the Chicago Police Department
  • 12:19 - 12:24
    to learn about Predictive Policing
    tactics in Chicago to implement it
  • 12:24 - 12:30
    throughout Germany, Hamburg and Berlin.
  • 12:30 - 12:34
    It’s used to generally
    forecast repeat-offenses.
  • 12:34 - 12:40
    Again, when training data sets you need
    enough data points to predict crime.
  • 12:40 - 12:44
    So crimes that are less likely to
    happen or happen very rarely:
  • 12:44 - 12:48
    much harder to predict. Crimes that
    aren’t reported: much harder to predict.
  • 12:48 - 12:52
    So a lot of these software…
    like pieces of software
  • 12:52 - 12:58
    rely on algorithms that are hoping
    that there’s a same sort of picture,
  • 12:58 - 13:03
    that they can predict: where and when
    and what type of crime will happen.
  • 13:03 - 13:07
    PRECOBS is actually a term with a plan
  • 13:07 - 13:11
    – the movie ‘Minority Report’, if you’re
    familiar with it, it’s the 3 psychics
  • 13:11 - 13:15
    who predict crimes
    before they happen.
  • 13:15 - 13:19
    So there’re other, similar systems
    in the world that are being used
  • 13:19 - 13:23
    to predict whether or not
    something will happen.
  • 13:23 - 13:27
    The first one is ‘Disease and Diagnosis’.
    They found that algorithms are actually
  • 13:27 - 13:34
    more likely than doctors to predict
    what disease an individual has.
  • 13:34 - 13:39
    It’s kind of shocking. The other is
    ‘Security Clearance’ in the US.
  • 13:39 - 13:44
    It allows access to classified documents.
    There’s no automatic access in the US.
  • 13:44 - 13:49
    So every person who wants to see
    some sort of secret cleared document
  • 13:49 - 13:53
    must go through this process.
    And it’s vetting individuals.
  • 13:53 - 13:57
    So it’s an opt-in process. But here
    they’re trying to predict who will
  • 13:57 - 14:01
    disclose information, who will
    break the clearance system;
  • 14:01 - 14:06
    and predict there… Here, the error rate,
    they’re probably much more comfortable
  • 14:06 - 14:09
    with a high error rate. Because they
    have so many people competing
  • 14:09 - 14:14
    for a particular job, to get
    clearance, that if they’re wrong,
  • 14:14 - 14:18
    that somebody probably won’t disclose
    information, they don’t care,
  • 14:18 - 14:22
    they just rather eliminate
    them than take the risk.
  • 14:22 - 14:28
    So I’m an attorney in the US. I have
    this urge to talk about US law.
  • 14:28 - 14:32
    It also seems to impact a lot
    of people internationally.
  • 14:32 - 14:36
    Here we’re talking about the targeting
    of individuals, not hotspots.
  • 14:36 - 14:41
    So targeting of individuals is
    not as widespread, currently.
  • 14:41 - 14:46
    However it’s happening in Chicago;
  • 14:46 - 14:49
    and other cities are considering
    implementing programs and there are grants
  • 14:49 - 14:54
    right now to encourage
    Police departments
  • 14:54 - 14:57
    to figure out target lists.
  • 14:57 - 15:01
    So in the US suspicion is based on
    the totality of the circumstances.
  • 15:01 - 15:05
    That’s the whole picture. The Police
    officer, the individual must look
  • 15:05 - 15:08
    at the whole picture of what’s happening
    before they can detain an individual.
  • 15:08 - 15:12
    It’s supposed to be a balanced
    assessment of relative weights, meaning
  • 15:12 - 15:16
    – you know – if you know that the
    person is a pastor maybe then
  • 15:16 - 15:22
    pacing in front of a liquor
    store, is not as suspicious
  • 15:22 - 15:26
    as somebody who’s been convicted
    of 3 burglaries. It has to be ‘based
  • 15:26 - 15:31
    on specific and articulable facts’. And
    the Police officers can use experience
  • 15:31 - 15:37
    and common sense to determine
    whether or not their suspicion…
  • 15:37 - 15:43
    Large amounts of networked data generally
    can provide individualized suspicion.
  • 15:43 - 15:48
    The principal components here… the
    events leading up to the stop-and-search
  • 15:48 - 15:52
    – what is the person doing right before
    they’re detained as well as the use
  • 15:52 - 15:58
    of historical facts known about that
    individual, the crime, the area
  • 15:58 - 16:02
    in which it’s happening etc.
    So it can rely on both things.
  • 16:02 - 16:07
    No court in the US has really put out
    a percentage as what Probable Cause
  • 16:07 - 16:11
    and Reasonable Suspicion. So ‘Probable
    Cause’ you need to get a warrant
  • 16:11 - 16:15
    to search and seize an individual.
    ‘Reasonable Suspicion’ is needed
  • 16:15 - 16:20
    to do stop-and-frisk in the US – stop
    an individual and question them.
  • 16:20 - 16:24
    And this is a little bit different than
    what they call ‘Consensual Encounters’,
  • 16:24 - 16:28
    where a Police officer goes up to you and
    chats you up. ‘Reasonable Suspicion’
  • 16:28 - 16:32
    – you’re actually detained. But I had
    a law professor who basically said:
  • 16:32 - 16:36
    “30%..45% seem like a really good number
  • 16:36 - 16:39
    just to show how low it really is”.You
    don’t even need to be 50% sure
  • 16:39 - 16:42
    that somebody has committed a crime.
  • 16:42 - 16:47
    So, officers can draw from their own
    experience to determine ‘Probable Cause’.
  • 16:47 - 16:51
    And the UK has a similar
    ‘Reasonable Suspicion’ standard
  • 16:51 - 16:55
    which depend on the circumstances
    of each case. So,
  • 16:55 - 16:59
    I’m not as familiar with UK law but I
    believe even that some of the analysis-run
  • 16:59 - 17:03
    ‘Reasonable Suspicion’ is similar.
  • 17:03 - 17:07
    Is this like a black box?
    So, I threw this slide in
  • 17:07 - 17:11
    for those who are interested
    in comparing this US law.
  • 17:11 - 17:15
    Generally a dog sniff in the US
    falls under a particular set
  • 17:15 - 17:20
    of legal history which is: a
    dog can go up, sniff for dogs,
  • 17:20 - 17:24
    alert and that is completely okay.
  • 17:24 - 17:28
    And the Police officers can use that
    data to detain and further search
  • 17:28 - 17:34
    an individual. So is an algorithm similar
    to the dog which is kind of a black box?
  • 17:34 - 17:37
    Information goes out, it’s processed,
    information comes out and
  • 17:37 - 17:43
    a prediction is made.
    Police rely on the ‘Good Faith’
  • 17:43 - 17:49
    in ‘Totality of the Circumstances’
    to make their decision. So there’s
  • 17:49 - 17:54
    really no… if they’re
    relying on the algorithm
  • 17:54 - 17:57
    and think in that situation that
    everything’s okay we might reach
  • 17:57 - 18:02
    a level of ‘Reasonable Suspicion’ where
    the individual can now pat down
  • 18:02 - 18:08
    the person he’s decided on the street
    or the algorithm has alerted to. So,
  • 18:08 - 18:13
    the big question is, you know, “Could the
    officer consult predictive software apps
  • 18:13 - 18:19
    in any individual analysis. Could he
    say: ‘60% likely to commit a crime’”.
  • 18:19 - 18:24
    In my hypo: Does that
    mean that the person
  • 18:24 - 18:29
    without looking at anything
    else detain that individual.
  • 18:29 - 18:34
    And the answer is “Probably not”. One:
    predictive Policing algorithms just
  • 18:34 - 18:38
    can not take in the Totality of the
    Circumstances. They have to be
  • 18:38 - 18:43
    frequently updated, there are
    things that are happening that
  • 18:43 - 18:46
    the algorithm possibly could
    not have taken into account.
  • 18:46 - 18:49
    The problem here is
    that the algorithm itself,
  • 18:49 - 18:52
    the prediction itself becomes part
    of Totality of the Circumstances,
  • 18:52 - 18:56
    which I’m going to talk
    about a little bit more later.
  • 18:56 - 19:01
    But officers have to have Reasonable
    Suspicion before the stop occurs.
  • 19:01 - 19:05
    Retroactive justification
    is not sufficient. So,
  • 19:05 - 19:09
    the algorithm can’t just say:
    “60% likely, you detain the individual
  • 19:09 - 19:12
    and then figure out why you’ve
    detained the person”. It has to be
  • 19:12 - 19:17
    before the detention actually happens.
    And the suspicion must relate
  • 19:17 - 19:20
    to current criminal activity. The
    person must be doing something
  • 19:20 - 19:25
    to indicate criminal activity. Just
    the fact that an algorithm says,
  • 19:25 - 19:29
    based on these facts: “60%”,
    or even without articulating
  • 19:29 - 19:34
    why the algorithm has
    chosen that, isn’t enough.
  • 19:34 - 19:38
    Maybe you can see a gun
    shaped bulge in the pocket etc.
  • 19:38 - 19:43
    So, effectiveness… the
    Totality of the Circumstances,
  • 19:43 - 19:47
    can the algorithms keep up?
    Generally, probably not.
  • 19:47 - 19:51
    Missing data, not capable of
    processing this data in real time.
  • 19:51 - 19:55
    There’s no idea… the
    algorithm doesn’t know,
  • 19:55 - 19:59
    and the Police officer probably
    doesn’t know the all of the facts.
  • 19:59 - 20:03
    So the Police officer can take
    the algorithm into consideration
  • 20:03 - 20:08
    but the problem here is: Did the algorithm
    know that the individual was active
  • 20:08 - 20:13
    in the community, or was a politician, or
  • 20:13 - 20:17
    that was a personal friend of the officer
    etc. It can’t just be relied upon.
  • 20:17 - 20:23
    What if the algorithm did take into
    account that the individual was a Pastor?
  • 20:23 - 20:26
    Now that information is counted twice
    and the balancing for the Totality
  • 20:26 - 20:34
    of the Circumstances is off. Humans
    here must be the final decider.
  • 20:34 - 20:38
    What are the problems?
    Well, there’s bad underlying data,
  • 20:38 - 20:42
    there’s no transparency into
    what kind of data is being used,
  • 20:42 - 20:46
    how it was collected, how old it
    is, how often it’s been updated,
  • 20:46 - 20:51
    whether or not it’s been verified. There
    could just be noise in the training data.
  • 20:51 - 20:57
    Honestly, the data is biased. It was
    collected by individuals in the US;
  • 20:57 - 21:01
    generally there’ve been
    several studies done that
  • 21:01 - 21:05
    black, young individuals are
    stopped more often than whites.
  • 21:05 - 21:10
    And this is going to
    cause a collection bias.
  • 21:10 - 21:15
    It’s gonna be drastically disproportionate
    to the makeup of the population of cities;
  • 21:15 - 21:19
    and as more data has been collected on
    minorities, refugees in poor neighborhoods
  • 21:19 - 21:24
    it’s gonna feed back in and of course only
    have data on those groups and provide
  • 21:24 - 21:26
    feedback and say:
    “More crime is likely to
  • 21:26 - 21:28
    happen because that’s where the data
  • 21:28 - 21:32
    was collected”. So, what’s
    an acceptable error rate, well,
  • 21:32 - 21:38
    depends on the burden of proof. Harm
    is different for an opt-in system.
  • 21:38 - 21:41
    You know, what’s my harm if I don’t
    get clearance, or I don’t get the job;
  • 21:41 - 21:45
    but I’m opting in, I’m asking to
    being considered for employment.
  • 21:45 - 21:49
    In the US, what’s an error? If you
    search and find nothing, if you think
  • 21:49 - 21:54
    you have Reasonable Suspicion
    based on good faith,
  • 21:54 - 21:57
    both on the algorithm and what
    you witness, the US says that it’s
  • 21:57 - 22:01
    no 4th Amendment violation,
    even if nothing has happened.
  • 22:01 - 22:06
    It’s very low error
    false-positive rate here.
  • 22:06 - 22:09
    In Big Data, generally, and
    machine-learning it’s great!
  • 22:09 - 22:14
    Like 1% error is fantastic! But that’s
    pretty large for the number of individuals
  • 22:14 - 22:18
    stopped each day. Or who might
    be subject to these algorithms.
  • 22:18 - 22:22
    Because even though there’re only
    400 individuals on the list in Chicago
  • 22:22 - 22:25
    those individuals have been
    listed basically as targets
  • 22:25 - 22:29
    by the Chicago Police Department.
  • 22:29 - 22:34
    Other problems include database errors.
    Exclusion of evidence in the US
  • 22:34 - 22:37
    only happens when there’s gross
    negligence or systematic misconduct.
  • 22:37 - 22:42
    That’s very difficult to prove, especially
    when a lot of people view these algorithms
  • 22:42 - 22:47
    as a big box. Data goes in,
    predictions come out, everyone’s happy.
  • 22:47 - 22:53
    You rely and trust on the
    quality of IBM, HunchLab etc.
  • 22:53 - 22:57
    to provide good software.
  • 22:57 - 23:01
    Finally, some more concerns I have
    include feedback loop auditing
  • 23:01 - 23:05
    and access to data and algorithms
    and the prediction thresholds.
  • 23:05 - 23:10
    How certain must a prediction be
    – before it’s reported to the Police –
  • 23:10 - 23:13
    that the person might commit a
    crime. Or that crime might happen
  • 23:13 - 23:18
    in the individual area. If Reasonable
    Suspicion is as low as 35%,
  • 23:18 - 23:24
    and reasonable Suspicion in the US has
    been held at: That guy drives a car
  • 23:24 - 23:28
    that drug dealers like to drive,
    and he’s in the DEA database
  • 23:28 - 23:37
    as a possible drug dealer. That was
    enough to stop and search him.
  • 23:37 - 23:40
    So, are there Positives? Well, PredPol,
  • 23:40 - 23:45
    which is one of the services that
    provides Predictive Policing software,
  • 23:45 - 23:50
    says: “Since these cities have
    implemented there’s been dropping crime”.
  • 23:50 - 23:54
    In L.A. 13% reduction in
    crime, in one division.
  • 23:54 - 23:58
    There was even one day where
    they had no crime reported.
  • 23:58 - 24:05
    Santa Cruz – 25..29% reduction,
    -9% in assaults etc.
  • 24:05 - 24:10
    One: these are Police departments
    self-reporting these successes for…
  • 24:10 - 24:15
    you know, take it for what it is
    and reiterated by the people
  • 24:15 - 24:21
    selling the software. But perhaps
    it is actually reducing crime.
  • 24:21 - 24:24
    It’s kind of hard to tell because
    there’s a feedback loop.
  • 24:24 - 24:29
    Do we know that crime is really being
    reduced? Will it affect the data
  • 24:29 - 24:33
    that is collected in the future? It’s
    really hard to know. Because
  • 24:33 - 24:38
    if you send the Police officers into
    a community it’s more likely
  • 24:38 - 24:43
    that they’re going to affect that
    community and that data collection.
  • 24:43 - 24:47
    Will more crimes happen because they
    feel like the Police are harassing them?
  • 24:47 - 24:52
    It’s very likely and it’s a problem here.
  • 24:52 - 24:57
    So, some final thoughts. Predictive
    Policing programs are not going anywhere.
  • 24:57 - 25:01
    They’re only in their wheelstart.
  • 25:01 - 25:06
    And I think that more analysis, more
    transparency, more access to data
  • 25:06 - 25:11
    needs to happen around these algorithms.
    There needs to be regulation.
  • 25:11 - 25:16
    Currently, a very successful way in which
  • 25:16 - 25:19
    these companies get data is they
    buy from Third Party sources
  • 25:19 - 25:25
    and then sell it to Police departments. So
    perhaps PredPol might get information
  • 25:25 - 25:29
    from Google, Facebook, Social Media
    accounts; aggregate data themselves,
  • 25:29 - 25:32
    and then turn around and sell it to
    Police departments or provide access
  • 25:32 - 25:36
    to Police departments. And generally, the
    Courts are gonna have to begin to work out
  • 25:36 - 25:40
    how to handle this type of data.
    There’s not case law,
  • 25:40 - 25:45
    at least in the US, that really knows
    how to handle predictive algorithms
  • 25:45 - 25:49
    in determining what the analysis says.
    And so there really needs to be
  • 25:49 - 25:53
    a lot more research and
    thought put into this.
  • 25:53 - 25:56
    And one of the big things in order
    for this to actually be useful:
  • 25:56 - 26:02
    if this is a tactic that had been used
    by Police departments for decades,
  • 26:02 - 26:04
    we need to eliminate the bias in
    the data sets. Because right now
  • 26:04 - 26:09
    all that it’s doing is facilitating and
    continuing bias, set in the database.
  • 26:09 - 26:13
    And it’s incredibly difficult.
    It’s data collected by humans.
  • 26:13 - 26:18
    And it causes initial selection bias.
    Which is gonna have to stop
  • 26:18 - 26:21
    for it to be successful.
  • 26:21 - 26:26
    And perhaps these systems can cause
    implicit bias or confirmation bias,
  • 26:26 - 26:29
    e.g. Police are going to believe
    what they’ve been told.
  • 26:29 - 26:33
    So if a Police officer goes
    on duty to an area
  • 26:33 - 26:37
    and an algorithm says: “You’re
    70% likely to find a burglar
  • 26:37 - 26:41
    in this area”. Are they gonna find
    a burglar because they’ve been told:
  • 26:41 - 26:46
    “You might find a burglar”?
    And finally the US border.
  • 26:46 - 26:50
    There is no 4th Amendment
    protection at the US border.
  • 26:50 - 26:54
    It’s an exception to the warrant
    requirement. This means
  • 26:54 - 26:59
    no suspicion is needed to commit
    a search. So this data is gonna go into
  • 26:59 - 27:04
    a way to examine you when
    you cross the border.
  • 27:04 - 27:10
    And aggregate data can be used to
    refuse you entry into the US etc.
  • 27:10 - 27:14
    And I think that’s pretty much it.
    And so a few minutes for questions.
  • 27:14 - 27:24
    applause
    Thank you!
  • 27:24 - 27:27
    Herald: Thanks a lot for your talk,
    Whitney. We have about 4 minutes left
  • 27:27 - 27:32
    for questions. So please line up at
    the microphones and remember to
  • 27:32 - 27:38
    make short and easy questions.
  • 27:38 - 27:42
    Microphone No.2, please.
  • 27:42 - 27:54
    Question: Just a comment: if I want
    to run a crime organization, like,
  • 27:54 - 27:58
    I would target the PRECOBS
    here in Hamburg, maybe.
  • 27:58 - 28:01
    So I can take the crime to the scenes
  • 28:01 - 28:06
    where the PRECOBS doesn’t suspect.
  • 28:06 - 28:09
    Whitney: Possibly. And I think this is
    a big problem in getting availability
  • 28:09 - 28:13
    of data; in that there’s a good argument
    for Police departments to say:
  • 28:13 - 28:17
    “We don’t want to tell you what
    our tactics are for Policing,
  • 28:17 - 28:19
    because it might move crime”.
  • 28:19 - 28:23
    Herald: Do we have questions from
    the internet? Yes, then please,
  • 28:23 - 28:27
    one question from the internet.
  • 28:27 - 28:30
    Signal Angel: Is there evidence that data
    like the use of encrypted messaging
  • 28:30 - 28:36
    systems, encrypted emails, VPN, TOR,
    with automated request to the ISP,
  • 28:36 - 28:42
    are used to obtain real names and
    collected to contribute to the scoring?
  • 28:42 - 28:46
    Whitney: I’m not sure if that’s
    being taken into account
  • 28:46 - 28:50
    by Predictive Policing algorithms,
    or by the software being used.
  • 28:50 - 28:55
    I know that Police departments do
    take those things into consideration.
  • 28:55 - 29:01
    And considering that in the US
    Totality of the Circumstances is
  • 29:01 - 29:05
    how you evaluate suspicion. They are gonna
    take all of those things into account
  • 29:05 - 29:09
    and they actually kind of
    have to take into account.
  • 29:09 - 29:12
    Herald: Okay, microphone No.1, please.
  • 29:12 - 29:17
    Question: In your example you mentioned
    disease tracking, e.g. Google Flu Trends
  • 29:17 - 29:22
    is a good example of preventive Predictive
    Policing. Are there any examples
  • 29:22 - 29:28
    where – instead of increasing Policing
    in the lives of communities –
  • 29:28 - 29:34
    where sociologists or social workers
    are called to use predictive tools,
  • 29:34 - 29:36
    instead of more criminalization?
  • 29:36 - 29:41
    Whitney: I’m not aware if that’s…
    if Police departments are sending
  • 29:41 - 29:45
    social workers instead of Police officers.
    But that wouldn’t surprise me because
  • 29:45 - 29:50
    algorithms are being used to suspect child
    abuse. And in the US they’re gonna send
  • 29:50 - 29:53
    a social worker in regard. So I would
    not be surprised if that’s also being
  • 29:53 - 29:57
    considered. Since that’s
    part of the resources.
  • 29:57 - 29:59
    Herald: OK, so if you have
    a really short question, then
  • 29:59 - 30:01
    microphone No.2, please.
    Last question.
  • 30:01 - 30:08
    Question: Okay, thank you for the
    talk. This talk as well as few others
  • 30:08 - 30:14
    brought the thought in the debate
    about the fine-tuning that is required
  • 30:14 - 30:20
    between false positives and
    preventing crimes or terror.
  • 30:20 - 30:24
    Now, it’s a different situation
    if the Policeman is predicting,
  • 30:24 - 30:28
    or a system is predicting somebody’s
    stealing a paper from someone;
  • 30:28 - 30:32
    or someone is creating a terror attack.
  • 30:32 - 30:38
    And the justification to prevent it
  • 30:38 - 30:43
    under the expense of false positive
    is different in these cases.
  • 30:43 - 30:49
    How do you make sure that the decision
    or the fine-tuning is not going to be
  • 30:49 - 30:54
    deep down in the algorithm
    and by the programmers,
  • 30:54 - 30:59
    but rather by the customer
    – the Policemen or the authorities?
  • 30:59 - 31:03
    Whitney: I can imagine that Police
    officers are using common sense in that,
  • 31:03 - 31:06
    and their knowledge about the situation
    and even what they’re being told
  • 31:06 - 31:10
    by the algorithm. You hope
    that they’re gonna take…
  • 31:10 - 31:14
    they probably are gonna take
    terrorism to a different level
  • 31:14 - 31:17
    than a common burglary or
    a stealing of a piece of paper
  • 31:17 - 31:22
    or a non-violent crime.
    And that fine-tuning
  • 31:22 - 31:26
    is probably on a Police department
  • 31:26 - 31:29
    by Police department basis.
  • 31:29 - 31:32
    Herald: Thank you! This was Whitney
    Merrill, give a warm round of applause, please!!
  • 31:32 - 31:40
    Whitney: Thank you!
    applause
  • 31:40 - 31:43
    postroll music
  • 31:43 - 31:52
    Subtitles created by c3subtitles.de
    in the year 2016. Join and help us!
Title:
Whitney Merrill: Predicting Crime in a Big Data World
Description:

more » « less
Video Language:
English
Duration:
31:52

English subtitles

Revisions