0:00:00.310,0:00:10.240
32c3 preroll music
0:00:10.240,0:00:13.920
Angel: I introduce Whitney Merrill.[br]She is an attorney in the US
0:00:13.920,0:00:17.259
and she just recently, actually[br]last week, graduated
0:00:17.259,0:00:20.999
to her CS masters in Illinois.
0:00:20.999,0:00:27.299
applause
0:00:27.299,0:00:30.249
Angel: Without further ado:[br]‘Predicting Crime In A Big Data World’.
0:00:30.249,0:00:32.870
cautious applause
0:00:32.870,0:00:36.920
Whitney Merrill: Hi everyone.[br]Thank you so much for coming.
0:00:36.920,0:00:40.950
I know it´s been a exhausting Congress,[br]so I appreciate you guys coming
0:00:40.950,0:00:45.300
to hear me talk about Big[br]Data and Crime Prediction.
0:00:45.300,0:00:48.820
This is kind of a hobby of mine, I,
0:00:48.820,0:00:53.030
in my last semester at Illinois,[br]decided to poke around
0:00:53.030,0:00:56.850
what´s currently happening, how these[br]algorithms are being used and kind of
0:00:56.850,0:01:00.390
figure out what kind of information can be[br]gathered. So, I have about 30 minutes
0:01:00.390,0:01:04.629
with you guys. I´m gonna do a broad[br]overview of the types of programs.
0:01:04.629,0:01:10.020
I´m gonna talk about what Predictive[br]Policing is, the data used,
0:01:10.020,0:01:13.600
similar systems in other areas[br]where predictive algorithms are
0:01:13.600,0:01:19.079
trying to better society,[br]current uses in policing.
0:01:19.079,0:01:22.119
I´m gonna talk a little bit about their[br]effectiveness and then give you
0:01:22.119,0:01:26.409
some final thoughts. So, imagine,
0:01:26.409,0:01:30.310
in the very near future a Police[br]officer is walking down the street
0:01:30.310,0:01:34.389
wearing a camera on her collar.[br]In her ear is a feed of information
0:01:34.389,0:01:38.819
about the people and cars she passes[br]alerting her to individuals and cars
0:01:38.819,0:01:43.259
that might fit a particular crime[br]or profile for a criminal.
0:01:43.259,0:01:47.619
Early in the day she examined a[br]map highlighting hotspots for crime.
0:01:47.619,0:01:52.459
In the area she´s been set to patrol[br]the predictive policing software
0:01:52.459,0:01:57.590
indicates that there is an 82%[br]chance of burglary at 2 pm,
0:01:57.590,0:02:01.539
and it´s currently 2:10 pm.[br]As she passes one individual
0:02:01.539,0:02:05.549
her camera captures the[br]individual´s face, runs it through
0:02:05.549,0:02:10.399
a coordinated Police database - all of the[br]Police departments that use this database
0:02:10.399,0:02:14.680
are sharing information. Facial[br]recognition software indicates that
0:02:14.680,0:02:19.580
the person is Bobby Burglar who was[br]previously convicted of burglary,
0:02:19.580,0:02:24.790
was recently released and is now currently[br]on patrole. The voice in her ear whispers:
0:02:24.790,0:02:29.970
50 percent likely to commit a crime.[br]Can she stop and search him?
0:02:29.970,0:02:32.970
Should she chat him up?[br]Should see how he acts?
0:02:32.970,0:02:37.150
Does she need additional information[br]to stop and detain him?
0:02:37.150,0:02:40.900
And does it matter that he´s[br]carrying a large duffle bag?
0:02:40.900,0:02:45.579
Did the algorithm take this into account[br]or did it just look at his face?
0:02:45.579,0:02:49.939
What information was being[br]collected at the time the algorithm
0:02:49.939,0:02:55.259
chose to say 50% to provide[br]the final analysis?
0:02:55.259,0:02:57.930
So, another thought I´m gonna[br]have you guys think about as I go
0:02:57.930,0:03:01.540
through this presentation, is this[br]quote that is more favorable
0:03:01.540,0:03:05.870
towards Police algorithms, which is:[br]“As people become data plots
0:03:05.870,0:03:10.209
and probability scores, law enforcement[br]officials and politicians alike
0:03:10.209,0:03:16.519
can point and say: ‘Technology is void of[br]the racist, profiling bias of humans.’”
0:03:16.519,0:03:21.459
Is that true? Well, they probably[br]will point and say that,
0:03:21.459,0:03:24.860
but is it actually void of[br]racist, profiling humans?
0:03:24.860,0:03:27.849
And I´m gonna talk about that as well.
0:03:27.849,0:03:32.759
So, Predictive Policing explained.[br]Who and what?
0:03:32.759,0:03:35.620
First of all, Predictive Policing[br]actually isn´t new. All we´re doing
0:03:35.620,0:03:41.469
is adding technology, doing better,[br]faster aggregation of data.
0:03:41.469,0:03:47.200
Analysts in Police departments have been[br]doing this by hand for decades.
0:03:47.200,0:03:50.950
These techniques are used to create[br]profiles that accurately match
0:03:50.950,0:03:55.530
likely offenders with specific past[br]crimes. So, there´s individual targeting
0:03:55.530,0:03:59.489
and then we have location-based[br]targeting. The location-based,
0:03:59.489,0:04:05.010
the goal is to help Police[br]forces deploy their resources
0:04:05.010,0:04:10.230
in a correct manner, in an efficient[br]manner. They can be as simple
0:04:10.230,0:04:13.950
as recommending that general crime[br]may happen in a particular area,
0:04:13.950,0:04:19.108
or specifically, what type of crime will[br]happen in a one-block-radius.
0:04:19.108,0:04:24.050
They take into account the time[br]of day, the recent data collected
0:04:24.050,0:04:30.040
and when in the year it´s happening[br]as well as weather etc.
0:04:30.040,0:04:33.850
So, another really quick thing worth[br]going over, cause not everyone
0:04:33.850,0:04:39.090
is familiar with machine learning.[br]This is a very basic breakdown
0:04:39.090,0:04:43.069
of training an algorithm on a data set.
0:04:43.069,0:04:46.240
You collect it from many different[br]sources, you put it all together,
0:04:46.240,0:04:51.019
you clean it up, you split it into 3 sets:[br]a training set, a validation set
0:04:51.019,0:04:56.350
and a test set. The training set is[br]what is going to develop the rules
0:04:56.350,0:05:01.379
in which it´s going to kind of[br]determine the final outcome.
0:05:01.379,0:05:05.060
You´re gonna use a validation[br]set to optimize it and finally
0:05:05.060,0:05:09.729
apply this to establish[br]a confidence level.
0:05:09.729,0:05:15.349
There you´ll set a support level where[br]you say you need a certain amount of data
0:05:15.349,0:05:19.940
to determine whether or not the[br]algorithm has enough information
0:05:19.940,0:05:24.190
to kind of make a prediction.[br]So, rules with a low support level
0:05:24.190,0:05:28.759
are less likely to be statistically[br]significant and the confidence level
0:05:28.759,0:05:34.099
in the end is basically if there´s[br]an 85% confidence level
0:05:34.099,0:05:39.930
that means there´s an 85% chance that the[br]suspect, e.g. meeting the rule in question,
0:05:39.930,0:05:45.139
is engaged in criminal conduct.[br]So, what does this mean? Well,
0:05:45.139,0:05:49.590
it encourages collection and hoarding[br]of data about crimes and individuals.
0:05:49.590,0:05:52.720
Because you want as much information[br]as possible so that you detect
0:05:52.720,0:05:56.030
even the less likely scenarios.
0:05:56.030,0:05:59.729
Information sharing is also[br]encouraged because it´s easier,
0:05:59.729,0:06:04.090
it´s done by third parties, or even[br]what are called fourth parties
0:06:04.090,0:06:07.860
and shared amongst departments.[br]And here, your criminal data again
0:06:07.860,0:06:10.810
was being done by analysts in Police[br]departments for decades, but
0:06:10.810,0:06:13.660
the information sharing and the amount[br]of information they could aggregate
0:06:13.660,0:06:17.169
was just significantly more difficult. So,
0:06:17.169,0:06:21.410
what are these Predictive Policing[br]algorithms and software…
0:06:21.410,0:06:24.580
what are they doing? Are they[br]determining guilt and innocence?
0:06:24.580,0:06:29.289
And, unlike a thoughtcrime, they[br]are not saying this person is guilty,
0:06:29.289,0:06:33.289
this person is innocent. It´s creating[br]a probability of whether or not
0:06:33.289,0:06:37.800
the person has likely committed[br]a crime or will likely commit a crime.
0:06:37.800,0:06:41.030
And it can only say something[br]to the future and the past.
0:06:41.030,0:06:46.310
This here is a picture from[br]one particular piece of software
0:06:46.310,0:06:50.199
provided by HunchLab; and patterns[br]emerge here from past crimes
0:06:50.199,0:06:58.230
that can profile criminal types and[br]associations, detect crime patterns etc.
0:06:58.230,0:07:02.139
Generally in this types of algorithms[br]they are using unsupervised data,
0:07:02.139,0:07:05.479
that means someone is not going through[br]and saying true-false, good-bad, good-bad.
0:07:05.479,0:07:10.780
There´s just 1) too much information and[br]2) they´re trying to do clustering,
0:07:10.780,0:07:15.279
determine the things that are similar.
0:07:15.279,0:07:20.110
So, really quickly, I´m also gonna[br]talk about the data that´s used.
0:07:20.110,0:07:23.259
There are several different types:[br]Personal characteristics,
0:07:23.259,0:07:28.169
demographic information, activities[br]of individuals, scientific data etc.
0:07:28.169,0:07:32.690
This comes from all sorts of sources,[br]one that really shocked me, was,
0:07:32.690,0:07:36.860
and I´ll talk about it a little bit in the[br]future, but, is the radiation detectors
0:07:36.860,0:07:41.310
on New York City Police are[br]constantly taking in data
0:07:41.310,0:07:44.819
and it´s so sensitive, it can detect if[br]you´ve had a recent medical treatment
0:07:44.819,0:07:49.330
that involves radiation. Facial[br]recognition and biometrics
0:07:49.330,0:07:52.860
are clear here and the third-party[br]doctrine – which basically says
0:07:52.860,0:07:56.550
in the United States that you have no[br]reasonable expectation of privacy in data
0:07:56.550,0:08:01.490
you share with third parties –[br]facilitates easy collection
0:08:01.490,0:08:05.720
for Police officers and Government[br]officials because they can go
0:08:05.720,0:08:11.080
and ask for the information[br]without any sort of warrant.
0:08:11.080,0:08:16.259
For a really great overview: a friend of[br]mine, Dia, did a talk here at CCC
0:08:16.259,0:08:21.280
on “The architecture of a street level[br]panopticon”. Does a really great overview
0:08:21.280,0:08:25.199
of how this type of data is collected[br]on the streets. Worth checking out
0:08:25.199,0:08:29.490
´cause I´m gonna gloss over[br]kind of the types of data.
0:08:29.490,0:08:33.450
There is in the United States what[br]they call Multistate Anti-Terrorism
0:08:33.450,0:08:38.279
Information Exchange Program which[br]uses everything from credit history,
0:08:38.279,0:08:42.029
your concealed weapons permits,[br]aircraft pilot licenses,
0:08:42.029,0:08:46.800
fishing licences etc. that´s searchable[br]and shared amongst Police departments
0:08:46.800,0:08:50.530
and Government officials and this is just[br]more information. So, if they can collect
0:08:50.530,0:08:57.690
it, they will aggregate it into a data[br]base. So, what are the current uses?
0:08:57.690,0:09:01.779
There are many, many different[br]companies currently
0:09:01.779,0:09:04.950
making software and marketing[br]it to Police departments.
0:09:04.950,0:09:08.470
All of them are slightly different, have[br]different features, but currently
0:09:08.470,0:09:12.260
it´s a competition to get clients,[br]Police departments etc.
0:09:12.260,0:09:15.829
The more Police departments you have[br]the more data sharing you can sell,
0:09:15.829,0:09:21.089
saying: “Oh, by enrolling you’ll now have[br]x,y and z Police departments’ data
0:09:21.089,0:09:27.040
to access” etc. These here[br]are Hitachi and HunchLab,
0:09:27.040,0:09:31.350
they both are hotspot targeting,[br]it´s not individual targeting,
0:09:31.350,0:09:35.140
those are a lot rarer. And it´s actually[br]being used in my home town,
0:09:35.140,0:09:39.550
which I´ll talk about in a little bit.[br]Here, the appropriate tactics
0:09:39.550,0:09:44.180
are automatically displayed for officers[br]when they´re entering mission areas.
0:09:44.180,0:09:47.920
So HunchLab will tell an officer:[br]“Hey, you´re entering an area
0:09:47.920,0:09:52.180
where there´s gonna be burglary that you[br]should keep an eye out, be aware”.
0:09:52.180,0:09:58.010
And this is updating in live time and[br]they´re hoping it mitigates crime.
0:09:58.010,0:10:01.240
Here are 2 other ones, the Domain[br]Awareness System was created
0:10:01.240,0:10:05.139
in New York City after 9/11[br]in conjunction with Microsoft.
0:10:05.139,0:10:10.000
New York City actually makes[br]money selling it to other cities
0:10:10.000,0:10:16.470
to use this. CCTV-cameras[br]are collected, they can…
0:10:16.470,0:10:21.029
If they say there´s a man[br]wearing a red shirt,
0:10:21.029,0:10:24.430
the software will look for people[br]wearing red shirts and at least
0:10:24.430,0:10:28.139
alert Police departments to[br]people that meet this description
0:10:28.139,0:10:34.389
walking in public in New York[br]City. The other one is by IBM
0:10:34.389,0:10:40.139
and there are quite a few, you know, it´s[br]just generally another hotspot targeting,
0:10:40.139,0:10:45.839
each have a few different features.[br]Worth mentioning, too, is the Heat List.
0:10:45.839,0:10:50.769
This targeted individuals. I’m from the[br]city of Chicago. I grew up in the city.
0:10:50.769,0:10:55.149
There are currently 420 names, when[br]this came out about a year ago,
0:10:55.149,0:10:59.920
of individuals who are 500 times more[br]likely than average to be involved
0:10:59.920,0:11:05.230
in violence. Individual names, passed[br]around to each Police officer in Chicago.
0:11:05.230,0:11:10.029
They consider the rap sheet,[br]disturbance calls, social network etc.
0:11:10.029,0:11:15.540
But one of the main things they considered[br]in placing mainly young black individuals
0:11:15.540,0:11:19.279
on this list were known acquaintances[br]and their arrest histories.
0:11:19.279,0:11:23.279
So if kids went to school or young[br]teenagers went to school
0:11:23.279,0:11:27.880
with several people in a gang – and that[br]individual may not even be involved
0:11:27.880,0:11:32.160
in a gang – they’re more likely to[br]appear on the list. The list has been
0:11:32.160,0:11:36.829
heavily criticized for being racist,[br]for not giving these children
0:11:36.829,0:11:40.660
or young individuals on the list[br]a chance to change their history
0:11:40.660,0:11:44.510
because it’s being decided for them.[br]They’re being told: “You are likely
0:11:44.510,0:11:49.850
to be a criminal, and we’re gonna[br]watch you”. Officers in Chicago
0:11:49.850,0:11:53.550
visited these individuals would do knock[br]and announce with a knock on the door
0:11:53.550,0:11:58.029
and say: “Hi, I’m here, like just[br]checking up what are you up to”.
0:11:58.029,0:12:02.480
Which you don’t need any special[br]suspicion to do. But it’s, you know,
0:12:02.480,0:12:06.860
kind of a harassment that[br]might cause a feedback,
0:12:06.860,0:12:11.310
back into the data collected.
0:12:11.310,0:12:15.209
This is PRECOBS. It’s currently[br]used here in Hamburg.
0:12:15.209,0:12:19.100
They actually went to Chicago and[br]visited the Chicago Police Department
0:12:19.100,0:12:24.170
to learn about Predictive Policing[br]tactics in Chicago to implement it
0:12:24.170,0:12:29.729
throughout Germany, Hamburg and Berlin.
0:12:29.729,0:12:33.620
It’s used to generally[br]forecast repeat-offenses.
0:12:33.620,0:12:39.930
Again, when training data sets you need[br]enough data points to predict crime.
0:12:39.930,0:12:43.699
So crimes that are less likely to[br]happen or happen very rarely:
0:12:43.699,0:12:48.120
much harder to predict. Crimes that[br]aren’t reported: much harder to predict.
0:12:48.120,0:12:52.480
So a lot of these software…[br]like pieces of software
0:12:52.480,0:12:58.290
rely on algorithms that are hoping[br]that there’s a same sort of picture,
0:12:58.290,0:13:03.070
that they can predict: where and when[br]and what type of crime will happen.
0:13:03.070,0:13:06.890
PRECOBS is actually a term with a plan
0:13:06.890,0:13:11.240
– the movie ‘Minority Report’, if you’re[br]familiar with it, it’s the 3 psychics
0:13:11.240,0:13:15.370
who predict crimes[br]before they happen.
0:13:15.370,0:13:19.149
So there’re other, similar systems[br]in the world that are being used
0:13:19.149,0:13:22.949
to predict whether or not[br]something will happen.
0:13:22.949,0:13:27.360
The first one is ‘Disease and Diagnosis’.[br]They found that algorithms are actually
0:13:27.360,0:13:33.810
more likely than doctors to predict[br]what disease an individual has.
0:13:33.810,0:13:39.480
It’s kind of shocking. The other is[br]‘Security Clearance’ in the US.
0:13:39.480,0:13:44.240
It allows access to classified documents.[br]There’s no automatic access in the US.
0:13:44.240,0:13:48.750
So every person who wants to see[br]some sort of secret cleared document
0:13:48.750,0:13:53.089
must go through this process.[br]And it’s vetting individuals.
0:13:53.089,0:13:56.690
So it’s an opt-in process. But here[br]they’re trying to predict who will
0:13:56.690,0:14:00.550
disclose information, who will[br]break the clearance system;
0:14:00.550,0:14:05.810
and predict there… Here, the error rate,[br]they’re probably much more comfortable
0:14:05.810,0:14:09.360
with a high error rate. Because they[br]have so many people competing
0:14:09.360,0:14:13.699
for a particular job, to get[br]clearance, that if they’re wrong,
0:14:13.699,0:14:18.000
that somebody probably won’t disclose[br]information, they don’t care,
0:14:18.000,0:14:22.319
they just rather eliminate[br]them than take the risk.
0:14:22.319,0:14:27.509
So I’m an attorney in the US. I have[br]this urge to talk about US law.
0:14:27.509,0:14:32.089
It also seems to impact a lot[br]of people internationally.
0:14:32.089,0:14:36.360
Here we’re talking about the targeting[br]of individuals, not hotspots.
0:14:36.360,0:14:40.810
So targeting of individuals is[br]not as widespread, currently.
0:14:40.810,0:14:45.579
However it’s happening in Chicago;
0:14:45.579,0:14:49.259
and other cities are considering[br]implementing programs and there are grants
0:14:49.259,0:14:53.730
right now to encourage[br]Police departments
0:14:53.730,0:14:57.110
to figure out target lists.
0:14:57.110,0:15:00.699
So in the US suspicion is based on[br]the totality of the circumstances.
0:15:00.699,0:15:04.730
That’s the whole picture. The Police[br]officer, the individual must look
0:15:04.730,0:15:08.269
at the whole picture of what’s happening[br]before they can detain an individual.
0:15:08.269,0:15:11.920
It’s supposed to be a balanced[br]assessment of relative weights, meaning
0:15:11.920,0:15:16.399
– you know – if you know that the[br]person is a pastor maybe then
0:15:16.399,0:15:21.720
pacing in front of a liquor[br]store, is not as suspicious
0:15:21.720,0:15:26.370
as somebody who’s been convicted[br]of 3 burglaries. It has to be ‘based
0:15:26.370,0:15:31.430
on specific and articulable facts’. And[br]the Police officers can use experience
0:15:31.430,0:15:37.470
and common sense to determine[br]whether or not their suspicion…
0:15:37.470,0:15:42.920
Large amounts of networked data generally[br]can provide individualized suspicion.
0:15:42.920,0:15:48.410
The principal components here… the[br]events leading up to the stop-and-search
0:15:48.410,0:15:52.319
– what is the person doing right before[br]they’re detained as well as the use
0:15:52.319,0:15:57.709
of historical facts known about that[br]individual, the crime, the area
0:15:57.709,0:16:02.329
in which it’s happening etc.[br]So it can rely on both things.
0:16:02.329,0:16:06.819
No court in the US has really put out[br]a percentage as what Probable Cause
0:16:06.819,0:16:11.089
and Reasonable Suspicion. So ‘Probable[br]Cause’ you need to get a warrant
0:16:11.089,0:16:14.639
to search and seize an individual.[br]‘Reasonable Suspicion’ is needed
0:16:14.639,0:16:20.329
to do stop-and-frisk in the US – stop[br]an individual and question them.
0:16:20.329,0:16:24.100
And this is a little bit different than[br]what they call ‘Consensual Encounters’,
0:16:24.100,0:16:27.680
where a Police officer goes up to you and[br]chats you up. ‘Reasonable Suspicion’
0:16:27.680,0:16:32.029
– you’re actually detained. But I had[br]a law professor who basically said:
0:16:32.029,0:16:35.730
“30%..45% seem like a really good number
0:16:35.730,0:16:39.290
just to show how low it really is”.You[br]don’t even need to be 50% sure
0:16:39.290,0:16:42.180
that somebody has committed a crime.
0:16:42.180,0:16:47.459
So, officers can draw from their own[br]experience to determine ‘Probable Cause’.
0:16:47.459,0:16:51.350
And the UK has a similar[br]‘Reasonable Suspicion’ standard
0:16:51.350,0:16:55.010
which depend on the circumstances[br]of each case. So,
0:16:55.010,0:16:58.819
I’m not as familiar with UK law but I[br]believe even that some of the analysis-run
0:16:58.819,0:17:03.480
‘Reasonable Suspicion’ is similar.
0:17:03.480,0:17:07.339
Is this like a black box?[br]So, I threw this slide in
0:17:07.339,0:17:10.960
for those who are interested[br]in comparing this US law.
0:17:10.960,0:17:15.280
Generally a dog sniff in the US[br]falls under a particular set
0:17:15.280,0:17:20.140
of legal history which is: a[br]dog can go up, sniff for dogs,
0:17:20.140,0:17:24.220
alert and that is completely okay.
0:17:24.220,0:17:28.099
And the Police officers can use that[br]data to detain and further search
0:17:28.099,0:17:33.520
an individual. So is an algorithm similar[br]to the dog which is kind of a black box?
0:17:33.520,0:17:37.030
Information goes out, it’s processed,[br]information comes out and
0:17:37.030,0:17:42.720
a prediction is made.[br]Police rely on the ‘Good Faith’
0:17:42.720,0:17:48.780
in ‘Totality of the Circumstances’[br]to make their decision. So there’s
0:17:48.780,0:17:53.970
really no… if they’re[br]relying on the algorithm
0:17:53.970,0:17:57.230
and think in that situation that[br]everything’s okay we might reach
0:17:57.230,0:18:01.980
a level of ‘Reasonable Suspicion’ where[br]the individual can now pat down
0:18:01.980,0:18:08.470
the person he’s decided on the street[br]or the algorithm has alerted to. So,
0:18:08.470,0:18:13.220
the big question is, you know, “Could the[br]officer consult predictive software apps
0:18:13.220,0:18:18.610
in any individual analysis. Could he[br]say: ‘60% likely to commit a crime’”.
0:18:18.610,0:18:24.180
In my hypo: Does that[br]mean that the person
0:18:24.180,0:18:29.160
without looking at anything[br]else detain that individual.
0:18:29.160,0:18:33.810
And the answer is “Probably not”. One:[br]predictive Policing algorithms just
0:18:33.810,0:18:37.770
can not take in the Totality of the[br]Circumstances. They have to be
0:18:37.770,0:18:42.690
frequently updated, there are[br]things that are happening that
0:18:42.690,0:18:46.060
the algorithm possibly could[br]not have taken into account.
0:18:46.060,0:18:48.590
The problem here is[br]that the algorithm itself,
0:18:48.590,0:18:51.780
the prediction itself becomes part[br]of Totality of the Circumstances,
0:18:51.780,0:18:56.330
which I’m going to talk[br]about a little bit more later.
0:18:56.330,0:19:00.660
But officers have to have Reasonable[br]Suspicion before the stop occurs.
0:19:00.660,0:19:04.660
Retroactive justification[br]is not sufficient. So,
0:19:04.660,0:19:08.790
the algorithm can’t just say:[br]“60% likely, you detain the individual
0:19:08.790,0:19:12.130
and then figure out why you’ve[br]detained the person”. It has to be
0:19:12.130,0:19:16.570
before the detention actually happens.[br]And the suspicion must relate
0:19:16.570,0:19:19.990
to current criminal activity. The[br]person must be doing something
0:19:19.990,0:19:24.700
to indicate criminal activity. Just[br]the fact that an algorithm says,
0:19:24.700,0:19:29.440
based on these facts: “60%”,[br]or even without articulating
0:19:29.440,0:19:33.890
why the algorithm has[br]chosen that, isn’t enough.
0:19:33.890,0:19:38.380
Maybe you can see a gun[br]shaped bulge in the pocket etc.
0:19:38.380,0:19:43.160
So, effectiveness… the[br]Totality of the Circumstances,
0:19:43.160,0:19:46.720
can the algorithms keep up?[br]Generally, probably not.
0:19:46.720,0:19:50.560
Missing data, not capable of[br]processing this data in real time.
0:19:50.560,0:19:54.820
There’s no idea… the[br]algorithm doesn’t know,
0:19:54.820,0:19:58.950
and the Police officer probably[br]doesn’t know the all of the facts.
0:19:58.950,0:20:03.260
So the Police officer can take[br]the algorithm into consideration
0:20:03.260,0:20:08.130
but the problem here is: Did the algorithm[br]know that the individual was active
0:20:08.130,0:20:12.670
in the community, or was a politician, or
0:20:12.670,0:20:17.450
that was a personal friend of the officer[br]etc. It can’t just be relied upon.
0:20:17.450,0:20:22.640
What if the algorithm did take into[br]account that the individual was a Pastor?
0:20:22.640,0:20:26.180
Now that information is counted twice[br]and the balancing for the Totality
0:20:26.180,0:20:34.320
of the Circumstances is off. Humans[br]here must be the final decider.
0:20:34.320,0:20:38.040
What are the problems?[br]Well, there’s bad underlying data,
0:20:38.040,0:20:41.970
there’s no transparency into[br]what kind of data is being used,
0:20:41.970,0:20:45.720
how it was collected, how old it[br]is, how often it’s been updated,
0:20:45.720,0:20:51.010
whether or not it’s been verified. There[br]could just be noise in the training data.
0:20:51.010,0:20:57.240
Honestly, the data is biased. It was[br]collected by individuals in the US;
0:20:57.240,0:21:01.020
generally there’ve been[br]several studies done that
0:21:01.020,0:21:05.270
black, young individuals are[br]stopped more often than whites.
0:21:05.270,0:21:09.800
And this is going to[br]cause a collection bias.
0:21:09.800,0:21:14.550
It’s gonna be drastically disproportionate[br]to the makeup of the population of cities;
0:21:14.550,0:21:19.440
and as more data has been collected on[br]minorities, refugees in poor neighborhoods
0:21:19.440,0:21:23.640
it’s gonna feed back in and of course only[br]have data on those groups and provide
0:21:23.640,0:21:26.410
feedback and say:[br]“More crime is likely to
0:21:26.410,0:21:27.770
happen because that’s where the data
0:21:27.770,0:21:32.250
was collected”. So, what’s[br]an acceptable error rate, well,
0:21:32.250,0:21:37.500
depends on the burden of proof. Harm[br]is different for an opt-in system.
0:21:37.500,0:21:40.840
You know, what’s my harm if I don’t[br]get clearance, or I don’t get the job;
0:21:40.840,0:21:45.160
but I’m opting in, I’m asking to[br]being considered for employment.
0:21:45.160,0:21:49.080
In the US, what’s an error? If you[br]search and find nothing, if you think
0:21:49.080,0:21:53.630
you have Reasonable Suspicion[br]based on good faith,
0:21:53.630,0:21:57.060
both on the algorithm and what[br]you witness, the US says that it’s
0:21:57.060,0:22:00.620
no 4th Amendment violation,[br]even if nothing has happened.
0:22:00.620,0:22:05.970
It’s very low error[br]false-positive rate here.
0:22:05.970,0:22:09.140
In Big Data, generally, and[br]machine-learning it’s great!
0:22:09.140,0:22:13.550
Like 1% error is fantastic! But that’s[br]pretty large for the number of individuals
0:22:13.550,0:22:17.930
stopped each day. Or who might[br]be subject to these algorithms.
0:22:17.930,0:22:21.950
Because even though there’re only[br]400 individuals on the list in Chicago
0:22:21.950,0:22:25.210
those individuals have been[br]listed basically as targets
0:22:25.210,0:22:28.870
by the Chicago Police Department.
0:22:28.870,0:22:33.700
Other problems include database errors.[br]Exclusion of evidence in the US
0:22:33.700,0:22:37.170
only happens when there’s gross[br]negligence or systematic misconduct.
0:22:37.170,0:22:42.150
That’s very difficult to prove, especially[br]when a lot of people view these algorithms
0:22:42.150,0:22:47.360
as a big box. Data goes in,[br]predictions come out, everyone’s happy.
0:22:47.360,0:22:53.100
You rely and trust on the[br]quality of IBM, HunchLab etc.
0:22:53.100,0:22:56.730
to provide good software.
0:22:56.730,0:23:01.000
Finally, some more concerns I have[br]include feedback loop auditing
0:23:01.000,0:23:04.810
and access to data and algorithms[br]and the prediction thresholds.
0:23:04.810,0:23:09.970
How certain must a prediction be[br]– before it’s reported to the Police –
0:23:09.970,0:23:13.230
that the person might commit a[br]crime. Or that crime might happen
0:23:13.230,0:23:18.460
in the individual area. If Reasonable[br]Suspicion is as low as 35%,
0:23:18.460,0:23:23.740
and reasonable Suspicion in the US has[br]been held at: That guy drives a car
0:23:23.740,0:23:28.350
that drug dealers like to drive,[br]and he’s in the DEA database
0:23:28.350,0:23:36.550
as a possible drug dealer. That was[br]enough to stop and search him.
0:23:36.550,0:23:40.090
So, are there Positives? Well, PredPol,
0:23:40.090,0:23:44.800
which is one of the services that[br]provides Predictive Policing software,
0:23:44.800,0:23:49.650
says: “Since these cities have[br]implemented there’s been dropping crime”.
0:23:49.650,0:23:54.030
In L.A. 13% reduction in[br]crime, in one division.
0:23:54.030,0:23:57.510
There was even one day where[br]they had no crime reported.
0:23:57.510,0:24:04.550
Santa Cruz – 25..29% reduction,[br]-9% in assaults etc.
0:24:04.550,0:24:10.030
One: these are Police departments[br]self-reporting these successes for…
0:24:10.030,0:24:14.670
you know, take it for what it is[br]and reiterated by the people
0:24:14.670,0:24:20.510
selling the software. But perhaps[br]it is actually reducing crime.
0:24:20.510,0:24:24.390
It’s kind of hard to tell because[br]there’s a feedback loop.
0:24:24.390,0:24:29.200
Do we know that crime is really being[br]reduced? Will it affect the data
0:24:29.200,0:24:33.170
that is collected in the future? It’s[br]really hard to know. Because
0:24:33.170,0:24:38.330
if you send the Police officers into[br]a community it’s more likely
0:24:38.330,0:24:42.580
that they’re going to affect that[br]community and that data collection.
0:24:42.580,0:24:46.940
Will more crimes happen because they[br]feel like the Police are harassing them?
0:24:46.940,0:24:52.020
It’s very likely and it’s a problem here.
0:24:52.020,0:24:56.930
So, some final thoughts. Predictive[br]Policing programs are not going anywhere.
0:24:56.930,0:25:01.430
They’re only in their wheelstart.
0:25:01.430,0:25:06.030
And I think that more analysis, more[br]transparency, more access to data
0:25:06.030,0:25:10.560
needs to happen around these algorithms.[br]There needs to be regulation.
0:25:10.560,0:25:16.000
Currently, a very successful way in which
0:25:16.000,0:25:19.310
these companies get data is they[br]buy from Third Party sources
0:25:19.310,0:25:24.590
and then sell it to Police departments. So[br]perhaps PredPol might get information
0:25:24.590,0:25:28.780
from Google, Facebook, Social Media[br]accounts; aggregate data themselves,
0:25:28.780,0:25:31.890
and then turn around and sell it to[br]Police departments or provide access
0:25:31.890,0:25:36.110
to Police departments. And generally, the[br]Courts are gonna have to begin to work out
0:25:36.110,0:25:40.210
how to handle this type of data.[br]There’s not case law,
0:25:40.210,0:25:45.160
at least in the US, that really knows[br]how to handle predictive algorithms
0:25:45.160,0:25:48.900
in determining what the analysis says.[br]And so there really needs to be
0:25:48.900,0:25:52.600
a lot more research and[br]thought put into this.
0:25:52.600,0:25:56.480
And one of the big things in order[br]for this to actually be useful:
0:25:56.480,0:26:01.590
if this is a tactic that had been used[br]by Police departments for decades,
0:26:01.590,0:26:04.420
we need to eliminate the bias in[br]the data sets. Because right now
0:26:04.420,0:26:09.090
all that it’s doing is facilitating and[br]continuing bias, set in the database.
0:26:09.090,0:26:12.610
And it’s incredibly difficult.[br]It’s data collected by humans.
0:26:12.610,0:26:17.780
And it causes initial selection bias.[br]Which is gonna have to stop
0:26:17.780,0:26:21.380
for it to be successful.
0:26:21.380,0:26:25.930
And perhaps these systems can cause[br]implicit bias or confirmation bias,
0:26:25.930,0:26:29.030
e.g. Police are going to believe[br]what they’ve been told.
0:26:29.030,0:26:33.170
So if a Police officer goes[br]on duty to an area
0:26:33.170,0:26:36.660
and an algorithm says: “You’re[br]70% likely to find a burglar
0:26:36.660,0:26:40.840
in this area”. Are they gonna find[br]a burglar because they’ve been told:
0:26:40.840,0:26:45.930
“You might find a burglar”?[br]And finally the US border.
0:26:45.930,0:26:49.800
There is no 4th Amendment[br]protection at the US border.
0:26:49.800,0:26:53.740
It’s an exception to the warrant[br]requirement. This means
0:26:53.740,0:26:58.740
no suspicion is needed to commit[br]a search. So this data is gonna go into
0:26:58.740,0:27:03.680
a way to examine you when[br]you cross the border.
0:27:03.680,0:27:09.960
And aggregate data can be used to[br]refuse you entry into the US etc.
0:27:09.960,0:27:13.690
And I think that’s pretty much it.[br]And so a few minutes for questions.
0:27:13.690,0:27:24.490
applause[br]Thank you!
0:27:24.490,0:27:27.460
Herald: Thanks a lot for your talk,[br]Whitney. We have about 4 minutes left
0:27:27.460,0:27:31.800
for questions. So please line up at[br]the microphones and remember to
0:27:31.800,0:27:37.740
make short and easy questions.
0:27:37.740,0:27:42.060
Microphone No.2, please.
0:27:42.060,0:27:53.740
Question: Just a comment: if I want[br]to run a crime organization, like,
0:27:53.740,0:27:57.760
I would target the PRECOBS[br]here in Hamburg, maybe.
0:27:57.760,0:28:01.170
So I can take the crime to the scenes
0:28:01.170,0:28:05.700
where the PRECOBS doesn’t suspect.
0:28:05.700,0:28:08.940
Whitney: Possibly. And I think this is[br]a big problem in getting availability
0:28:08.940,0:28:13.410
of data; in that there’s a good argument[br]for Police departments to say:
0:28:13.410,0:28:16.590
“We don’t want to tell you what[br]our tactics are for Policing,
0:28:16.590,0:28:19.490
because it might move crime”.
0:28:19.490,0:28:23.130
Herald: Do we have questions from[br]the internet? Yes, then please,
0:28:23.130,0:28:26.580
one question from the internet.
0:28:26.580,0:28:29.770
Signal Angel: Is there evidence that data[br]like the use of encrypted messaging
0:28:29.770,0:28:35.710
systems, encrypted emails, VPN, TOR,[br]with automated request to the ISP,
0:28:35.710,0:28:41.980
are used to obtain real names and[br]collected to contribute to the scoring?
0:28:41.980,0:28:45.580
Whitney: I’m not sure if that’s[br]being taken into account
0:28:45.580,0:28:49.530
by Predictive Policing algorithms,[br]or by the software being used.
0:28:49.530,0:28:55.160
I know that Police departments do[br]take those things into consideration.
0:28:55.160,0:29:00.630
And considering that in the US[br]Totality of the Circumstances is
0:29:00.630,0:29:04.980
how you evaluate suspicion. They are gonna[br]take all of those things into account
0:29:04.980,0:29:09.150
and they actually kind of[br]have to take into account.
0:29:09.150,0:29:11.830
Herald: Okay, microphone No.1, please.
0:29:11.830,0:29:16.790
Question: In your example you mentioned[br]disease tracking, e.g. Google Flu Trends
0:29:16.790,0:29:21.870
is a good example of preventive Predictive[br]Policing. Are there any examples
0:29:21.870,0:29:27.630
where – instead of increasing Policing[br]in the lives of communities –
0:29:27.630,0:29:34.260
where sociologists or social workers[br]are called to use predictive tools,
0:29:34.260,0:29:36.210
instead of more criminalization?
0:29:36.210,0:29:41.360
Whitney: I’m not aware if that’s…[br]if Police departments are sending
0:29:41.360,0:29:45.250
social workers instead of Police officers.[br]But that wouldn’t surprise me because
0:29:45.250,0:29:50.060
algorithms are being used to suspect child[br]abuse. And in the US they’re gonna send
0:29:50.060,0:29:53.230
a social worker in regard. So I would[br]not be surprised if that’s also being
0:29:53.230,0:29:56.890
considered. Since that’s[br]part of the resources.
0:29:56.890,0:29:59.030
Herald: OK, so if you have[br]a really short question, then
0:29:59.030,0:30:01.470
microphone No.2, please.[br]Last question.
0:30:01.470,0:30:08.440
Question: Okay, thank you for the[br]talk. This talk as well as few others
0:30:08.440,0:30:13.710
brought the thought in the debate[br]about the fine-tuning that is required
0:30:13.710,0:30:19.790
between false positives and[br]preventing crimes or terror.
0:30:19.790,0:30:24.250
Now, it’s a different situation[br]if the Policeman is predicting,
0:30:24.250,0:30:28.350
or a system is predicting somebody’s[br]stealing a paper from someone;
0:30:28.350,0:30:32.230
or someone is creating a terror attack.
0:30:32.230,0:30:38.030
And the justification to prevent it
0:30:38.030,0:30:42.980
under the expense of false positive[br]is different in these cases.
0:30:42.980,0:30:49.080
How do you make sure that the decision[br]or the fine-tuning is not going to be
0:30:49.080,0:30:53.570
deep down in the algorithm[br]and by the programmers,
0:30:53.570,0:30:58.650
but rather by the customer[br]– the Policemen or the authorities?
0:30:58.650,0:31:02.720
Whitney: I can imagine that Police[br]officers are using common sense in that,
0:31:02.720,0:31:06.220
and their knowledge about the situation[br]and even what they’re being told
0:31:06.220,0:31:10.450
by the algorithm. You hope[br]that they’re gonna take…
0:31:10.450,0:31:13.790
they probably are gonna take[br]terrorism to a different level
0:31:13.790,0:31:17.260
than a common burglary or[br]a stealing of a piece of paper
0:31:17.260,0:31:21.760
or a non-violent crime.[br]And that fine-tuning
0:31:21.760,0:31:26.160
is probably on a Police department
0:31:26.160,0:31:29.390
by Police department basis.
0:31:29.390,0:31:32.090
Herald: Thank you! This was Whitney[br]Merrill, give a warm round of applause, please!!
0:31:32.090,0:31:40.490
Whitney: Thank you![br]applause
0:31:40.490,0:31:42.510
postroll music
0:31:42.510,0:31:51.501
Subtitles created by c3subtitles.de[br]in the year 2016. Join and help us!