1
00:00:00,310 --> 00:00:10,240
32c3 preroll music
2
00:00:10,240 --> 00:00:13,920
Angel: I introduce Whitney Merrill.
She is an attorney in the US
3
00:00:13,920 --> 00:00:17,259
and she just recently, actually
last week, graduated
4
00:00:17,259 --> 00:00:20,999
to her CS masters in Illinois.
5
00:00:20,999 --> 00:00:27,299
applause
6
00:00:27,299 --> 00:00:30,249
Angel: Without further ado:
‘Predicting Crime In A Big Data World’.
7
00:00:30,249 --> 00:00:32,870
cautious applause
8
00:00:32,870 --> 00:00:36,920
Whitney Merrill: Hi everyone.
Thank you so much for coming.
9
00:00:36,920 --> 00:00:40,950
I know it´s been a exhausting Congress,
so I appreciate you guys coming
10
00:00:40,950 --> 00:00:45,300
to hear me talk about Big
Data and Crime Prediction.
11
00:00:45,300 --> 00:00:48,820
This is kind of a hobby of mine, I,
12
00:00:48,820 --> 00:00:53,030
in my last semester at Illinois,
decided to poke around
13
00:00:53,030 --> 00:00:56,850
what´s currently happening, how these
algorithms are being used and kind of
14
00:00:56,850 --> 00:01:00,390
figure out what kind of information can be
gathered. So, I have about 30 minutes
15
00:01:00,390 --> 00:01:04,629
with you guys. I´m gonna do a broad
overview of the types of programs.
16
00:01:04,629 --> 00:01:10,020
I´m gonna talk about what Predictive
Policing is, the data used,
17
00:01:10,020 --> 00:01:13,600
similar systems in other areas
where predictive algorithms are
18
00:01:13,600 --> 00:01:19,079
trying to better society,
current uses in policing.
19
00:01:19,079 --> 00:01:22,119
I´m gonna talk a little bit about their
effectiveness and then give you
20
00:01:22,119 --> 00:01:26,409
some final thoughts. So, imagine,
21
00:01:26,409 --> 00:01:30,310
in the very near future a Police
officer is walking down the street
22
00:01:30,310 --> 00:01:34,389
wearing a camera on her collar.
In her ear is a feed of information
23
00:01:34,389 --> 00:01:38,819
about the people and cars she passes
alerting her to individuals and cars
24
00:01:38,819 --> 00:01:43,259
that might fit a particular crime
or profile for a criminal.
25
00:01:43,259 --> 00:01:47,619
Early in the day she examined a
map highlighting hotspots for crime.
26
00:01:47,619 --> 00:01:52,459
In the area she´s been set to patrol
the predictive policing software
27
00:01:52,459 --> 00:01:57,590
indicates that there is an 82%
chance of burglary at 2 pm,
28
00:01:57,590 --> 00:02:01,539
and it´s currently 2:10 pm.
As she passes one individual
29
00:02:01,539 --> 00:02:05,549
her camera captures the
individual´s face, runs it through
30
00:02:05,549 --> 00:02:10,399
a coordinated Police database - all of the
Police departments that use this database
31
00:02:10,399 --> 00:02:14,680
are sharing information. Facial
recognition software indicates that
32
00:02:14,680 --> 00:02:19,580
the person is Bobby Burglar who was
previously convicted of burglary,
33
00:02:19,580 --> 00:02:24,790
was recently released and is now currently
on patrole. The voice in her ear whispers:
34
00:02:24,790 --> 00:02:29,970
50 percent likely to commit a crime.
Can she stop and search him?
35
00:02:29,970 --> 00:02:32,970
Should she chat him up?
Should see how he acts?
36
00:02:32,970 --> 00:02:37,150
Does she need additional information
to stop and detain him?
37
00:02:37,150 --> 00:02:40,900
And does it matter that he´s
carrying a large duffle bag?
38
00:02:40,900 --> 00:02:45,579
Did the algorithm take this into account
or did it just look at his face?
39
00:02:45,579 --> 00:02:49,939
What information was being
collected at the time the algorithm
40
00:02:49,939 --> 00:02:55,259
chose to say 50% to provide
the final analysis?
41
00:02:55,259 --> 00:02:57,930
So, another thought I´m gonna
have you guys think about as I go
42
00:02:57,930 --> 00:03:01,540
through this presentation, is this
quote that is more favorable
43
00:03:01,540 --> 00:03:05,870
towards Police algorithms, which is:
“As people become data plots
44
00:03:05,870 --> 00:03:10,209
and probability scores, law enforcement
officials and politicians alike
45
00:03:10,209 --> 00:03:16,519
can point and say: ‘Technology is void of
the racist, profiling bias of humans.’”
46
00:03:16,519 --> 00:03:21,459
Is that true? Well, they probably
will point and say that,
47
00:03:21,459 --> 00:03:24,860
but is it actually void of
racist, profiling humans?
48
00:03:24,860 --> 00:03:27,849
And I´m gonna talk about that as well.
49
00:03:27,849 --> 00:03:32,759
So, Predictive Policing explained.
Who and what?
50
00:03:32,759 --> 00:03:35,620
First of all, Predictive Policing
actually isn´t new. All we´re doing
51
00:03:35,620 --> 00:03:41,469
is adding technology, doing better,
faster aggregation of data.
52
00:03:41,469 --> 00:03:47,200
Analysts in Police departments have been
doing this by hand for decades.
53
00:03:47,200 --> 00:03:50,950
These techniques are used to create
profiles that accurately match
54
00:03:50,950 --> 00:03:55,530
likely offenders with specific past
crimes. So, there´s individual targeting
55
00:03:55,530 --> 00:03:59,489
and then we have location-based
targeting. The location-based,
56
00:03:59,489 --> 00:04:05,010
the goal is to help Police
forces deploy their resources
57
00:04:05,010 --> 00:04:10,230
in a correct manner, in an efficient
manner. They can be as simple
58
00:04:10,230 --> 00:04:13,950
as recommending that general crime
may happen in a particular area,
59
00:04:13,950 --> 00:04:19,108
or specifically, what type of crime will
happen in a one-block-radius.
60
00:04:19,108 --> 00:04:24,050
They take into account the time
of day, the recent data collected
61
00:04:24,050 --> 00:04:30,040
and when in the year it´s happening
as well as weather etc.
62
00:04:30,040 --> 00:04:33,850
So, another really quick thing worth
going over, cause not everyone
63
00:04:33,850 --> 00:04:39,090
is familiar with machine learning.
This is a very basic breakdown
64
00:04:39,090 --> 00:04:43,069
of training an algorithm on a data set.
65
00:04:43,069 --> 00:04:46,240
You collect it from many different
sources, you put it all together,
66
00:04:46,240 --> 00:04:51,019
you clean it up, you split it into 3 sets:
a training set, a validation set
67
00:04:51,019 --> 00:04:56,350
and a test set. The training set is
what is going to develop the rules
68
00:04:56,350 --> 00:05:01,379
in which it´s going to kind of
determine the final outcome.
69
00:05:01,379 --> 00:05:05,060
You´re gonna use a validation
set to optimize it and finally
70
00:05:05,060 --> 00:05:09,729
apply this to establish
a confidence level.
71
00:05:09,729 --> 00:05:15,349
There you´ll set a support level where
you say you need a certain amount of data
72
00:05:15,349 --> 00:05:19,940
to determine whether or not the
algorithm has enough information
73
00:05:19,940 --> 00:05:24,190
to kind of make a prediction.
So, rules with a low support level
74
00:05:24,190 --> 00:05:28,759
are less likely to be statistically
significant and the confidence level
75
00:05:28,759 --> 00:05:34,099
in the end is basically if there´s
an 85% confidence level
76
00:05:34,099 --> 00:05:39,930
that means there´s an 85% chance that the
suspect, e.g. meeting the rule in question,
77
00:05:39,930 --> 00:05:45,139
is engaged in criminal conduct.
So, what does this mean? Well,
78
00:05:45,139 --> 00:05:49,590
it encourages collection and hoarding
of data about crimes and individuals.
79
00:05:49,590 --> 00:05:52,720
Because you want as much information
as possible so that you detect
80
00:05:52,720 --> 00:05:56,030
even the less likely scenarios.
81
00:05:56,030 --> 00:05:59,729
Information sharing is also
encouraged because it´s easier,
82
00:05:59,729 --> 00:06:04,090
it´s done by third parties, or even
what are called fourth parties
83
00:06:04,090 --> 00:06:07,860
and shared amongst departments.
And here, your criminal data again
84
00:06:07,860 --> 00:06:10,810
was being done by analysts in Police
departments for decades, but
85
00:06:10,810 --> 00:06:13,660
the information sharing and the amount
of information they could aggregate
86
00:06:13,660 --> 00:06:17,169
was just significantly more difficult. So,
87
00:06:17,169 --> 00:06:21,410
what are these Predictive Policing
algorithms and software…
88
00:06:21,410 --> 00:06:24,580
what are they doing? Are they
determining guilt and innocence?
89
00:06:24,580 --> 00:06:29,289
And, unlike a thoughtcrime, they
are not saying this person is guilty,
90
00:06:29,289 --> 00:06:33,289
this person is innocent. It´s creating
a probability of whether or not
91
00:06:33,289 --> 00:06:37,800
the person has likely committed
a crime or will likely commit a crime.
92
00:06:37,800 --> 00:06:41,030
And it can only say something
to the future and the past.
93
00:06:41,030 --> 00:06:46,310
This here is a picture from
one particular piece of software
94
00:06:46,310 --> 00:06:50,199
provided by HunchLab; and patterns
emerge here from past crimes
95
00:06:50,199 --> 00:06:58,230
that can profile criminal types and
associations, detect crime patterns etc.
96
00:06:58,230 --> 00:07:02,139
Generally in this types of algorithms
they are using unsupervised data,
97
00:07:02,139 --> 00:07:05,479
that means someone is not going through
and saying true-false, good-bad, good-bad.
98
00:07:05,479 --> 00:07:10,780
There´s just 1) too much information and
2) they´re trying to do clustering,
99
00:07:10,780 --> 00:07:15,279
determine the things that are similar.
100
00:07:15,279 --> 00:07:20,110
So, really quickly, I´m also gonna
talk about the data that´s used.
101
00:07:20,110 --> 00:07:23,259
There are several different types:
Personal characteristics,
102
00:07:23,259 --> 00:07:28,169
demographic information, activities
of individuals, scientific data etc.
103
00:07:28,169 --> 00:07:32,690
This comes from all sorts of sources,
one that really shocked me, was,
104
00:07:32,690 --> 00:07:36,860
and I´ll talk about it a little bit in the
future, but, is the radiation detectors
105
00:07:36,860 --> 00:07:41,310
on New York City Police are
constantly taking in data
106
00:07:41,310 --> 00:07:44,819
and it´s so sensitive, it can detect if
you´ve had a recent medical treatment
107
00:07:44,819 --> 00:07:49,330
that involves radiation. Facial
recognition and biometrics
108
00:07:49,330 --> 00:07:52,860
are clear here and the third-party
doctrine – which basically says
109
00:07:52,860 --> 00:07:56,550
in the United States that you have no
reasonable expectation of privacy in data
110
00:07:56,550 --> 00:08:01,490
you share with third parties –
facilitates easy collection
111
00:08:01,490 --> 00:08:05,720
for Police officers and Government
officials because they can go
112
00:08:05,720 --> 00:08:11,080
and ask for the information
without any sort of warrant.
113
00:08:11,080 --> 00:08:16,259
For a really great overview: a friend of
mine, Dia, did a talk here at CCC
114
00:08:16,259 --> 00:08:21,280
on “The architecture of a street level
panopticon”. Does a really great overview
115
00:08:21,280 --> 00:08:25,199
of how this type of data is collected
on the streets. Worth checking out
116
00:08:25,199 --> 00:08:29,490
´cause I´m gonna gloss over
kind of the types of data.
117
00:08:29,490 --> 00:08:33,450
There is in the United States what
they call Multistate Anti-Terrorism
118
00:08:33,450 --> 00:08:38,279
Information Exchange Program which
uses everything from credit history,
119
00:08:38,279 --> 00:08:42,029
your concealed weapons permits,
aircraft pilot licenses,
120
00:08:42,029 --> 00:08:46,800
fishing licences etc. that´s searchable
and shared amongst Police departments
121
00:08:46,800 --> 00:08:50,530
and Government officials and this is just
more information. So, if they can collect
122
00:08:50,530 --> 00:08:57,690
it, they will aggregate it into a data
base. So, what are the current uses?
123
00:08:57,690 --> 00:09:01,779
There are many, many different
companies currently
124
00:09:01,779 --> 00:09:04,950
making software and marketing
it to Police departments.
125
00:09:04,950 --> 00:09:08,470
All of them are slightly different, have
different features, but currently
126
00:09:08,470 --> 00:09:12,260
it´s a competition to get clients,
Police departments etc.
127
00:09:12,260 --> 00:09:15,829
The more Police departments you have
the more data sharing you can sell,
128
00:09:15,829 --> 00:09:21,089
saying: “Oh, by enrolling you’ll now have
x,y and z Police departments’ data
129
00:09:21,089 --> 00:09:27,040
to access” etc. These here
are Hitachi and HunchLab,
130
00:09:27,040 --> 00:09:31,350
they both are hotspot targeting,
it´s not individual targeting,
131
00:09:31,350 --> 00:09:35,140
those are a lot rarer. And it´s actually
being used in my home town,
132
00:09:35,140 --> 00:09:39,550
which I´ll talk about in a little bit.
Here, the appropriate tactics
133
00:09:39,550 --> 00:09:44,180
are automatically displayed for officers
when they´re entering mission areas.
134
00:09:44,180 --> 00:09:47,920
So HunchLab will tell an officer:
“Hey, you´re entering an area
135
00:09:47,920 --> 00:09:52,180
where there´s gonna be burglary that you
should keep an eye out, be aware”.
136
00:09:52,180 --> 00:09:58,010
And this is updating in live time and
they´re hoping it mitigates crime.
137
00:09:58,010 --> 00:10:01,240
Here are 2 other ones, the Domain
Awareness System was created
138
00:10:01,240 --> 00:10:05,139
in New York City after 9/11
in conjunction with Microsoft.
139
00:10:05,139 --> 00:10:10,000
New York City actually makes
money selling it to other cities
140
00:10:10,000 --> 00:10:16,470
to use this. CCTV-cameras
are collected, they can…
141
00:10:16,470 --> 00:10:21,029
If they say there´s a man
wearing a red shirt,
142
00:10:21,029 --> 00:10:24,430
the software will look for people
wearing red shirts and at least
143
00:10:24,430 --> 00:10:28,139
alert Police departments to
people that meet this description
144
00:10:28,139 --> 00:10:34,389
walking in public in New York
City. The other one is by IBM
145
00:10:34,389 --> 00:10:40,139
and there are quite a few, you know, it´s
just generally another hotspot targeting,
146
00:10:40,139 --> 00:10:45,839
each have a few different features.
Worth mentioning, too, is the Heat List.
147
00:10:45,839 --> 00:10:50,769
This targeted individuals. I’m from the
city of Chicago. I grew up in the city.
148
00:10:50,769 --> 00:10:55,149
There are currently 420 names, when
this came out about a year ago,
149
00:10:55,149 --> 00:10:59,920
of individuals who are 500 times more
likely than average to be involved
150
00:10:59,920 --> 00:11:05,230
in violence. Individual names, passed
around to each Police officer in Chicago.
151
00:11:05,230 --> 00:11:10,029
They consider the rap sheet,
disturbance calls, social network etc.
152
00:11:10,029 --> 00:11:15,540
But one of the main things they considered
in placing mainly young black individuals
153
00:11:15,540 --> 00:11:19,279
on this list were known acquaintances
and their arrest histories.
154
00:11:19,279 --> 00:11:23,279
So if kids went to school or young
teenagers went to school
155
00:11:23,279 --> 00:11:27,880
with several people in a gang – and that
individual may not even be involved
156
00:11:27,880 --> 00:11:32,160
in a gang – they’re more likely to
appear on the list. The list has been
157
00:11:32,160 --> 00:11:36,829
heavily criticized for being racist,
for not giving these children
158
00:11:36,829 --> 00:11:40,660
or young individuals on the list
a chance to change their history
159
00:11:40,660 --> 00:11:44,510
because it’s being decided for them.
They’re being told: “You are likely
160
00:11:44,510 --> 00:11:49,850
to be a criminal, and we’re gonna
watch you”. Officers in Chicago
161
00:11:49,850 --> 00:11:53,550
visited these individuals would do knock
and announce with a knock on the door
162
00:11:53,550 --> 00:11:58,029
and say: “Hi, I’m here, like just
checking up what are you up to”.
163
00:11:58,029 --> 00:12:02,480
Which you don’t need any special
suspicion to do. But it’s, you know,
164
00:12:02,480 --> 00:12:06,860
kind of a harassment that
might cause a feedback,
165
00:12:06,860 --> 00:12:11,310
back into the data collected.
166
00:12:11,310 --> 00:12:15,209
This is PRECOBS. It’s currently
used here in Hamburg.
167
00:12:15,209 --> 00:12:19,100
They actually went to Chicago and
visited the Chicago Police Department
168
00:12:19,100 --> 00:12:24,170
to learn about Predictive Policing
tactics in Chicago to implement it
169
00:12:24,170 --> 00:12:29,729
throughout Germany, Hamburg and Berlin.
170
00:12:29,729 --> 00:12:33,620
It’s used to generally
forecast repeat-offenses.
171
00:12:33,620 --> 00:12:39,930
Again, when training data sets you need
enough data points to predict crime.
172
00:12:39,930 --> 00:12:43,699
So crimes that are less likely to
happen or happen very rarely:
173
00:12:43,699 --> 00:12:48,120
much harder to predict. Crimes that
aren’t reported: much harder to predict.
174
00:12:48,120 --> 00:12:52,480
So a lot of these software…
like pieces of software
175
00:12:52,480 --> 00:12:58,290
rely on algorithms that are hoping
that there’s a same sort of picture,
176
00:12:58,290 --> 00:13:03,070
that they can predict: where and when
and what type of crime will happen.
177
00:13:03,070 --> 00:13:06,890
PRECOBS is actually a term with a plan
178
00:13:06,890 --> 00:13:11,240
– the movie ‘Minority Report’, if you’re
familiar with it, it’s the 3 psychics
179
00:13:11,240 --> 00:13:15,370
who predict crimes
before they happen.
180
00:13:15,370 --> 00:13:19,149
So there’re other, similar systems
in the world that are being used
181
00:13:19,149 --> 00:13:22,949
to predict whether or not
something will happen.
182
00:13:22,949 --> 00:13:27,360
The first one is ‘Disease and Diagnosis’.
They found that algorithms are actually
183
00:13:27,360 --> 00:13:33,810
more likely than doctors to predict
what disease an individual has.
184
00:13:33,810 --> 00:13:39,480
It’s kind of shocking. The other is
‘Security Clearance’ in the US.
185
00:13:39,480 --> 00:13:44,240
It allows access to classified documents.
There’s no automatic access in the US.
186
00:13:44,240 --> 00:13:48,750
So every person who wants to see
some sort of secret cleared document
187
00:13:48,750 --> 00:13:53,089
must go through this process.
And it’s vetting individuals.
188
00:13:53,089 --> 00:13:56,690
So it’s an opt-in process. But here
they’re trying to predict who will
189
00:13:56,690 --> 00:14:00,550
disclose information, who will
break the clearance system;
190
00:14:00,550 --> 00:14:05,810
and predict there… Here, the error rate,
they’re probably much more comfortable
191
00:14:05,810 --> 00:14:09,360
with a high error rate. Because they
have so many people competing
192
00:14:09,360 --> 00:14:13,699
for a particular job, to get
clearance, that if they’re wrong,
193
00:14:13,699 --> 00:14:18,000
that somebody probably won’t disclose
information, they don’t care,
194
00:14:18,000 --> 00:14:22,319
they just rather eliminate
them than take the risk.
195
00:14:22,319 --> 00:14:27,509
So I’m an attorney in the US. I have
this urge to talk about US law.
196
00:14:27,509 --> 00:14:32,089
It also seems to impact a lot
of people internationally.
197
00:14:32,089 --> 00:14:36,360
Here we’re talking about the targeting
of individuals, not hotspots.
198
00:14:36,360 --> 00:14:40,810
So targeting of individuals is
not as widespread, currently.
199
00:14:40,810 --> 00:14:45,579
However it’s happening in Chicago;
200
00:14:45,579 --> 00:14:49,259
and other cities are considering
implementing programs and there are grants
201
00:14:49,259 --> 00:14:53,730
right now to encourage
Police departments
202
00:14:53,730 --> 00:14:57,110
to figure out target lists.
203
00:14:57,110 --> 00:15:00,699
So in the US suspicion is based on
the totality of the circumstances.
204
00:15:00,699 --> 00:15:04,730
That’s the whole picture. The Police
officer, the individual must look
205
00:15:04,730 --> 00:15:08,269
at the whole picture of what’s happening
before they can detain an individual.
206
00:15:08,269 --> 00:15:11,920
It’s supposed to be a balanced
assessment of relative weights, meaning
207
00:15:11,920 --> 00:15:16,399
– you know – if you know that the
person is a pastor maybe then
208
00:15:16,399 --> 00:15:21,720
pacing in front of a liquor
store, is not as suspicious
209
00:15:21,720 --> 00:15:26,370
as somebody who’s been convicted
of 3 burglaries. It has to be ‘based
210
00:15:26,370 --> 00:15:31,430
on specific and articulable facts’. And
the Police officers can use experience
211
00:15:31,430 --> 00:15:37,470
and common sense to determine
whether or not their suspicion…
212
00:15:37,470 --> 00:15:42,920
Large amounts of networked data generally
can provide individualized suspicion.
213
00:15:42,920 --> 00:15:48,410
The principal components here… the
events leading up to the stop-and-search
214
00:15:48,410 --> 00:15:52,319
– what is the person doing right before
they’re detained as well as the use
215
00:15:52,319 --> 00:15:57,709
of historical facts known about that
individual, the crime, the area
216
00:15:57,709 --> 00:16:02,329
in which it’s happening etc.
So it can rely on both things.
217
00:16:02,329 --> 00:16:06,819
No court in the US has really put out
a percentage as what Probable Cause
218
00:16:06,819 --> 00:16:11,089
and Reasonable Suspicion. So ‘Probable
Cause’ you need to get a warrant
219
00:16:11,089 --> 00:16:14,639
to search and seize an individual.
‘Reasonable Suspicion’ is needed
220
00:16:14,639 --> 00:16:20,329
to do stop-and-frisk in the US – stop
an individual and question them.
221
00:16:20,329 --> 00:16:24,100
And this is a little bit different than
what they call ‘Consensual Encounters’,
222
00:16:24,100 --> 00:16:27,680
where a Police officer goes up to you and
chats you up. ‘Reasonable Suspicion’
223
00:16:27,680 --> 00:16:32,029
– you’re actually detained. But I had
a law professor who basically said:
224
00:16:32,029 --> 00:16:35,730
“30%..45% seem like a really good number
225
00:16:35,730 --> 00:16:39,290
just to show how low it really is”.You
don’t even need to be 50% sure
226
00:16:39,290 --> 00:16:42,180
that somebody has committed a crime.
227
00:16:42,180 --> 00:16:47,459
So, officers can draw from their own
experience to determine ‘Probable Cause’.
228
00:16:47,459 --> 00:16:51,350
And the UK has a similar
‘Reasonable Suspicion’ standard
229
00:16:51,350 --> 00:16:55,010
which depend on the circumstances
of each case. So,
230
00:16:55,010 --> 00:16:58,819
I’m not as familiar with UK law but I
believe even that some of the analysis-run
231
00:16:58,819 --> 00:17:03,480
‘Reasonable Suspicion’ is similar.
232
00:17:03,480 --> 00:17:07,339
Is this like a black box?
So, I threw this slide in
233
00:17:07,339 --> 00:17:10,960
for those who are interested
in comparing this US law.
234
00:17:10,960 --> 00:17:15,280
Generally a dog sniff in the US
falls under a particular set
235
00:17:15,280 --> 00:17:20,140
of legal history which is: a
dog can go up, sniff for dogs,
236
00:17:20,140 --> 00:17:24,220
alert and that is completely okay.
237
00:17:24,220 --> 00:17:28,099
And the Police officers can use that
data to detain and further search
238
00:17:28,099 --> 00:17:33,520
an individual. So is an algorithm similar
to the dog which is kind of a black box?
239
00:17:33,520 --> 00:17:37,030
Information goes out, it’s processed,
information comes out and
240
00:17:37,030 --> 00:17:42,720
a prediction is made.
Police rely on the ‘Good Faith’
241
00:17:42,720 --> 00:17:48,780
in ‘Totality of the Circumstances’
to make their decision. So there’s
242
00:17:48,780 --> 00:17:53,970
really no… if they’re
relying on the algorithm
243
00:17:53,970 --> 00:17:57,230
and think in that situation that
everything’s okay we might reach
244
00:17:57,230 --> 00:18:01,980
a level of ‘Reasonable Suspicion’ where
the individual can now pat down
245
00:18:01,980 --> 00:18:08,470
the person he’s decided on the street
or the algorithm has alerted to. So,
246
00:18:08,470 --> 00:18:13,220
the big question is, you know, “Could the
officer consult predictive software apps
247
00:18:13,220 --> 00:18:18,610
in any individual analysis. Could he
say: ‘60% likely to commit a crime’”.
248
00:18:18,610 --> 00:18:24,180
In my hypo: Does that
mean that the person
249
00:18:24,180 --> 00:18:29,160
without looking at anything
else detain that individual.
250
00:18:29,160 --> 00:18:33,810
And the answer is “Probably not”. One:
predictive Policing algorithms just
251
00:18:33,810 --> 00:18:37,770
can not take in the Totality of the
Circumstances. They have to be
252
00:18:37,770 --> 00:18:42,690
frequently updated, there are
things that are happening that
253
00:18:42,690 --> 00:18:46,060
the algorithm possibly could
not have taken into account.
254
00:18:46,060 --> 00:18:48,590
The problem here is
that the algorithm itself,
255
00:18:48,590 --> 00:18:51,780
the prediction itself becomes part
of Totality of the Circumstances,
256
00:18:51,780 --> 00:18:56,330
which I’m going to talk
about a little bit more later.
257
00:18:56,330 --> 00:19:00,660
But officers have to have Reasonable
Suspicion before the stop occurs.
258
00:19:00,660 --> 00:19:04,660
Retroactive justification
is not sufficient. So,
259
00:19:04,660 --> 00:19:08,790
the algorithm can’t just say:
“60% likely, you detain the individual
260
00:19:08,790 --> 00:19:12,130
and then figure out why you’ve
detained the person”. It has to be
261
00:19:12,130 --> 00:19:16,570
before the detention actually happens.
And the suspicion must relate
262
00:19:16,570 --> 00:19:19,990
to current criminal activity. The
person must be doing something
263
00:19:19,990 --> 00:19:24,700
to indicate criminal activity. Just
the fact that an algorithm says,
264
00:19:24,700 --> 00:19:29,440
based on these facts: “60%”,
or even without articulating
265
00:19:29,440 --> 00:19:33,890
why the algorithm has
chosen that, isn’t enough.
266
00:19:33,890 --> 00:19:38,380
Maybe you can see a gun
shaped bulge in the pocket etc.
267
00:19:38,380 --> 00:19:43,160
So, effectiveness… the
Totality of the Circumstances,
268
00:19:43,160 --> 00:19:46,720
can the algorithms keep up?
Generally, probably not.
269
00:19:46,720 --> 00:19:50,560
Missing data, not capable of
processing this data in real time.
270
00:19:50,560 --> 00:19:54,820
There’s no idea… the
algorithm doesn’t know,
271
00:19:54,820 --> 00:19:58,950
and the Police officer probably
doesn’t know the all of the facts.
272
00:19:58,950 --> 00:20:03,260
So the Police officer can take
the algorithm into consideration
273
00:20:03,260 --> 00:20:08,130
but the problem here is: Did the algorithm
know that the individual was active
274
00:20:08,130 --> 00:20:12,670
in the community, or was a politician, or
275
00:20:12,670 --> 00:20:17,450
that was a personal friend of the officer
etc. It can’t just be relied upon.
276
00:20:17,450 --> 00:20:22,640
What if the algorithm did take into
account that the individual was a Pastor?
277
00:20:22,640 --> 00:20:26,180
Now that information is counted twice
and the balancing for the Totality
278
00:20:26,180 --> 00:20:34,320
of the Circumstances is off. Humans
here must be the final decider.
279
00:20:34,320 --> 00:20:38,040
What are the problems?
Well, there’s bad underlying data,
280
00:20:38,040 --> 00:20:41,970
there’s no transparency into
what kind of data is being used,
281
00:20:41,970 --> 00:20:45,720
how it was collected, how old it
is, how often it’s been updated,
282
00:20:45,720 --> 00:20:51,010
whether or not it’s been verified. There
could just be noise in the training data.
283
00:20:51,010 --> 00:20:57,240
Honestly, the data is biased. It was
collected by individuals in the US;
284
00:20:57,240 --> 00:21:01,020
generally there’ve been
several studies done that
285
00:21:01,020 --> 00:21:05,270
black, young individuals are
stopped more often than whites.
286
00:21:05,270 --> 00:21:09,800
And this is going to
cause a collection bias.
287
00:21:09,800 --> 00:21:14,550
It’s gonna be drastically disproportionate
to the makeup of the population of cities;
288
00:21:14,550 --> 00:21:19,440
and as more data has been collected on
minorities, refugees in poor neighborhoods
289
00:21:19,440 --> 00:21:23,640
it’s gonna feed back in and of course only
have data on those groups and provide
290
00:21:23,640 --> 00:21:26,410
feedback and say:
“More crime is likely to
291
00:21:26,410 --> 00:21:27,770
happen because that’s where the data
292
00:21:27,770 --> 00:21:32,250
was collected”. So, what’s
an acceptable error rate, well,
293
00:21:32,250 --> 00:21:37,500
depends on the burden of proof. Harm
is different for an opt-in system.
294
00:21:37,500 --> 00:21:40,840
You know, what’s my harm if I don’t
get clearance, or I don’t get the job;
295
00:21:40,840 --> 00:21:45,160
but I’m opting in, I’m asking to
being considered for employment.
296
00:21:45,160 --> 00:21:49,080
In the US, what’s an error? If you
search and find nothing, if you think
297
00:21:49,080 --> 00:21:53,630
you have Reasonable Suspicion
based on good faith,
298
00:21:53,630 --> 00:21:57,060
both on the algorithm and what
you witness, the US says that it’s
299
00:21:57,060 --> 00:22:00,620
no 4th Amendment violation,
even if nothing has happened.
300
00:22:00,620 --> 00:22:05,970
It’s very low error
false-positive rate here.
301
00:22:05,970 --> 00:22:09,140
In Big Data, generally, and
machine-learning it’s great!
302
00:22:09,140 --> 00:22:13,550
Like 1% error is fantastic! But that’s
pretty large for the number of individuals
303
00:22:13,550 --> 00:22:17,930
stopped each day. Or who might
be subject to these algorithms.
304
00:22:17,930 --> 00:22:21,950
Because even though there’re only
400 individuals on the list in Chicago
305
00:22:21,950 --> 00:22:25,210
those individuals have been
listed basically as targets
306
00:22:25,210 --> 00:22:28,870
by the Chicago Police Department.
307
00:22:28,870 --> 00:22:33,700
Other problems include database errors.
Exclusion of evidence in the US
308
00:22:33,700 --> 00:22:37,170
only happens when there’s gross
negligence or systematic misconduct.
309
00:22:37,170 --> 00:22:42,150
That’s very difficult to prove, especially
when a lot of people view these algorithms
310
00:22:42,150 --> 00:22:47,360
as a big box. Data goes in,
predictions come out, everyone’s happy.
311
00:22:47,360 --> 00:22:53,100
You rely and trust on the
quality of IBM, HunchLab etc.
312
00:22:53,100 --> 00:22:56,730
to provide good software.
313
00:22:56,730 --> 00:23:01,000
Finally, some more concerns I have
include feedback loop auditing
314
00:23:01,000 --> 00:23:04,810
and access to data and algorithms
and the prediction thresholds.
315
00:23:04,810 --> 00:23:09,970
How certain must a prediction be
– before it’s reported to the Police –
316
00:23:09,970 --> 00:23:13,230
that the person might commit a
crime. Or that crime might happen
317
00:23:13,230 --> 00:23:18,460
in the individual area. If Reasonable
Suspicion is as low as 35%,
318
00:23:18,460 --> 00:23:23,740
and reasonable Suspicion in the US has
been held at: That guy drives a car
319
00:23:23,740 --> 00:23:28,350
that drug dealers like to drive,
and he’s in the DEA database
320
00:23:28,350 --> 00:23:36,550
as a possible drug dealer. That was
enough to stop and search him.
321
00:23:36,550 --> 00:23:40,090
So, are there Positives? Well, PredPol,
322
00:23:40,090 --> 00:23:44,800
which is one of the services that
provides Predictive Policing software,
323
00:23:44,800 --> 00:23:49,650
says: “Since these cities have
implemented there’s been dropping crime”.
324
00:23:49,650 --> 00:23:54,030
In L.A. 13% reduction in
crime, in one division.
325
00:23:54,030 --> 00:23:57,510
There was even one day where
they had no crime reported.
326
00:23:57,510 --> 00:24:04,550
Santa Cruz – 25..29% reduction,
-9% in assaults etc.
327
00:24:04,550 --> 00:24:10,030
One: these are Police departments
self-reporting these successes for…
328
00:24:10,030 --> 00:24:14,670
you know, take it for what it is
and reiterated by the people
329
00:24:14,670 --> 00:24:20,510
selling the software. But perhaps
it is actually reducing crime.
330
00:24:20,510 --> 00:24:24,390
It’s kind of hard to tell because
there’s a feedback loop.
331
00:24:24,390 --> 00:24:29,200
Do we know that crime is really being
reduced? Will it affect the data
332
00:24:29,200 --> 00:24:33,170
that is collected in the future? It’s
really hard to know. Because
333
00:24:33,170 --> 00:24:38,330
if you send the Police officers into
a community it’s more likely
334
00:24:38,330 --> 00:24:42,580
that they’re going to affect that
community and that data collection.
335
00:24:42,580 --> 00:24:46,940
Will more crimes happen because they
feel like the Police are harassing them?
336
00:24:46,940 --> 00:24:52,020
It’s very likely and it’s a problem here.
337
00:24:52,020 --> 00:24:56,930
So, some final thoughts. Predictive
Policing programs are not going anywhere.
338
00:24:56,930 --> 00:25:01,430
They’re only in their wheelstart.
339
00:25:01,430 --> 00:25:06,030
And I think that more analysis, more
transparency, more access to data
340
00:25:06,030 --> 00:25:10,560
needs to happen around these algorithms.
There needs to be regulation.
341
00:25:10,560 --> 00:25:16,000
Currently, a very successful way in which
342
00:25:16,000 --> 00:25:19,310
these companies get data is they
buy from Third Party sources
343
00:25:19,310 --> 00:25:24,590
and then sell it to Police departments. So
perhaps PredPol might get information
344
00:25:24,590 --> 00:25:28,780
from Google, Facebook, Social Media
accounts; aggregate data themselves,
345
00:25:28,780 --> 00:25:31,890
and then turn around and sell it to
Police departments or provide access
346
00:25:31,890 --> 00:25:36,110
to Police departments. And generally, the
Courts are gonna have to begin to work out
347
00:25:36,110 --> 00:25:40,210
how to handle this type of data.
There’s not case law,
348
00:25:40,210 --> 00:25:45,160
at least in the US, that really knows
how to handle predictive algorithms
349
00:25:45,160 --> 00:25:48,900
in determining what the analysis says.
And so there really needs to be
350
00:25:48,900 --> 00:25:52,600
a lot more research and
thought put into this.
351
00:25:52,600 --> 00:25:56,480
And one of the big things in order
for this to actually be useful:
352
00:25:56,480 --> 00:26:01,590
if this is a tactic that had been used
by Police departments for decades,
353
00:26:01,590 --> 00:26:04,420
we need to eliminate the bias in
the data sets. Because right now
354
00:26:04,420 --> 00:26:09,090
all that it’s doing is facilitating and
continuing bias, set in the database.
355
00:26:09,090 --> 00:26:12,610
And it’s incredibly difficult.
It’s data collected by humans.
356
00:26:12,610 --> 00:26:17,780
And it causes initial selection bias.
Which is gonna have to stop
357
00:26:17,780 --> 00:26:21,380
for it to be successful.
358
00:26:21,380 --> 00:26:25,930
And perhaps these systems can cause
implicit bias or confirmation bias,
359
00:26:25,930 --> 00:26:29,030
e.g. Police are going to believe
what they’ve been told.
360
00:26:29,030 --> 00:26:33,170
So if a Police officer goes
on duty to an area
361
00:26:33,170 --> 00:26:36,660
and an algorithm says: “You’re
70% likely to find a burglar
362
00:26:36,660 --> 00:26:40,840
in this area”. Are they gonna find
a burglar because they’ve been told:
363
00:26:40,840 --> 00:26:45,930
“You might find a burglar”?
And finally the US border.
364
00:26:45,930 --> 00:26:49,800
There is no 4th Amendment
protection at the US border.
365
00:26:49,800 --> 00:26:53,740
It’s an exception to the warrant
requirement. This means
366
00:26:53,740 --> 00:26:58,740
no suspicion is needed to commit
a search. So this data is gonna go into
367
00:26:58,740 --> 00:27:03,680
a way to examine you when
you cross the border.
368
00:27:03,680 --> 00:27:09,960
And aggregate data can be used to
refuse you entry into the US etc.
369
00:27:09,960 --> 00:27:13,690
And I think that’s pretty much it.
And so a few minutes for questions.
370
00:27:13,690 --> 00:27:24,490
applause
Thank you!
371
00:27:24,490 --> 00:27:27,460
Herald: Thanks a lot for your talk,
Whitney. We have about 4 minutes left
372
00:27:27,460 --> 00:27:31,800
for questions. So please line up at
the microphones and remember to
373
00:27:31,800 --> 00:27:37,740
make short and easy questions.
374
00:27:37,740 --> 00:27:42,060
Microphone No.2, please.
375
00:27:42,060 --> 00:27:53,740
Question: Just a comment: if I want
to run a crime organization, like,
376
00:27:53,740 --> 00:27:57,760
I would target the PRECOBS
here in Hamburg, maybe.
377
00:27:57,760 --> 00:28:01,170
So I can take the crime to the scenes
378
00:28:01,170 --> 00:28:05,700
where the PRECOBS doesn’t suspect.
379
00:28:05,700 --> 00:28:08,940
Whitney: Possibly. And I think this is
a big problem in getting availability
380
00:28:08,940 --> 00:28:13,410
of data; in that there’s a good argument
for Police departments to say:
381
00:28:13,410 --> 00:28:16,590
“We don’t want to tell you what
our tactics are for Policing,
382
00:28:16,590 --> 00:28:19,490
because it might move crime”.
383
00:28:19,490 --> 00:28:23,130
Herald: Do we have questions from
the internet? Yes, then please,
384
00:28:23,130 --> 00:28:26,580
one question from the internet.
385
00:28:26,580 --> 00:28:29,770
Signal Angel: Is there evidence that data
like the use of encrypted messaging
386
00:28:29,770 --> 00:28:35,710
systems, encrypted emails, VPN, TOR,
with automated request to the ISP,
387
00:28:35,710 --> 00:28:41,980
are used to obtain real names and
collected to contribute to the scoring?
388
00:28:41,980 --> 00:28:45,580
Whitney: I’m not sure if that’s
being taken into account
389
00:28:45,580 --> 00:28:49,530
by Predictive Policing algorithms,
or by the software being used.
390
00:28:49,530 --> 00:28:55,160
I know that Police departments do
take those things into consideration.
391
00:28:55,160 --> 00:29:00,630
And considering that in the US
Totality of the Circumstances is
392
00:29:00,630 --> 00:29:04,980
how you evaluate suspicion. They are gonna
take all of those things into account
393
00:29:04,980 --> 00:29:09,150
and they actually kind of
have to take into account.
394
00:29:09,150 --> 00:29:11,830
Herald: Okay, microphone No.1, please.
395
00:29:11,830 --> 00:29:16,790
Question: In your example you mentioned
disease tracking, e.g. Google Flu Trends
396
00:29:16,790 --> 00:29:21,870
is a good example of preventive Predictive
Policing. Are there any examples
397
00:29:21,870 --> 00:29:27,630
where – instead of increasing Policing
in the lives of communities –
398
00:29:27,630 --> 00:29:34,260
where sociologists or social workers
are called to use predictive tools,
399
00:29:34,260 --> 00:29:36,210
instead of more criminalization?
400
00:29:36,210 --> 00:29:41,360
Whitney: I’m not aware if that’s…
if Police departments are sending
401
00:29:41,360 --> 00:29:45,250
social workers instead of Police officers.
But that wouldn’t surprise me because
402
00:29:45,250 --> 00:29:50,060
algorithms are being used to suspect child
abuse. And in the US they’re gonna send
403
00:29:50,060 --> 00:29:53,230
a social worker in regard. So I would
not be surprised if that’s also being
404
00:29:53,230 --> 00:29:56,890
considered. Since that’s
part of the resources.
405
00:29:56,890 --> 00:29:59,030
Herald: OK, so if you have
a really short question, then
406
00:29:59,030 --> 00:30:01,470
microphone No.2, please.
Last question.
407
00:30:01,470 --> 00:30:08,440
Question: Okay, thank you for the
talk. This talk as well as few others
408
00:30:08,440 --> 00:30:13,710
brought the thought in the debate
about the fine-tuning that is required
409
00:30:13,710 --> 00:30:19,790
between false positives and
preventing crimes or terror.
410
00:30:19,790 --> 00:30:24,250
Now, it’s a different situation
if the Policeman is predicting,
411
00:30:24,250 --> 00:30:28,350
or a system is predicting somebody’s
stealing a paper from someone;
412
00:30:28,350 --> 00:30:32,230
or someone is creating a terror attack.
413
00:30:32,230 --> 00:30:38,030
And the justification to prevent it
414
00:30:38,030 --> 00:30:42,980
under the expense of false positive
is different in these cases.
415
00:30:42,980 --> 00:30:49,080
How do you make sure that the decision
or the fine-tuning is not going to be
416
00:30:49,080 --> 00:30:53,570
deep down in the algorithm
and by the programmers,
417
00:30:53,570 --> 00:30:58,650
but rather by the customer
– the Policemen or the authorities?
418
00:30:58,650 --> 00:31:02,720
Whitney: I can imagine that Police
officers are using common sense in that,
419
00:31:02,720 --> 00:31:06,220
and their knowledge about the situation
and even what they’re being told
420
00:31:06,220 --> 00:31:10,450
by the algorithm. You hope
that they’re gonna take…
421
00:31:10,450 --> 00:31:13,790
they probably are gonna take
terrorism to a different level
422
00:31:13,790 --> 00:31:17,260
than a common burglary or
a stealing of a piece of paper
423
00:31:17,260 --> 00:31:21,760
or a non-violent crime.
And that fine-tuning
424
00:31:21,760 --> 00:31:26,160
is probably on a Police department
425
00:31:26,160 --> 00:31:29,390
by Police department basis.
426
00:31:29,390 --> 00:31:32,090
Herald: Thank you! This was Whitney
Merrill, give a warm round of applause, please!!
427
00:31:32,090 --> 00:31:40,490
Whitney: Thank you!
applause
428
00:31:40,490 --> 00:31:42,510
postroll music
429
00:31:42,510 --> 00:31:51,501
Subtitles created by c3subtitles.de
in the year 2016. Join and help us!