WEBVTT
00:00:00.310 --> 00:00:10.240
32c3 preroll music
00:00:10.240 --> 00:00:13.920
Angel: I introduce Whitney Merrill.
She is an attorney in the US
00:00:13.920 --> 00:00:17.259
and she just recently, actually
last week, graduated
00:00:17.259 --> 00:00:20.999
to her CS masters in Illinois.
00:00:20.999 --> 00:00:27.299
applause
00:00:27.299 --> 00:00:30.249
Angel: Without further ado:
‘Predicting Crime In A Big Data World’.
00:00:30.249 --> 00:00:32.870
cautious applause
00:00:32.870 --> 00:00:36.920
Whitney Merrill: Hi everyone.
Thank you so much for coming.
00:00:36.920 --> 00:00:40.950
I know it´s been a exhausting Congress,
so I appreciate you guys coming
00:00:40.950 --> 00:00:45.300
to hear me talk about Big
Data and Crime Prediction.
00:00:45.300 --> 00:00:48.820
This is kind of a hobby of mine, I,
00:00:48.820 --> 00:00:53.030
in my last semester at Illinois,
decided to poke around
00:00:53.030 --> 00:00:56.850
what´s currently happening, how these
algorithms are being used and kind of
00:00:56.850 --> 00:01:00.390
figure out what kind of information can be
gathered. So, I have about 30 minutes
00:01:00.390 --> 00:01:04.629
with you guys. I´m gonna do a broad
overview of the types of programs.
00:01:04.629 --> 00:01:10.020
I´m gonna talk about what Predictive
Policing is, the data used,
00:01:10.020 --> 00:01:13.600
similar systems in other areas
where predictive algorithms are
00:01:13.600 --> 00:01:19.079
trying to better society,
current uses in policing.
00:01:19.079 --> 00:01:22.119
I´m gonna talk a little bit about their
effectiveness and then give you
00:01:22.119 --> 00:01:26.409
some final thoughts. So, imagine,
00:01:26.409 --> 00:01:30.310
in the very near future a Police
officer is walking down the street
00:01:30.310 --> 00:01:34.389
wearing a camera on her collar.
In her ear is a feed of information
00:01:34.389 --> 00:01:38.819
about the people and cars she passes
alerting her to individuals and cars
00:01:38.819 --> 00:01:43.259
that might fit a particular crime
or profile for a criminal.
00:01:43.259 --> 00:01:47.619
Early in the day she examined a
map highlighting hotspots for crime.
00:01:47.619 --> 00:01:52.459
In the area she´s been set to patrol
the predictive policing software
00:01:52.459 --> 00:01:57.590
indicates that there is an 82%
chance of burglary at 2 pm,
00:01:57.590 --> 00:02:01.539
and it´s currently 2:10 pm.
As she passes one individual
00:02:01.539 --> 00:02:05.549
her camera captures the
individual´s face, runs it through
00:02:05.549 --> 00:02:10.399
a coordinated Police database - all of the
Police departments that use this database
00:02:10.399 --> 00:02:14.680
are sharing information. Facial
recognition software indicates that
00:02:14.680 --> 00:02:19.580
the person is Bobby Burglar who was
previously convicted of burglary,
00:02:19.580 --> 00:02:24.790
was recently released and is now currently
on patrole. The voice in her ear whispers:
00:02:24.790 --> 00:02:29.970
50 percent likely to commit a crime.
Can she stop and search him?
00:02:29.970 --> 00:02:32.970
Should she chat him up?
Should see how he acts?
00:02:32.970 --> 00:02:37.150
Does she need additional information
to stop and detain him?
00:02:37.150 --> 00:02:40.900
And does it matter that he´s
carrying a large duffle bag?
00:02:40.900 --> 00:02:45.579
Did the algorithm take this into account
or did it just look at his face?
00:02:45.579 --> 00:02:49.939
What information was being
collected at the time the algorithm
00:02:49.939 --> 00:02:55.259
chose to say 50% to provide
the final analysis?
00:02:55.259 --> 00:02:57.930
So, another thought I´m gonna
have you guys think about as I go
00:02:57.930 --> 00:03:01.540
through this presentation, is this
quote that is more favorable
00:03:01.540 --> 00:03:05.870
towards Police algorithms, which is:
“As people become data plots
00:03:05.870 --> 00:03:10.209
and probability scores, law enforcement
officials and politicians alike
00:03:10.209 --> 00:03:16.519
can point and say: ‘Technology is void of
the racist, profiling bias of humans.’”
00:03:16.519 --> 00:03:21.459
Is that true? Well, they probably
will point and say that,
00:03:21.459 --> 00:03:24.860
but is it actually void of
racist, profiling humans?
00:03:24.860 --> 00:03:27.849
And I´m gonna talk about that as well.
00:03:27.849 --> 00:03:32.759
So, Predictive Policing explained.
Who and what?
00:03:32.759 --> 00:03:35.620
First of all, Predictive Policing
actually isn´t new. All we´re doing
00:03:35.620 --> 00:03:41.469
is adding technology, doing better,
faster aggregation of data.
00:03:41.469 --> 00:03:47.200
Analysts in Police departments have been
doing this by hand for decades.
00:03:47.200 --> 00:03:50.950
These techniques are used to create
profiles that accurately match
00:03:50.950 --> 00:03:55.530
likely offenders with specific past
crimes. So, there´s individual targeting
00:03:55.530 --> 00:03:59.489
and then we have location-based
targeting. The location-based,
00:03:59.489 --> 00:04:05.010
the goal is to help Police
forces deploy their resources
00:04:05.010 --> 00:04:10.230
in a correct manner, in an efficient
manner. They can be as simple
00:04:10.230 --> 00:04:13.950
as recommending that general crime
may happen in a particular area,
00:04:13.950 --> 00:04:19.108
or specifically, what type of crime will
happen in a one-block-radius.
00:04:19.108 --> 00:04:24.050
They take into account the time
of day, the recent data collected
00:04:24.050 --> 00:04:30.040
and when in the year it´s happening
as well as weather etc.
00:04:30.040 --> 00:04:33.850
So, another really quick thing worth
going over, cause not everyone
00:04:33.850 --> 00:04:39.090
is familiar with machine learning.
This is a very basic breakdown
00:04:39.090 --> 00:04:43.069
of training an algorithm on a data set.
00:04:43.069 --> 00:04:46.240
You collect it from many different
sources, you put it all together,
00:04:46.240 --> 00:04:51.019
you clean it up, you split it into 3 sets:
a training set, a validation set
00:04:51.019 --> 00:04:56.350
and a test set. The training set is
what is going to develop the rules
00:04:56.350 --> 00:05:01.379
in which it´s going to kind of
determine the final outcome.
00:05:01.379 --> 00:05:05.060
You´re gonna use a validation
set to optimize it and finally
00:05:05.060 --> 00:05:09.729
apply this to establish
a confidence level.
00:05:09.729 --> 00:05:15.349
There you´ll set a support level where
you say you need a certain amount of data
00:05:15.349 --> 00:05:19.940
to determine whether or not the
algorithm has enough information
00:05:19.940 --> 00:05:24.190
to kind of make a prediction.
So, rules with a low support level
00:05:24.190 --> 00:05:28.759
are less likely to be statistically
significant and the confidence level
00:05:28.759 --> 00:05:34.099
in the end is basically if there´s
an 85% confidence level
00:05:34.099 --> 00:05:39.930
that means there´s an 85% chance that the
suspect, e.g. meeting the rule in question,
00:05:39.930 --> 00:05:45.139
is engaged in criminal conduct.
So, what does this mean? Well,
00:05:45.139 --> 00:05:49.590
it encourages collection and hoarding
of data about crimes and individuals.
00:05:49.590 --> 00:05:52.720
Because you want as much information
as possible so that you detect
00:05:52.720 --> 00:05:56.030
even the less likely scenarios.
00:05:56.030 --> 00:05:59.729
Information sharing is also
encouraged because it´s easier,
00:05:59.729 --> 00:06:04.090
it´s done by third parties, or even
what are called fourth parties
00:06:04.090 --> 00:06:07.860
and shared amongst departments.
And here, your criminal data again
00:06:07.860 --> 00:06:10.810
was being done by analysts in Police
departments for decades, but
00:06:10.810 --> 00:06:13.660
the information sharing and the amount
of information they could aggregate
00:06:13.660 --> 00:06:17.169
was just significantly more difficult. So,
00:06:17.169 --> 00:06:21.410
what are these Predictive Policing
algorithms and software…
00:06:21.410 --> 00:06:24.580
what are they doing? Are they
determining guilt and innocence?
00:06:24.580 --> 00:06:29.289
And, unlike a thoughtcrime, they
are not saying this person is guilty,
00:06:29.289 --> 00:06:33.289
this person is innocent. It´s creating
a probability of whether or not
00:06:33.289 --> 00:06:37.800
the person has likely committed
a crime or will likely commit a crime.
00:06:37.800 --> 00:06:41.030
And it can only say something
to the future and the past.
00:06:41.030 --> 00:06:46.310
This here is a picture from
one particular piece of software
00:06:46.310 --> 00:06:50.199
provided by HunchLab; and patterns
emerge here from past crimes
00:06:50.199 --> 00:06:58.230
that can profile criminal types and
associations, detect crime patterns etc.
00:06:58.230 --> 00:07:02.139
Generally in this types of algorithms
they are using unsupervised data,
00:07:02.139 --> 00:07:05.479
that means someone is not going through
and saying true-false, good-bad, good-bad.
00:07:05.479 --> 00:07:10.780
There´s just 1) too much information and
2) they´re trying to do clustering,
00:07:10.780 --> 00:07:15.279
determine the things that are similar.
00:07:15.279 --> 00:07:20.110
So, really quickly, I´m also gonna
talk about the data that´s used.
00:07:20.110 --> 00:07:23.259
There are several different types:
Personal characteristics,
00:07:23.259 --> 00:07:28.169
demographic information, activities
of individuals, scientific data etc.
00:07:28.169 --> 00:07:32.690
This comes from all sorts of sources,
one that really shocked me, was,
00:07:32.690 --> 00:07:36.860
and I´ll talk about it a little bit in the
future, but, is the radiation detectors
00:07:36.860 --> 00:07:41.310
on New York City Police are
constantly taking in data
00:07:41.310 --> 00:07:44.819
and it´s so sensitive, it can detect if
you´ve had a recent medical treatment
00:07:44.819 --> 00:07:49.330
that involves radiation. Facial
recognition and biometrics
00:07:49.330 --> 00:07:52.860
are clear here and the third-party
doctrine – which basically says
00:07:52.860 --> 00:07:56.550
in the United States that you have no
reasonable expectation of privacy in data
00:07:56.550 --> 00:08:01.490
you share with third parties –
facilitates easy collection
00:08:01.490 --> 00:08:05.720
for Police officers and Government
officials because they can go
00:08:05.720 --> 00:08:11.080
and ask for the information
without any sort of warrant.
00:08:11.080 --> 00:08:16.259
For a really great overview: a friend of
mine, Dia, did a talk here at CCC
00:08:16.259 --> 00:08:21.280
on “The architecture of a street level
panopticon”. Does a really great overview
00:08:21.280 --> 00:08:25.199
of how this type of data is collected
on the streets. Worth checking out
00:08:25.199 --> 00:08:29.490
´cause I´m gonna gloss over
kind of the types of data.
00:08:29.490 --> 00:08:33.450
There is in the United States what
they call Multistate Anti-Terrorism
00:08:33.450 --> 00:08:38.279
Information Exchange Program which
uses everything from credit history,
00:08:38.279 --> 00:08:42.029
your concealed weapons permits,
aircraft pilot licenses,
00:08:42.029 --> 00:08:46.800
fishing licences etc. that´s searchable
and shared amongst Police departments
00:08:46.800 --> 00:08:50.530
and Government officials and this is just
more information. So, if they can collect
00:08:50.530 --> 00:08:57.690
it, they will aggregate it into a data
base. So, what are the current uses?
00:08:57.690 --> 00:09:01.779
There are many, many different
companies currently
00:09:01.779 --> 00:09:04.950
making software and marketing
it to Police departments.
00:09:04.950 --> 00:09:08.470
All of them are slightly different, have
different features, but currently
00:09:08.470 --> 00:09:12.260
it´s a competition to get clients,
Police departments etc.
00:09:12.260 --> 00:09:15.829
The more Police departments you have
the more data sharing you can sell,
00:09:15.829 --> 00:09:21.089
saying: “Oh, by enrolling you’ll now have
x,y and z Police departments’ data
00:09:21.089 --> 00:09:27.040
to access” etc. These here
are Hitachi and HunchLab,
00:09:27.040 --> 00:09:31.350
they both are hotspot targeting,
it´s not individual targeting,
00:09:31.350 --> 00:09:35.140
those are a lot rarer. And it´s actually
being used in my home town,
00:09:35.140 --> 00:09:39.550
which I´ll talk about in a little bit.
Here, the appropriate tactics
00:09:39.550 --> 00:09:44.180
are automatically displayed for officers
when they´re entering mission areas.
00:09:44.180 --> 00:09:47.920
So HunchLab will tell an officer:
“Hey, you´re entering an area
00:09:47.920 --> 00:09:52.180
where there´s gonna be burglary that you
should keep an eye out, be aware”.
00:09:52.180 --> 00:09:58.010
And this is updating in live time and
they´re hoping it mitigates crime.
00:09:58.010 --> 00:10:01.240
Here are 2 other ones, the Domain
Awareness System was created
00:10:01.240 --> 00:10:05.139
in New York City after 9/11
in conjunction with Microsoft.
00:10:05.139 --> 00:10:10.000
New York City actually makes
money selling it to other cities
00:10:10.000 --> 00:10:16.470
to use this. CCTV-cameras
are collected, they can…
00:10:16.470 --> 00:10:21.029
If they say there´s a man
wearing a red shirt,
00:10:21.029 --> 00:10:24.430
the software will look for people
wearing red shirts and at least
00:10:24.430 --> 00:10:28.139
alert Police departments to
people that meet this description
00:10:28.139 --> 00:10:34.389
walking in public in New York
City. The other one is by IBM
00:10:34.389 --> 00:10:40.139
and there are quite a few, you know, it´s
just generally another hotspot targeting,
00:10:40.139 --> 00:10:45.839
each have a few different features.
Worth mentioning, too, is the Heat List.
00:10:45.839 --> 00:10:50.769
This targeted individuals. I’m from the
city of Chicago. I grew up in the city.
00:10:50.769 --> 00:10:55.149
There are currently 420 names, when
this came out about a year ago,
00:10:55.149 --> 00:10:59.920
of individuals who are 500 times more
likely than average to be involved
00:10:59.920 --> 00:11:05.230
in violence. Individual names, passed
around to each Police officer in Chicago.
00:11:05.230 --> 00:11:10.029
They consider the rap sheet,
disturbance calls, social network etc.
00:11:10.029 --> 00:11:15.540
But one of the main things they considered
in placing mainly young black individuals
00:11:15.540 --> 00:11:19.279
on this list were known acquaintances
and their arrest histories.
00:11:19.279 --> 00:11:23.279
So if kids went to school or young
teenagers went to school
00:11:23.279 --> 00:11:27.880
with several people in a gang – and that
individual may not even be involved
00:11:27.880 --> 00:11:32.160
in a gang – they’re more likely to
appear on the list. The list has been
00:11:32.160 --> 00:11:36.829
heavily criticized for being racist,
for not giving these children
00:11:36.829 --> 00:11:40.660
or young individuals on the list
a chance to change their history
00:11:40.660 --> 00:11:44.510
because it’s being decided for them.
They’re being told: “You are likely
00:11:44.510 --> 00:11:49.850
to be a criminal, and we’re gonna
watch you”. Officers in Chicago
00:11:49.850 --> 00:11:53.550
visited these individuals would do knock
and announce with a knock on the door
00:11:53.550 --> 00:11:58.029
and say: “Hi, I’m here, like just
checking up what are you up to”.
00:11:58.029 --> 00:12:02.480
Which you don’t need any special
suspicion to do. But it’s, you know,
00:12:02.480 --> 00:12:06.860
kind of a harassment that
might cause a feedback,
00:12:06.860 --> 00:12:11.310
back into the data collected.
00:12:11.310 --> 00:12:15.209
This is PRECOBS. It’s currently
used here in Hamburg.
00:12:15.209 --> 00:12:19.100
They actually went to Chicago and
visited the Chicago Police Department
00:12:19.100 --> 00:12:24.170
to learn about Predictive Policing
tactics in Chicago to implement it
00:12:24.170 --> 00:12:29.729
throughout Germany, Hamburg and Berlin.
00:12:29.729 --> 00:12:33.620
It’s used to generally
forecast repeat-offenses.
00:12:33.620 --> 00:12:39.930
Again, when training data sets you need
enough data points to predict crime.
00:12:39.930 --> 00:12:43.699
So crimes that are less likely to
happen or happen very rarely:
00:12:43.699 --> 00:12:48.120
much harder to predict. Crimes that
aren’t reported: much harder to predict.
00:12:48.120 --> 00:12:52.480
So a lot of these software…
like pieces of software
00:12:52.480 --> 00:12:58.290
rely on algorithms that are hoping
that there’s a same sort of picture,
00:12:58.290 --> 00:13:03.070
that they can predict: where and when
and what type of crime will happen.
00:13:03.070 --> 00:13:06.890
PRECOBS is actually a term with a plan
00:13:06.890 --> 00:13:11.240
– the movie ‘Minority Report’, if you’re
familiar with it, it’s the 3 psychics
00:13:11.240 --> 00:13:15.370
who predict crimes
before they happen.
00:13:15.370 --> 00:13:19.149
So there’re other, similar systems
in the world that are being used
00:13:19.149 --> 00:13:22.949
to predict whether or not
something will happen.
00:13:22.949 --> 00:13:27.360
The first one is ‘Disease and Diagnosis’.
They found that algorithms are actually
00:13:27.360 --> 00:13:33.810
more likely than doctors to predict
what disease an individual has.
00:13:33.810 --> 00:13:39.480
It’s kind of shocking. The other is
‘Security Clearance’ in the US.
00:13:39.480 --> 00:13:44.240
It allows access to classified documents.
There’s no automatic access in the US.
00:13:44.240 --> 00:13:48.750
So every person who wants to see
some sort of secret cleared document
00:13:48.750 --> 00:13:53.089
must go through this process.
And it’s vetting individuals.
00:13:53.089 --> 00:13:56.690
So it’s an opt-in process. But here
they’re trying to predict who will
00:13:56.690 --> 00:14:00.550
disclose information, who will
break the clearance system;
00:14:00.550 --> 00:14:05.810
and predict there… Here, the error rate,
they’re probably much more comfortable
00:14:05.810 --> 00:14:09.360
with a high error rate. Because they
have so many people competing
00:14:09.360 --> 00:14:13.699
for a particular job, to get
clearance, that if they’re wrong,
00:14:13.699 --> 00:14:18.000
that somebody probably won’t disclose
information, they don’t care,
00:14:18.000 --> 00:14:22.319
they just rather eliminate
them than take the risk.
00:14:22.319 --> 00:14:27.509
So I’m an attorney in the US. I have
this urge to talk about US law.
00:14:27.509 --> 00:14:32.089
It also seems to impact a lot
of people internationally.
00:14:32.089 --> 00:14:36.360
Here we’re talking about the targeting
of individuals, not hotspots.
00:14:36.360 --> 00:14:40.810
So targeting of individuals is
not as widespread, currently.
00:14:40.810 --> 00:14:45.579
However it’s happening in Chicago;
00:14:45.579 --> 00:14:49.259
and other cities are considering
implementing programs and there are grants
00:14:49.259 --> 00:14:53.730
right now to encourage
Police departments
00:14:53.730 --> 00:14:57.110
to figure out target lists.
00:14:57.110 --> 00:15:00.699
So in the US suspicion is based on
the totality of the circumstances.
00:15:00.699 --> 00:15:04.730
That’s the whole picture. The Police
officer, the individual must look
00:15:04.730 --> 00:15:08.269
at the whole picture of what’s happening
before they can detain an individual.
00:15:08.269 --> 00:15:11.920
It’s supposed to be a balanced
assessment of relative weights, meaning
00:15:11.920 --> 00:15:16.399
– you know – if you know that the
person is a pastor maybe then
00:15:16.399 --> 00:15:21.720
pacing in front of a liquor
store, is not as suspicious
00:15:21.720 --> 00:15:26.370
as somebody who’s been convicted
of 3 burglaries. It has to be ‘based
00:15:26.370 --> 00:15:31.430
on specific and articulable facts’. And
the Police officers can use experience
00:15:31.430 --> 00:15:37.470
and common sense to determine
whether or not their suspicion…
00:15:37.470 --> 00:15:42.920
Large amounts of networked data generally
can provide individualized suspicion.
00:15:42.920 --> 00:15:48.410
The principal components here… the
events leading up to the stop-and-search
00:15:48.410 --> 00:15:52.319
– what is the person doing right before
they’re detained as well as the use
00:15:52.319 --> 00:15:57.709
of historical facts known about that
individual, the crime, the area
00:15:57.709 --> 00:16:02.329
in which it’s happening etc.
So it can rely on both things.
00:16:02.329 --> 00:16:06.819
No court in the US has really put out
a percentage as what Probable Cause
00:16:06.819 --> 00:16:11.089
and Reasonable Suspicion. So ‘Probable
Cause’ you need to get a warrant
00:16:11.089 --> 00:16:14.639
to search and seize an individual.
‘Reasonable Suspicion’ is needed
00:16:14.639 --> 00:16:20.329
to do stop-and-frisk in the US – stop
an individual and question them.
00:16:20.329 --> 00:16:24.100
And this is a little bit different than
what they call ‘Consensual Encounters’,
00:16:24.100 --> 00:16:27.680
where a Police officer goes up to you and
chats you up. ‘Reasonable Suspicion’
00:16:27.680 --> 00:16:32.029
– you’re actually detained. But I had
a law professor who basically said:
00:16:32.029 --> 00:16:35.730
“30%..45% seem like a really good number
00:16:35.730 --> 00:16:39.290
just to show how low it really is”.You
don’t even need to be 50% sure
00:16:39.290 --> 00:16:42.180
that somebody has committed a crime.
00:16:42.180 --> 00:16:47.459
So, officers can draw from their own
experience to determine ‘Probable Cause’.
00:16:47.459 --> 00:16:51.350
And the UK has a similar
‘Reasonable Suspicion’ standard
00:16:51.350 --> 00:16:55.010
which depend on the circumstances
of each case. So,
00:16:55.010 --> 00:16:58.819
I’m not as familiar with UK law but I
believe even that some of the analysis-run
00:16:58.819 --> 00:17:03.480
‘Reasonable Suspicion’ is similar.
00:17:03.480 --> 00:17:07.339
Is this like a black box?
So, I threw this slide in
00:17:07.339 --> 00:17:10.960
for those who are interested
in comparing this US law.
00:17:10.960 --> 00:17:15.280
Generally a dog sniff in the US
falls under a particular set
00:17:15.280 --> 00:17:20.140
of legal history which is: a
dog can go up, sniff for dogs,
00:17:20.140 --> 00:17:24.220
alert and that is completely okay.
00:17:24.220 --> 00:17:28.099
And the Police officers can use that
data to detain and further search
00:17:28.099 --> 00:17:33.520
an individual. So is an algorithm similar
to the dog which is kind of a black box?
00:17:33.520 --> 00:17:37.030
Information goes out, it’s processed,
information comes out and
00:17:37.030 --> 00:17:42.720
a prediction is made.
Police rely on the ‘Good Faith’
00:17:42.720 --> 00:17:48.780
in ‘Totality of the Circumstances’
to make their decision. So there’s
00:17:48.780 --> 00:17:53.970
really no… if they’re
relying on the algorithm
00:17:53.970 --> 00:17:57.230
and think in that situation that
everything’s okay we might reach
00:17:57.230 --> 00:18:01.980
a level of ‘Reasonable Suspicion’ where
the individual can now pat down
00:18:01.980 --> 00:18:08.470
the person he’s decided on the street
or the algorithm has alerted to. So,
00:18:08.470 --> 00:18:13.220
the big question is, you know, “Could the
officer consult predictive software apps
00:18:13.220 --> 00:18:18.610
in any individual analysis. Could he
say: ‘60% likely to commit a crime’”.
00:18:18.610 --> 00:18:24.180
In my hypo: Does that
mean that the person
00:18:24.180 --> 00:18:29.160
without looking at anything
else detain that individual.
00:18:29.160 --> 00:18:33.810
And the answer is “Probably not”. One:
predictive Policing algorithms just
00:18:33.810 --> 00:18:37.770
can not take in the Totality of the
Circumstances. They have to be
00:18:37.770 --> 00:18:42.690
frequently updated, there are
things that are happening that
00:18:42.690 --> 00:18:46.060
the algorithm possibly could
not have taken into account.
00:18:46.060 --> 00:18:48.590
The problem here is
that the algorithm itself,
00:18:48.590 --> 00:18:51.780
the prediction itself becomes part
of Totality of the Circumstances,
00:18:51.780 --> 00:18:56.330
which I’m going to talk
about a little bit more later.
00:18:56.330 --> 00:19:00.660
But officers have to have Reasonable
Suspicion before the stop occurs.
00:19:00.660 --> 00:19:04.660
Retroactive justification
is not sufficient. So,
00:19:04.660 --> 00:19:08.790
the algorithm can’t just say:
“60% likely, you detain the individual
00:19:08.790 --> 00:19:12.130
and then figure out why you’ve
detained the person”. It has to be
00:19:12.130 --> 00:19:16.570
before the detention actually happens.
And the suspicion must relate
00:19:16.570 --> 00:19:19.990
to current criminal activity. The
person must be doing something
00:19:19.990 --> 00:19:24.700
to indicate criminal activity. Just
the fact that an algorithm says,
00:19:24.700 --> 00:19:29.440
based on these facts: “60%”,
or even without articulating
00:19:29.440 --> 00:19:33.890
why the algorithm has
chosen that, isn’t enough.
00:19:33.890 --> 00:19:38.380
Maybe you can see a gun
shaped bulge in the pocket etc.
00:19:38.380 --> 00:19:43.160
So, effectiveness… the
Totality of the Circumstances,
00:19:43.160 --> 00:19:46.720
can the algorithms keep up?
Generally, probably not.
00:19:46.720 --> 00:19:50.560
Missing data, not capable of
processing this data in real time.
00:19:50.560 --> 00:19:54.820
There’s no idea… the
algorithm doesn’t know,
00:19:54.820 --> 00:19:58.950
and the Police officer probably
doesn’t know the all of the facts.
00:19:58.950 --> 00:20:03.260
So the Police officer can take
the algorithm into consideration
00:20:03.260 --> 00:20:08.130
but the problem here is: Did the algorithm
know that the individual was active
00:20:08.130 --> 00:20:12.670
in the community, or was a politician, or
00:20:12.670 --> 00:20:17.450
that was a personal friend of the officer
etc. It can’t just be relied upon.
00:20:17.450 --> 00:20:22.640
What if the algorithm did take into
account that the individual was a Pastor?
00:20:22.640 --> 00:20:26.180
Now that information is counted twice
and the balancing for the Totality
00:20:26.180 --> 00:20:34.320
of the Circumstances is off. Humans
here must be the final decider.
00:20:34.320 --> 00:20:38.040
What are the problems?
Well, there’s bad underlying data,
00:20:38.040 --> 00:20:41.970
there’s no transparency into
what kind of data is being used,
00:20:41.970 --> 00:20:45.720
how it was collected, how old it
is, how often it’s been updated,
00:20:45.720 --> 00:20:51.010
whether or not it’s been verified. There
could just be noise in the training data.
00:20:51.010 --> 00:20:57.240
Honestly, the data is biased. It was
collected by individuals in the US;
00:20:57.240 --> 00:21:01.020
generally there’ve been
several studies done that
00:21:01.020 --> 00:21:05.270
black, young individuals are
stopped more often than whites.
00:21:05.270 --> 00:21:09.800
And this is going to
cause a collection bias.
00:21:09.800 --> 00:21:14.550
It’s gonna be drastically disproportionate
to the makeup of the population of cities;
00:21:14.550 --> 00:21:19.440
and as more data has been collected on
minorities, refugees in poor neighborhoods
00:21:19.440 --> 00:21:23.640
it’s gonna feed back in and of course only
have data on those groups and provide
00:21:23.640 --> 00:21:26.410
feedback and say:
“More crime is likely to
00:21:26.410 --> 00:21:27.770
happen because that’s where the data
00:21:27.770 --> 00:21:32.250
was collected”. So, what’s
an acceptable error rate, well,
00:21:32.250 --> 00:21:37.500
depends on the burden of proof. Harm
is different for an opt-in system.
00:21:37.500 --> 00:21:40.840
You know, what’s my harm if I don’t
get clearance, or I don’t get the job;
00:21:40.840 --> 00:21:45.160
but I’m opting in, I’m asking to
being considered for employment.
00:21:45.160 --> 00:21:49.080
In the US, what’s an error? If you
search and find nothing, if you think
00:21:49.080 --> 00:21:53.630
you have Reasonable Suspicion
based on good faith,
00:21:53.630 --> 00:21:57.060
both on the algorithm and what
you witness, the US says that it’s
00:21:57.060 --> 00:22:00.620
no 4th Amendment violation,
even if nothing has happened.
00:22:00.620 --> 00:22:05.970
It’s very low error
false-positive rate here.
00:22:05.970 --> 00:22:09.140
In Big Data, generally, and
machine-learning it’s great!
00:22:09.140 --> 00:22:13.550
Like 1% error is fantastic! But that’s
pretty large for the number of individuals
00:22:13.550 --> 00:22:17.930
stopped each day. Or who might
be subject to these algorithms.
00:22:17.930 --> 00:22:21.950
Because even though there’re only
400 individuals on the list in Chicago
00:22:21.950 --> 00:22:25.210
those individuals have been
listed basically as targets
00:22:25.210 --> 00:22:28.870
by the Chicago Police Department.
00:22:28.870 --> 00:22:33.700
Other problems include database errors.
Exclusion of evidence in the US
00:22:33.700 --> 00:22:37.170
only happens when there’s gross
negligence or systematic misconduct.
00:22:37.170 --> 00:22:42.150
That’s very difficult to prove, especially
when a lot of people view these algorithms
00:22:42.150 --> 00:22:47.360
as a big box. Data goes in,
predictions come out, everyone’s happy.
00:22:47.360 --> 00:22:53.100
You rely and trust on the
quality of IBM, HunchLab etc.
00:22:53.100 --> 00:22:56.730
to provide good software.
00:22:56.730 --> 00:23:01.000
Finally, some more concerns I have
include feedback loop auditing
00:23:01.000 --> 00:23:04.810
and access to data and algorithms
and the prediction thresholds.
00:23:04.810 --> 00:23:09.970
How certain must a prediction be
– before it’s reported to the Police –
00:23:09.970 --> 00:23:13.230
that the person might commit a
crime. Or that crime might happen
00:23:13.230 --> 00:23:18.460
in the individual area. If Reasonable
Suspicion is as low as 35%,
00:23:18.460 --> 00:23:23.740
and reasonable Suspicion in the US has
been held at: That guy drives a car
00:23:23.740 --> 00:23:28.350
that drug dealers like to drive,
and he’s in the DEA database
00:23:28.350 --> 00:23:36.550
as a possible drug dealer. That was
enough to stop and search him.
00:23:36.550 --> 00:23:40.090
So, are there Positives? Well, PredPol,
00:23:40.090 --> 00:23:44.800
which is one of the services that
provides Predictive Policing software,
00:23:44.800 --> 00:23:49.650
says: “Since these cities have
implemented there’s been dropping crime”.
00:23:49.650 --> 00:23:54.030
In L.A. 13% reduction in
crime, in one division.
00:23:54.030 --> 00:23:57.510
There was even one day where
they had no crime reported.
00:23:57.510 --> 00:24:04.550
Santa Cruz – 25..29% reduction,
-9% in assaults etc.
00:24:04.550 --> 00:24:10.030
One: these are Police departments
self-reporting these successes for…
00:24:10.030 --> 00:24:14.670
you know, take it for what it is
and reiterated by the people
00:24:14.670 --> 00:24:20.510
selling the software. But perhaps
it is actually reducing crime.
00:24:20.510 --> 00:24:24.390
It’s kind of hard to tell because
there’s a feedback loop.
00:24:24.390 --> 00:24:29.200
Do we know that crime is really being
reduced? Will it affect the data
00:24:29.200 --> 00:24:33.170
that is collected in the future? It’s
really hard to know. Because
00:24:33.170 --> 00:24:38.330
if you send the Police officers into
a community it’s more likely
00:24:38.330 --> 00:24:42.580
that they’re going to affect that
community and that data collection.
00:24:42.580 --> 00:24:46.940
Will more crimes happen because they
feel like the Police are harassing them?
00:24:46.940 --> 00:24:52.020
It’s very likely and it’s a problem here.
00:24:52.020 --> 00:24:56.930
So, some final thoughts. Predictive
Policing programs are not going anywhere.
00:24:56.930 --> 00:25:01.430
They’re only in their wheelstart.
00:25:01.430 --> 00:25:06.030
And I think that more analysis, more
transparency, more access to data
00:25:06.030 --> 00:25:10.560
needs to happen around these algorithms.
There needs to be regulation.
00:25:10.560 --> 00:25:16.000
Currently, a very successful way in which
00:25:16.000 --> 00:25:19.310
these companies get data is they
buy from Third Party sources
00:25:19.310 --> 00:25:24.590
and then sell it to Police departments. So
perhaps PredPol might get information
00:25:24.590 --> 00:25:28.780
from Google, Facebook, Social Media
accounts; aggregate data themselves,
00:25:28.780 --> 00:25:31.890
and then turn around and sell it to
Police departments or provide access
00:25:31.890 --> 00:25:36.110
to Police departments. And generally, the
Courts are gonna have to begin to work out
00:25:36.110 --> 00:25:40.210
how to handle this type of data.
There’s not case law,
00:25:40.210 --> 00:25:45.160
at least in the US, that really knows
how to handle predictive algorithms
00:25:45.160 --> 00:25:48.900
in determining what the analysis says.
And so there really needs to be
00:25:48.900 --> 00:25:52.600
a lot more research and
thought put into this.
00:25:52.600 --> 00:25:56.480
And one of the big things in order
for this to actually be useful:
00:25:56.480 --> 00:26:01.590
if this is a tactic that had been used
by Police departments for decades,
00:26:01.590 --> 00:26:04.420
we need to eliminate the bias in
the data sets. Because right now
00:26:04.420 --> 00:26:09.090
all that it’s doing is facilitating and
continuing bias, set in the database.
00:26:09.090 --> 00:26:12.610
And it’s incredibly difficult.
It’s data collected by humans.
00:26:12.610 --> 00:26:17.780
And it causes initial selection bias.
Which is gonna have to stop
00:26:17.780 --> 00:26:21.380
for it to be successful.
00:26:21.380 --> 00:26:25.930
And perhaps these systems can cause
implicit bias or confirmation bias,
00:26:25.930 --> 00:26:29.030
e.g. Police are going to believe
what they’ve been told.
00:26:29.030 --> 00:26:33.170
So if a Police officer goes
on duty to an area
00:26:33.170 --> 00:26:36.660
and an algorithm says: “You’re
70% likely to find a burglar
00:26:36.660 --> 00:26:40.840
in this area”. Are they gonna find
a burglar because they’ve been told:
00:26:40.840 --> 00:26:45.930
“You might find a burglar”?
And finally the US border.
00:26:45.930 --> 00:26:49.800
There is no 4th Amendment
protection at the US border.
00:26:49.800 --> 00:26:53.740
It’s an exception to the warrant
requirement. This means
00:26:53.740 --> 00:26:58.740
no suspicion is needed to commit
a search. So this data is gonna go into
00:26:58.740 --> 00:27:03.680
a way to examine you when
you cross the border.
00:27:03.680 --> 00:27:09.960
And aggregate data can be used to
refuse you entry into the US etc.
00:27:09.960 --> 00:27:13.690
And I think that’s pretty much it.
And so a few minutes for questions.
00:27:13.690 --> 00:27:24.490
applause
Thank you!
00:27:24.490 --> 00:27:27.460
Herald: Thanks a lot for your talk,
Whitney. We have about 4 minutes left
00:27:27.460 --> 00:27:31.800
for questions. So please line up at
the microphones and remember to
00:27:31.800 --> 00:27:37.740
make short and easy questions.
00:27:37.740 --> 00:27:42.060
Microphone No.2, please.
00:27:42.060 --> 00:27:53.740
Question: Just a comment: if I want
to run a crime organization, like,
00:27:53.740 --> 00:27:57.760
I would target the PRECOBS
here in Hamburg, maybe.
00:27:57.760 --> 00:28:01.170
So I can take the crime to the scenes
00:28:01.170 --> 00:28:05.700
where the PRECOBS doesn’t suspect.
00:28:05.700 --> 00:28:08.940
Whitney: Possibly. And I think this is
a big problem in getting availability
00:28:08.940 --> 00:28:13.410
of data; in that there’s a good argument
for Police departments to say:
00:28:13.410 --> 00:28:16.590
“We don’t want to tell you what
our tactics are for Policing,
00:28:16.590 --> 00:28:19.490
because it might move crime”.
00:28:19.490 --> 00:28:23.130
Herald: Do we have questions from
the internet? Yes, then please,
00:28:23.130 --> 00:28:26.580
one question from the internet.
00:28:26.580 --> 00:28:29.770
Signal Angel: Is there evidence that data
like the use of encrypted messaging
00:28:29.770 --> 00:28:35.710
systems, encrypted emails, VPN, TOR,
with automated request to the ISP,
00:28:35.710 --> 00:28:41.980
are used to obtain real names and
collected to contribute to the scoring?
00:28:41.980 --> 00:28:45.580
Whitney: I’m not sure if that’s
being taken into account
00:28:45.580 --> 00:28:49.530
by Predictive Policing algorithms,
or by the software being used.
00:28:49.530 --> 00:28:55.160
I know that Police departments do
take those things into consideration.
00:28:55.160 --> 00:29:00.630
And considering that in the US
Totality of the Circumstances is
00:29:00.630 --> 00:29:04.980
how you evaluate suspicion. They are gonna
take all of those things into account
00:29:04.980 --> 00:29:09.150
and they actually kind of
have to take into account.
00:29:09.150 --> 00:29:11.830
Herald: Okay, microphone No.1, please.
00:29:11.830 --> 00:29:16.790
Question: In your example you mentioned
disease tracking, e.g. Google Flu Trends
00:29:16.790 --> 00:29:21.870
is a good example of preventive Predictive
Policing. Are there any examples
00:29:21.870 --> 00:29:27.630
where – instead of increasing Policing
in the lives of communities –
00:29:27.630 --> 00:29:34.260
where sociologists or social workers
are called to use predictive tools,
00:29:34.260 --> 00:29:36.210
instead of more criminalization?
00:29:36.210 --> 00:29:41.360
Whitney: I’m not aware if that’s…
if Police departments are sending
00:29:41.360 --> 00:29:45.250
social workers instead of Police officers.
But that wouldn’t surprise me because
00:29:45.250 --> 00:29:50.060
algorithms are being used to suspect child
abuse. And in the US they’re gonna send
00:29:50.060 --> 00:29:53.230
a social worker in regard. So I would
not be surprised if that’s also being
00:29:53.230 --> 00:29:56.890
considered. Since that’s
part of the resources.
00:29:56.890 --> 00:29:59.030
Herald: OK, so if you have
a really short question, then
00:29:59.030 --> 00:30:01.470
microphone No.2, please.
Last question.
00:30:01.470 --> 00:30:08.440
Question: Okay, thank you for the
talk. This talk as well as few others
00:30:08.440 --> 00:30:13.710
brought the thought in the debate
about the fine-tuning that is required
00:30:13.710 --> 00:30:19.790
between false positives and
preventing crimes or terror.
00:30:19.790 --> 00:30:24.250
Now, it’s a different situation
if the Policeman is predicting,
00:30:24.250 --> 00:30:28.350
or a system is predicting somebody’s
stealing a paper from someone;
00:30:28.350 --> 00:30:32.230
or someone is creating a terror attack.
00:30:32.230 --> 00:30:38.030
And the justification to prevent it
00:30:38.030 --> 00:30:42.980
under the expense of false positive
is different in these cases.
00:30:42.980 --> 00:30:49.080
How do you make sure that the decision
or the fine-tuning is not going to be
00:30:49.080 --> 00:30:53.570
deep down in the algorithm
and by the programmers,
00:30:53.570 --> 00:30:58.650
but rather by the customer
– the Policemen or the authorities?
00:30:58.650 --> 00:31:02.720
Whitney: I can imagine that Police
officers are using common sense in that,
00:31:02.720 --> 00:31:06.220
and their knowledge about the situation
and even what they’re being told
00:31:06.220 --> 00:31:10.450
by the algorithm. You hope
that they’re gonna take…
00:31:10.450 --> 00:31:13.790
they probably are gonna take
terrorism to a different level
00:31:13.790 --> 00:31:17.260
than a common burglary or
a stealing of a piece of paper
00:31:17.260 --> 00:31:21.760
or a non-violent crime.
And that fine-tuning
00:31:21.760 --> 00:31:26.160
is probably on a Police department
00:31:26.160 --> 00:31:29.390
by Police department basis.
00:31:29.390 --> 00:31:32.090
Herald: Thank you! This was Whitney
Merrill, give a warm round of applause, please!!
00:31:32.090 --> 00:31:40.490
Whitney: Thank you!
applause
00:31:40.490 --> 00:31:42.510
postroll music
00:31:42.510 --> 00:31:51.501
Subtitles created by c3subtitles.de
in the year 2016. Join and help us!