WEBVTT 00:00:00.310 --> 00:00:10.240 32c3 preroll music 00:00:10.240 --> 00:00:13.920 Angel: I introduce Whitney Merrill. She is an attorney in the US 00:00:13.920 --> 00:00:17.259 and she just recently, actually last week, graduated 00:00:17.259 --> 00:00:20.999 to her CS masters in Illinois. 00:00:20.999 --> 00:00:27.299 applause 00:00:27.299 --> 00:00:30.249 Angel: Without further ado: ‘Predicting Crime In A Big Data World’. 00:00:30.249 --> 00:00:32.870 cautious applause 00:00:32.870 --> 00:00:36.920 Whitney Merrill: Hi everyone. Thank you so much for coming. 00:00:36.920 --> 00:00:40.950 I know it´s been a exhausting Congress, so I appreciate you guys coming 00:00:40.950 --> 00:00:45.300 to hear me talk about Big Data and Crime Prediction. 00:00:45.300 --> 00:00:48.820 This is kind of a hobby of mine, I, 00:00:48.820 --> 00:00:53.030 in my last semester at Illinois, decided to poke around 00:00:53.030 --> 00:00:56.850 what´s currently happening, how these algorithms are being used and kind of 00:00:56.850 --> 00:01:00.390 figure out what kind of information can be gathered. So, I have about 30 minutes 00:01:00.390 --> 00:01:04.629 with you guys. I´m gonna do a broad overview of the types of programs. 00:01:04.629 --> 00:01:10.020 I´m gonna talk about what Predictive Policing is, the data used, 00:01:10.020 --> 00:01:13.600 similar systems in other areas where predictive algorithms are 00:01:13.600 --> 00:01:19.079 trying to better society, current uses in policing. 00:01:19.079 --> 00:01:22.119 I´m gonna talk a little bit about their effectiveness and then give you 00:01:22.119 --> 00:01:26.409 some final thoughts. So, imagine, 00:01:26.409 --> 00:01:30.310 in the very near future a Police officer is walking down the street 00:01:30.310 --> 00:01:34.389 wearing a camera on her collar. In her ear is a feed of information 00:01:34.389 --> 00:01:38.819 about the people and cars she passes alerting her to individuals and cars 00:01:38.819 --> 00:01:43.259 that might fit a particular crime or profile for a criminal. 00:01:43.259 --> 00:01:47.619 Early in the day she examined a map highlighting hotspots for crime. 00:01:47.619 --> 00:01:52.459 In the area she´s been set to patrol the predictive policing software 00:01:52.459 --> 00:01:57.590 indicates that there is an 82% chance of burglary at 2 pm, 00:01:57.590 --> 00:02:01.539 and it´s currently 2:10 pm. As she passes one individual 00:02:01.539 --> 00:02:05.549 her camera captures the individual´s face, runs it through 00:02:05.549 --> 00:02:10.399 a coordinated Police database - all of the Police departments that use this database 00:02:10.399 --> 00:02:14.680 are sharing information. Facial recognition software indicates that 00:02:14.680 --> 00:02:19.580 the person is Bobby Burglar who was previously convicted of burglary, 00:02:19.580 --> 00:02:24.790 was recently released and is now currently on patrole. The voice in her ear whispers: 00:02:24.790 --> 00:02:29.970 50 percent likely to commit a crime. Can she stop and search him? 00:02:29.970 --> 00:02:32.970 Should she chat him up? Should see how he acts? 00:02:32.970 --> 00:02:37.150 Does she need additional information to stop and detain him? 00:02:37.150 --> 00:02:40.900 And does it matter that he´s carrying a large duffle bag? 00:02:40.900 --> 00:02:45.579 Did the algorithm take this into account or did it just look at his face? 00:02:45.579 --> 00:02:49.939 What information was being collected at the time the algorithm 00:02:49.939 --> 00:02:55.259 chose to say 50% to provide the final analysis? 00:02:55.259 --> 00:02:57.930 So, another thought I´m gonna have you guys think about as I go 00:02:57.930 --> 00:03:01.540 through this presentation, is this quote that is more favorable 00:03:01.540 --> 00:03:05.870 towards Police algorithms, which is: “As people become data plots 00:03:05.870 --> 00:03:10.209 and probability scores, law enforcement officials and politicians alike 00:03:10.209 --> 00:03:16.519 can point and say: ‘Technology is void of the racist, profiling bias of humans.’” 00:03:16.519 --> 00:03:21.459 Is that true? Well, they probably will point and say that, 00:03:21.459 --> 00:03:24.860 but is it actually void of racist, profiling humans? 00:03:24.860 --> 00:03:27.849 And I´m gonna talk about that as well. 00:03:27.849 --> 00:03:32.759 So, Predictive Policing explained. Who and what? 00:03:32.759 --> 00:03:35.620 First of all, Predictive Policing actually isn´t new. All we´re doing 00:03:35.620 --> 00:03:41.469 is adding technology, doing better, faster aggregation of data. 00:03:41.469 --> 00:03:47.200 Analysts in Police departments have been doing this by hand for decades. 00:03:47.200 --> 00:03:50.950 These techniques are used to create profiles that accurately match 00:03:50.950 --> 00:03:55.530 likely offenders with specific past crimes. So, there´s individual targeting 00:03:55.530 --> 00:03:59.489 and then we have location-based targeting. The location-based, 00:03:59.489 --> 00:04:05.010 the goal is to help Police forces deploy their resources 00:04:05.010 --> 00:04:10.230 in a correct manner, in an efficient manner. They can be as simple 00:04:10.230 --> 00:04:13.950 as recommending that general crime may happen in a particular area, 00:04:13.950 --> 00:04:19.108 or specifically, what type of crime will happen in a one-block-radius. 00:04:19.108 --> 00:04:24.050 They take into account the time of day, the recent data collected 00:04:24.050 --> 00:04:30.040 and when in the year it´s happening as well as weather etc. 00:04:30.040 --> 00:04:33.850 So, another really quick thing worth going over, cause not everyone 00:04:33.850 --> 00:04:39.090 is familiar with machine learning. This is a very basic breakdown 00:04:39.090 --> 00:04:43.069 of training an algorithm on a data set. 00:04:43.069 --> 00:04:46.240 You collect it from many different sources, you put it all together, 00:04:46.240 --> 00:04:51.019 you clean it up, you split it into 3 sets: a training set, a validation set 00:04:51.019 --> 00:04:56.350 and a test set. The training set is what is going to develop the rules 00:04:56.350 --> 00:05:01.379 in which it´s going to kind of determine the final outcome. 00:05:01.379 --> 00:05:05.060 You´re gonna use a validation set to optimize it and finally 00:05:05.060 --> 00:05:09.729 apply this to establish a confidence level. 00:05:09.729 --> 00:05:15.349 There you´ll set a support level where you say you need a certain amount of data 00:05:15.349 --> 00:05:19.940 to determine whether or not the algorithm has enough information 00:05:19.940 --> 00:05:24.190 to kind of make a prediction. So, rules with a low support level 00:05:24.190 --> 00:05:28.759 are less likely to be statistically significant and the confidence level 00:05:28.759 --> 00:05:34.099 in the end is basically if there´s an 85% confidence level 00:05:34.099 --> 00:05:39.930 that means there´s an 85% chance that the suspect, e.g. meeting the rule in question, 00:05:39.930 --> 00:05:45.139 is engaged in criminal conduct. So, what does this mean? Well, 00:05:45.139 --> 00:05:49.590 it encourages collection and hoarding of data about crimes and individuals. 00:05:49.590 --> 00:05:52.720 Because you want as much information as possible so that you detect 00:05:52.720 --> 00:05:56.030 even the less likely scenarios. 00:05:56.030 --> 00:05:59.729 Information sharing is also encouraged because it´s easier, 00:05:59.729 --> 00:06:04.090 it´s done by third parties, or even what are called fourth parties 00:06:04.090 --> 00:06:07.860 and shared amongst departments. And here, your criminal data again 00:06:07.860 --> 00:06:10.810 was being done by analysts in Police departments for decades, but 00:06:10.810 --> 00:06:13.660 the information sharing and the amount of information they could aggregate 00:06:13.660 --> 00:06:17.169 was just significantly more difficult. So, 00:06:17.169 --> 00:06:21.410 what are these Predictive Policing algorithms and software… 00:06:21.410 --> 00:06:24.580 what are they doing? Are they determining guilt and innocence? 00:06:24.580 --> 00:06:29.289 And, unlike a thoughtcrime, they are not saying this person is guilty, 00:06:29.289 --> 00:06:33.289 this person is innocent. It´s creating a probability of whether or not 00:06:33.289 --> 00:06:37.800 the person has likely committed a crime or will likely commit a crime. 00:06:37.800 --> 00:06:41.030 And it can only say something to the future and the past. 00:06:41.030 --> 00:06:46.310 This here is a picture from one particular piece of software 00:06:46.310 --> 00:06:50.199 provided by HunchLab; and patterns emerge here from past crimes 00:06:50.199 --> 00:06:58.230 that can profile criminal types and associations, detect crime patterns etc. 00:06:58.230 --> 00:07:02.139 Generally in this types of algorithms they are using unsupervised data, 00:07:02.139 --> 00:07:05.479 that means someone is not going through and saying true-false, good-bad, good-bad. 00:07:05.479 --> 00:07:10.780 There´s just 1) too much information and 2) they´re trying to do clustering, 00:07:10.780 --> 00:07:15.279 determine the things that are similar. 00:07:15.279 --> 00:07:20.110 So, really quickly, I´m also gonna talk about the data that´s used. 00:07:20.110 --> 00:07:23.259 There are several different types: Personal characteristics, 00:07:23.259 --> 00:07:28.169 demographic information, activities of individuals, scientific data etc. 00:07:28.169 --> 00:07:32.690 This comes from all sorts of sources, one that really shocked me, was, 00:07:32.690 --> 00:07:36.860 and I´ll talk about it a little bit in the future, but, is the radiation detectors 00:07:36.860 --> 00:07:41.310 on New York City Police are constantly taking in data 00:07:41.310 --> 00:07:44.819 and it´s so sensitive, it can detect if you´ve had a recent medical treatment 00:07:44.819 --> 00:07:49.330 that involves radiation. Facial recognition and biometrics 00:07:49.330 --> 00:07:52.860 are clear here and the third-party doctrine – which basically says 00:07:52.860 --> 00:07:56.550 in the United States that you have no reasonable expectation of privacy in data 00:07:56.550 --> 00:08:01.490 you share with third parties – facilitates easy collection 00:08:01.490 --> 00:08:05.720 for Police officers and Government officials because they can go 00:08:05.720 --> 00:08:11.080 and ask for the information without any sort of warrant. 00:08:11.080 --> 00:08:16.259 For a really great overview: a friend of mine, Dia, did a talk here at CCC 00:08:16.259 --> 00:08:21.280 on “The architecture of a street level panopticon”. Does a really great overview 00:08:21.280 --> 00:08:25.199 of how this type of data is collected on the streets. Worth checking out 00:08:25.199 --> 00:08:29.490 ´cause I´m gonna gloss over kind of the types of data. 00:08:29.490 --> 00:08:33.450 There is in the United States what they call Multistate Anti-Terrorism 00:08:33.450 --> 00:08:38.279 Information Exchange Program which uses everything from credit history, 00:08:38.279 --> 00:08:42.029 your concealed weapons permits, aircraft pilot licenses, 00:08:42.029 --> 00:08:46.800 fishing licences etc. that´s searchable and shared amongst Police departments 00:08:46.800 --> 00:08:50.530 and Government officials and this is just more information. So, if they can collect 00:08:50.530 --> 00:08:57.690 it, they will aggregate it into a data base. So, what are the current uses? 00:08:57.690 --> 00:09:01.779 There are many, many different companies currently 00:09:01.779 --> 00:09:04.950 making software and marketing it to Police departments. 00:09:04.950 --> 00:09:08.470 All of them are slightly different, have different features, but currently 00:09:08.470 --> 00:09:12.260 it´s a competition to get clients, Police departments etc. 00:09:12.260 --> 00:09:15.829 The more Police departments you have the more data sharing you can sell, 00:09:15.829 --> 00:09:21.089 saying: “Oh, by enrolling you’ll now have x,y and z Police departments’ data 00:09:21.089 --> 00:09:27.040 to access” etc. These here are Hitachi and HunchLab, 00:09:27.040 --> 00:09:31.350 they both are hotspot targeting, it´s not individual targeting, 00:09:31.350 --> 00:09:35.140 those are a lot rarer. And it´s actually being used in my home town, 00:09:35.140 --> 00:09:39.550 which I´ll talk about in a little bit. Here, the appropriate tactics 00:09:39.550 --> 00:09:44.180 are automatically displayed for officers when they´re entering mission areas. 00:09:44.180 --> 00:09:47.920 So HunchLab will tell an officer: “Hey, you´re entering an area 00:09:47.920 --> 00:09:52.180 where there´s gonna be burglary that you should keep an eye out, be aware”. 00:09:52.180 --> 00:09:58.010 And this is updating in live time and they´re hoping it mitigates crime. 00:09:58.010 --> 00:10:01.240 Here are 2 other ones, the Domain Awareness System was created 00:10:01.240 --> 00:10:05.139 in New York City after 9/11 in conjunction with Microsoft. 00:10:05.139 --> 00:10:10.000 New York City actually makes money selling it to other cities 00:10:10.000 --> 00:10:16.470 to use this. CCTV-cameras are collected, they can… 00:10:16.470 --> 00:10:21.029 If they say there´s a man wearing a red shirt, 00:10:21.029 --> 00:10:24.430 the software will look for people wearing red shirts and at least 00:10:24.430 --> 00:10:28.139 alert Police departments to people that meet this description 00:10:28.139 --> 00:10:34.389 walking in public in New York City. The other one is by IBM 00:10:34.389 --> 00:10:40.139 and there are quite a few, you know, it´s just generally another hotspot targeting, 00:10:40.139 --> 00:10:45.839 each have a few different features. Worth mentioning, too, is the Heat List. 00:10:45.839 --> 00:10:50.769 This targeted individuals. I’m from the city of Chicago. I grew up in the city. 00:10:50.769 --> 00:10:55.149 There are currently 420 names, when this came out about a year ago, 00:10:55.149 --> 00:10:59.920 of individuals who are 500 times more likely than average to be involved 00:10:59.920 --> 00:11:05.230 in violence. Individual names, passed around to each Police officer in Chicago. 00:11:05.230 --> 00:11:10.029 They consider the rap sheet, disturbance calls, social network etc. 00:11:10.029 --> 00:11:15.540 But one of the main things they considered in placing mainly young black individuals 00:11:15.540 --> 00:11:19.279 on this list were known acquaintances and their arrest histories. 00:11:19.279 --> 00:11:23.279 So if kids went to school or young teenagers went to school 00:11:23.279 --> 00:11:27.880 with several people in a gang – and that individual may not even be involved 00:11:27.880 --> 00:11:32.160 in a gang – they’re more likely to appear on the list. The list has been 00:11:32.160 --> 00:11:36.829 heavily criticized for being racist, for not giving these children 00:11:36.829 --> 00:11:40.660 or young individuals on the list a chance to change their history 00:11:40.660 --> 00:11:44.510 because it’s being decided for them. They’re being told: “You are likely 00:11:44.510 --> 00:11:49.850 to be a criminal, and we’re gonna watch you”. Officers in Chicago 00:11:49.850 --> 00:11:53.550 visited these individuals would do knock and announce with a knock on the door 00:11:53.550 --> 00:11:58.029 and say: “Hi, I’m here, like just checking up what are you up to”. 00:11:58.029 --> 00:12:02.480 Which you don’t need any special suspicion to do. But it’s, you know, 00:12:02.480 --> 00:12:06.860 kind of a harassment that might cause a feedback, 00:12:06.860 --> 00:12:11.310 back into the data collected. 00:12:11.310 --> 00:12:15.209 This is PRECOBS. It’s currently used here in Hamburg. 00:12:15.209 --> 00:12:19.100 They actually went to Chicago and visited the Chicago Police Department 00:12:19.100 --> 00:12:24.170 to learn about Predictive Policing tactics in Chicago to implement it 00:12:24.170 --> 00:12:29.729 throughout Germany, Hamburg and Berlin. 00:12:29.729 --> 00:12:33.620 It’s used to generally forecast repeat-offenses. 00:12:33.620 --> 00:12:39.930 Again, when training data sets you need enough data points to predict crime. 00:12:39.930 --> 00:12:43.699 So crimes that are less likely to happen or happen very rarely: 00:12:43.699 --> 00:12:48.120 much harder to predict. Crimes that aren’t reported: much harder to predict. 00:12:48.120 --> 00:12:52.480 So a lot of these software… like pieces of software 00:12:52.480 --> 00:12:58.290 rely on algorithms that are hoping that there’s a same sort of picture, 00:12:58.290 --> 00:13:03.070 that they can predict: where and when and what type of crime will happen. 00:13:03.070 --> 00:13:06.890 PRECOBS is actually a term with a plan 00:13:06.890 --> 00:13:11.240 – the movie ‘Minority Report’, if you’re familiar with it, it’s the 3 psychics 00:13:11.240 --> 00:13:15.370 who predict crimes before they happen. 00:13:15.370 --> 00:13:19.149 So there’re other, similar systems in the world that are being used 00:13:19.149 --> 00:13:22.949 to predict whether or not something will happen. 00:13:22.949 --> 00:13:27.360 The first one is ‘Disease and Diagnosis’. They found that algorithms are actually 00:13:27.360 --> 00:13:33.810 more likely than doctors to predict what disease an individual has. 00:13:33.810 --> 00:13:39.480 It’s kind of shocking. The other is ‘Security Clearance’ in the US. 00:13:39.480 --> 00:13:44.240 It allows access to classified documents. There’s no automatic access in the US. 00:13:44.240 --> 00:13:48.750 So every person who wants to see some sort of secret cleared document 00:13:48.750 --> 00:13:53.089 must go through this process. And it’s vetting individuals. 00:13:53.089 --> 00:13:56.690 So it’s an opt-in process. But here they’re trying to predict who will 00:13:56.690 --> 00:14:00.550 disclose information, who will break the clearance system; 00:14:00.550 --> 00:14:05.810 and predict there… Here, the error rate, they’re probably much more comfortable 00:14:05.810 --> 00:14:09.360 with a high error rate. Because they have so many people competing 00:14:09.360 --> 00:14:13.699 for a particular job, to get clearance, that if they’re wrong, 00:14:13.699 --> 00:14:18.000 that somebody probably won’t disclose information, they don’t care, 00:14:18.000 --> 00:14:22.319 they just rather eliminate them than take the risk. 00:14:22.319 --> 00:14:27.509 So I’m an attorney in the US. I have this urge to talk about US law. 00:14:27.509 --> 00:14:32.089 It also seems to impact a lot of people internationally. 00:14:32.089 --> 00:14:36.360 Here we’re talking about the targeting of individuals, not hotspots. 00:14:36.360 --> 00:14:40.810 So targeting of individuals is not as widespread, currently. 00:14:40.810 --> 00:14:45.579 However it’s happening in Chicago; 00:14:45.579 --> 00:14:49.259 and other cities are considering implementing programs and there are grants 00:14:49.259 --> 00:14:53.730 right now to encourage Police departments 00:14:53.730 --> 00:14:57.110 to figure out target lists. 00:14:57.110 --> 00:15:00.699 So in the US suspicion is based on the totality of the circumstances. 00:15:00.699 --> 00:15:04.730 That’s the whole picture. The Police officer, the individual must look 00:15:04.730 --> 00:15:08.269 at the whole picture of what’s happening before they can detain an individual. 00:15:08.269 --> 00:15:11.920 It’s supposed to be a balanced assessment of relative weights, meaning 00:15:11.920 --> 00:15:16.399 – you know – if you know that the person is a pastor maybe then 00:15:16.399 --> 00:15:21.720 pacing in front of a liquor store, is not as suspicious 00:15:21.720 --> 00:15:26.370 as somebody who’s been convicted of 3 burglaries. It has to be ‘based 00:15:26.370 --> 00:15:31.430 on specific and articulable facts’. And the Police officers can use experience 00:15:31.430 --> 00:15:37.470 and common sense to determine whether or not their suspicion… 00:15:37.470 --> 00:15:42.920 Large amounts of networked data generally can provide individualized suspicion. 00:15:42.920 --> 00:15:48.410 The principal components here… the events leading up to the stop-and-search 00:15:48.410 --> 00:15:52.319 – what is the person doing right before they’re detained as well as the use 00:15:52.319 --> 00:15:57.709 of historical facts known about that individual, the crime, the area 00:15:57.709 --> 00:16:02.329 in which it’s happening etc. So it can rely on both things. 00:16:02.329 --> 00:16:06.819 No court in the US has really put out a percentage as what Probable Cause 00:16:06.819 --> 00:16:11.089 and Reasonable Suspicion. So ‘Probable Cause’ you need to get a warrant 00:16:11.089 --> 00:16:14.639 to search and seize an individual. ‘Reasonable Suspicion’ is needed 00:16:14.639 --> 00:16:20.329 to do stop-and-frisk in the US – stop an individual and question them. 00:16:20.329 --> 00:16:24.100 And this is a little bit different than what they call ‘Consensual Encounters’, 00:16:24.100 --> 00:16:27.680 where a Police officer goes up to you and chats you up. ‘Reasonable Suspicion’ 00:16:27.680 --> 00:16:32.029 – you’re actually detained. But I had a law professor who basically said: 00:16:32.029 --> 00:16:35.730 “30%..45% seem like a really good number 00:16:35.730 --> 00:16:39.290 just to show how low it really is”.You don’t even need to be 50% sure 00:16:39.290 --> 00:16:42.180 that somebody has committed a crime. 00:16:42.180 --> 00:16:47.459 So, officers can draw from their own experience to determine ‘Probable Cause’. 00:16:47.459 --> 00:16:51.350 And the UK has a similar ‘Reasonable Suspicion’ standard 00:16:51.350 --> 00:16:55.010 which depend on the circumstances of each case. So, 00:16:55.010 --> 00:16:58.819 I’m not as familiar with UK law but I believe even that some of the analysis-run 00:16:58.819 --> 00:17:03.480 ‘Reasonable Suspicion’ is similar. 00:17:03.480 --> 00:17:07.339 Is this like a black box? So, I threw this slide in 00:17:07.339 --> 00:17:10.960 for those who are interested in comparing this US law. 00:17:10.960 --> 00:17:15.280 Generally a dog sniff in the US falls under a particular set 00:17:15.280 --> 00:17:20.140 of legal history which is: a dog can go up, sniff for dogs, 00:17:20.140 --> 00:17:24.220 alert and that is completely okay. 00:17:24.220 --> 00:17:28.099 And the Police officers can use that data to detain and further search 00:17:28.099 --> 00:17:33.520 an individual. So is an algorithm similar to the dog which is kind of a black box? 00:17:33.520 --> 00:17:37.030 Information goes out, it’s processed, information comes out and 00:17:37.030 --> 00:17:42.720 a prediction is made. Police rely on the ‘Good Faith’ 00:17:42.720 --> 00:17:48.780 in ‘Totality of the Circumstances’ to make their decision. So there’s 00:17:48.780 --> 00:17:53.970 really no… if they’re relying on the algorithm 00:17:53.970 --> 00:17:57.230 and think in that situation that everything’s okay we might reach 00:17:57.230 --> 00:18:01.980 a level of ‘Reasonable Suspicion’ where the individual can now pat down 00:18:01.980 --> 00:18:08.470 the person he’s decided on the street or the algorithm has alerted to. So, 00:18:08.470 --> 00:18:13.220 the big question is, you know, “Could the officer consult predictive software apps 00:18:13.220 --> 00:18:18.610 in any individual analysis. Could he say: ‘60% likely to commit a crime’”. 00:18:18.610 --> 00:18:24.180 In my hypo: Does that mean that the person 00:18:24.180 --> 00:18:29.160 without looking at anything else detain that individual. 00:18:29.160 --> 00:18:33.810 And the answer is “Probably not”. One: predictive Policing algorithms just 00:18:33.810 --> 00:18:37.770 can not take in the Totality of the Circumstances. They have to be 00:18:37.770 --> 00:18:42.690 frequently updated, there are things that are happening that 00:18:42.690 --> 00:18:46.060 the algorithm possibly could not have taken into account. 00:18:46.060 --> 00:18:48.590 The problem here is that the algorithm itself, 00:18:48.590 --> 00:18:51.780 the prediction itself becomes part of Totality of the Circumstances, 00:18:51.780 --> 00:18:56.330 which I’m going to talk about a little bit more later. 00:18:56.330 --> 00:19:00.660 But officers have to have Reasonable Suspicion before the stop occurs. 00:19:00.660 --> 00:19:04.660 Retroactive justification is not sufficient. So, 00:19:04.660 --> 00:19:08.790 the algorithm can’t just say: “60% likely, you detain the individual 00:19:08.790 --> 00:19:12.130 and then figure out why you’ve detained the person”. It has to be 00:19:12.130 --> 00:19:16.570 before the detention actually happens. And the suspicion must relate 00:19:16.570 --> 00:19:19.990 to current criminal activity. The person must be doing something 00:19:19.990 --> 00:19:24.700 to indicate criminal activity. Just the fact that an algorithm says, 00:19:24.700 --> 00:19:29.440 based on these facts: “60%”, or even without articulating 00:19:29.440 --> 00:19:33.890 why the algorithm has chosen that, isn’t enough. 00:19:33.890 --> 00:19:38.380 Maybe you can see a gun shaped bulge in the pocket etc. 00:19:38.380 --> 00:19:43.160 So, effectiveness… the Totality of the Circumstances, 00:19:43.160 --> 00:19:46.720 can the algorithms keep up? Generally, probably not. 00:19:46.720 --> 00:19:50.560 Missing data, not capable of processing this data in real time. 00:19:50.560 --> 00:19:54.820 There’s no idea… the algorithm doesn’t know, 00:19:54.820 --> 00:19:58.950 and the Police officer probably doesn’t know the all of the facts. 00:19:58.950 --> 00:20:03.260 So the Police officer can take the algorithm into consideration 00:20:03.260 --> 00:20:08.130 but the problem here is: Did the algorithm know that the individual was active 00:20:08.130 --> 00:20:12.670 in the community, or was a politician, or 00:20:12.670 --> 00:20:17.450 that was a personal friend of the officer etc. It can’t just be relied upon. 00:20:17.450 --> 00:20:22.640 What if the algorithm did take into account that the individual was a Pastor? 00:20:22.640 --> 00:20:26.180 Now that information is counted twice and the balancing for the Totality 00:20:26.180 --> 00:20:34.320 of the Circumstances is off. Humans here must be the final decider. 00:20:34.320 --> 00:20:38.040 What are the problems? Well, there’s bad underlying data, 00:20:38.040 --> 00:20:41.970 there’s no transparency into what kind of data is being used, 00:20:41.970 --> 00:20:45.720 how it was collected, how old it is, how often it’s been updated, 00:20:45.720 --> 00:20:51.010 whether or not it’s been verified. There could just be noise in the training data. 00:20:51.010 --> 00:20:57.240 Honestly, the data is biased. It was collected by individuals in the US; 00:20:57.240 --> 00:21:01.020 generally there’ve been several studies done that 00:21:01.020 --> 00:21:05.270 black, young individuals are stopped more often than whites. 00:21:05.270 --> 00:21:09.800 And this is going to cause a collection bias. 00:21:09.800 --> 00:21:14.550 It’s gonna be drastically disproportionate to the makeup of the population of cities; 00:21:14.550 --> 00:21:19.440 and as more data has been collected on minorities, refugees in poor neighborhoods 00:21:19.440 --> 00:21:23.640 it’s gonna feed back in and of course only have data on those groups and provide 00:21:23.640 --> 00:21:26.410 feedback and say: “More crime is likely to 00:21:26.410 --> 00:21:27.770 happen because that’s where the data 00:21:27.770 --> 00:21:32.250 was collected”. So, what’s an acceptable error rate, well, 00:21:32.250 --> 00:21:37.500 depends on the burden of proof. Harm is different for an opt-in system. 00:21:37.500 --> 00:21:40.840 You know, what’s my harm if I don’t get clearance, or I don’t get the job; 00:21:40.840 --> 00:21:45.160 but I’m opting in, I’m asking to being considered for employment. 00:21:45.160 --> 00:21:49.080 In the US, what’s an error? If you search and find nothing, if you think 00:21:49.080 --> 00:21:53.630 you have Reasonable Suspicion based on good faith, 00:21:53.630 --> 00:21:57.060 both on the algorithm and what you witness, the US says that it’s 00:21:57.060 --> 00:22:00.620 no 4th Amendment violation, even if nothing has happened. 00:22:00.620 --> 00:22:05.970 It’s very low error false-positive rate here. 00:22:05.970 --> 00:22:09.140 In Big Data, generally, and machine-learning it’s great! 00:22:09.140 --> 00:22:13.550 Like 1% error is fantastic! But that’s pretty large for the number of individuals 00:22:13.550 --> 00:22:17.930 stopped each day. Or who might be subject to these algorithms. 00:22:17.930 --> 00:22:21.950 Because even though there’re only 400 individuals on the list in Chicago 00:22:21.950 --> 00:22:25.210 those individuals have been listed basically as targets 00:22:25.210 --> 00:22:28.870 by the Chicago Police Department. 00:22:28.870 --> 00:22:33.700 Other problems include database errors. Exclusion of evidence in the US 00:22:33.700 --> 00:22:37.170 only happens when there’s gross negligence or systematic misconduct. 00:22:37.170 --> 00:22:42.150 That’s very difficult to prove, especially when a lot of people view these algorithms 00:22:42.150 --> 00:22:47.360 as a big box. Data goes in, predictions come out, everyone’s happy. 00:22:47.360 --> 00:22:53.100 You rely and trust on the quality of IBM, HunchLab etc. 00:22:53.100 --> 00:22:56.730 to provide good software. 00:22:56.730 --> 00:23:01.000 Finally, some more concerns I have include feedback loop auditing 00:23:01.000 --> 00:23:04.810 and access to data and algorithms and the prediction thresholds. 00:23:04.810 --> 00:23:09.970 How certain must a prediction be – before it’s reported to the Police – 00:23:09.970 --> 00:23:13.230 that the person might commit a crime. Or that crime might happen 00:23:13.230 --> 00:23:18.460 in the individual area. If Reasonable Suspicion is as low as 35%, 00:23:18.460 --> 00:23:23.740 and reasonable Suspicion in the US has been held at: That guy drives a car 00:23:23.740 --> 00:23:28.350 that drug dealers like to drive, and he’s in the DEA database 00:23:28.350 --> 00:23:36.550 as a possible drug dealer. That was enough to stop and search him. 00:23:36.550 --> 00:23:40.090 So, are there Positives? Well, PredPol, 00:23:40.090 --> 00:23:44.800 which is one of the services that provides Predictive Policing software, 00:23:44.800 --> 00:23:49.650 says: “Since these cities have implemented there’s been dropping crime”. 00:23:49.650 --> 00:23:54.030 In L.A. 13% reduction in crime, in one division. 00:23:54.030 --> 00:23:57.510 There was even one day where they had no crime reported. 00:23:57.510 --> 00:24:04.550 Santa Cruz – 25..29% reduction, -9% in assaults etc. 00:24:04.550 --> 00:24:10.030 One: these are Police departments self-reporting these successes for… 00:24:10.030 --> 00:24:14.670 you know, take it for what it is and reiterated by the people 00:24:14.670 --> 00:24:20.510 selling the software. But perhaps it is actually reducing crime. 00:24:20.510 --> 00:24:24.390 It’s kind of hard to tell because there’s a feedback loop. 00:24:24.390 --> 00:24:29.200 Do we know that crime is really being reduced? Will it affect the data 00:24:29.200 --> 00:24:33.170 that is collected in the future? It’s really hard to know. Because 00:24:33.170 --> 00:24:38.330 if you send the Police officers into a community it’s more likely 00:24:38.330 --> 00:24:42.580 that they’re going to affect that community and that data collection. 00:24:42.580 --> 00:24:46.940 Will more crimes happen because they feel like the Police are harassing them? 00:24:46.940 --> 00:24:52.020 It’s very likely and it’s a problem here. 00:24:52.020 --> 00:24:56.930 So, some final thoughts. Predictive Policing programs are not going anywhere. 00:24:56.930 --> 00:25:01.430 They’re only in their wheelstart. 00:25:01.430 --> 00:25:06.030 And I think that more analysis, more transparency, more access to data 00:25:06.030 --> 00:25:10.560 needs to happen around these algorithms. There needs to be regulation. 00:25:10.560 --> 00:25:16.000 Currently, a very successful way in which 00:25:16.000 --> 00:25:19.310 these companies get data is they buy from Third Party sources 00:25:19.310 --> 00:25:24.590 and then sell it to Police departments. So perhaps PredPol might get information 00:25:24.590 --> 00:25:28.780 from Google, Facebook, Social Media accounts; aggregate data themselves, 00:25:28.780 --> 00:25:31.890 and then turn around and sell it to Police departments or provide access 00:25:31.890 --> 00:25:36.110 to Police departments. And generally, the Courts are gonna have to begin to work out 00:25:36.110 --> 00:25:40.210 how to handle this type of data. There’s not case law, 00:25:40.210 --> 00:25:45.160 at least in the US, that really knows how to handle predictive algorithms 00:25:45.160 --> 00:25:48.900 in determining what the analysis says. And so there really needs to be 00:25:48.900 --> 00:25:52.600 a lot more research and thought put into this. 00:25:52.600 --> 00:25:56.480 And one of the big things in order for this to actually be useful: 00:25:56.480 --> 00:26:01.590 if this is a tactic that had been used by Police departments for decades, 00:26:01.590 --> 00:26:04.420 we need to eliminate the bias in the data sets. Because right now 00:26:04.420 --> 00:26:09.090 all that it’s doing is facilitating and continuing bias, set in the database. 00:26:09.090 --> 00:26:12.610 And it’s incredibly difficult. It’s data collected by humans. 00:26:12.610 --> 00:26:17.780 And it causes initial selection bias. Which is gonna have to stop 00:26:17.780 --> 00:26:21.380 for it to be successful. 00:26:21.380 --> 00:26:25.930 And perhaps these systems can cause implicit bias or confirmation bias, 00:26:25.930 --> 00:26:29.030 e.g. Police are going to believe what they’ve been told. 00:26:29.030 --> 00:26:33.170 So if a Police officer goes on duty to an area 00:26:33.170 --> 00:26:36.660 and an algorithm says: “You’re 70% likely to find a burglar 00:26:36.660 --> 00:26:40.840 in this area”. Are they gonna find a burglar because they’ve been told: 00:26:40.840 --> 00:26:45.930 “You might find a burglar”? And finally the US border. 00:26:45.930 --> 00:26:49.800 There is no 4th Amendment protection at the US border. 00:26:49.800 --> 00:26:53.740 It’s an exception to the warrant requirement. This means 00:26:53.740 --> 00:26:58.740 no suspicion is needed to commit a search. So this data is gonna go into 00:26:58.740 --> 00:27:03.680 a way to examine you when you cross the border. 00:27:03.680 --> 00:27:09.960 And aggregate data can be used to refuse you entry into the US etc. 00:27:09.960 --> 00:27:13.690 And I think that’s pretty much it. And so a few minutes for questions. 00:27:13.690 --> 00:27:24.490 applause Thank you! 00:27:24.490 --> 00:27:27.460 Herald: Thanks a lot for your talk, Whitney. We have about 4 minutes left 00:27:27.460 --> 00:27:31.800 for questions. So please line up at the microphones and remember to 00:27:31.800 --> 00:27:37.740 make short and easy questions. 00:27:37.740 --> 00:27:42.060 Microphone No.2, please. 00:27:42.060 --> 00:27:53.740 Question: Just a comment: if I want to run a crime organization, like, 00:27:53.740 --> 00:27:57.760 I would target the PRECOBS here in Hamburg, maybe. 00:27:57.760 --> 00:28:01.170 So I can take the crime to the scenes 00:28:01.170 --> 00:28:05.700 where the PRECOBS doesn’t suspect. 00:28:05.700 --> 00:28:08.940 Whitney: Possibly. And I think this is a big problem in getting availability 00:28:08.940 --> 00:28:13.410 of data; in that there’s a good argument for Police departments to say: 00:28:13.410 --> 00:28:16.590 “We don’t want to tell you what our tactics are for Policing, 00:28:16.590 --> 00:28:19.490 because it might move crime”. 00:28:19.490 --> 00:28:23.130 Herald: Do we have questions from the internet? Yes, then please, 00:28:23.130 --> 00:28:26.580 one question from the internet. 00:28:26.580 --> 00:28:29.770 Signal Angel: Is there evidence that data like the use of encrypted messaging 00:28:29.770 --> 00:28:35.710 systems, encrypted emails, VPN, TOR, with automated request to the ISP, 00:28:35.710 --> 00:28:41.980 are used to obtain real names and collected to contribute to the scoring? 00:28:41.980 --> 00:28:45.580 Whitney: I’m not sure if that’s being taken into account 00:28:45.580 --> 00:28:49.530 by Predictive Policing algorithms, or by the software being used. 00:28:49.530 --> 00:28:55.160 I know that Police departments do take those things into consideration. 00:28:55.160 --> 00:29:00.630 And considering that in the US Totality of the Circumstances is 00:29:00.630 --> 00:29:04.980 how you evaluate suspicion. They are gonna take all of those things into account 00:29:04.980 --> 00:29:09.150 and they actually kind of have to take into account. 00:29:09.150 --> 00:29:11.830 Herald: Okay, microphone No.1, please. 00:29:11.830 --> 00:29:16.790 Question: In your example you mentioned disease tracking, e.g. Google Flu Trends 00:29:16.790 --> 00:29:21.870 is a good example of preventive Predictive Policing. Are there any examples 00:29:21.870 --> 00:29:27.630 where – instead of increasing Policing in the lives of communities – 00:29:27.630 --> 00:29:34.260 where sociologists or social workers are called to use predictive tools, 00:29:34.260 --> 00:29:36.210 instead of more criminalization? 00:29:36.210 --> 00:29:41.360 Whitney: I’m not aware if that’s… if Police departments are sending 00:29:41.360 --> 00:29:45.250 social workers instead of Police officers. But that wouldn’t surprise me because 00:29:45.250 --> 00:29:50.060 algorithms are being used to suspect child abuse. And in the US they’re gonna send 00:29:50.060 --> 00:29:53.230 a social worker in regard. So I would not be surprised if that’s also being 00:29:53.230 --> 00:29:56.890 considered. Since that’s part of the resources. 00:29:56.890 --> 00:29:59.030 Herald: OK, so if you have a really short question, then 00:29:59.030 --> 00:30:01.470 microphone No.2, please. Last question. 00:30:01.470 --> 00:30:08.440 Question: Okay, thank you for the talk. This talk as well as few others 00:30:08.440 --> 00:30:13.710 brought the thought in the debate about the fine-tuning that is required 00:30:13.710 --> 00:30:19.790 between false positives and preventing crimes or terror. 00:30:19.790 --> 00:30:24.250 Now, it’s a different situation if the Policeman is predicting, 00:30:24.250 --> 00:30:28.350 or a system is predicting somebody’s stealing a paper from someone; 00:30:28.350 --> 00:30:32.230 or someone is creating a terror attack. 00:30:32.230 --> 00:30:38.030 And the justification to prevent it 00:30:38.030 --> 00:30:42.980 under the expense of false positive is different in these cases. 00:30:42.980 --> 00:30:49.080 How do you make sure that the decision or the fine-tuning is not going to be 00:30:49.080 --> 00:30:53.570 deep down in the algorithm and by the programmers, 00:30:53.570 --> 00:30:58.650 but rather by the customer – the Policemen or the authorities? 00:30:58.650 --> 00:31:02.720 Whitney: I can imagine that Police officers are using common sense in that, 00:31:02.720 --> 00:31:06.220 and their knowledge about the situation and even what they’re being told 00:31:06.220 --> 00:31:10.450 by the algorithm. You hope that they’re gonna take… 00:31:10.450 --> 00:31:13.790 they probably are gonna take terrorism to a different level 00:31:13.790 --> 00:31:17.260 than a common burglary or a stealing of a piece of paper 00:31:17.260 --> 00:31:21.760 or a non-violent crime. And that fine-tuning 00:31:21.760 --> 00:31:26.160 is probably on a Police department 00:31:26.160 --> 00:31:29.390 by Police department basis. 00:31:29.390 --> 00:31:32.090 Herald: Thank you! This was Whitney Merrill, give a warm round of applause, please!! 00:31:32.090 --> 00:31:40.490 Whitney: Thank you! applause 00:31:40.490 --> 00:31:42.510 postroll music 00:31:42.510 --> 00:31:51.501 Subtitles created by c3subtitles.de in the year 2016. Join and help us!