WEBVTT 00:00:00.000 --> 00:00:08.895 Musik 00:00:08.895 --> 00:00:20.040 Herald: Who of you is using Facebook? Twitter? Diaspora? 00:00:20.040 --> 00:00:27.630 concerned noise And all of that data you enter there 00:00:27.630 --> 00:00:34.240 gets to server, gets into the hand of somebody who's using it 00:00:34.240 --> 00:00:38.519 and the next talk is especially about that, 00:00:38.519 --> 00:00:43.879 because there's also intelligent machines and intelligent algorithms 00:00:43.879 --> 00:00:47.489 that try to make something out of that data. 00:00:47.489 --> 00:00:50.920 So the post-doc researcher Jennifer Helsby 00:00:50.920 --> 00:00:55.839 of the University of Chicago, which works in this 00:00:55.839 --> 00:00:59.370 intersection between policy and technology, 00:00:59.370 --> 00:01:04.709 will now ask you the question: To who would we give that power? 00:01:04.709 --> 00:01:12.860 Dr. Helsby: Thanks. applause 00:01:12.860 --> 00:01:17.090 Okay, so, today I'm gonna do a brief tour of intelligent systems 00:01:17.090 --> 00:01:18.640 and how they're currently used 00:01:18.640 --> 00:01:21.760 and then we're gonna look at some examples with respect 00:01:21.760 --> 00:01:23.710 to the properties that we might care about 00:01:23.710 --> 00:01:26.000 these systems having, and I'll talk a little bit about 00:01:26.000 --> 00:01:27.940 some of the work that's been done in academia 00:01:27.940 --> 00:01:28.680 on these topics. 00:01:28.680 --> 00:01:31.780 And then we'll talk about some promising paths forward. 00:01:31.780 --> 00:01:37.040 So, I wanna start with this: Kranzberg's First Law of Technology 00:01:37.040 --> 00:01:40.420 So, it's not good or bad, but it also isn't neutral. 00:01:40.420 --> 00:01:42.980 Technology shapes our world, and it can act as 00:01:42.980 --> 00:01:46.140 a liberating force-- or an oppressive and controlling force. 00:01:46.140 --> 00:01:49.730 So, in this talk, I'm gonna go towards some of the aspects 00:01:49.730 --> 00:01:53.830 of intelligent systems that might be more controlling in nature. 00:01:53.830 --> 00:01:56.060 So, as we all know, 00:01:56.060 --> 00:01:59.770 because of the rapidly decreasing cost of storage and computation, 00:01:59.770 --> 00:02:02.170 along with the rise of new sensor technologies, 00:02:02.170 --> 00:02:05.510 data collection devices are being pushed into every 00:02:05.510 --> 00:02:08.329 aspect of our lives: in our homes, our cars, 00:02:08.329 --> 00:02:10.469 in our pockets, on our wrists. 00:02:10.469 --> 00:02:13.280 And data collection systems act as intermediaries 00:02:13.280 --> 00:02:15.230 for a huge amount of human communication. 00:02:15.230 --> 00:02:17.900 And much of this data sits in government 00:02:17.900 --> 00:02:19.860 and corporate databases. 00:02:19.860 --> 00:02:23.090 So, in order to make use of this data, 00:02:23.090 --> 00:02:27.280 we need to be able to make some inferences. 00:02:27.280 --> 00:02:30.280 So, one way of approaching this is I can hire 00:02:30.280 --> 00:02:32.310 a lot of humans, and I can have these humans 00:02:32.310 --> 00:02:34.990 manually examine the data, and they can acquire 00:02:34.990 --> 00:02:36.900 expert knowledge of the domain, and then 00:02:36.900 --> 00:02:38.510 perhaps they can make some decisions 00:02:38.510 --> 00:02:40.830 or at least some recommendations based on it. 00:02:40.830 --> 00:02:43.030 However, there's some problems with this. 00:02:43.030 --> 00:02:45.810 One is that it's slow, and thus expensive. 00:02:45.810 --> 00:02:48.060 It's also biased. We know that humans have 00:02:48.060 --> 00:02:50.700 all sorts of biases, both conscious and unconscious, 00:02:50.700 --> 00:02:53.390 and it would be nice to have a system that did not have 00:02:53.390 --> 00:02:54.959 these inaccuracies. 00:02:54.959 --> 00:02:57.069 It's also not very transparent: I might 00:02:57.069 --> 00:02:58.910 not really know the factors that led to 00:02:58.910 --> 00:03:00.930 some decisions being made. 00:03:00.930 --> 00:03:03.360 Even humans themselves often don't really understand 00:03:03.360 --> 00:03:05.360 why they came to a given decision, because 00:03:05.360 --> 00:03:08.130 of their being emotional in nature. 00:03:08.130 --> 00:03:11.530 And, thus, these human decision making systems 00:03:11.530 --> 00:03:13.170 are often difficult to audit. 00:03:13.170 --> 00:03:15.819 So, another way to proceed is maybe instead 00:03:15.819 --> 00:03:18.000 I study the system and the data carefully 00:03:18.000 --> 00:03:20.520 and I write down the best rules for making a decision 00:03:20.520 --> 00:03:23.280 or, I can have a machine dynamically figure out 00:03:23.280 --> 00:03:25.459 the best rules, as in machine learning. 00:03:25.459 --> 00:03:28.640 So, maybe this is a better approach. 00:03:28.640 --> 00:03:32.230 It's certainly fast, and thus cheap. 00:03:32.230 --> 00:03:34.290 And maybe I can construct the system in such a way 00:03:34.290 --> 00:03:37.090 that it doesn't have the biases that are inherent 00:03:37.090 --> 00:03:39.209 in human decision making. 00:03:39.209 --> 00:03:41.560 And, since I've written these rules down, 00:03:41.560 --> 00:03:42.819 or a computer has learned these rules, 00:03:42.819 --> 00:03:45.140 then I can just show them to somebody, right? 00:03:45.140 --> 00:03:46.819 And then they can audit it. 00:03:46.819 --> 00:03:49.020 So, more and more decision making is being 00:03:49.020 --> 00:03:50.750 done in this way. 00:03:50.750 --> 00:03:53.170 And so, in this model, we take data 00:03:53.170 --> 00:03:55.709 we make an inference based on that data 00:03:55.709 --> 00:03:58.120 using these algorithms, and then 00:03:58.120 --> 00:03:59.420 we can take actions. 00:03:59.420 --> 00:04:01.860 And, when we take this more scientific approach 00:04:01.860 --> 00:04:04.200 to making decisions and optimizing for 00:04:04.200 --> 00:04:07.310 a desired outcome, we can take an experimental approach 00:04:07.310 --> 00:04:10.080 so we can determine which actions are most effective 00:04:10.080 --> 00:04:12.310 in achieving a desired outcome. 00:04:12.310 --> 00:04:14.010 Maybe there are some types of communication 00:04:14.010 --> 00:04:16.750 styles that are most effective with certain people. 00:04:16.750 --> 00:04:19.510 I can perhaps deploy some individualized incentives 00:04:19.510 --> 00:04:22.060 to get the outcome that I desire. 00:04:22.060 --> 00:04:25.990 And, maybe even if I carefully design an experiment 00:04:25.990 --> 00:04:27.810 with the environment in which people make 00:04:27.810 --> 00:04:30.699 these decisions, perhaps even very small changes 00:04:30.699 --> 00:04:34.250 can introduce significant changes in peoples' behavior. 00:04:34.250 --> 00:04:37.320 So, through these mechanisms, and this experimental approach, 00:04:37.320 --> 00:04:39.840 I can maximize the probability that humans do 00:04:39.840 --> 00:04:42.020 what I want. 00:04:42.020 --> 00:04:45.380 So, algorithmic decision making is being used 00:04:45.380 --> 00:04:47.270 in industry, and is used in lots of other areas, 00:04:47.270 --> 00:04:49.530 from astrophysics to medicine, and is now 00:04:49.530 --> 00:04:52.199 moving into new domains, including 00:04:52.199 --> 00:04:53.990 government applications. 00:04:53.990 --> 00:04:58.560 So, we have recommendation engines like Netflix, Yelp, SoundCloud, 00:04:58.560 --> 00:05:00.699 that direct our attention to what we should 00:05:00.699 --> 00:05:03.510 watch and listen to. 00:05:03.510 --> 00:05:07.919 Since 2009, Google uses personalized searched results, 00:05:07.919 --> 00:05:12.840 including if you're not logged in into your Google account. 00:05:12.840 --> 00:05:15.389 And we also have algorithm curation and filtering, 00:05:15.389 --> 00:05:17.530 as in the case of Facebook News Feed, 00:05:17.530 --> 00:05:19.870 Google News, Yahoo News, 00:05:19.870 --> 00:05:22.840 which shows you what news articles, for example, 00:05:22.840 --> 00:05:24.330 you should be looking at. 00:05:24.330 --> 00:05:25.650 And this is important, because a lot of people 00:05:25.650 --> 00:05:29.410 get news from these media. 00:05:29.410 --> 00:05:31.520 We even have algorithmic journalists! 00:05:31.520 --> 00:05:35.240 So, automatic systems generate articles 00:05:35.240 --> 00:05:36.880 about weather, traffic, or sports 00:05:36.880 --> 00:05:38.729 instead of a human. 00:05:38.729 --> 00:05:41.949 And, another application that's more recent 00:05:41.949 --> 00:05:43.570 is the use of predictive systems 00:05:43.570 --> 00:05:45.180 in political campaigns. 00:05:45.180 --> 00:05:47.370 So, political campaigns also now take this 00:05:47.370 --> 00:05:50.340 approach to predict on an individual basis 00:05:50.340 --> 00:05:53.300 which candidate voters are likely to vote for. 00:05:53.300 --> 00:05:55.500 And then they can target, on an individual basis, 00:05:55.500 --> 00:05:58.199 those that can be persuaded otherwise. 00:05:58.199 --> 00:06:00.830 And, finally, in the public sector, 00:06:00.830 --> 00:06:02.710 we're starting to use predictive systems 00:06:02.710 --> 00:06:06.320 in areas from policing, to health, to education and energy. 00:06:06.320 --> 00:06:08.979 So, there are some advantages to this. 00:06:08.979 --> 00:06:12.790 So, one thing is that we can automate 00:06:12.790 --> 00:06:15.759 aspects of our lives that we consider to be mundane 00:06:15.759 --> 00:06:17.620 using systems that are intelligent 00:06:17.620 --> 00:06:19.580 and adaptive enough. 00:06:19.580 --> 00:06:21.680 We can make use of all the data 00:06:21.680 --> 00:06:23.990 and really get the pieces of information we 00:06:23.990 --> 00:06:25.830 really care about. 00:06:25.830 --> 00:06:29.650 We can spend money in the most effective way, 00:06:29.650 --> 00:06:32.110 and we can do this with this experimental 00:06:32.110 --> 00:06:34.210 approach to optimize actions to produce 00:06:34.210 --> 00:06:35.190 desired outcomes. 00:06:35.190 --> 00:06:37.300 So, we can embed intelligence 00:06:37.300 --> 00:06:39.520 into all of these mundane objects 00:06:39.520 --> 00:06:41.180 and enable them to make decisions for us, 00:06:41.180 --> 00:06:42.860 and so that's what we're doing more and more, 00:06:42.860 --> 00:06:45.210 and we can have an object that decides for us 00:06:45.210 --> 00:06:46.840 what temperature we should set our house, 00:06:46.840 --> 00:06:49.009 what we should be doing, etc. 00:06:49.009 --> 00:06:52.400 So, there might be some implications here. 00:06:52.400 --> 00:06:55.680 We want these systems that do work on this data 00:06:55.680 --> 00:06:58.039 to increase the opportunities available to us. 00:06:58.039 --> 00:07:00.259 But it might be that there are some implications 00:07:00.259 --> 00:07:01.780 that we have not carefully thought through. 00:07:01.780 --> 00:07:03.430 This is a new area, and people are only 00:07:03.430 --> 00:07:05.940 starting to scratch the surface of what the 00:07:05.940 --> 00:07:07.289 problems might be. 00:07:07.289 --> 00:07:09.600 In some cases, they might narrow the options 00:07:09.600 --> 00:07:10.990 available to people, 00:07:10.990 --> 00:07:13.199 and this approach subjects people to 00:07:13.199 --> 00:07:15.620 suggestive messaging intended to nudge them 00:07:15.620 --> 00:07:17.169 to a desired outcome. 00:07:17.169 --> 00:07:19.320 Some people may have a problem with that. 00:07:19.320 --> 00:07:20.650 Values we care about are not gonna be 00:07:20.650 --> 00:07:23.860 baked into these systems by default. 00:07:23.860 --> 00:07:25.960 It's also the case that some algorithmic systems 00:07:25.960 --> 00:07:28.300 facilitate work that we do not like. 00:07:28.300 --> 00:07:30.199 For example, in the case of mass surveillance. 00:07:30.199 --> 00:07:32.130 And even the same systems, 00:07:32.130 --> 00:07:34.039 used by different people or organizations, 00:07:34.039 --> 00:07:36.110 have very different consequences. 00:07:36.110 --> 00:07:37.320 For example, if I can predict 00:07:37.320 --> 00:07:40.020 with high accuracy, based on say search queries, 00:07:40.020 --> 00:07:42.050 who's gonna be admitted to a hospital, 00:07:42.050 --> 00:07:43.750 some people would be interested in knowing that. 00:07:43.750 --> 00:07:46.120 You might be interested in having your doctor know that. 00:07:46.120 --> 00:07:47.919 But that same predictive model in the hands of 00:07:47.919 --> 00:07:50.569 an insurance company has a very different implication. 00:07:50.569 --> 00:07:53.389 So, the point here is that these systems 00:07:53.389 --> 00:07:55.860 structure and influence how humans interact 00:07:55.860 --> 00:07:58.360 with each other, how they interact with society, 00:07:58.360 --> 00:07:59.850 and how they interact with government. 00:07:59.850 --> 00:08:03.080 And if they constrain what people can do, 00:08:03.080 --> 00:08:05.069 we should really care about this. 00:08:05.069 --> 00:08:08.270 So now I'm gonna go to sort of an extreme case, 00:08:08.270 --> 00:08:11.930 just as an example, and that's this Chinese Social Credit System. 00:08:11.930 --> 00:08:14.169 And so this is probably one of the more 00:08:14.169 --> 00:08:17.259 ambitious uses of data, 00:08:17.259 --> 00:08:18.880 that is used to rank each citizen 00:08:18.880 --> 00:08:21.190 based on their behavior, in China. 00:08:21.190 --> 00:08:24.210 So right now, there are various pilot systems 00:08:24.210 --> 00:08:27.660 deployed by various companies doing this in China. 00:08:27.660 --> 00:08:30.729 They're currently voluntary, and by 2020 00:08:30.729 --> 00:08:32.630 this system is gonna be decided on, 00:08:32.630 --> 00:08:34.679 or a combination of the systems, 00:08:34.679 --> 00:08:37.409 that is gonna be mandatory for everyone. 00:08:37.409 --> 00:08:40.950 And so, in this system, there are some citizens, 00:08:40.950 --> 00:08:44.380 and a huge range of data sources are used. 00:08:44.380 --> 00:08:46.820 So, some of the data sources are 00:08:46.820 --> 00:08:48.360 your financial data, 00:08:48.360 --> 00:08:50.020 your criminal history, 00:08:50.020 --> 00:08:52.320 how many points you have on your driver's license, 00:08:52.320 --> 00:08:55.360 medical information-- for example, if you take birth control pills, 00:08:55.360 --> 00:08:56.810 that's incorporated. 00:08:56.810 --> 00:08:59.830 Your purchase history-- for example, if you purchase games, 00:08:59.830 --> 00:09:02.430 you are down-ranked in the system. 00:09:02.430 --> 00:09:04.490 Some of the systems, not all of them, 00:09:04.490 --> 00:09:07.260 incorporate social media monitoring, 00:09:07.260 --> 00:09:09.200 which makes sense if you're a state like China, 00:09:09.200 --> 00:09:11.270 you probably want to know about 00:09:11.270 --> 00:09:14.899 political statements that people are saying on social media. 00:09:14.899 --> 00:09:18.020 And, one of the more interesting parts is 00:09:18.020 --> 00:09:22.160 social network analysis: looking at the relationships between people. 00:09:22.160 --> 00:09:24.270 So, if you have a close relationship with somebody 00:09:24.270 --> 00:09:26.180 and they have a low credit score, 00:09:26.180 --> 00:09:29.130 that can have implications on your credit score. 00:09:29.130 --> 00:09:34.440 So, the way that these scores are generated is secret. 00:09:34.440 --> 00:09:38.140 And, according to the call for these systems 00:09:38.140 --> 00:09:39.270 put out by the government, 00:09:39.270 --> 00:09:42.810 the goal is to "carry forward the sincerity and 00:09:42.810 --> 00:09:45.760 traditional virtues" and establish the idea of a 00:09:45.760 --> 00:09:47.520 "sincerity culture." 00:09:47.520 --> 00:09:49.440 But wait, it gets better: 00:09:49.440 --> 00:09:52.450 so, there's a portal that enables citizens 00:09:52.450 --> 00:09:55.040 to look up the citizen score of anyone. 00:09:55.040 --> 00:09:56.520 And many people like this system, 00:09:56.520 --> 00:09:58.320 they think it's a fun game. 00:09:58.320 --> 00:10:00.700 They boast about it on social media, 00:10:00.700 --> 00:10:03.610 they put their score in their dating profile, 00:10:03.610 --> 00:10:04.760 because if you're ranked highly you're 00:10:04.760 --> 00:10:06.589 part of an exclusive club. 00:10:06.589 --> 00:10:10.060 You can get VIP treatment at hotels and other companies. 00:10:10.060 --> 00:10:11.880 But the downside is that, if you're excluded 00:10:11.880 --> 00:10:15.540 from that club, your weak score may have other implications, 00:10:15.540 --> 00:10:20.120 like being unable to get access to credit, housing, jobs. 00:10:20.120 --> 00:10:23.399 There is some reporting that even travel visas 00:10:23.399 --> 00:10:27.000 might be restricted if your score is particularly low. 00:10:27.000 --> 00:10:31.160 So, a system like this, for a state, is really 00:10:31.160 --> 00:10:34.690 the optimal solution to the problem of the public. 00:10:34.690 --> 00:10:37.130 It constitutes a very subtle and insiduous 00:10:37.130 --> 00:10:39.350 mechanism of social control. 00:10:39.350 --> 00:10:41.209 You don't need to spend a lot of money on 00:10:41.209 --> 00:10:43.800 police or prisons if you can set up a system 00:10:43.800 --> 00:10:45.820 where people discourage one another from 00:10:45.820 --> 00:10:48.930 anti-social acts like political action in exchange for 00:10:48.930 --> 00:10:51.430 a coupon for a free Uber ride. 00:10:51.430 --> 00:10:55.269 So, there are a lot of legitimate questions here: 00:10:55.269 --> 00:10:58.370 What protections does user data have in this scheme? 00:10:58.370 --> 00:11:01.279 Do any safeguards exist to prevent tampering? 00:11:01.279 --> 00:11:04.310 What mechanism, if any, is there to prevent 00:11:04.310 --> 00:11:08.810 false input data from creating erroneous inferences? 00:11:08.810 --> 00:11:10.420 Is there any way that people can fix 00:11:10.420 --> 00:11:12.540 their score once they're ranked poorly? 00:11:12.540 --> 00:11:13.899 Or does it end up becoming a 00:11:13.899 --> 00:11:15.720 self-fulfilling prophecy? 00:11:15.720 --> 00:11:17.850 Your weak score means you have less access 00:11:17.850 --> 00:11:21.620 to jobs and credit, and now you will have 00:11:21.620 --> 00:11:24.709 limited access to opportunity. 00:11:24.709 --> 00:11:27.110 So, let's take a step back. 00:11:27.110 --> 00:11:28.470 So, what do we want? 00:11:28.470 --> 00:11:31.540 So, we probably don't want that, 00:11:31.540 --> 00:11:33.570 but as advocates we really wanna 00:11:33.570 --> 00:11:36.130 understand what questions we should be asking 00:11:36.130 --> 00:11:37.510 of these systems. Right now there's 00:11:37.510 --> 00:11:39.570 very little oversight, 00:11:39.570 --> 00:11:41.420 and we wanna make sure that we don't 00:11:41.420 --> 00:11:44.029 sort of sleepwalk our way to a situation 00:11:44.029 --> 00:11:46.649 where we've lost even more power 00:11:46.649 --> 00:11:49.740 to these centralized systems of control. 00:11:49.740 --> 00:11:52.209 And if you're an implementer, we wanna understand 00:11:52.209 --> 00:11:53.709 what can we be doing better. 00:11:53.709 --> 00:11:56.019 Are there better ways that we can be implementing 00:11:56.019 --> 00:11:57.640 these systems? 00:11:57.640 --> 00:11:59.430 Are there values that, as humans, 00:11:59.430 --> 00:12:01.060 we care about that we should make sure 00:12:01.060 --> 00:12:02.420 these systems have? 00:12:02.420 --> 00:12:05.550 So, the first thing that most people in the room 00:12:05.550 --> 00:12:07.820 might think about is privacy. 00:12:07.820 --> 00:12:10.510 Which is, of course, of the utmost importance. 00:12:10.510 --> 00:12:12.920 We need privacy, and there is a good discussion 00:12:12.920 --> 00:12:15.680 on the importance of protecting user data where possible. 00:12:15.680 --> 00:12:18.420 So, in this talk, I'm gonna focus on the other aspects of 00:12:18.420 --> 00:12:19.470 algorithmic decision making, 00:12:19.470 --> 00:12:21.190 that I think have got less attention. 00:12:21.190 --> 00:12:25.140 Because it's not just privacy that we need to worry about here. 00:12:25.140 --> 00:12:28.519 We also want systems that are fair and equitable. 00:12:28.519 --> 00:12:30.240 We want transparent systems, 00:12:30.240 --> 00:12:35.110 we don't want opaque decisions to be made about us, 00:12:35.110 --> 00:12:36.510 decisions that might have serious impacts 00:12:36.510 --> 00:12:37.779 on our lives. 00:12:37.779 --> 00:12:40.490 And we need some accountability mechanisms. 00:12:40.490 --> 00:12:41.890 So, for the rest of this talk 00:12:41.890 --> 00:12:43.230 we're gonna go through each one of these things 00:12:43.230 --> 00:12:45.230 and look at some examples. 00:12:45.230 --> 00:12:47.709 So, the first thing is fairness. 00:12:47.709 --> 00:12:50.450 And so, as I said in the beginning, this is one area 00:12:50.450 --> 00:12:52.690 where there might be an advantage 00:12:52.690 --> 00:12:55.079 to making decisions by machine, 00:12:55.079 --> 00:12:56.740 especially in areas where there have 00:12:56.740 --> 00:12:59.410 historically been fairness issues with 00:12:59.410 --> 00:13:02.350 decision making, such as law enforcement. 00:13:02.350 --> 00:13:05.839 So, this is one way that police departments 00:13:05.839 --> 00:13:08.360 use predictive models. 00:13:08.360 --> 00:13:10.540 The idea here is police would like to 00:13:10.540 --> 00:13:13.450 allocate resources in a more effective way, 00:13:13.450 --> 00:13:15.050 and they would also like to enable 00:13:15.050 --> 00:13:16.640 proactive policing. 00:13:16.640 --> 00:13:20.110 So, if you can predict where crimes are going to occur, 00:13:20.110 --> 00:13:22.149 or who is going to commit crimes, 00:13:22.149 --> 00:13:24.870 then you can put cops in those places, 00:13:24.870 --> 00:13:27.769 or perhaps following these people, 00:13:27.769 --> 00:13:29.300 and then the crimes will not occur. 00:13:29.300 --> 00:13:31.370 So, it's sort of the pre-crime approach. 00:13:31.370 --> 00:13:34.649 So, there are a few ways of going about this. 00:13:34.649 --> 00:13:37.920 One way is doing this individual-level prediction. 00:13:37.920 --> 00:13:41.089 So you take each citizen and estimate the risk 00:13:41.089 --> 00:13:43.769 that each citizen will participate, say, in violence 00:13:43.769 --> 00:13:45.279 based on some data. 00:13:45.279 --> 00:13:46.779 And then you can flag those people that are 00:13:46.779 --> 00:13:49.199 considered particularly violent. 00:13:49.199 --> 00:13:51.519 So, this is currently done. 00:13:51.519 --> 00:13:52.589 This is done in the U.S. 00:13:52.589 --> 00:13:56.120 It's done in Chicago, by the Chicago Police Department. 00:13:56.120 --> 00:13:58.350 And they maintain a heat list of individuals 00:13:58.350 --> 00:14:00.790 that are considered most likely to commit, 00:14:00.790 --> 00:14:03.529 or be the victim of, violence. 00:14:03.529 --> 00:14:06.700 And this is done using data that the police maintain. 00:14:06.700 --> 00:14:09.589 So, the features that are used in this predictive model 00:14:09.589 --> 00:14:12.209 include things that are derived from 00:14:12.209 --> 00:14:14.610 individuals' criminal history. 00:14:14.610 --> 00:14:16.810 So, for example, have they been involved in 00:14:16.810 --> 00:14:18.350 gun violence in the past? 00:14:18.350 --> 00:14:21.450 Do they have narcotics arrests? And so on. 00:14:21.450 --> 00:14:22.860 But another thing that's incorporated 00:14:22.860 --> 00:14:25.060 in the Chicago Police Department model is 00:14:25.060 --> 00:14:28.300 information derived from social media network analysis. 00:14:28.300 --> 00:14:30.630 So, who you interact with, 00:14:30.630 --> 00:14:32.279 as noted in police data. 00:14:32.279 --> 00:14:34.899 So, for example, your co-arrestees. 00:14:34.899 --> 00:14:36.440 When officers conduct field interviews, 00:14:36.440 --> 00:14:38.240 who are people interacting with? 00:14:38.240 --> 00:14:42.940 And then this is all incorporated into this risk score. 00:14:42.940 --> 00:14:44.639 So another way to proceed, 00:14:44.639 --> 00:14:47.070 which is the method that most companies 00:14:47.070 --> 00:14:49.579 that sell products like this to the police have taken, 00:14:49.579 --> 00:14:51.459 is instead predicting which areas 00:14:51.459 --> 00:14:53.810 are likely to have crimes committed in them. 00:14:53.810 --> 00:14:56.690 So, take my city, I put a grid down, 00:14:56.690 --> 00:14:58.180 and then I use crime statistics 00:14:58.180 --> 00:15:00.430 and maybe some ancillary data sources, 00:15:00.430 --> 00:15:01.790 to determine which areas have 00:15:01.790 --> 00:15:04.709 the highest risk of crimes occurring in them, 00:15:04.709 --> 00:15:06.329 and I can flag those areas and send 00:15:06.329 --> 00:15:08.470 police officers to them. 00:15:08.470 --> 00:15:10.950 So now, let's look at some of the tools 00:15:10.950 --> 00:15:14.010 that are used for this geographic-level prediction. 00:15:14.010 --> 00:15:19.040 So, here are 3 companies that sell these 00:15:19.040 --> 00:15:22.910 geographic-level predictive policing systems. 00:15:22.910 --> 00:15:25.639 So, PredPol has a system that uses 00:15:25.639 --> 00:15:27.200 primarily crime statistics: 00:15:27.200 --> 00:15:30.209 only the time, place, and type of crime 00:15:30.209 --> 00:15:33.040 to predict where crimes will occur. 00:15:33.040 --> 00:15:35.970 HunchLab uses a wider range of data sources 00:15:35.970 --> 00:15:37.260 including, for example, weather 00:15:37.260 --> 00:15:39.720 and then Hitachi is a newer system 00:15:39.720 --> 00:15:42.100 that has a predictive crime analytics tool 00:15:42.100 --> 00:15:44.779 that also incorporates social media. 00:15:44.779 --> 00:15:47.850 The first one, to my knowledge, to do so. 00:15:47.850 --> 00:15:49.399 And these systems are in use 00:15:49.399 --> 00:15:52.820 in 50+ cities in the U.S. 00:15:52.820 --> 00:15:56.540 So, why do police departments buy this? 00:15:56.540 --> 00:15:57.760 Some police departments are interesting in 00:15:57.760 --> 00:16:00.500 buying systems like this, because they're marketed 00:16:00.500 --> 00:16:02.660 as impartial systems, 00:16:02.660 --> 00:16:06.199 so it's a way to police in an unbiased way. 00:16:06.199 --> 00:16:08.040 And so, these companies make 00:16:08.040 --> 00:16:08.670 statements like this-- 00:16:08.670 --> 00:16:10.800 by the way, the references will all be at the end, 00:16:10.800 --> 00:16:12.560 and they'll be on the slides-- 00:16:12.560 --> 00:16:13.370 So, for example 00:16:13.370 --> 00:16:16.110 the predictive crime analytics from Hitachi 00:16:16.110 --> 00:16:17.610 claims that the system is anonymous, 00:16:17.610 --> 00:16:19.350 because it shows you an area, 00:16:19.350 --> 00:16:23.060 it doesn't show you to look for a particular person. 00:16:23.060 --> 00:16:25.699 and PredPol reassures people that 00:16:25.699 --> 00:16:29.560 it eliminates any liberties or profiling concerns. 00:16:29.560 --> 00:16:32.269 And HunchLab notes that the system 00:16:32.269 --> 00:16:35.170 fairly represents priorities for public safety 00:16:35.170 --> 00:16:38.769 and is unbiased by race or ethnicity, for example. 00:16:38.769 --> 00:16:43.529 So, let's take a minute to describe in more detail 00:16:43.529 --> 00:16:48.100 what we mean when we talk about fairness. 00:16:48.100 --> 00:16:51.300 So, when we talk about fairness, 00:16:51.300 --> 00:16:52.740 we mean a few things. 00:16:52.740 --> 00:16:56.070 So, one is fairness with respect to individuals: 00:16:56.070 --> 00:16:58.040 so if I'm very similar to somebody 00:16:58.040 --> 00:17:00.170 and we go through some process 00:17:00.170 --> 00:17:03.430 and there is two very different outcomes to that process 00:17:03.430 --> 00:17:05.679 we would consider that to be unfair. 00:17:05.679 --> 00:17:07.929 So, we want similar people to be treated 00:17:07.929 --> 00:17:09.539 in a similar way. 00:17:09.539 --> 00:17:13.079 But, there are certain protected attributes 00:17:13.079 --> 00:17:15.199 that we wouldn't want someone 00:17:15.199 --> 00:17:17.099 to discriminate based on. 00:17:17.099 --> 00:17:20.069 And so, there's this other property, Group Fairness. 00:17:20.069 --> 00:17:22.249 So, we can look at the statistical parity 00:17:22.249 --> 00:17:25.439 between groups, based on gender, race, etc. 00:17:25.439 --> 00:17:28.049 and see if they're treated in a similar way. 00:17:28.049 --> 00:17:30.409 And we might not expect that in some cases, 00:17:30.409 --> 00:17:32.429 for example if the base rates in each group 00:17:32.429 --> 00:17:34.659 are very different. 00:17:34.659 --> 00:17:36.889 And then there's also Fairness in Errors. 00:17:36.889 --> 00:17:40.080 All predictive systems are gonna make errors, 00:17:40.080 --> 00:17:42.989 and if the errors are concentrated, 00:17:42.989 --> 00:17:46.399 then that may also represent unfairness. 00:17:46.399 --> 00:17:50.149 And so this concern arose recently with Facebook 00:17:50.149 --> 00:17:52.289 because people with Native American names 00:17:52.289 --> 00:17:54.389 had their profiles flagged as fraudulent 00:17:54.389 --> 00:17:58.759 far more often than those with White American names. 00:17:58.759 --> 00:18:00.559 So these are the sorts of things that we worry about 00:18:00.559 --> 00:18:02.190 and each of these are metrics, 00:18:02.190 --> 00:18:04.239 and if you're interested more you should 00:18:04.239 --> 00:18:06.159 check those 2 papers out. 00:18:06.159 --> 00:18:10.639 So, how can potential issues with predictive policing 00:18:10.639 --> 00:18:13.850 have implications for these principles? 00:18:13.850 --> 00:18:18.559 So, one problem is the training data that's used. 00:18:18.559 --> 00:18:21.059 Some of these systems only use crime statistics, 00:18:21.059 --> 00:18:23.600 other systems-- all of them use crime statistics 00:18:23.600 --> 00:18:25.619 in some way. 00:18:25.619 --> 00:18:31.419 So, one problem is that crime databases 00:18:31.419 --> 00:18:34.830 contain only crimes that've been detected. 00:18:34.830 --> 00:18:38.629 Right? So, the police are only gonna detect 00:18:38.629 --> 00:18:41.009 crimes that they know are happening, 00:18:41.009 --> 00:18:44.109 either through patrol and their own investigation 00:18:44.109 --> 00:18:46.320 or because they've been alerted to crime, 00:18:46.320 --> 00:18:48.789 for example by a citizen calling the police. 00:18:48.789 --> 00:18:52.179 So, a citizen has to feel like they can call the police, 00:18:52.179 --> 00:18:54.019 like that's a good idea. 00:18:54.019 --> 00:18:58.789 So, some crimes suffer from this problem less than others: 00:18:58.789 --> 00:19:02.249 for example, gun violence is much easier to detect 00:19:02.249 --> 00:19:03.639 relative to fraud, for example, 00:19:03.639 --> 00:19:07.509 which is very difficult to detect. 00:19:07.509 --> 00:19:11.940 Now the racial profiling aspect of this might come in 00:19:11.940 --> 00:19:15.590 because of biased policing in the past. 00:19:15.590 --> 00:19:19.999 So, for example, for marijuana arrests, 00:19:19.999 --> 00:19:22.619 black people are arrested in the U.S. at rates 00:19:22.619 --> 00:19:25.119 4 times that of white people, 00:19:25.119 --> 00:19:27.960 even though there is statistical parity 00:19:27.960 --> 00:19:31.389 with these 2 groups, to within a few percent. 00:19:31.389 --> 00:19:35.820 So, this is where problems can arise. 00:19:35.820 --> 00:19:37.159 So, let's go back to this 00:19:37.159 --> 00:19:38.749 geographic-level predictive policing. 00:19:38.749 --> 00:19:42.460 So the danger here is that, unless this system 00:19:42.460 --> 00:19:44.299 is very carefully constructed, 00:19:44.299 --> 00:19:47.090 this sort of crime area ranking might 00:19:47.090 --> 00:19:49.019 again become a self-fulling prophecy. 00:19:49.019 --> 00:19:51.460 If you send police officers to these areas, 00:19:51.460 --> 00:19:53.220 you further scrutinize them, 00:19:53.220 --> 00:19:55.659 and then again you're only detecting a subset 00:19:55.659 --> 00:19:57.979 of crimes, and the cycle continues. 00:19:57.979 --> 00:20:02.139 So, one obvious issue is that 00:20:02.139 --> 00:20:07.599 this statement about geographic-based crime prediction 00:20:07.599 --> 00:20:10.229 being anonymous is not true, 00:20:10.229 --> 00:20:13.159 because race and location are very strongly 00:20:13.159 --> 00:20:14.840 correlated in the U.S. 00:20:14.840 --> 00:20:16.609 And this is something that machine-learning systems 00:20:16.609 --> 00:20:20.049 can potentially learn. 00:20:20.049 --> 00:20:23.039 Another issue is that, for example, 00:20:23.039 --> 00:20:25.580 for individual fairness, one of my homes 00:20:25.580 --> 00:20:27.599 sits within one of these boxes. 00:20:27.599 --> 00:20:29.950 Some of these boxes in these systems are very small, 00:20:29.950 --> 00:20:33.399 for example PredPol is 500ft x 500ft, 00:20:33.399 --> 00:20:36.349 so it's maybe only a few houses. 00:20:36.349 --> 00:20:39.149 So, the implications of this system are that 00:20:39.149 --> 00:20:40.849 you have police officers maybe sitting 00:20:40.849 --> 00:20:42.979 in a police cruiser outside your home 00:20:42.979 --> 00:20:45.450 and a few doors down someone 00:20:45.450 --> 00:20:46.799 may not be within that box, 00:20:46.799 --> 00:20:48.159 and doesn't have this. 00:20:48.159 --> 00:20:51.399 So, that may represent unfairness. 00:20:51.399 --> 00:20:54.929 So, there are real questions here, 00:20:54.929 --> 00:20:57.720 especially because there's no opt-out. 00:20:57.720 --> 00:21:00.059 There's no way to opt-out of this system: 00:21:00.059 --> 00:21:02.239 if you live in a city that has this, 00:21:02.239 --> 00:21:04.909 then you have to deal with it. 00:21:04.909 --> 00:21:07.229 So, it's quite difficult to find out 00:21:07.229 --> 00:21:09.879 what's really going on 00:21:09.879 --> 00:21:11.169 because the algorithm is secret. 00:21:11.169 --> 00:21:13.049 And, in most cases, we don't know 00:21:13.049 --> 00:21:14.789 the full details of the inputs. 00:21:14.789 --> 00:21:16.679 We have some idea about what features are used, 00:21:16.679 --> 00:21:17.970 but that's about it. 00:21:17.970 --> 00:21:19.509 We also don't know the output. 00:21:19.509 --> 00:21:21.899 That would be knowing police allocation, 00:21:21.899 --> 00:21:23.179 police strategies, 00:21:23.179 --> 00:21:26.299 and in order to nail down what's really going on here 00:21:26.299 --> 00:21:28.609 in order to verify the validity of 00:21:28.609 --> 00:21:30.009 these companies' claims, 00:21:30.009 --> 00:21:33.799 it may be necessary to have a 3rd party come in, 00:21:33.799 --> 00:21:35.629 examine the inputs and outputs of the system, 00:21:35.629 --> 00:21:37.590 and say concretely what's going on. 00:21:37.590 --> 00:21:39.460 And if everything is fine and dandy 00:21:39.460 --> 00:21:40.929 then this shouldn't be a problem. 00:21:40.929 --> 00:21:43.619 So, that's potentially one role that 00:21:43.619 --> 00:21:44.769 advocates can play. 00:21:44.769 --> 00:21:46.720 Maybe we should start pushing for audits 00:21:46.720 --> 00:21:48.820 of systems that are used in this way. 00:21:48.820 --> 00:21:50.970 These could have serious implications 00:21:50.970 --> 00:21:52.679 for peoples' lives. 00:21:52.679 --> 00:21:55.249 So, we'll return to this idea a little bit later, 00:21:55.249 --> 00:21:58.210 but for now this leads us nicely to Transparency. 00:21:58.210 --> 00:21:59.419 So, we wanna know 00:21:59.419 --> 00:22:01.929 what these systems are doing. 00:22:01.929 --> 00:22:04.729 But it's very hard, for the reasons described earlier, 00:22:04.729 --> 00:22:06.139 but even in the case of something like 00:22:06.139 --> 00:22:09.849 trying to understand Google's search algorithm, 00:22:09.849 --> 00:22:11.679 it's difficult because it's personalized. 00:22:11.679 --> 00:22:13.529 So, by construction, each user is 00:22:13.529 --> 00:22:15.320 only seeing one endpoint. 00:22:15.320 --> 00:22:18.169 So, it's a very isolating system. 00:22:18.169 --> 00:22:20.349 What do other people see? 00:22:20.349 --> 00:22:22.409 And one reason it's difficult to make 00:22:22.409 --> 00:22:24.099 some of these systems transparent 00:22:24.099 --> 00:22:26.679 is because of, simply, the complexity 00:22:26.679 --> 00:22:27.950 of the algorithms. 00:22:27.950 --> 00:22:30.309 So, an algorithm can become so complex that 00:22:30.309 --> 00:22:31.669 it's difficult to comprehend, 00:22:31.669 --> 00:22:33.289 even for the designer of the system, 00:22:33.289 --> 00:22:35.509 or the implementer of the system. 00:22:35.509 --> 00:22:38.419 The designed might know that this algorithm 00:22:38.419 --> 00:22:42.889 maximizes some metric-- say, accuracy, 00:22:42.889 --> 00:22:44.570 but they may not always have a solid 00:22:44.570 --> 00:22:46.779 understanding of what the algorithm is doing 00:22:46.779 --> 00:22:48.330 for all inputs. 00:22:48.330 --> 00:22:50.970 Certainly with respect to fairness. 00:22:50.970 --> 00:22:55.759 So, in some cases, it might not be appropriate to use 00:22:55.759 --> 00:22:57.379 an extremely complex model. 00:22:57.379 --> 00:22:59.529 It might be better to use a simpler system 00:22:59.529 --> 00:23:02.910 with human-interpretable features. 00:23:02.910 --> 00:23:04.749 Another issue that arises 00:23:04.749 --> 00:23:07.559 from the opacity of these systems 00:23:07.559 --> 00:23:09.409 and the centralized control 00:23:09.409 --> 00:23:11.860 is that it makes them very influential. 00:23:11.860 --> 00:23:13.950 And thus, an excellent target 00:23:13.950 --> 00:23:16.210 for manipulation or tampering. 00:23:16.210 --> 00:23:18.479 So, this might be tampering that is done 00:23:18.479 --> 00:23:21.950 from an organization that controls the system, 00:23:21.950 --> 00:23:23.769 or an insider at one of the organizations, 00:23:23.769 --> 00:23:27.139 or anyone who's able to compromise their security. 00:23:27.139 --> 00:23:30.249 So, this is an interesting academic work 00:23:30.249 --> 00:23:32.099 that looked at the possibility of 00:23:32.099 --> 00:23:34.159 slightly modifying search rankings 00:23:34.159 --> 00:23:36.619 to shift people's political views. 00:23:36.619 --> 00:23:39.009 So, since people are most likely to 00:23:39.009 --> 00:23:41.330 click on the top search results, 00:23:41.330 --> 00:23:44.429 so 90% of clicks go to the first page of search results, 00:23:44.429 --> 00:23:46.719 then perhaps by reshuffling things a little bit, 00:23:46.719 --> 00:23:48.729 or maybe dropping some search results, 00:23:48.729 --> 00:23:50.269 you can influence people's views 00:23:50.269 --> 00:23:51.679 in a coherent way, 00:23:51.679 --> 00:23:53.090 and maybe you can make it so subtle 00:23:53.090 --> 00:23:55.749 that no one is able to notice. 00:23:55.749 --> 00:23:57.249 So in this academic study, 00:23:57.249 --> 00:24:00.349 they did an experiment 00:24:00.349 --> 00:24:02.070 in the 2014 Indian election. 00:24:02.070 --> 00:24:04.219 So they used real voters, 00:24:04.219 --> 00:24:06.450 and they kept the size of the experiment small enough 00:24:06.450 --> 00:24:08.190 that it was not going to influence the outcome 00:24:08.190 --> 00:24:10.090 of the election. 00:24:10.090 --> 00:24:12.139 So the researchers took people, 00:24:12.139 --> 00:24:14.229 they determined their political leaning, 00:24:14.229 --> 00:24:17.429 and they segmented them into control and treatment groups, 00:24:17.429 --> 00:24:19.269 where the treatment was manipulation 00:24:19.269 --> 00:24:21.210 of the search ranking results, 00:24:21.210 --> 00:24:24.409 And then they had these people browse the web. 00:24:24.409 --> 00:24:25.969 And what they found, is that 00:24:25.969 --> 00:24:28.229 this mechanism is very effective at shifting 00:24:28.229 --> 00:24:30.429 people's voter preferences. 00:24:30.429 --> 00:24:33.649 So, in this study, they were able to introduce 00:24:33.649 --> 00:24:36.849 a 20% shift in voter preferences. 00:24:36.849 --> 00:24:39.299 Even alerting users to the fact that this 00:24:39.299 --> 00:24:41.729 was going to be done, telling them 00:24:41.729 --> 00:24:44.049 "we are going to manipulate your search results," 00:24:44.049 --> 00:24:45.729 "really pay attention," 00:24:45.729 --> 00:24:49.099 they were totally unable to decrease 00:24:49.099 --> 00:24:50.859 the magnitude of the effect. 00:24:50.859 --> 00:24:55.109 So, the margins of error in many elections 00:24:55.109 --> 00:24:57.669 is incredibly small, 00:24:57.669 --> 00:24:59.929 and the authors estimate that this shift 00:24:59.929 --> 00:25:02.009 could change the outcome of about 00:25:02.009 --> 00:25:07.109 25% of elections worldwide, if this were done. 00:25:07.109 --> 00:25:10.919 And the bias is so small that no one can tell. 00:25:10.919 --> 00:25:14.279 So, all humans, no matter how smart 00:25:14.279 --> 00:25:17.109 and resistant to manipulation we think we are, 00:25:17.109 --> 00:25:21.909 all of us are subject to this sort of manipulation, 00:25:21.909 --> 00:25:24.320 and we really can't tell. 00:25:24.320 --> 00:25:27.129 So, I'm not saying that this is occurring, 00:25:27.129 --> 00:25:31.389 but right now there is no regulation to stop this, 00:25:31.389 --> 00:25:34.409 there is no way we could reliably detect this, 00:25:34.409 --> 00:25:37.210 so there's a huge amount of power here. 00:25:37.210 --> 00:25:39.779 So, something to think about. 00:25:39.779 --> 00:25:42.710 But it's not only corporations that are interested 00:25:42.710 --> 00:25:47.269 in this sort of behavioral manipulation. 00:25:47.269 --> 00:25:51.119 In 2010, UK Prime Minister David Cameron 00:25:51.119 --> 00:25:54.969 created this UK Behavioural Insights Team, 00:25:54.969 --> 00:25:57.269 which is informally called the Nudge Unit. 00:25:57.269 --> 00:26:01.489 And so what they do is they use behavioral science 00:26:01.489 --> 00:26:04.769 and this predictive analytics approach, 00:26:04.769 --> 00:26:06.119 with experimentation, 00:26:06.119 --> 00:26:07.940 to have people make better decisions 00:26:07.940 --> 00:26:09.690 for themselves and society-- 00:26:09.690 --> 00:26:11.989 as determined by the UK government. 00:26:11.989 --> 00:26:14.269 And as of a few months ago, 00:26:14.269 --> 00:26:16.849 after an executive order signed by Obama 00:26:16.849 --> 00:26:19.349 in September, the United States now has 00:26:19.349 --> 00:26:21.429 its own Nudge Unit. 00:26:21.429 --> 00:26:24.009 So, to be clear, I don't think that this is 00:26:24.009 --> 00:26:25.539 some sort of malicious plot. 00:26:25.539 --> 00:26:27.440 I think that there can be huge value 00:26:27.440 --> 00:26:29.489 in these sorts of initiatives, 00:26:29.489 --> 00:26:31.330 positively impacting people's lives, 00:26:31.330 --> 00:26:34.179 but when this sort of behavioral manipulation 00:26:34.179 --> 00:26:37.289 is being done, in part openly, 00:26:37.289 --> 00:26:39.460 oversight is pretty important, 00:26:39.460 --> 00:26:41.700 and we really need to consider 00:26:41.700 --> 00:26:46.090 what these systems are optimizing for. 00:26:46.090 --> 00:26:47.849 And that's something that we might 00:26:47.849 --> 00:26:52.090 not always know, or at least understand, 00:26:52.090 --> 00:26:54.450 so for example, for industry, 00:26:54.450 --> 00:26:57.679 we do have a pretty good understanding there: 00:26:57.679 --> 00:26:59.809 industry cares about optimizing for 00:26:59.809 --> 00:27:01.960 the time spent on the website, 00:27:01.960 --> 00:27:04.929 Facebook wants you to spend more time on Facebook, 00:27:04.929 --> 00:27:06.950 they want you to click on ads, 00:27:06.950 --> 00:27:09.109 click on newsfeed items, 00:27:09.109 --> 00:27:11.299 they want you to like things. 00:27:11.299 --> 00:27:14.309 And, fundamentally: profit. 00:27:14.309 --> 00:27:17.599 So, already this has some serious implications, 00:27:17.599 --> 00:27:19.690 and this had pretty serious implications 00:27:19.690 --> 00:27:22.190 in the last 10 years, in media for example. 00:27:22.190 --> 00:27:25.119 The optimizing for click-through rate in journalism 00:27:25.119 --> 00:27:26.629 has produced a race to the bottom 00:27:26.629 --> 00:27:28.039 in terms of quality. 00:27:28.039 --> 00:27:30.919 And another issue is that optimizing 00:27:30.919 --> 00:27:34.589 for what people like might not always be 00:27:34.589 --> 00:27:35.839 the best approach. 00:27:35.839 --> 00:27:38.859 So, Facebook officials have said publicly 00:27:38.859 --> 00:27:41.279 about how Facebook's goal is to make you happy, 00:27:41.279 --> 00:27:43.149 they want you to open that newsfeed 00:27:43.149 --> 00:27:45.080 and just feel great. 00:27:45.080 --> 00:27:47.379 But, there's an issue there, right? 00:27:47.379 --> 00:27:50.169 Because people get their news, 00:27:50.169 --> 00:27:52.369 like 40% of people according to Pew Research, 00:27:52.369 --> 00:27:54.599 get their news from Facebook. 00:27:54.599 --> 00:27:58.460 So, if people don't want to see 00:27:58.460 --> 00:28:01.239 war and corpses, because it makes them feel sad, 00:28:01.239 --> 00:28:04.179 so this is not a system that is gonna optimize 00:28:04.179 --> 00:28:07.149 for an informed population. 00:28:07.149 --> 00:28:09.359 It's not gonna produce a population that is 00:28:09.359 --> 00:28:11.469 ready to engage in civic life. 00:28:11.469 --> 00:28:13.059 It's gonna produce an amused populations 00:28:13.059 --> 00:28:16.809 whose time is occupied by cat pictures. 00:28:16.809 --> 00:28:19.159 So, in politics, we have a similar 00:28:19.159 --> 00:28:21.269 optimization problem that's occurring. 00:28:21.269 --> 00:28:23.769 So, these political campaigns that use 00:28:23.769 --> 00:28:26.769 these predictive systems, 00:28:26.769 --> 00:28:28.669 are optimizing for votes for the desired candidate, 00:28:28.669 --> 00:28:30.200 of course. 00:28:30.200 --> 00:28:33.499 So, instead of a political campaign being 00:28:33.499 --> 00:28:36.139 --well, maybe this is a naive view, but-- 00:28:36.139 --> 00:28:38.070 being an open discussion of the issues 00:28:38.070 --> 00:28:39.830 facing the country, 00:28:39.830 --> 00:28:43.200 it becomes this micro-targeted persuasion game, 00:28:43.200 --> 00:28:44.669 and the people that get targeted 00:28:44.669 --> 00:28:47.349 are a very small subset of all people, 00:28:47.349 --> 00:28:49.399 and it's only gonna be people that are 00:28:49.399 --> 00:28:51.409 you know, on the edge, maybe disinterested, 00:28:51.409 --> 00:28:54.399 those are the people that are gonna get attention 00:28:54.399 --> 00:28:58.839 from political candidates. 00:28:58.839 --> 00:29:01.869 In policy, as with these Nudge Units, 00:29:01.869 --> 00:29:03.539 they're being used to enable 00:29:03.539 --> 00:29:06.109 better use of government services. 00:29:06.109 --> 00:29:07.419 There are some good projects that have 00:29:07.419 --> 00:29:09.419 come out of this: 00:29:09.419 --> 00:29:11.409 increasing voter registration, 00:29:11.409 --> 00:29:12.739 improving health outcomes, 00:29:12.739 --> 00:29:14.419 improving education outcomes. 00:29:14.419 --> 00:29:16.419 But some of these predictive systems 00:29:16.419 --> 00:29:18.229 that we're starting to see in government 00:29:18.229 --> 00:29:20.700 are optimizing for compliance, 00:29:20.700 --> 00:29:23.669 as is the case with predictive policing. 00:29:23.669 --> 00:29:25.460 So this is something that we need to 00:29:25.460 --> 00:29:28.649 watch carefully. 00:29:28.649 --> 00:29:30.119 I think this is a nice quote that 00:29:30.119 --> 00:29:33.339 sort of describes the problem. 00:29:33.339 --> 00:29:35.200 In some ways me might be narrowing 00:29:35.200 --> 00:29:38.259 our horizon, and the danger is that 00:29:38.259 --> 00:29:41.989 these tools are separating people. 00:29:41.989 --> 00:29:43.570 And this is particularly bad 00:29:43.570 --> 00:29:45.940 for political action, because political action 00:29:45.940 --> 00:29:49.879 requires people to have shared experience, 00:29:49.879 --> 00:29:53.799 and thus are able to collectively act 00:29:53.799 --> 00:29:57.629 to exert pressure to fix problems. 00:29:57.629 --> 00:30:00.810 So, finally: accountability. 00:30:00.810 --> 00:30:03.399 So, we need some oversight mechanisms. 00:30:03.399 --> 00:30:06.519 For example, in the case of errors-- 00:30:06.519 --> 00:30:08.219 so this is particularly important for 00:30:08.219 --> 00:30:10.849 civil or bureaucratic systems. 00:30:10.849 --> 00:30:14.330 So, when an algorithm produces some decision, 00:30:14.330 --> 00:30:16.549 we don't always want humans to just 00:30:16.549 --> 00:30:18.039 defer to the machine, 00:30:18.039 --> 00:30:21.859 and that might represent one of the problems. 00:30:21.859 --> 00:30:25.419 So, there are starting to be some cases 00:30:25.419 --> 00:30:28.039 of computer algorithms yielding a decision, 00:30:28.039 --> 00:30:30.409 and then humans being unable to correct 00:30:30.409 --> 00:30:31.799 an obvious error. 00:30:31.799 --> 00:30:35.190 So there's this case in Georgia, in the United States, 00:30:35.190 --> 00:30:37.259 where 2 young people went to 00:30:37.259 --> 00:30:38.529 the Department of Motor Vehicles, 00:30:38.529 --> 00:30:39.749 they're twins, and they went 00:30:39.749 --> 00:30:42.099 to get their driver's license. 00:30:42.099 --> 00:30:44.979 However, they were both flagged by 00:30:44.979 --> 00:30:47.489 a fraud algorithm that uses facial recognition 00:30:47.489 --> 00:30:48.809 to look for similar faces, 00:30:48.809 --> 00:30:50.919 and I guess the people that designed the system 00:30:50.919 --> 00:30:54.549 didn't think of the possibility of twins. 00:30:54.549 --> 00:30:58.489 Yeah. So, they just left 00:30:58.489 --> 00:30:59.889 without their driver's licenses. 00:30:59.889 --> 00:31:01.889 The people in the Department of Motor Vehicles 00:31:01.889 --> 00:31:03.809 were unable to correct this. 00:31:03.809 --> 00:31:06.820 So, this is one implication-- 00:31:06.820 --> 00:31:08.579 it's like something out of Kafka. 00:31:08.579 --> 00:31:11.529 But there are also cases of errors being made, 00:31:11.529 --> 00:31:13.879 and people not noticing until 00:31:13.879 --> 00:31:15.909 after actions have been taken, 00:31:15.909 --> 00:31:17.570 some of them very serious-- 00:31:17.570 --> 00:31:19.129 because people simply deferred 00:31:19.129 --> 00:31:20.619 to the machine. 00:31:20.619 --> 00:31:23.309 So, this is an example from San Francisco. 00:31:23.309 --> 00:31:26.679 So, an ALPR-- an Automated License Plate Reader-- 00:31:26.679 --> 00:31:29.429 is a device that uses image recognition 00:31:29.429 --> 00:31:32.099 to detect and read license plates, 00:31:32.099 --> 00:31:34.339 and usually to compare license plates 00:31:34.339 --> 00:31:37.159 with a known list of plates of interest. 00:31:37.159 --> 00:31:39.799 And, so, San Francisco uses these 00:31:39.799 --> 00:31:42.179 and they're mounted on police cars. 00:31:42.179 --> 00:31:46.659 So, in this case, San Francisco ALPR 00:31:46.659 --> 00:31:48.879 got a hit on a car, 00:31:48.879 --> 00:31:53.029 and it was the car of a 47-year-old woman, 00:31:53.029 --> 00:31:54.839 with no criminal history. 00:31:54.839 --> 00:31:56.029 And so it was a false hit 00:31:56.029 --> 00:31:58.099 because it was a blurry image, 00:31:58.099 --> 00:31:59.709 and it matched erroneously with 00:31:59.709 --> 00:32:00.909 one of the plates of interest 00:32:00.909 --> 00:32:03.479 that happened to be a stolen vehicle. 00:32:03.479 --> 00:32:06.869 So, they conducted a traffic stop on her, 00:32:06.869 --> 00:32:09.330 and they take her out of the vehicle, 00:32:09.330 --> 00:32:11.049 they search her and the vehicle, 00:32:11.049 --> 00:32:12.659 she gets a pat-down, 00:32:12.659 --> 00:32:14.849 and they have her kneel 00:32:14.849 --> 00:32:17.780 at gunpoint, in the street. 00:32:17.780 --> 00:32:20.989 So, how much oversight should be present 00:32:20.989 --> 00:32:23.999 depends on the implications of the system. 00:32:23.999 --> 00:32:25.279 It's certainly the case that 00:32:25.279 --> 00:32:26.910 for some of these decision-making systems, 00:32:26.910 --> 00:32:29.219 an error might not be that important, 00:32:29.219 --> 00:32:31.149 it could be relatively harmless, 00:32:31.149 --> 00:32:33.559 but in this case, an error in this algorithmic decision 00:32:33.559 --> 00:32:36.259 led to this totally innocent person 00:32:36.259 --> 00:32:40.019 literally having a gun pointed at her. 00:32:40.019 --> 00:32:44.019 So, that brings us to: we need some way of 00:32:44.019 --> 00:32:45.419 getting some information about 00:32:45.419 --> 00:32:47.249 what is going on here. 00:32:47.249 --> 00:32:50.179 We don't wanna have to wait for these events 00:32:50.179 --> 00:32:52.580 before we are able to determine 00:32:52.580 --> 00:32:54.409 some information about the system. 00:32:54.409 --> 00:32:56.139 So, auditing is one option: 00:32:56.139 --> 00:32:58.109 to independently verify the statements 00:32:58.109 --> 00:33:00.809 of companies, in situations where we have 00:33:00.809 --> 00:33:02.939 inputs and outputs. 00:33:02.939 --> 00:33:05.200 So, for example, this could be done with 00:33:05.200 --> 00:33:07.489 Google, Facebook. 00:33:07.489 --> 00:33:09.190 If you have the inputs of a system, 00:33:09.190 --> 00:33:10.649 say you have test accounts, 00:33:10.649 --> 00:33:11.729 or real accounts, 00:33:11.729 --> 00:33:14.359 maybe you can collect people's information together. 00:33:14.359 --> 00:33:15.830 So that was something that was done 00:33:15.830 --> 00:33:18.759 during the 2012 Obama campaign 00:33:18.759 --> 00:33:20.249 by ProPublica. 00:33:20.249 --> 00:33:21.269 People noticed that they were getting 00:33:21.269 --> 00:33:24.739 different emails from the Obama campaign, 00:33:24.739 --> 00:33:26.009 and were interested to see 00:33:26.009 --> 00:33:28.209 based on what factors 00:33:28.209 --> 00:33:29.749 the emails were changing. 00:33:29.749 --> 00:33:32.659 So, I think about 200 people submitted emails 00:33:32.659 --> 00:33:34.940 and they were able to determine some information 00:33:34.940 --> 00:33:38.809 about what the emails were being varied based on. 00:33:38.809 --> 00:33:40.859 So there have been some successful 00:33:40.859 --> 00:33:43.080 attempts at this. 00:33:43.080 --> 00:33:45.919 So, compare inputs and then look at 00:33:45.919 --> 00:33:48.709 why one item was shown to one user 00:33:48.709 --> 00:33:50.289 and not another, and see if there's 00:33:50.289 --> 00:33:51.879 any statistical differences. 00:33:51.879 --> 00:33:56.279 So, there's some potential legal issues 00:33:56.279 --> 00:33:57.749 with the test accounts, so that's something 00:33:57.749 --> 00:34:01.499 to think about-- I'm not a lawyer. 00:34:01.499 --> 00:34:03.919 So, for example, if you wanna examine 00:34:03.919 --> 00:34:06.269 ad-targeting algorithms, 00:34:06.269 --> 00:34:07.969 one way to proceed is to construct 00:34:07.969 --> 00:34:10.589 a browsing profile, and then examine 00:34:10.589 --> 00:34:12.989 what ads are served back to you. 00:34:12.989 --> 00:34:14.119 And so this is something that 00:34:14.119 --> 00:34:16.250 academic researchers have looked at, 00:34:16.250 --> 00:34:17.489 because, at the time at least, 00:34:17.489 --> 00:34:20.879 you didn't need to make an account to do this. 00:34:20.879 --> 00:34:24.768 So, this was a study that was presented at 00:34:24.768 --> 00:34:27.799 Privacy Enhancing Technologies last year, 00:34:27.799 --> 00:34:31.149 and in this study, the researchers 00:34:31.149 --> 00:34:33.179 generate some browsing profiles 00:34:33.179 --> 00:34:35.909 that differ only by one characteristic, 00:34:35.909 --> 00:34:37.690 so they're basically identical in every way 00:34:37.690 --> 00:34:39.049 except for one thing. 00:34:39.049 --> 00:34:42.359 And that is denoted by Treatment 1 and 2. 00:34:42.359 --> 00:34:44.460 So this is a randomized, controlled trial, 00:34:44.460 --> 00:34:46.389 but I left out the randomization part 00:34:46.389 --> 00:34:48.220 for simplicity. 00:34:48.220 --> 00:34:54.799 So, in one study, they applied a treatment of gender. 00:34:54.799 --> 00:34:56.799 So, they had the browsing profiles 00:34:56.799 --> 00:34:59.319 in Treatment 1 be male browsing profiles, 00:34:59.319 --> 00:35:02.029 and the browsing profiles in Treatment 2 be female. 00:35:02.029 --> 00:35:04.430 And they wanted to see: is there any difference 00:35:04.430 --> 00:35:06.079 in the way that ads are targeted 00:35:06.079 --> 00:35:08.710 if browsing profiles are effectively identical 00:35:08.710 --> 00:35:11.019 except for gender? 00:35:11.019 --> 00:35:14.710 So, it turns out that there was. 00:35:14.710 --> 00:35:19.180 So, a 3rd-party site was showing Google ads 00:35:19.180 --> 00:35:21.289 for senior executive positions 00:35:21.289 --> 00:35:23.980 at a rate 6 times higher to the fake men 00:35:23.980 --> 00:35:27.059 than for the fake women in this study. 00:35:27.059 --> 00:35:30.109 So, this sort of auditing is not going to 00:35:30.109 --> 00:35:32.779 be able to determine everything 00:35:32.779 --> 00:35:34.930 that algorithms are doing, but they can 00:35:34.930 --> 00:35:36.519 sometimes uncover interesting, 00:35:36.519 --> 00:35:40.900 at least statistical differences. 00:35:40.900 --> 00:35:47.099 So, this leads us to the fundamental issue: 00:35:47.099 --> 00:35:49.180 Right now, we're really not in control 00:35:49.180 --> 00:35:50.510 of some of these systems, 00:35:50.510 --> 00:35:54.480 and we really need these predictive systems 00:35:54.480 --> 00:35:56.119 to be controlled by us, 00:35:56.119 --> 00:35:57.819 in order for them not to be used 00:35:57.819 --> 00:36:00.109 as a system of control. 00:36:00.109 --> 00:36:03.220 So there are some technologies that I'd like 00:36:03.220 --> 00:36:06.890 to point you all to. 00:36:06.890 --> 00:36:08.319 We need tools in the digital commons 00:36:08.319 --> 00:36:11.160 that can help address some of these concerns. 00:36:11.160 --> 00:36:13.349 So, the first thing is that of course 00:36:13.349 --> 00:36:14.730 we known that minimizing the amount of 00:36:14.730 --> 00:36:17.069 data available can help in some contexts, 00:36:17.069 --> 00:36:18.980 which we can do by making systems 00:36:18.980 --> 00:36:22.779 that are private by design, and by default. 00:36:22.779 --> 00:36:24.549 Another thing is that these audit tools 00:36:24.549 --> 00:36:25.890 might be useful. 00:36:25.890 --> 00:36:30.720 And, so, these 2 nice examples in academia... 00:36:30.720 --> 00:36:34.359 the ad experiment that I just showed was done 00:36:34.359 --> 00:36:36.120 using AdFisher. 00:36:36.120 --> 00:36:38.200 So, these are 2 toolkits that you can use 00:36:38.200 --> 00:36:41.440 to start doing this sort of auditing. 00:36:41.440 --> 00:36:44.579 Another technology that is generally useful, 00:36:44.579 --> 00:36:46.700 but particularly in the case of prediction 00:36:46.700 --> 00:36:48.789 it's useful to maintain access to 00:36:48.789 --> 00:36:50.289 as many sites as possible, 00:36:50.289 --> 00:36:52.589 through anonymity systems like Tor, 00:36:52.589 --> 00:36:54.319 because it's impossible to personalize 00:36:54.319 --> 00:36:55.650 when everyone looks the same. 00:36:55.650 --> 00:36:59.130 So this is a very important technology. 00:36:59.130 --> 00:37:01.519 Something that doesn't really exist, 00:37:01.519 --> 00:37:03.630 but that I think is pretty important, 00:37:03.630 --> 00:37:05.829 is having some tool to view the landscape. 00:37:05.829 --> 00:37:08.160 So, as we know from these few studies 00:37:08.160 --> 00:37:10.440 that have been done, 00:37:10.440 --> 00:37:12.059 different people are not seeing the internet 00:37:12.059 --> 00:37:12.950 in the same way. 00:37:12.950 --> 00:37:15.730 This is one reason why we don't like censorship. 00:37:15.730 --> 00:37:17.880 But, rich and poor people, 00:37:17.880 --> 00:37:19.659 from academic research we know that 00:37:19.659 --> 00:37:23.790 there is widespread price discrimination on the internet, 00:37:23.790 --> 00:37:25.650 so rich and poor people see a different view 00:37:25.650 --> 00:37:26.970 of the Internet, 00:37:26.970 --> 00:37:28.400 men and women see a different view 00:37:28.400 --> 00:37:29.940 of the Internet. 00:37:29.940 --> 00:37:31.200 We wanna know how different people 00:37:31.200 --> 00:37:32.450 see the same site, 00:37:32.450 --> 00:37:34.329 and this could be the beginning of 00:37:34.329 --> 00:37:36.329 a defense system for this sort of 00:37:36.329 --> 00:37:41.730 manipulation/tampering that I showed earlier. 00:37:41.730 --> 00:37:45.549 Another interesting approach is obfuscation: 00:37:45.549 --> 00:37:46.980 injecting noise into the system. 00:37:46.980 --> 00:37:49.190 So there's an interesting browser extension 00:37:49.190 --> 00:37:51.720 called Adnauseum, that's for Firefox, 00:37:51.720 --> 00:37:54.579 which clicks on every single ad you're served, 00:37:54.579 --> 00:37:55.680 to inject noise. 00:37:55.680 --> 00:37:57.019 So that's, I think, an interesting approach 00:37:57.019 --> 00:38:00.170 that people haven't looked at too much. 00:38:00.170 --> 00:38:03.780 So in terms of policy, 00:38:03.780 --> 00:38:06.530 Facebook and Google, these internet giants, 00:38:06.530 --> 00:38:08.829 have billions of users, 00:38:08.829 --> 00:38:12.220 and sometimes they like to call themselves 00:38:12.220 --> 00:38:13.769 new public utilities, 00:38:13.769 --> 00:38:15.000 and if that's the case then 00:38:15.000 --> 00:38:17.549 it might be necessary to subject them 00:38:17.549 --> 00:38:20.539 to additional regulation. 00:38:20.539 --> 00:38:21.990 Another problem that's come up, 00:38:21.990 --> 00:38:23.539 for example with some of the studies 00:38:23.539 --> 00:38:24.900 that Facebook has done, 00:38:24.900 --> 00:38:29.039 is sometimes a lack of ethics review. 00:38:29.039 --> 00:38:31.059 So, for example, in academia, 00:38:31.059 --> 00:38:33.859 if you're gonna do research involving humans, 00:38:33.859 --> 00:38:35.390 there's an Institutional Review Board 00:38:35.390 --> 00:38:36.970 that you go to that verifies that 00:38:36.970 --> 00:38:39.140 you're doing things in an ethical manner. 00:38:39.140 --> 00:38:40.910 And some companies do have internal 00:38:40.910 --> 00:38:43.029 review processes like this, but it might 00:38:43.029 --> 00:38:45.119 be important to have an independent 00:38:45.119 --> 00:38:48.200 ethics board that does this sort of thing. 00:38:48.200 --> 00:38:50.849 And we really need 3rd-party auditing. 00:38:50.849 --> 00:38:54.519 So, for example, some companies 00:38:54.519 --> 00:38:56.220 don't want auditing to be done 00:38:56.220 --> 00:38:59.190 because of IP concerns, 00:38:59.190 --> 00:39:00.579 and if that's the concern 00:39:00.579 --> 00:39:03.180 maybe having a set of people 00:39:03.180 --> 00:39:05.680 that are not paid by the company 00:39:05.680 --> 00:39:07.200 to check how some of these systems 00:39:07.200 --> 00:39:08.640 are being implemented, 00:39:08.640 --> 00:39:11.240 could help give us confidence that 00:39:11.240 --> 00:39:16.979 things are being done in a reasonable way. 00:39:16.979 --> 00:39:20.269 So, in closing, 00:39:20.269 --> 00:39:23.180 algorithmic decision making is here, 00:39:23.180 --> 00:39:26.140 and it's barreling forward at a very fast rate, 00:39:26.140 --> 00:39:27.890 and we need to figure out what 00:39:27.890 --> 00:39:30.410 the guide rails should be, 00:39:30.410 --> 00:39:31.380 and how to install them 00:39:31.380 --> 00:39:33.119 to handle some of the potential threats. 00:39:33.119 --> 00:39:35.470 There's a huge amount of power here. 00:39:35.470 --> 00:39:37.910 We need more openness in these systems. 00:39:37.910 --> 00:39:39.589 And, right now, 00:39:39.589 --> 00:39:41.559 with the intelligent systems that do exist, 00:39:41.559 --> 00:39:43.920 we don't know what's occurring really, 00:39:43.920 --> 00:39:46.510 and we need to watch carefully 00:39:46.510 --> 00:39:49.099 where and how these systems are being used. 00:39:49.099 --> 00:39:50.690 And I think this community has 00:39:50.690 --> 00:39:53.940 an important role to play in this fight, 00:39:53.940 --> 00:39:55.730 to study what's being done, 00:39:55.730 --> 00:39:57.160 to show people what's being done, 00:39:57.160 --> 00:39:58.670 to raise the debate and advocate, 00:39:58.670 --> 00:40:01.200 and, where necessary, to resist. 00:40:01.200 --> 00:40:03.339 Thanks. 00:40:03.339 --> 00:40:13.129 applause 00:40:13.129 --> 00:40:17.519 Herald: So, let's have a question and answer. 00:40:17.519 --> 00:40:19.080 Microphone 2, please. 00:40:19.080 --> 00:40:20.199 Mic 2: Hi there. 00:40:20.199 --> 00:40:23.259 Thanks for the talk. 00:40:23.259 --> 00:40:26.230 Since these pre-crime softwares also 00:40:26.230 --> 00:40:27.359 arrived here in Germany 00:40:27.359 --> 00:40:29.680 with the start of the so-called CopWatch system 00:40:29.680 --> 00:40:32.779 in southern Germany, and Bavaria and Nuremberg especially, 00:40:32.779 --> 00:40:35.420 where they try to predict burglary crime 00:40:35.420 --> 00:40:37.460 using that criminal record 00:40:37.460 --> 00:40:40.170 geographical analysis, like you explained, 00:40:40.170 --> 00:40:43.380 leads me to a 2-fold question: 00:40:43.380 --> 00:40:47.900 first, have you heard of any research 00:40:47.900 --> 00:40:49.760 that measures the effectiveness 00:40:49.760 --> 00:40:53.690 of such measures, at all? 00:40:53.690 --> 00:40:57.040 And, second: 00:40:57.040 --> 00:41:00.599 What do you think of the game theory 00:41:00.599 --> 00:41:02.690 if the thieves or the bad guys 00:41:02.690 --> 00:41:07.619 know the system, and when they game the system, 00:41:07.619 --> 00:41:09.980 they will probably win, 00:41:09.980 --> 00:41:11.640 since one police officer in an interview said 00:41:11.640 --> 00:41:14.019 this system is used to reduce 00:41:14.019 --> 00:41:16.460 the personal costs of policing, 00:41:16.460 --> 00:41:19.460 so they just send the guys where the red flags are, 00:41:19.460 --> 00:41:22.290 and the others take the day off. 00:41:22.290 --> 00:41:24.360 Dr. Helsby: Yup. 00:41:24.360 --> 00:41:27.150 Um, so, with respect to 00:41:27.150 --> 00:41:30.990 testing the effectiveness of predictive policing, 00:41:30.990 --> 00:41:31.990 the companies, 00:41:31.990 --> 00:41:33.910 some of them do randomized, controlled trials 00:41:33.910 --> 00:41:35.240 and claim a reduction in policing. 00:41:35.240 --> 00:41:38.349 The best independent study that I've seen 00:41:38.349 --> 00:41:40.680 is by this RAND Corporation 00:41:40.680 --> 00:41:43.120 that did a study in, I think, 00:41:43.120 --> 00:41:44.920 Shreveport, Louisiana, 00:41:44.920 --> 00:41:47.589 and in their report they claim 00:41:47.589 --> 00:41:50.190 that there was no statistically significant 00:41:50.190 --> 00:41:52.900 difference, they didn't find any reduction. 00:41:52.900 --> 00:41:54.099 And it was specifically looking at 00:41:54.099 --> 00:41:56.730 property crime, which I think you mentioned. 00:41:56.730 --> 00:41:59.480 So, I think right now there's sort of 00:41:59.480 --> 00:42:01.069 conflicting reports between 00:42:01.069 --> 00:42:06.180 the independent auditors and these company claims. 00:42:06.180 --> 00:42:09.289 So there definitely needs to be more study. 00:42:09.289 --> 00:42:12.240 And then, the 2nd thing...sorry, remind me what it was? 00:42:12.240 --> 00:42:15.189 Mic 2: What about the guys gaming the system? 00:42:15.189 --> 00:42:16.949 Dr. Helsby: Oh, yeah. 00:42:16.949 --> 00:42:18.900 I think it's a legitimate concern. 00:42:18.900 --> 00:42:22.480 Like, if all the outputs were just immediately public, 00:42:22.480 --> 00:42:24.599 then, yes, everyone knows the location 00:42:24.599 --> 00:42:26.549 of all police officers, 00:42:26.549 --> 00:42:29.009 and I imagine that people would have 00:42:29.009 --> 00:42:30.779 a problem with that. 00:42:30.779 --> 00:42:32.679 Yup. 00:42:32.679 --> 00:42:35.990 Heraldl: Microphone #4, please. 00:42:35.990 --> 00:42:39.369 Mic 4: Yeah, this is not actually a question, 00:42:39.369 --> 00:42:40.779 but just a comment. 00:42:40.779 --> 00:42:42.970 I've enjoyed your talk very much, 00:42:42.970 --> 00:42:47.789 in particular after watching 00:42:47.789 --> 00:42:52.270 the talk in Hall 1 earlier in the afternoon. 00:42:52.270 --> 00:42:55.730 The "Say Hi to Your New Boss", about 00:42:55.730 --> 00:42:59.609 algorithms that are trained with big data, 00:42:59.609 --> 00:43:02.390 and finally make decisions. 00:43:02.390 --> 00:43:08.210 And I think these 2 talks are kind of complementary, 00:43:08.210 --> 00:43:11.309 and if people are interested in the topic 00:43:11.309 --> 00:43:14.710 they might want to check out the other talk 00:43:14.710 --> 00:43:16.259 and watch it later, because these 00:43:16.259 --> 00:43:17.319 fit very well together. 00:43:17.319 --> 00:43:19.589 Dr. Helsby: Yeah, it was a great talk. 00:43:19.589 --> 00:43:22.130 Herald: Microphone #2, please. 00:43:22.130 --> 00:43:25.049 Mic 2: Um, yeah, you mentioned 00:43:25.049 --> 00:43:27.319 the need to have some kind of 3rd-party auditing 00:43:27.319 --> 00:43:30.900 or some kind of way to 00:43:30.900 --> 00:43:31.930 peek into these algorithms 00:43:31.930 --> 00:43:33.079 and to see what they're doing, 00:43:33.079 --> 00:43:34.420 and to see if they're being fair. 00:43:34.420 --> 00:43:36.199 Can you talk a little bit more about that? 00:43:36.199 --> 00:43:38.059 Like, going forward, 00:43:38.059 --> 00:43:40.690 some kind of regulatory structures 00:43:40.690 --> 00:43:44.200 would probably have to emerge 00:43:44.200 --> 00:43:47.200 to analyze and to look at 00:43:47.200 --> 00:43:49.339 these black boxes that are just sort of 00:43:49.339 --> 00:43:51.309 popping up everywhere and, you know, 00:43:51.309 --> 00:43:52.939 controlling more and more of the things 00:43:52.939 --> 00:43:56.150 in our lives, and important decisions. 00:43:56.150 --> 00:43:58.539 So, just, what kind of discussions 00:43:58.539 --> 00:43:59.460 are there for that? 00:43:59.460 --> 00:44:01.809 And what kind of possibility is there for that? 00:44:01.809 --> 00:44:04.900 And, I'm sure that companies would be 00:44:04.900 --> 00:44:08.000 very, very resistant to 00:44:08.000 --> 00:44:09.890 any kind of attempt to look into 00:44:09.890 --> 00:44:13.890 algorithms, and to... 00:44:13.890 --> 00:44:15.070 Dr. Helsby: Yeah, I mean, definitely 00:44:15.070 --> 00:44:18.069 companies would be very resistant to 00:44:18.069 --> 00:44:19.670 having people look into their algorithms. 00:44:19.670 --> 00:44:22.190 So, if you wanna do a very rigorous 00:44:22.190 --> 00:44:23.339 audit of what's going on 00:44:23.339 --> 00:44:25.660 then it's probably necessary to have 00:44:25.660 --> 00:44:26.589 a few people come in 00:44:26.589 --> 00:44:28.900 and sign NDAs, and then 00:44:28.900 --> 00:44:31.039 look through the systems. 00:44:31.039 --> 00:44:33.140 So, that's one way to proceed. 00:44:33.140 --> 00:44:35.049 But, another way to proceed that-- 00:44:35.049 --> 00:44:38.720 so, these academic researchers have done 00:44:38.720 --> 00:44:40.009 a few experiments 00:44:40.009 --> 00:44:42.809 and found some interesting things, 00:44:42.809 --> 00:44:45.500 and that's sort all the attempts at auditing 00:44:45.500 --> 00:44:46.450 that we've seen: 00:44:46.450 --> 00:44:48.490 there was 1 attempt in 2012 for the Obama campaign, 00:44:48.490 --> 00:44:49.910 but there's really not been any 00:44:49.910 --> 00:44:51.500 sort of systematic attempt-- 00:44:51.500 --> 00:44:52.589 you know, like, in censorship 00:44:52.589 --> 00:44:54.539 we see a systematic attempt to 00:44:54.539 --> 00:44:56.779 do measurement as often as possible, 00:44:56.779 --> 00:44:58.240 check what's going on, 00:44:58.240 --> 00:44:59.339 and that itself, you know, 00:44:59.339 --> 00:45:00.900 can act as an oversight mechanism. 00:45:00.900 --> 00:45:01.880 But, right now, 00:45:01.880 --> 00:45:03.900 I think many of these companies 00:45:03.900 --> 00:45:05.259 realize no one is watching, 00:45:05.259 --> 00:45:07.160 so there's no real push to have 00:45:07.160 --> 00:45:10.440 people verify: are you being fair when you 00:45:10.440 --> 00:45:11.539 implement this system? 00:45:11.539 --> 00:45:12.969 Because no one's really checking. 00:45:12.969 --> 00:45:13.980 Mic 2: Do you think that, 00:45:13.980 --> 00:45:15.339 at some point, it would be like 00:45:15.339 --> 00:45:19.059 an FDA or SEC, to give some American examples... 00:45:19.059 --> 00:45:21.490 an actual government regulatory agency 00:45:21.490 --> 00:45:24.960 that has the power and ability to 00:45:24.960 --> 00:45:27.930 not just sort of look and try to 00:45:27.930 --> 00:45:31.710 reverse engineer some of these algorithms, 00:45:31.710 --> 00:45:33.920 but actually peek in there and make sure 00:45:33.920 --> 00:45:36.420 that things are fair, because it seems like 00:45:36.420 --> 00:45:38.240 there's just-- it's so important now 00:45:38.240 --> 00:45:41.769 that, again, it could be the difference between 00:45:41.769 --> 00:45:42.930 life and death, between 00:45:42.930 --> 00:45:44.589 getting a job, not getting a job, 00:45:44.589 --> 00:45:46.130 being pulled over, not being pulled over, 00:45:46.130 --> 00:45:48.069 being racially profiled, not racially profiled, 00:45:48.069 --> 00:45:49.410 things like that. Dr. Helsby: Right. 00:45:49.410 --> 00:45:50.430 Mic 2: Is it moving in that direction? 00:45:50.430 --> 00:45:52.249 Or is it way too early for it? 00:45:52.249 --> 00:45:55.110 Dr. Helsby: I mean, so some people have... 00:45:55.110 --> 00:45:56.859 someone has called for, like, 00:45:56.859 --> 00:45:59.079 a Federal Search Commission, 00:45:59.079 --> 00:46:00.930 or like a Federal Algorithms Commission, 00:46:00.930 --> 00:46:03.200 that would do this sort of oversight work, 00:46:03.200 --> 00:46:06.130 but it's in such early stages right now 00:46:06.130 --> 00:46:09.970 that there's no real push for that. 00:46:09.970 --> 00:46:13.330 But I think it's a good idea. 00:46:13.330 --> 00:46:15.729 Herald: And again, #2 please. 00:46:15.729 --> 00:46:17.059 Mic 2: Thank you again for your talk. 00:46:17.059 --> 00:46:19.309 I was just curious if you can point 00:46:19.309 --> 00:46:20.440 to any examples of 00:46:20.440 --> 00:46:22.619 either current producers or consumers 00:46:22.619 --> 00:46:24.029 of these algorithmic systems 00:46:24.029 --> 00:46:26.390 who are actively and publicly trying 00:46:26.390 --> 00:46:27.720 to do so in a responsible manner 00:46:27.720 --> 00:46:29.720 by describing what they're trying to do 00:46:29.720 --> 00:46:31.380 and how they're going about it? 00:46:31.380 --> 00:46:37.210 Dr. Helsby: So, yeah, there are some companies, 00:46:37.210 --> 00:46:39.000 for example, like DataKind, 00:46:39.000 --> 00:46:42.710 that try to deploy algorithmic systems 00:46:42.710 --> 00:46:44.640 in as responsible a way as possible, 00:46:44.640 --> 00:46:47.250 for like public policy. 00:46:47.250 --> 00:46:49.549 Like, I actually also implement systems 00:46:49.549 --> 00:46:51.750 for public policy in a transparent way. 00:46:51.750 --> 00:46:54.329 Like, all the code is in GitHub, etc. 00:46:54.329 --> 00:47:00.020 And it is also the case to give credit to 00:47:00.020 --> 00:47:01.990 Google, and these giants, 00:47:01.990 --> 00:47:06.109 they're trying to implement transparency systems 00:47:06.109 --> 00:47:08.170 that help you understand. 00:47:08.170 --> 00:47:09.289 This has been done with respect to 00:47:09.289 --> 00:47:12.329 how your data is being collected, 00:47:12.329 --> 00:47:14.579 but for example if you go on Amazon.com 00:47:14.579 --> 00:47:17.890 you can see a recommendation has been made, 00:47:17.890 --> 00:47:19.420 and that is pretty transparent. 00:47:19.420 --> 00:47:21.480 You can see "this item was recommended to me," 00:47:21.480 --> 00:47:25.039 so you know that prediction is being used in this case, 00:47:25.039 --> 00:47:27.089 and it will say why prediction is being used: 00:47:27.089 --> 00:47:29.230 because you purchased some item. 00:47:29.230 --> 00:47:30.380 And Google has a similar thing, 00:47:30.380 --> 00:47:32.420 if you go to like Google Ad Settings, 00:47:32.420 --> 00:47:35.249 you can even turn off personalization of ads 00:47:35.249 --> 00:47:36.380 if you want, 00:47:36.380 --> 00:47:38.119 and you can also see some of the inferences 00:47:38.119 --> 00:47:39.400 that have been learned about you. 00:47:39.400 --> 00:47:40.819 A subset of the inferences that have been 00:47:40.819 --> 00:47:41.700 learned about you. 00:47:41.700 --> 00:47:43.940 So, like, what interests... 00:47:43.940 --> 00:47:47.869 Herald: A question from the internet, please? 00:47:47.869 --> 00:47:50.930 Signal Angel: Yes, billetQ is asking 00:47:50.930 --> 00:47:54.479 how do you avoid biases in machine learning? 00:47:54.479 --> 00:47:57.380 I asume analysis system, for example, 00:47:57.380 --> 00:48:00.420 could be biased against women and minorities, 00:48:00.420 --> 00:48:04.960 if used for hiring decisions based on known data. 00:48:04.960 --> 00:48:06.499 Dr. Helsby: Yeah, so one thing is to 00:48:06.499 --> 00:48:08.529 just explicitly check. 00:48:08.529 --> 00:48:12.199 So, you can check to see how 00:48:12.199 --> 00:48:14.309 positive outcomes are being distributed 00:48:14.309 --> 00:48:16.779 among those protected classes. 00:48:16.779 --> 00:48:19.210 You could also incorporate these sort of 00:48:19.210 --> 00:48:21.440 fairness constraints in the function 00:48:21.440 --> 00:48:24.069 that you optimize when you train the system, 00:48:24.069 --> 00:48:25.950 and so, if you're interested in reading more 00:48:25.950 --> 00:48:28.960 about this, the 2 papers-- 00:48:28.960 --> 00:48:31.909 let me go to References-- 00:48:31.909 --> 00:48:32.730 there's a good paper called 00:48:32.730 --> 00:48:35.339 Fairness Through Awareness that describes 00:48:35.339 --> 00:48:37.499 how to go about doing this, 00:48:37.499 --> 00:48:39.579 so I recommend this person read that. 00:48:39.579 --> 00:48:40.970 It's good. 00:48:40.970 --> 00:48:43.400 Herald: Microphone 2, please. 00:48:43.400 --> 00:48:45.400 Mic2: Thanks again for your talk. 00:48:45.400 --> 00:48:49.649 Umm, hello? 00:48:49.649 --> 00:48:50.999 Okay. 00:48:50.999 --> 00:48:52.960 Umm, I see of course a problem with 00:48:52.960 --> 00:48:54.619 all the black boxes that you describe 00:48:54.619 --> 00:48:57.069 with regards for the crime systems, 00:48:57.069 --> 00:48:59.569 but when we look at the advertising systems 00:48:59.569 --> 00:49:02.169 in many cases they are very networked. 00:49:02.169 --> 00:49:04.160 There are many different systems collaborating 00:49:04.160 --> 00:49:07.109 and exchanging data via open APIs: 00:49:07.109 --> 00:49:08.720 RESTful APIs, and various 00:49:08.720 --> 00:49:11.720 demand-side platforms and audience-exchange platforms, 00:49:11.720 --> 00:49:12.539 and everything. 00:49:12.539 --> 00:49:15.420 So, can that help to at least 00:49:15.420 --> 00:49:22.160 increase awareness on where targeting, personalization 00:49:22.160 --> 00:49:23.679 might be happening? 00:49:23.679 --> 00:49:26.190 I mean, I'm looking at systems like 00:49:26.190 --> 00:49:29.539 BuiltWith, that surface what kind of 00:49:29.539 --> 00:49:31.380 JavaScript libraries are used elsewhere. 00:49:31.380 --> 00:49:32.999 So, is that something that could help 00:49:32.999 --> 00:49:35.670 at least to give a better awareness 00:49:35.670 --> 00:49:38.690 and listing all the points where 00:49:38.690 --> 00:49:41.409 you might be targeted... 00:49:41.409 --> 00:49:43.070 Dr. Helsby: So, like, with respect to 00:49:43.070 --> 00:49:46.460 advertising, the fact that there is behind the scenes 00:49:46.460 --> 00:49:48.450 this like complicated auction process 00:49:48.450 --> 00:49:50.650 that's occurring, just makes things 00:49:50.650 --> 00:49:51.819 a lot more complicated. 00:49:51.819 --> 00:49:54.170 So, for example, I said briefly 00:49:54.170 --> 00:49:57.269 that they found that there's this statistical difference 00:49:57.269 --> 00:49:59.099 between how men and women are treated, 00:49:59.099 --> 00:50:01.339 but it doesn't necessarily mean that 00:50:01.339 --> 00:50:03.640 "Oh, the algorithm is definitely biased." 00:50:03.640 --> 00:50:06.369 It could be because of this auction process, 00:50:06.369 --> 00:50:10.569 it could be that women are considered 00:50:10.569 --> 00:50:12.630 more valuable when it comes to advertising, 00:50:12.630 --> 00:50:15.099 and so these executive ads are getting 00:50:15.099 --> 00:50:17.160 outbid by some other ads, 00:50:17.160 --> 00:50:18.890 and so there's a lot of potential 00:50:18.890 --> 00:50:20.490 causes for that. 00:50:20.490 --> 00:50:22.829 So, I think it just makes things a lot more complicated. 00:50:22.829 --> 00:50:25.910 I don't know if it helps with the bias at all. 00:50:25.910 --> 00:50:27.410 Mic 2: Well, the question was more 00:50:27.410 --> 00:50:30.299 a direction... can it help to surface 00:50:30.299 --> 00:50:32.499 and make people aware of that fact? 00:50:32.499 --> 00:50:34.930 I mean, I can talk to my kids probably, 00:50:34.930 --> 00:50:36.259 and they will probably understand, 00:50:36.259 --> 00:50:38.420 but I can't explain that to my grandma, 00:50:38.420 --> 00:50:43.150 who's also, umm, looking at an iPad. 00:50:43.150 --> 00:50:44.289 Dr. Helsby: So, the fact that 00:50:44.289 --> 00:50:45.690 the systems are... 00:50:45.690 --> 00:50:48.509 I don't know if I understand. 00:50:48.509 --> 00:50:50.529 Mic 2: OK. I think that the main problem 00:50:50.529 --> 00:50:53.710 is that we are behind the industry efforts 00:50:53.710 --> 00:50:57.179 to being targeted at, and many people 00:50:57.179 --> 00:51:00.579 do know, but a lot more people don't know, 00:51:00.579 --> 00:51:03.160 and making them aware of the fact 00:51:03.160 --> 00:51:07.269 that they are a target, in a way, 00:51:07.269 --> 00:51:10.990 is something that can only be shown 00:51:10.990 --> 00:51:14.779 by a 3rd party that disposed that data, 00:51:14.779 --> 00:51:16.339 and make audits in a way-- 00:51:16.339 --> 00:51:17.929 maybe in an automated way. 00:51:17.929 --> 00:51:19.170 Dr. Helsby: Right. 00:51:19.170 --> 00:51:21.410 Yeah, I think it certainly could help with advocacy 00:51:21.410 --> 00:51:23.059 if that's the point, yeah. 00:51:23.059 --> 00:51:26.079 Herald: Another question from the internet, please. 00:51:26.079 --> 00:51:29.319 Signal Angel: Yes, on IRC they are asking 00:51:29.319 --> 00:51:31.440 if we know that prediction in some cases 00:51:31.440 --> 00:51:34.460 provides an influence that cannot be controlled. 00:51:34.460 --> 00:51:38.480 So, r4v5 would like to know from you 00:51:38.480 --> 00:51:41.519 if there are some cases or areas where 00:51:41.519 --> 00:51:45.060 machine learning simply shouldn't go? 00:51:45.060 --> 00:51:48.349 Dr. Helsby: Umm, so I think... 00:51:48.349 --> 00:51:52.559 I mean, yes, I think that it is the case 00:51:52.559 --> 00:51:54.650 that in some cases machine learning 00:51:54.650 --> 00:51:56.180 might not be appropriate. 00:51:56.180 --> 00:51:58.359 For example, if you use machine learning 00:51:58.359 --> 00:52:00.970 to decide who should be searched. 00:52:00.970 --> 00:52:02.619 I don't think it should be the case that 00:52:02.619 --> 00:52:03.809 machine learning algorithms should 00:52:03.809 --> 00:52:05.440 ever be used to determine 00:52:05.440 --> 00:52:08.430 probable cause, or something like that. 00:52:08.430 --> 00:52:12.339 So, if it's just one piece of evidence 00:52:12.339 --> 00:52:13.299 that you consider, 00:52:13.299 --> 00:52:14.990 and there's human oversight always, 00:52:14.990 --> 00:52:18.519 maybe it's fine, but 00:52:18.519 --> 00:52:20.839 we should be very suspicious and hesitant 00:52:20.839 --> 00:52:22.119 in certain contexts where 00:52:22.119 --> 00:52:24.529 the ramifications are very serious. 00:52:24.529 --> 00:52:27.259 Like the No Fly List, and so on. 00:52:27.259 --> 00:52:29.200 Herald: And #2 again. 00:52:29.200 --> 00:52:30.809 Mic 2: A second question 00:52:30.809 --> 00:52:33.509 that just occurred to me, if you don't mind. 00:52:33.509 --> 00:52:35.339 Umm, until the advent of 00:52:35.339 --> 00:52:36.559 algorithmic systems, 00:52:36.559 --> 00:52:40.470 when there've been cases of serious harm 00:52:40.470 --> 00:52:42.799 that's been resulted in individuals or groups, 00:52:42.799 --> 00:52:44.579 and it's been demonstrated that 00:52:44.579 --> 00:52:46.029 it's occurred because of 00:52:46.029 --> 00:52:49.400 an individual or a system of people 00:52:49.400 --> 00:52:53.019 being systematically biased, then often 00:52:53.019 --> 00:52:55.130 one of the actions that's taken is 00:52:55.130 --> 00:52:56.869 pressure's applied, and then 00:52:56.869 --> 00:52:59.660 people are required to change, 00:52:59.660 --> 00:53:01.049 and hopely be held responsible, 00:53:01.049 --> 00:53:02.910 and then change the way that they do things 00:53:02.910 --> 00:53:06.400 to try to remove bias from that system. 00:53:06.400 --> 00:53:07.839 What's the current thinking about 00:53:07.839 --> 00:53:10.299 how we can go about doing that 00:53:10.299 --> 00:53:12.599 when the systems that are doing that 00:53:12.599 --> 00:53:13.650 are algorithmic? 00:53:13.650 --> 00:53:15.999 Is it just going to be human oversight, 00:53:15.999 --> 00:53:16.910 and humans are gonna have to be 00:53:16.910 --> 00:53:18.379 held responsible for the oversight? 00:53:18.379 --> 00:53:20.890 Dr. Helsby: So, in terms of bias, 00:53:20.890 --> 00:53:22.569 if we're concerned about bias towards 00:53:22.569 --> 00:53:24.019 particular types of people, 00:53:24.019 --> 00:53:25.710 that's something that we can optimize for. 00:53:25.710 --> 00:53:28.839 So, we can train systems that are unbiased 00:53:28.839 --> 00:53:30.019 in this way. 00:53:30.019 --> 00:53:32.109 So that's one way to deal with it. 00:53:32.109 --> 00:53:34.039 But there's always gonna be errors, 00:53:34.039 --> 00:53:35.420 so that's sort of a separate issue 00:53:35.420 --> 00:53:37.509 from the bias, and in the case 00:53:37.509 --> 00:53:39.180 where there are errors, 00:53:39.180 --> 00:53:40.539 there must be oversight. 00:53:40.539 --> 00:53:45.079 So, one way that one could improve 00:53:45.079 --> 00:53:46.410 the way that this is done 00:53:46.410 --> 00:53:48.160 is by making sure that you're 00:53:48.160 --> 00:53:50.799 keeping track of confidence of decisions. 00:53:50.799 --> 00:53:54.039 So, if you have a low confidence prediction, 00:53:54.039 --> 00:53:56.259 then maybe a human should come in and check things. 00:53:56.259 --> 00:53:58.809 So, that might be one way to proceed. 00:54:02.099 --> 00:54:03.990 Herald: So, there's no more question. 00:54:03.990 --> 00:54:06.199 I close this talk now, 00:54:06.199 --> 00:54:08.239 and thank you very much 00:54:08.239 --> 00:54:09.410 and a big applause to 00:54:09.410 --> 00:54:11.780 Jennifer Helsby! 00:54:11.780 --> 00:54:16.310 roaring applause 00:54:16.310 --> 00:54:28.000 subtitles created by c3subtitles.de Join, and help us!