0:00:00.000,0:00:08.895 Musik 0:00:08.895,0:00:20.040 Herald: Who of you is using Facebook? Twitter? [br]Diaspora? 0:00:20.040,0:00:27.630 concerned noise And all of that data[br]you enter there 0:00:27.630,0:00:34.240 gets to server, gets into the hand of somebody[br]who's using it 0:00:34.240,0:00:38.519 and the next talk[br]is especially about that, 0:00:38.519,0:00:43.879 because there's also intelligent machines[br]and intelligent algorithms 0:00:43.879,0:00:47.489 that try to make something[br]out of that data. 0:00:47.489,0:00:50.920 So the post-doc researcher Jennifer Helsby 0:00:50.920,0:00:55.839 of the University of Chicago,[br]which works in this 0:00:55.839,0:00:59.370 intersection between policy and [br]technology, 0:00:59.370,0:01:04.709 will now ask you the question:[br]To who would we give that power? 0:01:04.709,0:01:12.860 Dr. Helsby: Thanks.[br]applause 0:01:12.860,0:01:17.090 Okay, so, today I'm gonna do a brief tour[br]of intelligent systems 0:01:17.090,0:01:18.640 and how they're currently used 0:01:18.640,0:01:21.760 and then we're gonna look at some examples[br]with respect 0:01:21.760,0:01:23.710 to the properties that we might care about 0:01:23.710,0:01:26.000 these systems having,[br]and I'll talk a little bit about 0:01:26.000,0:01:27.940 some of the work that's been done in academia 0:01:27.940,0:01:28.680 on these topics. 0:01:28.680,0:01:31.780 And then we'll talk about some[br]promising paths forward. 0:01:31.780,0:01:37.040 So, I wanna start with this:[br]Kranzberg's First Law of Technology 0:01:37.040,0:01:40.420 So, it's not good or bad,[br]but it also isn't neutral. 0:01:40.420,0:01:42.980 Technology shapes our world,[br]and it can act as 0:01:42.980,0:01:46.140 a liberating force-- or an oppressive and[br]controlling force. 0:01:46.140,0:01:49.730 So, in this talk, I'm gonna go[br]towards some of the aspects 0:01:49.730,0:01:53.830 of intelligent systems that might be more[br]controlling in nature. 0:01:53.830,0:01:56.060 So, as we all know, 0:01:56.060,0:01:59.770 because of the rapidly decreasing cost[br]of storage and computation, 0:01:59.770,0:02:02.170 along with the rise of new sensor technologies, 0:02:02.170,0:02:05.510 data collection devices[br]are being pushed into every 0:02:05.510,0:02:08.329 aspect of our lives: in our homes, our cars, 0:02:08.329,0:02:10.469 in our pockets, on our wrists. 0:02:10.469,0:02:13.280 And data collection systems act as intermediaries 0:02:13.280,0:02:15.230 for a huge amount of human communication. 0:02:15.230,0:02:17.900 And much of this data sits in government 0:02:17.900,0:02:19.860 and corporate databases. 0:02:19.860,0:02:23.090 So, in order to make use of this data, 0:02:23.090,0:02:27.280 we need to be able to make some inferences. 0:02:27.280,0:02:30.280 So, one way of approaching this is I can hire 0:02:30.280,0:02:32.310 a lot of humans, and I can have these humans 0:02:32.310,0:02:34.990 manually examine the data, and they can acquire 0:02:34.990,0:02:36.900 expert knowledge of the domain, and then 0:02:36.900,0:02:38.510 perhaps they can make some decisions 0:02:38.510,0:02:40.830 or at least some recommendations[br]based on it. 0:02:40.830,0:02:43.030 However, there's some problems with this. 0:02:43.030,0:02:45.810 One is that it's slow, and thus expensive. 0:02:45.810,0:02:48.060 It's also biased. We know that humans have 0:02:48.060,0:02:50.700 all sorts of biases, both conscious and unconscious, 0:02:50.700,0:02:53.390 and it would be nice to have a system[br]that did not have 0:02:53.390,0:02:54.959 these inaccuracies. 0:02:54.959,0:02:57.069 It's also not very transparent: I might 0:02:57.069,0:02:58.910 not really know the factors that led to 0:02:58.910,0:03:00.930 some decisions being made. 0:03:00.930,0:03:03.360 Even humans themselves[br]often don't really understand 0:03:03.360,0:03:05.360 why they came to a given decision, because 0:03:05.360,0:03:08.130 of their being emotional in nature. 0:03:08.130,0:03:11.530 And, thus, these human decision making systems 0:03:11.530,0:03:13.170 are often difficult to audit. 0:03:13.170,0:03:15.819 So, another way to proceed is maybe instead 0:03:15.819,0:03:18.000 I study the system and the data carefully 0:03:18.000,0:03:20.520 and I write down the best rules[br]for making a decision 0:03:20.520,0:03:23.280 or, I can have a machine[br]dynamically figure out 0:03:23.280,0:03:25.459 the best rules, as in machine learning. 0:03:25.459,0:03:28.640 So, maybe this is a better approach. 0:03:28.640,0:03:32.230 It's certainly fast, and thus cheap. 0:03:32.230,0:03:34.290 And maybe I can construct[br]the system in such a way 0:03:34.290,0:03:37.090 that it doesn't have the biases that are inherent 0:03:37.090,0:03:39.209 in human decision making. 0:03:39.209,0:03:41.560 And, since I've written these rules down, 0:03:41.560,0:03:42.819 or a computer has learned these rules, 0:03:42.819,0:03:45.140 then I can just show them to somebody, right? 0:03:45.140,0:03:46.819 And then they can audit it. 0:03:46.819,0:03:49.020 So, more and more decision making is being 0:03:49.020,0:03:50.750 done in this way. 0:03:50.750,0:03:53.170 And so, in this model, we take data 0:03:53.170,0:03:55.709 we make an inference based on that data 0:03:55.709,0:03:58.120 using these algorithms, and then 0:03:58.120,0:03:59.420 we can take actions. 0:03:59.420,0:04:01.860 And, when we take this more scientific approach 0:04:01.860,0:04:04.200 to making decisions and optimizing for 0:04:04.200,0:04:07.310 a desired outcome,[br]we can take an experimental approach 0:04:07.310,0:04:10.080 so we can determine[br]which actions are most effective 0:04:10.080,0:04:12.310 in achieving a desired outcome. 0:04:12.310,0:04:14.010 Maybe there are some types of communication 0:04:14.010,0:04:16.750 styles that are most effective[br]with certain people. 0:04:16.750,0:04:19.510 I can perhaps deploy some individualized incentives 0:04:19.510,0:04:22.060 to get the outcome that I desire. 0:04:22.060,0:04:25.990 And, maybe even if I carefully design an experiment 0:04:25.990,0:04:27.810 with the environment in which people make 0:04:27.810,0:04:30.699 these decisions, perhaps even very small changes 0:04:30.699,0:04:34.250 can introduce significant changes[br]in peoples' behavior. 0:04:34.250,0:04:37.320 So, through these mechanisms,[br]and this experimental approach, 0:04:37.320,0:04:39.840 I can maximize the probability[br]that humans do 0:04:39.840,0:04:42.020 what I want. 0:04:42.020,0:04:45.380 So, algorithmic decision making is being used 0:04:45.380,0:04:47.270 in industry, and is used[br]in lots of other areas, 0:04:47.270,0:04:49.530 from astrophysics to medicine, and is now 0:04:49.530,0:04:52.199 moving into new domains, including 0:04:52.199,0:04:53.990 government applications. 0:04:53.990,0:04:58.560 So, we have recommendation engines like[br]Netflix, Yelp, SoundCloud, 0:04:58.560,0:05:00.699 that direct our attention to what we should 0:05:00.699,0:05:03.510 watch and listen to. 0:05:03.510,0:05:07.919 Since 2009, Google uses[br]personalized searched results, 0:05:07.919,0:05:12.840 including if you're not logged in[br]into your Google account. 0:05:12.840,0:05:15.389 And we also have algorithm curation and filtering, 0:05:15.389,0:05:17.530 as in the case of Facebook News Feed, 0:05:17.530,0:05:19.870 Google News, Yahoo News, 0:05:19.870,0:05:22.840 which shows you what news articles, for example, 0:05:22.840,0:05:24.330 you should be looking at. 0:05:24.330,0:05:25.650 And this is important, because a lot of people 0:05:25.650,0:05:29.410 get news from these media. 0:05:29.410,0:05:31.520 We even have algorithmic journalists! 0:05:31.520,0:05:35.240 So, automatic systems generate articles 0:05:35.240,0:05:36.880 about weather, traffic, or sports 0:05:36.880,0:05:38.729 instead of a human. 0:05:38.729,0:05:41.949 And, another application that's more recent 0:05:41.949,0:05:43.570 is the use of predictive systems 0:05:43.570,0:05:45.180 in political campaigns. 0:05:45.180,0:05:47.370 So, political campaigns also now take this 0:05:47.370,0:05:50.340 approach to predict on an individual basis 0:05:50.340,0:05:53.300 which candidate voters[br]are likely to vote for. 0:05:53.300,0:05:55.500 And then they can target,[br]on an individual basis, 0:05:55.500,0:05:58.199 those that can be persuaded otherwise. 0:05:58.199,0:06:00.830 And, finally, in the public sector, 0:06:00.830,0:06:02.710 we're starting to use predictive systems 0:06:02.710,0:06:06.320 in areas from policing, to health,[br]to education and energy. 0:06:06.320,0:06:08.979 So, there are some advantages to this. 0:06:08.979,0:06:12.790 So, one thing is that we can automate 0:06:12.790,0:06:15.759 aspects of our lives[br]that we consider to be mundane 0:06:15.759,0:06:17.620 using systems that are intelligent 0:06:17.620,0:06:19.580 and adaptive enough. 0:06:19.580,0:06:21.680 We can make use of all the data 0:06:21.680,0:06:23.990 and really get the pieces of information we 0:06:23.990,0:06:25.830 really care about. 0:06:25.830,0:06:29.650 We can spend money in the most effective way, 0:06:29.650,0:06:32.110 and we can do this with this experimental 0:06:32.110,0:06:34.210 approach to optimize actions to produce 0:06:34.210,0:06:35.190 desired outcomes. 0:06:35.190,0:06:37.300 So, we can embed intelligence 0:06:37.300,0:06:39.520 into all of these mundane objects 0:06:39.520,0:06:41.180 and enable them to make decisions for us, 0:06:41.180,0:06:42.860 and so that's what we're doing more and more, 0:06:42.860,0:06:45.210 and we can have an object[br]that decides for us 0:06:45.210,0:06:46.840 what temperature we should set our house, 0:06:46.840,0:06:49.009 what we should be doing, etc. 0:06:49.009,0:06:52.400 So, there might be some implications here. 0:06:52.400,0:06:55.680 We want these systems[br]that do work on this data 0:06:55.680,0:06:58.039 to increase the opportunities[br]available to us. 0:06:58.039,0:07:00.259 But it might be that there are some implications 0:07:00.259,0:07:01.780 that we have not carefully thought through. 0:07:01.780,0:07:03.430 This is a new area, and people are only 0:07:03.430,0:07:05.940 starting to scratch the surface of what the 0:07:05.940,0:07:07.289 problems might be. 0:07:07.289,0:07:09.600 In some cases, they might narrow the options 0:07:09.600,0:07:10.990 available to people, 0:07:10.990,0:07:13.199 and this approach subjects people to 0:07:13.199,0:07:15.620 suggestive messaging intended to nudge them 0:07:15.620,0:07:17.169 to a desired outcome. 0:07:17.169,0:07:19.320 Some people may have a problem with that. 0:07:19.320,0:07:20.650 Values we care about are not gonna be 0:07:20.650,0:07:23.860 baked into these systems by default. 0:07:23.860,0:07:25.960 It's also the case that some algorithmic systems 0:07:25.960,0:07:28.300 facilitate work that we do not like. 0:07:28.300,0:07:30.199 For example, in the case of mass surveillance. 0:07:30.199,0:07:32.130 And even the same systems, 0:07:32.130,0:07:34.039 used by different people or organizations, 0:07:34.039,0:07:36.110 have very different consequences. 0:07:36.110,0:07:37.320 For example, if I can predict 0:07:37.320,0:07:40.020 with high accuracy, based on say search queries, 0:07:40.020,0:07:42.050 who's gonna be admitted to a hospital, 0:07:42.050,0:07:43.750 some people would be interested[br]in knowing that. 0:07:43.750,0:07:46.120 You might be interested[br]in having your doctor know that. 0:07:46.120,0:07:47.919 But that same predictive model[br]in the hands of 0:07:47.919,0:07:50.569 an insurance company[br]has a very different implication. 0:07:50.569,0:07:53.389 So, the point here is that these systems 0:07:53.389,0:07:55.860 structure and influence how humans interact 0:07:55.860,0:07:58.360 with each other, how they interact with society, 0:07:58.360,0:07:59.850 and how they interact with government. 0:07:59.850,0:08:03.080 And if they constrain what people can do, 0:08:03.080,0:08:05.069 we should really care about this. 0:08:05.069,0:08:08.270 So now I'm gonna go to[br]sort of an extreme case, 0:08:08.270,0:08:11.930 just as an example, and that's this[br]Chinese Social Credit System. 0:08:11.930,0:08:14.169 And so this is probably one of the more 0:08:14.169,0:08:17.259 ambitious uses of data, 0:08:17.259,0:08:18.880 that is used to rank each citizen 0:08:18.880,0:08:21.190 based on their behavior, in China. 0:08:21.190,0:08:24.210 So right now, there are various pilot systems 0:08:24.210,0:08:27.660 deployed by various companies doing this in[br]China. 0:08:27.660,0:08:30.729 They're currently voluntary, and by 2020 0:08:30.729,0:08:32.630 this system is gonna be decided on, 0:08:32.630,0:08:34.679 or a combination of the systems, 0:08:34.679,0:08:37.409 that is gonna be mandatory for everyone. 0:08:37.409,0:08:40.950 And so, in this system, there are some citizens, 0:08:40.950,0:08:44.380 and a huge range of data sources are used. 0:08:44.380,0:08:46.820 So, some of the data sources are 0:08:46.820,0:08:48.360 your financial data, 0:08:48.360,0:08:50.020 your criminal history, 0:08:50.020,0:08:52.320 how many points you have[br]on your driver's license, 0:08:52.320,0:08:55.360 medical information-- for example,[br]if you take birth control pills, 0:08:55.360,0:08:56.810 that's incorporated. 0:08:56.810,0:08:59.830 Your purchase history-- for example,[br]if you purchase games, 0:08:59.830,0:09:02.430 you are down-ranked in the system. 0:09:02.430,0:09:04.490 Some of the systems, not all of them, 0:09:04.490,0:09:07.260 incorporate social media monitoring, 0:09:07.260,0:09:09.200 which makes sense if you're a state like China, 0:09:09.200,0:09:11.270 you probably want to know about 0:09:11.270,0:09:14.899 political statements that people[br]are saying on social media. 0:09:14.899,0:09:18.020 And, one of the more interesting parts is 0:09:18.020,0:09:22.160 social network analysis:[br]looking at the relationships between people. 0:09:22.160,0:09:24.270 So, if you have a close relationship with[br]somebody 0:09:24.270,0:09:26.180 and they have a low credit score, 0:09:26.180,0:09:29.130 that can have implications on your credit[br]score. 0:09:29.130,0:09:34.440 So, the way that these scores[br]are generated is secret. 0:09:34.440,0:09:38.140 And, according to the call for these systems 0:09:38.140,0:09:39.270 put out by the government, 0:09:39.270,0:09:42.810 the goal is to[br]"carry forward the sincerity and 0:09:42.810,0:09:45.760 traditional virtues" and[br]establish the idea of a 0:09:45.760,0:09:47.520 "sincerity culture." 0:09:47.520,0:09:49.440 But wait, it gets better: 0:09:49.440,0:09:52.450 so, there's a portal that enables citizens 0:09:52.450,0:09:55.040 to look up the citizen score of anyone. 0:09:55.040,0:09:56.520 And many people like this system, 0:09:56.520,0:09:58.320 they think it's a fun game. 0:09:58.320,0:10:00.700 They boast about it on social media, 0:10:00.700,0:10:03.610 they put their score in their dating profile, 0:10:03.610,0:10:04.760 because if you're ranked highly you're 0:10:04.760,0:10:06.589 part of an exclusive club. 0:10:06.589,0:10:10.060 You can get VIP treatment[br]at hotels and other companies. 0:10:10.060,0:10:11.880 But the downside is that, if you're excluded 0:10:11.880,0:10:15.540 from that club, your weak score[br]may have other implications, 0:10:15.540,0:10:20.120 like being unable to get access[br]to credit, housing, jobs. 0:10:20.120,0:10:23.399 There is some reporting that even travel visas 0:10:23.399,0:10:27.000 might be restricted[br]if your score is particularly low. 0:10:27.000,0:10:31.160 So, a system like this, for a state, is really 0:10:31.160,0:10:34.690 the optimal solution[br]to the problem of the public. 0:10:34.690,0:10:37.130 It constitutes a very subtle and insiduous 0:10:37.130,0:10:39.350 mechanism of social control. 0:10:39.350,0:10:41.209 You don't need to spend a lot of money on 0:10:41.209,0:10:43.800 police or prisons if you can set up a system 0:10:43.800,0:10:45.820 where people discourage one another from 0:10:45.820,0:10:48.930 anti-social acts like political action[br]in exchange for 0:10:48.930,0:10:51.430 a coupon for a free Uber ride. 0:10:51.430,0:10:55.269 So, there are a lot of[br]legitimate questions here: 0:10:55.269,0:10:58.370 What protections does[br]user data have in this scheme? 0:10:58.370,0:11:01.279 Do any safeguards exist to prevent tampering? 0:11:01.279,0:11:04.310 What mechanism, if any, is there to prevent 0:11:04.310,0:11:08.810 false input data from creating erroneous inferences? 0:11:08.810,0:11:10.420 Is there any way that people can fix 0:11:10.420,0:11:12.540 their score once they're ranked poorly? 0:11:12.540,0:11:13.899 Or does it end up becoming a 0:11:13.899,0:11:15.720 self-fulfilling prophecy? 0:11:15.720,0:11:17.850 Your weak score means you have less access 0:11:17.850,0:11:21.620 to jobs and credit, and now you will have 0:11:21.620,0:11:24.709 limited access to opportunity. 0:11:24.709,0:11:27.110 So, let's take a step back. 0:11:27.110,0:11:28.470 So, what do we want? 0:11:28.470,0:11:31.540 So, we probably don't want that, 0:11:31.540,0:11:33.570 but as advocates we really wanna 0:11:33.570,0:11:36.130 understand what questions we should be asking 0:11:36.130,0:11:37.510 of these systems. Right now there's 0:11:37.510,0:11:39.570 very little oversight, 0:11:39.570,0:11:41.420 and we wanna make sure that we don't 0:11:41.420,0:11:44.029 sort of sleepwalk our way to a situation 0:11:44.029,0:11:46.649 where we've lost even more power 0:11:46.649,0:11:49.740 to these centralized systems of control. 0:11:49.740,0:11:52.209 And if you're an implementer, we wanna understand 0:11:52.209,0:11:53.709 what can we be doing better. 0:11:53.709,0:11:56.019 Are there better ways that we can be implementing 0:11:56.019,0:11:57.640 these systems? 0:11:57.640,0:11:59.430 Are there values that, as humans, 0:11:59.430,0:12:01.060 we care about that we should make sure 0:12:01.060,0:12:02.420 these systems have? 0:12:02.420,0:12:05.550 So, the first thing[br]that most people in the room 0:12:05.550,0:12:07.820 might think about is privacy. 0:12:07.820,0:12:10.510 Which is, of course, of the utmost importance. 0:12:10.510,0:12:12.920 We need privacy, and there is a good discussion 0:12:12.920,0:12:15.680 on the importance of protecting[br]user data where possible. 0:12:15.680,0:12:18.420 So, in this talk, I'm gonna focus[br]on the other aspects of 0:12:18.420,0:12:19.470 algorithmic decision making, 0:12:19.470,0:12:21.190 that I think have got less attention. 0:12:21.190,0:12:25.140 Because it's not just privacy[br]that we need to worry about here. 0:12:25.140,0:12:28.519 We also want systems that are fair and equitable. 0:12:28.519,0:12:30.240 We want transparent systems, 0:12:30.240,0:12:35.110 we don't want opaque decisions[br]to be made about us, 0:12:35.110,0:12:36.510 decisions that might have serious impacts 0:12:36.510,0:12:37.779 on our lives. 0:12:37.779,0:12:40.490 And we need some accountability mechanisms. 0:12:40.490,0:12:41.890 So, for the rest of this talk 0:12:41.890,0:12:43.230 we're gonna go through each one of these things 0:12:43.230,0:12:45.230 and look at some examples. 0:12:45.230,0:12:47.709 So, the first thing is fairness. 0:12:47.709,0:12:50.450 And so, as I said in the beginning,[br]this is one area 0:12:50.450,0:12:52.690 where there might be an advantage 0:12:52.690,0:12:55.079 to making decisions by machine, 0:12:55.079,0:12:56.740 especially in areas where there have 0:12:56.740,0:12:59.410 historically been fairness issues with 0:12:59.410,0:13:02.350 decision making, such as law enforcement. 0:13:02.350,0:13:05.839 So, this is one way that police departments 0:13:05.839,0:13:08.360 use predictive models. 0:13:08.360,0:13:10.540 The idea here is police would like to 0:13:10.540,0:13:13.450 allocate resources in a more effective way, 0:13:13.450,0:13:15.050 and they would also like to enable 0:13:15.050,0:13:16.640 proactive policing. 0:13:16.640,0:13:20.110 So, if you can predict where crimes[br]are going to occur, 0:13:20.110,0:13:22.149 or who is going to commit crimes, 0:13:22.149,0:13:24.870 then you can put cops in those places, 0:13:24.870,0:13:27.769 or perhaps following these people, 0:13:27.769,0:13:29.300 and then the crimes will not occur. 0:13:29.300,0:13:31.370 So, it's sort of the pre-crime approach. 0:13:31.370,0:13:34.649 So, there are a few ways of going about this. 0:13:34.649,0:13:37.920 One way is doing this individual-level prediction. 0:13:37.920,0:13:41.089 So you take each citizen[br]and estimate the risk 0:13:41.089,0:13:43.769 that each citizen will participate,[br]say, in violence 0:13:43.769,0:13:45.279 based on some data. 0:13:45.279,0:13:46.779 And then you can flag those people that are 0:13:46.779,0:13:49.199 considered particularly violent. 0:13:49.199,0:13:51.519 So, this is currently done. 0:13:51.519,0:13:52.589 This is done in the U.S. 0:13:52.589,0:13:56.120 It's done in Chicago,[br]by the Chicago Police Department. 0:13:56.120,0:13:58.350 And they maintain a heat list of individuals 0:13:58.350,0:14:00.790 that are considered most likely to commit, 0:14:00.790,0:14:03.529 or be the victim of, violence. 0:14:03.529,0:14:06.700 And this is done using data[br]that the police maintain. 0:14:06.700,0:14:09.589 So, the features that are used[br]in this predictive model 0:14:09.589,0:14:12.209 include things that are derived from 0:14:12.209,0:14:14.610 individuals' criminal history. 0:14:14.610,0:14:16.810 So, for example, have they been involved in 0:14:16.810,0:14:18.350 gun violence in the past? 0:14:18.350,0:14:21.450 Do they have narcotics arrests? And so on. 0:14:21.450,0:14:22.860 But another thing that's incorporated 0:14:22.860,0:14:25.060 in the Chicago Police Department model is 0:14:25.060,0:14:28.300 information derived from[br]social media network analysis. 0:14:28.300,0:14:30.630 So, who you interact with, 0:14:30.630,0:14:32.279 as noted in police data. 0:14:32.279,0:14:34.899 So, for example, your co-arrestees. 0:14:34.899,0:14:36.440 When officers conduct field interviews, 0:14:36.440,0:14:38.240 who are people interacting with? 0:14:38.240,0:14:42.940 And then this is all incorporated[br]into this risk score. 0:14:42.940,0:14:44.639 So another way to proceed, 0:14:44.639,0:14:47.070 which is the method that most companies 0:14:47.070,0:14:49.579 that sell products like this[br]to the police have taken, 0:14:49.579,0:14:51.459 is instead predicting which areas 0:14:51.459,0:14:53.810 are likely to have crimes committed in them. 0:14:53.810,0:14:56.690 So, take my city, I put a grid down, 0:14:56.690,0:14:58.180 and then I use crime statistics 0:14:58.180,0:15:00.430 and maybe some ancillary data sources, 0:15:00.430,0:15:01.790 to determine which areas have 0:15:01.790,0:15:04.709 the highest risk of crimes occurring in them, 0:15:04.709,0:15:06.329 and I can flag those areas and send 0:15:06.329,0:15:08.470 police officers to them. 0:15:08.470,0:15:10.950 So now, let's look at some of the tools 0:15:10.950,0:15:14.010 that are used for this geographic-level prediction. 0:15:14.010,0:15:19.040 So, here are 3 companies that sell these 0:15:19.040,0:15:22.910 geographic-level predictive policing systems. 0:15:22.910,0:15:25.639 So, PredPol has a system that uses 0:15:25.639,0:15:27.200 primarily crime statistics: 0:15:27.200,0:15:30.209 only the time, place, and type of crime 0:15:30.209,0:15:33.040 to predict where crimes will occur. 0:15:33.040,0:15:35.970 HunchLab uses a wider range of data sources 0:15:35.970,0:15:37.260 including, for example, weather 0:15:37.260,0:15:39.720 and then Hitachi is a newer system 0:15:39.720,0:15:42.100 that has a predictive crime analytics tool 0:15:42.100,0:15:44.779 that also incorporates social media. 0:15:44.779,0:15:47.850 The first one, to my knowledge, to do so. 0:15:47.850,0:15:49.399 And these systems are in use 0:15:49.399,0:15:52.820 in 50+ cities in the U.S. 0:15:52.820,0:15:56.540 So, why do police departments buy this? 0:15:56.540,0:15:57.760 Some police departments are interesting in 0:15:57.760,0:16:00.500 buying systems like this, because they're marketed 0:16:00.500,0:16:02.660 as impartial systems, 0:16:02.660,0:16:06.199 so it's a way to police in an unbiased way. 0:16:06.199,0:16:08.040 And so, these companies make 0:16:08.040,0:16:08.670 statements like this-- 0:16:08.670,0:16:10.800 by the way, the references[br]will all be at the end, 0:16:10.800,0:16:12.560 and they'll be on the slides-- 0:16:12.560,0:16:13.370 So, for example 0:16:13.370,0:16:16.110 the predictive crime analytics from Hitachi 0:16:16.110,0:16:17.610 claims that the system is anonymous, 0:16:17.610,0:16:19.350 because it shows you an area, 0:16:19.350,0:16:23.060 it doesn't show you[br]to look for a particular person. 0:16:23.060,0:16:25.699 and PredPol reassures people that 0:16:25.699,0:16:29.560 it eliminates any liberties or profiling concerns. 0:16:29.560,0:16:32.269 And HunchLab notes that the system 0:16:32.269,0:16:35.170 fairly represents priorities for public safety 0:16:35.170,0:16:38.769 and is unbiased by race[br]or ethnicity, for example. 0:16:38.769,0:16:43.529 So, let's take a minute[br]to describe in more detail 0:16:43.529,0:16:48.100 what we mean when we talk about fairness. 0:16:48.100,0:16:51.300 So, when we talk about fairness, 0:16:51.300,0:16:52.740 we mean a few things. 0:16:52.740,0:16:56.070 So, one is fairness with respect to individuals: 0:16:56.070,0:16:58.040 so if I'm very similar to somebody 0:16:58.040,0:17:00.170 and we go through some process 0:17:00.170,0:17:03.430 and there is two very different[br]outcomes to that process 0:17:03.430,0:17:05.679 we would consider that to be unfair. 0:17:05.679,0:17:07.929 So, we want similar people to be treated 0:17:07.929,0:17:09.539 in a similar way. 0:17:09.539,0:17:13.079 But, there are certain protected attributes 0:17:13.079,0:17:15.199 that we wouldn't want someone 0:17:15.199,0:17:17.099 to discriminate based on. 0:17:17.099,0:17:20.069 And so, there's this other property,[br]Group Fairness. 0:17:20.069,0:17:22.249 So, we can look at the statistical parity 0:17:22.249,0:17:25.439 between groups, based on gender, race, etc. 0:17:25.439,0:17:28.049 and see if they're treated in a similar way. 0:17:28.049,0:17:30.409 And we might not expect that in some cases, 0:17:30.409,0:17:32.429 for example if the base rates in each group 0:17:32.429,0:17:34.659 are very different. 0:17:34.659,0:17:36.889 And then there's also Fairness in Errors. 0:17:36.889,0:17:40.080 All predictive systems are gonna make errors, 0:17:40.080,0:17:42.989 and if the errors are concentrated, 0:17:42.989,0:17:46.399 then that may also represent unfairness. 0:17:46.399,0:17:50.149 And so this concern arose recently with Facebook 0:17:50.149,0:17:52.289 because people with Native American names 0:17:52.289,0:17:54.389 had their profiles flagged as fraudulent 0:17:54.389,0:17:58.759 far more often than those[br]with White American names. 0:17:58.759,0:18:00.559 So these are the sorts of things[br]that we worry about 0:18:00.559,0:18:02.190 and each of these are metrics, 0:18:02.190,0:18:04.239 and if you're interested more you should 0:18:04.239,0:18:06.159 check those 2 papers out. 0:18:06.159,0:18:10.639 So, how can potential issues[br]with predictive policing 0:18:10.639,0:18:13.850 have implications for these principles? 0:18:13.850,0:18:18.559 So, one problem is[br]the training data that's used. 0:18:18.559,0:18:21.059 Some of these systems only use crime statistics, 0:18:21.059,0:18:23.600 other systems-- all of them use crime statistics 0:18:23.600,0:18:25.619 in some way. 0:18:25.619,0:18:31.419 So, one problem is that crime databases 0:18:31.419,0:18:34.830 contain only crimes that've been detected. 0:18:34.830,0:18:38.629 Right? So, the police are only gonna detect 0:18:38.629,0:18:41.009 crimes that they know are happening, 0:18:41.009,0:18:44.109 either through patrol and their own investigation 0:18:44.109,0:18:46.320 or because they've been alerted to crime, 0:18:46.320,0:18:48.789 for example by a citizen calling the police. 0:18:48.789,0:18:52.179 So, a citizen has to feel like[br]they can call the police, 0:18:52.179,0:18:54.019 like that's a good idea. 0:18:54.019,0:18:58.789 So, some crimes suffer[br]from this problem less than others: 0:18:58.789,0:19:02.249 for example, gun violence[br]is much easier to detect 0:19:02.249,0:19:03.639 relative to fraud, for example, 0:19:03.639,0:19:07.509 which is very difficult to detect. 0:19:07.509,0:19:11.940 Now the racial profiling aspect[br]of this might come in 0:19:11.940,0:19:15.590 because of biased policing in the past. 0:19:15.590,0:19:19.999 So, for example, for marijuana arrests, 0:19:19.999,0:19:22.619 black people are arrested in the U.S. at rates 0:19:22.619,0:19:25.119 4 times that of white people, 0:19:25.119,0:19:27.960 even though there is statistical parity 0:19:27.960,0:19:31.389 with these 2 groups, to within a few percent. 0:19:31.389,0:19:35.820 So, this is where problems can arise. 0:19:35.820,0:19:37.159 So, let's go back to this 0:19:37.159,0:19:38.749 geographic-level predictive policing. 0:19:38.749,0:19:42.460 So the danger here is that, unless this system 0:19:42.460,0:19:44.299 is very carefully constructed, 0:19:44.299,0:19:47.090 this sort of crime area ranking might 0:19:47.090,0:19:49.019 again become a self-fulling prophecy. 0:19:49.019,0:19:51.460 If you send police officers to these areas, 0:19:51.460,0:19:53.220 you further scrutinize them, 0:19:53.220,0:19:55.659 and then again you're only detecting a subset 0:19:55.659,0:19:57.979 of crimes, and the cycle continues. 0:19:57.979,0:20:02.139 So, one obvious issue is that 0:20:02.139,0:20:07.599 this statement about geographic-based[br]crime prediction 0:20:07.599,0:20:10.229 being anonymous is not true, 0:20:10.229,0:20:13.159 because race and location are very strongly 0:20:13.159,0:20:14.840 correlated in the U.S. 0:20:14.840,0:20:16.609 And this is something that machine-learning[br]systems 0:20:16.609,0:20:20.049 can potentially learn. 0:20:20.049,0:20:23.039 Another issue is that, for example, 0:20:23.039,0:20:25.580 for individual fairness, one of my homes 0:20:25.580,0:20:27.599 sits within one of these boxes. 0:20:27.599,0:20:29.950 Some of these boxes[br]in these systems are very small, 0:20:29.950,0:20:33.399 for example PredPol is 500ft x 500ft, 0:20:33.399,0:20:36.349 so it's maybe only a few houses. 0:20:36.349,0:20:39.149 So, the implications of this system are that 0:20:39.149,0:20:40.849 you have police officers maybe sitting 0:20:40.849,0:20:42.979 in a police cruiser outside your home 0:20:42.979,0:20:45.450 and a few doors down someone 0:20:45.450,0:20:46.799 may not be within that box, 0:20:46.799,0:20:48.159 and doesn't have this. 0:20:48.159,0:20:51.399 So, that may represent unfairness. 0:20:51.399,0:20:54.929 So, there are real questions here, 0:20:54.929,0:20:57.720 especially because there's no opt-out. 0:20:57.720,0:21:00.059 There's no way to opt-out of this system: 0:21:00.059,0:21:02.239 if you live in a city that has this, 0:21:02.239,0:21:04.909 then you have to deal with it. 0:21:04.909,0:21:07.229 So, it's quite difficult to find out 0:21:07.229,0:21:09.879 what's really going on 0:21:09.879,0:21:11.169 because the algorithm is secret. 0:21:11.169,0:21:13.049 And, in most cases, we don't know 0:21:13.049,0:21:14.789 the full details of the inputs. 0:21:14.789,0:21:16.679 We have some idea[br]about what features are used, 0:21:16.679,0:21:17.970 but that's about it. 0:21:17.970,0:21:19.509 We also don't know the output. 0:21:19.509,0:21:21.899 That would be knowing police allocation, 0:21:21.899,0:21:23.179 police strategies, 0:21:23.179,0:21:26.299 and in order to nail down[br]what's really going on here 0:21:26.299,0:21:28.609 in order to verify the validity of 0:21:28.609,0:21:30.009 these companies' claims, 0:21:30.009,0:21:33.799 it may be necessary[br]to have a 3rd party come in, 0:21:33.799,0:21:35.629 examine the inputs and outputs of the system, 0:21:35.629,0:21:37.590 and say concretely what's going on. 0:21:37.590,0:21:39.460 And if everything is fine and dandy 0:21:39.460,0:21:40.929 then this shouldn't be a problem. 0:21:40.929,0:21:43.619 So, that's potentially one role that 0:21:43.619,0:21:44.769 advocates can play. 0:21:44.769,0:21:46.720 Maybe we should start pushing for audits 0:21:46.720,0:21:48.820 of systems that are used in this way. 0:21:48.820,0:21:50.970 These could have serious implications 0:21:50.970,0:21:52.679 for peoples' lives. 0:21:52.679,0:21:55.249 So, we'll return[br]to this idea a little bit later, 0:21:55.249,0:21:58.210 but for now this leads us[br]nicely to Transparency. 0:21:58.210,0:21:59.419 So, we wanna know 0:21:59.419,0:22:01.929 what these systems are doing. 0:22:01.929,0:22:04.729 But it's very hard,[br]for the reasons described earlier, 0:22:04.729,0:22:06.139 but even in the case of something like 0:22:06.139,0:22:09.849 trying to understand Google's search algorithm, 0:22:09.849,0:22:11.679 it's difficult because it's personalized. 0:22:11.679,0:22:13.529 So, by construction, each user is 0:22:13.529,0:22:15.320 only seeing one endpoint. 0:22:15.320,0:22:18.169 So, it's a very isolating system. 0:22:18.169,0:22:20.349 What do other people see? 0:22:20.349,0:22:22.409 And one reason it's difficult to make 0:22:22.409,0:22:24.099 some of these systems transparent 0:22:24.099,0:22:26.679 is because of, simply, the complexity 0:22:26.679,0:22:27.950 of the algorithms. 0:22:27.950,0:22:30.309 So, an algorithm can become so complex that 0:22:30.309,0:22:31.669 it's difficult to comprehend, 0:22:31.669,0:22:33.289 even for the designer of the system, 0:22:33.289,0:22:35.509 or the implementer of the system. 0:22:35.509,0:22:38.419 The designed might know that this algorithm 0:22:38.419,0:22:42.889 maximizes some metric-- say, accuracy, 0:22:42.889,0:22:44.570 but they may not always have a solid 0:22:44.570,0:22:46.779 understanding of what the algorithm is doing 0:22:46.779,0:22:48.330 for all inputs. 0:22:48.330,0:22:50.970 Certainly with respect to fairness. 0:22:50.970,0:22:55.759 So, in some cases,[br]it might not be appropriate to use 0:22:55.759,0:22:57.379 an extremely complex model. 0:22:57.379,0:22:59.529 It might be better to use a simpler system 0:22:59.529,0:23:02.910 with human-interpretable features. 0:23:02.910,0:23:04.749 Another issue that arises 0:23:04.749,0:23:07.559 from the opacity of these systems 0:23:07.559,0:23:09.409 and the centralized control 0:23:09.409,0:23:11.860 is that it makes them very influential. 0:23:11.860,0:23:13.950 And thus, an excellent target 0:23:13.950,0:23:16.210 for manipulation or tampering. 0:23:16.210,0:23:18.479 So, this might be tampering that is done 0:23:18.479,0:23:21.950 from an organization that controls the system, 0:23:21.950,0:23:23.769 or an insider at one of the organizations, 0:23:23.769,0:23:27.139 or anyone who's able to compromise their security. 0:23:27.139,0:23:30.249 So, this is an interesting academic work 0:23:30.249,0:23:32.099 that looked at the possibility of 0:23:32.099,0:23:34.159 slightly modifying search rankings 0:23:34.159,0:23:36.619 to shift people's political views. 0:23:36.619,0:23:39.009 So, since people are most likely to 0:23:39.009,0:23:41.330 click on the top search results, 0:23:41.330,0:23:44.429 so 90% of clicks go to the[br]first page of search results, 0:23:44.429,0:23:46.719 then perhaps by reshuffling[br]things a little bit, 0:23:46.719,0:23:48.729 or maybe dropping some search results, 0:23:48.729,0:23:50.269 you can influence people's views 0:23:50.269,0:23:51.679 in a coherent way, 0:23:51.679,0:23:53.090 and maybe you can make it so subtle 0:23:53.090,0:23:55.749 that no one is able to notice. 0:23:55.749,0:23:57.249 So in this academic study, 0:23:57.249,0:24:00.349 they did an experiment 0:24:00.349,0:24:02.070 in the 2014 Indian election. 0:24:02.070,0:24:04.219 So they used real voters, 0:24:04.219,0:24:06.450 and they kept the size[br]of the experiment small enough 0:24:06.450,0:24:08.190 that it was not going to influence the outcome 0:24:08.190,0:24:10.090 of the election. 0:24:10.090,0:24:12.139 So the researchers took people, 0:24:12.139,0:24:14.229 they determined their political leaning, 0:24:14.229,0:24:17.429 and they segmented them into[br]control and treatment groups, 0:24:17.429,0:24:19.269 where the treatment was manipulation 0:24:19.269,0:24:21.210 of the search ranking results, 0:24:21.210,0:24:24.409 And then they had these people[br]browse the web. 0:24:24.409,0:24:25.969 And what they found, is that 0:24:25.969,0:24:28.229 this mechanism is very effective at shifting 0:24:28.229,0:24:30.429 people's voter preferences. 0:24:30.429,0:24:33.649 So, in this study, they were able to introduce 0:24:33.649,0:24:36.849 a 20% shift in voter preferences. 0:24:36.849,0:24:39.299 Even alerting users to the fact that this 0:24:39.299,0:24:41.729 was going to be done, telling them 0:24:41.729,0:24:44.049 "we are going to manipulate your search results," 0:24:44.049,0:24:45.729 "really pay attention," 0:24:45.729,0:24:49.099 they were totally unable to decrease 0:24:49.099,0:24:50.859 the magnitude of the effect. 0:24:50.859,0:24:55.109 So, the margins of error in many elections 0:24:55.109,0:24:57.669 is incredibly small, 0:24:57.669,0:24:59.929 and the authors estimate that this shift 0:24:59.929,0:25:02.009 could change the outcome of about 0:25:02.009,0:25:07.109 25% of elections worldwide, if this were done. 0:25:07.109,0:25:10.919 And the bias is so small that no one can tell. 0:25:10.919,0:25:14.279 So, all humans, no matter how smart 0:25:14.279,0:25:17.109 and resistant to manipulation[br]we think we are, 0:25:17.109,0:25:21.909 all of us are subject to this sort of manipulation, 0:25:21.909,0:25:24.320 and we really can't tell. 0:25:24.320,0:25:27.129 So, I'm not saying that this is occurring, 0:25:27.129,0:25:31.389 but right now there is no[br]regulation to stop this, 0:25:31.389,0:25:34.409 there is no way we could reliably detect this, 0:25:34.409,0:25:37.210 so there's a huge amount of power here. 0:25:37.210,0:25:39.779 So, something to think about. 0:25:39.779,0:25:42.710 But it's not only corporations that are interested 0:25:42.710,0:25:47.269 in this sort of behavioral manipulation. 0:25:47.269,0:25:51.119 In 2010, UK Prime Minister David Cameron 0:25:51.119,0:25:54.969 created this UK Behavioural Insights Team, 0:25:54.969,0:25:57.269 which is informally called the Nudge Unit. 0:25:57.269,0:26:01.489 And so what they do is[br]they use behavioral science 0:26:01.489,0:26:04.769 and this predictive analytics approach, 0:26:04.769,0:26:06.119 with experimentation, 0:26:06.119,0:26:07.940 to have people make better decisions 0:26:07.940,0:26:09.690 for themselves and society-- 0:26:09.690,0:26:11.989 as determined by the UK government. 0:26:11.989,0:26:14.269 And as of a few months ago, 0:26:14.269,0:26:16.849 after an executive order signed by Obama 0:26:16.849,0:26:19.349 in September, the United States now has 0:26:19.349,0:26:21.429 its own Nudge Unit. 0:26:21.429,0:26:24.009 So, to be clear, I don't think that this is 0:26:24.009,0:26:25.539 some sort of malicious plot. 0:26:25.539,0:26:27.440 I think that there can be huge value 0:26:27.440,0:26:29.489 in these sorts of initiatives, 0:26:29.489,0:26:31.330 positively impacting people's lives, 0:26:31.330,0:26:34.179 but when this sort of behavioral manipulation 0:26:34.179,0:26:37.289 is being done, in part openly, 0:26:37.289,0:26:39.460 oversight is pretty important, 0:26:39.460,0:26:41.700 and we really need to consider 0:26:41.700,0:26:46.090 what these systems are optimizing for. 0:26:46.090,0:26:47.849 And that's something that we might 0:26:47.849,0:26:52.090 not always know, or at least understand, 0:26:52.090,0:26:54.450 so for example, for industry, 0:26:54.450,0:26:57.679 we do have a pretty good understanding there: 0:26:57.679,0:26:59.809 industry cares about optimizing for 0:26:59.809,0:27:01.960 the time spent on the website, 0:27:01.960,0:27:04.929 Facebook wants you to spend more time on Facebook, 0:27:04.929,0:27:06.950 they want you to click on ads, 0:27:06.950,0:27:09.109 click on newsfeed items, 0:27:09.109,0:27:11.299 they want you to like things. 0:27:11.299,0:27:14.309 And, fundamentally: profit. 0:27:14.309,0:27:17.599 So, already this has some serious implications, 0:27:17.599,0:27:19.690 and this had pretty serious implications 0:27:19.690,0:27:22.190 in the last 10 years, in media for example. 0:27:22.190,0:27:25.119 The optimizing for click-through rate in journalism 0:27:25.119,0:27:26.629 has produced a race to the bottom 0:27:26.629,0:27:28.039 in terms of quality. 0:27:28.039,0:27:30.919 And another issue is that optimizing 0:27:30.919,0:27:34.589 for what people like might not always be 0:27:34.589,0:27:35.839 the best approach. 0:27:35.839,0:27:38.859 So, Facebook officials have said publicly 0:27:38.859,0:27:41.279 about how Facebook's goal is to make you happy, 0:27:41.279,0:27:43.149 they want you to open that newsfeed 0:27:43.149,0:27:45.080 and just feel great. 0:27:45.080,0:27:47.379 But, there's an issue there, right? 0:27:47.379,0:27:50.169 Because people get their news, 0:27:50.169,0:27:52.369 like 40% of people according to Pew Research, 0:27:52.369,0:27:54.599 get their news from Facebook. 0:27:54.599,0:27:58.460 So, if people don't want to see 0:27:58.460,0:28:01.239 war and corpses,[br]because it makes them feel sad, 0:28:01.239,0:28:04.179 so this is not a system that is gonna optimize 0:28:04.179,0:28:07.149 for an informed population. 0:28:07.149,0:28:09.359 It's not gonna produce a population that is 0:28:09.359,0:28:11.469 ready to engage in civic life. 0:28:11.469,0:28:13.059 It's gonna produce an amused populations 0:28:13.059,0:28:16.809 whose time is occupied by cat pictures. 0:28:16.809,0:28:19.159 So, in politics, we have a similar 0:28:19.159,0:28:21.269 optimization problem that's occurring. 0:28:21.269,0:28:23.769 So, these political campaigns that use 0:28:23.769,0:28:26.769 these predictive systems, 0:28:26.769,0:28:28.669 are optimizing for votes for the desired candidate, 0:28:28.669,0:28:30.200 of course. 0:28:30.200,0:28:33.499 So, instead of a political campaign being 0:28:33.499,0:28:36.139 --well, maybe this is a naive view, but-- 0:28:36.139,0:28:38.070 being an open discussion of the issues 0:28:38.070,0:28:39.830 facing the country, 0:28:39.830,0:28:43.200 it becomes this micro-targeted[br]persuasion game, 0:28:43.200,0:28:44.669 and the people that get targeted 0:28:44.669,0:28:47.349 are a very small subset of all people, 0:28:47.349,0:28:49.399 and it's only gonna be people that are 0:28:49.399,0:28:51.409 you know, on the edge, maybe disinterested, 0:28:51.409,0:28:54.399 those are the people that are gonna get attention 0:28:54.399,0:28:58.839 from political candidates. 0:28:58.839,0:29:01.869 In policy, as with these Nudge Units, 0:29:01.869,0:29:03.539 they're being used to enable 0:29:03.539,0:29:06.109 better use of government services. 0:29:06.109,0:29:07.419 There are some good projects that have 0:29:07.419,0:29:09.419 come out of this: 0:29:09.419,0:29:11.409 increasing voter registration, 0:29:11.409,0:29:12.739 improving health outcomes, 0:29:12.739,0:29:14.419 improving education outcomes. 0:29:14.419,0:29:16.419 But some of these predictive systems 0:29:16.419,0:29:18.229 that we're starting to see in government 0:29:18.229,0:29:20.700 are optimizing for compliance, 0:29:20.700,0:29:23.669 as is the case with predictive policing. 0:29:23.669,0:29:25.460 So this is something that we need to 0:29:25.460,0:29:28.649 watch carefully. 0:29:28.649,0:29:30.119 I think this is a nice quote that 0:29:30.119,0:29:33.339 sort of describes the problem. 0:29:33.339,0:29:35.200 In some ways me might be narrowing 0:29:35.200,0:29:38.259 our horizon, and the danger is that 0:29:38.259,0:29:41.989 these tools are separating people. 0:29:41.989,0:29:43.570 And this is particularly bad 0:29:43.570,0:29:45.940 for political action, because political action 0:29:45.940,0:29:49.879 requires people to have shared experience, 0:29:49.879,0:29:53.799 and thus are able to collectively act 0:29:53.799,0:29:57.629 to exert pressure to fix problems. 0:29:57.629,0:30:00.810 So, finally: accountability. 0:30:00.810,0:30:03.399 So, we need some oversight mechanisms. 0:30:03.399,0:30:06.519 For example, in the case of errors-- 0:30:06.519,0:30:08.219 so this is particularly important for 0:30:08.219,0:30:10.849 civil or bureaucratic systems. 0:30:10.849,0:30:14.330 So, when an algorithm produces some decision, 0:30:14.330,0:30:16.549 we don't always want humans to just 0:30:16.549,0:30:18.039 defer to the machine, 0:30:18.039,0:30:21.859 and that might represent one of the problems. 0:30:21.859,0:30:25.419 So, there are starting to be some cases 0:30:25.419,0:30:28.039 of computer algorithms yielding a decision, 0:30:28.039,0:30:30.409 and then humans being unable to correct 0:30:30.409,0:30:31.799 an obvious error. 0:30:31.799,0:30:35.190 So there's this case in Georgia,[br]in the United States, 0:30:35.190,0:30:37.259 where 2 young people went to 0:30:37.259,0:30:38.529 the Department of Motor Vehicles, 0:30:38.529,0:30:39.749 they're twins, and they went 0:30:39.749,0:30:42.099 to get their driver's license. 0:30:42.099,0:30:44.979 However, they were both flagged by 0:30:44.979,0:30:47.489 a fraud algorithm that uses facial recognition 0:30:47.489,0:30:48.809 to look for similar faces, 0:30:48.809,0:30:50.919 and I guess the people that designed the system 0:30:50.919,0:30:54.549 didn't think of the possibility of twins. 0:30:54.549,0:30:58.489 Yeah.[br]So, they just left 0:30:58.489,0:30:59.889 without their driver's licenses. 0:30:59.889,0:31:01.889 The people in the Department of Motor Vehicles 0:31:01.889,0:31:03.809 were unable to correct this. 0:31:03.809,0:31:06.820 So, this is one implication-- 0:31:06.820,0:31:08.579 it's like something out of Kafka. 0:31:08.579,0:31:11.529 But there are also cases of errors being made, 0:31:11.529,0:31:13.879 and people not noticing until 0:31:13.879,0:31:15.909 after actions have been taken, 0:31:15.909,0:31:17.570 some of them very serious-- 0:31:17.570,0:31:19.129 because people simply deferred 0:31:19.129,0:31:20.619 to the machine. 0:31:20.619,0:31:23.309 So, this is an example from San Francisco. 0:31:23.309,0:31:26.679 So, an ALPR-- an Automated License Plate Reader-- 0:31:26.679,0:31:29.429 is a device that uses image recognition 0:31:29.429,0:31:32.099 to detect and read license plates, 0:31:32.099,0:31:34.339 and usually to compare license plates 0:31:34.339,0:31:37.159 with a known list of plates of interest. 0:31:37.159,0:31:39.799 And, so, San Francisco uses these 0:31:39.799,0:31:42.179 and they're mounted on police cars. 0:31:42.179,0:31:46.659 So, in this case, San Francisco ALPR 0:31:46.659,0:31:48.879 got a hit on a car, 0:31:48.879,0:31:53.029 and it was the car of a 47-year-old woman, 0:31:53.029,0:31:54.839 with no criminal history. 0:31:54.839,0:31:56.029 And so it was a false hit 0:31:56.029,0:31:58.099 because it was a blurry image, 0:31:58.099,0:31:59.709 and it matched erroneously with 0:31:59.709,0:32:00.909 one of the plates of interest 0:32:00.909,0:32:03.479 that happened to be a stolen vehicle. 0:32:03.479,0:32:06.869 So, they conducted a traffic stop on her, 0:32:06.869,0:32:09.330 and they take her out of the vehicle, 0:32:09.330,0:32:11.049 they search her and the vehicle, 0:32:11.049,0:32:12.659 she gets a pat-down, 0:32:12.659,0:32:14.849 and they have her kneel 0:32:14.849,0:32:17.780 at gunpoint, in the street. 0:32:17.780,0:32:20.989 So, how much oversight should be present 0:32:20.989,0:32:23.999 depends on the implications of the system. 0:32:23.999,0:32:25.279 It's certainly the case that 0:32:25.279,0:32:26.910 for some of these decision-making systems, 0:32:26.910,0:32:29.219 an error might not be that important, 0:32:29.219,0:32:31.149 it could be relatively harmless, 0:32:31.149,0:32:33.559 but in this case,[br]an error in this algorithmic decision 0:32:33.559,0:32:36.259 led to this totally innocent person 0:32:36.259,0:32:40.019 literally having a gun pointed at her. 0:32:40.019,0:32:44.019 So, that brings us to: we need some way of 0:32:44.019,0:32:45.419 getting some information about 0:32:45.419,0:32:47.249 what is going on here. 0:32:47.249,0:32:50.179 We don't wanna have to wait for these events 0:32:50.179,0:32:52.580 before we are able to determine 0:32:52.580,0:32:54.409 some information about the system. 0:32:54.409,0:32:56.139 So, auditing is one option: 0:32:56.139,0:32:58.109 to independently verify the statements 0:32:58.109,0:33:00.809 of companies, in situations where we have 0:33:00.809,0:33:02.939 inputs and outputs. 0:33:02.939,0:33:05.200 So, for example, this could be done with 0:33:05.200,0:33:07.489 Google, Facebook. 0:33:07.489,0:33:09.190 If you have the inputs of a system, 0:33:09.190,0:33:10.649 say you have test accounts, 0:33:10.649,0:33:11.729 or real accounts, 0:33:11.729,0:33:14.359 maybe you can collect[br]people's information together. 0:33:14.359,0:33:15.830 So that was something that was done 0:33:15.830,0:33:18.759 during the 2012 Obama campaign 0:33:18.759,0:33:20.249 by ProPublica. 0:33:20.249,0:33:21.269 People noticed that they were getting 0:33:21.269,0:33:24.739 different emails from the Obama campaign, 0:33:24.739,0:33:26.009 and were interested to see 0:33:26.009,0:33:28.209 based on what factors 0:33:28.209,0:33:29.749 the emails were changing. 0:33:29.749,0:33:32.659 So, I think about 200 people submitted emails 0:33:32.659,0:33:34.940 and they were able to determine some information 0:33:34.940,0:33:38.809 about what the emails[br]were being varied based on. 0:33:38.809,0:33:40.859 So there have been some successful 0:33:40.859,0:33:43.080 attempts at this. 0:33:43.080,0:33:45.919 So, compare inputs and then look at 0:33:45.919,0:33:48.709 why one item was shown to one user 0:33:48.709,0:33:50.289 and not another, and see if there's 0:33:50.289,0:33:51.879 any statistical differences. 0:33:51.879,0:33:56.279 So, there's some potential legal issues 0:33:56.279,0:33:57.749 with the test accounts, so that's something 0:33:57.749,0:34:01.499 to think about-- I'm not a lawyer. 0:34:01.499,0:34:03.919 So, for example, if you wanna examine 0:34:03.919,0:34:06.269 ad-targeting algorithms, 0:34:06.269,0:34:07.969 one way to proceed is to construct 0:34:07.969,0:34:10.589 a browsing profile, and then examine 0:34:10.589,0:34:12.989 what ads are served back to you. 0:34:12.989,0:34:14.119 And so this is something that 0:34:14.119,0:34:16.250 academic researchers have looked at, 0:34:16.250,0:34:17.489 because, at the time at least, 0:34:17.489,0:34:20.879 you didn't need to make an account to do this. 0:34:20.879,0:34:24.768 So, this was a study that was presented at 0:34:24.768,0:34:27.799 Privacy Enhancing Technologies last year, 0:34:27.799,0:34:31.149 and in this study, the researchers 0:34:31.149,0:34:33.179 generate some browsing profiles 0:34:33.179,0:34:35.909 that differ only by one characteristic, 0:34:35.909,0:34:37.690 so they're basically identical in every way 0:34:37.690,0:34:39.049 except for one thing. 0:34:39.049,0:34:42.359 And that is denoted by Treatment 1 and 2. 0:34:42.359,0:34:44.460 So this is a randomized, controlled trial, 0:34:44.460,0:34:46.389 but I left out the randomization part 0:34:46.389,0:34:48.220 for simplicity. 0:34:48.220,0:34:54.799 So, in one study,[br]they applied a treatment of gender. 0:34:54.799,0:34:56.799 So, they had the browsing profiles 0:34:56.799,0:34:59.319 in Treatment 1 be male browsing profiles, 0:34:59.319,0:35:02.029 and the browsing profiles in Treatment 2[br]be female. 0:35:02.029,0:35:04.430 And they wanted to see: is there any difference 0:35:04.430,0:35:06.079 in the way that ads are targeted 0:35:06.079,0:35:08.710 if browsing profiles are effectively identical 0:35:08.710,0:35:11.019 except for gender? 0:35:11.019,0:35:14.710 So, it turns out that there was. 0:35:14.710,0:35:19.180 So, a 3rd-party site was showing Google ads 0:35:19.180,0:35:21.289 for senior executive positions 0:35:21.289,0:35:23.980 at a rate 6 times higher to the fake men 0:35:23.980,0:35:27.059 than for the fake women in this study. 0:35:27.059,0:35:30.109 So, this sort of auditing is not going to 0:35:30.109,0:35:32.779 be able to determine everything 0:35:32.779,0:35:34.930 that algorithms are doing, but they can 0:35:34.930,0:35:36.519 sometimes uncover interesting, 0:35:36.519,0:35:40.900 at least statistical differences. 0:35:40.900,0:35:47.099 So, this leads us to the fundamental issue: 0:35:47.099,0:35:49.180 Right now, we're really not in control 0:35:49.180,0:35:50.510 of some of these systems, 0:35:50.510,0:35:54.480 and we really need these predictive systems 0:35:54.480,0:35:56.119 to be controlled by us, 0:35:56.119,0:35:57.819 in order for them not to be used 0:35:57.819,0:36:00.109 as a system of control. 0:36:00.109,0:36:03.220 So there are some technologies that I'd like 0:36:03.220,0:36:06.890 to point you all to. 0:36:06.890,0:36:08.319 We need tools in the digital commons 0:36:08.319,0:36:11.160 that can help address some of these concerns. 0:36:11.160,0:36:13.349 So, the first thing is that of course 0:36:13.349,0:36:14.730 we known that minimizing the amount of 0:36:14.730,0:36:17.069 data available can help in some contexts, 0:36:17.069,0:36:18.980 which we can do by making systems 0:36:18.980,0:36:22.779 that are private by design, and by default. 0:36:22.779,0:36:24.549 Another thing is that these audit tools 0:36:24.549,0:36:25.890 might be useful. 0:36:25.890,0:36:30.720 And, so, these 2 nice examples in academia... 0:36:30.720,0:36:34.359 the ad experiment that I just showed was done 0:36:34.359,0:36:36.120 using AdFisher. 0:36:36.120,0:36:38.200 So, these are 2 toolkits that you can use 0:36:38.200,0:36:41.440 to start doing this sort of auditing. 0:36:41.440,0:36:44.579 Another technology that is generally useful, 0:36:44.579,0:36:46.700 but particularly in the case of prediction 0:36:46.700,0:36:48.789 it's useful to maintain access to 0:36:48.789,0:36:50.289 as many sites as possible, 0:36:50.289,0:36:52.589 through anonymity systems like Tor, 0:36:52.589,0:36:54.319 because it's impossible to personalize 0:36:54.319,0:36:55.650 when everyone looks the same. 0:36:55.650,0:36:59.130 So this is a very important technology. 0:36:59.130,0:37:01.519 Something that doesn't really exist, 0:37:01.519,0:37:03.630 but that I think is pretty important, 0:37:03.630,0:37:05.829 is having some tool to view the landscape. 0:37:05.829,0:37:08.160 So, as we know from these few studies 0:37:08.160,0:37:10.440 that have been done, 0:37:10.440,0:37:12.059 different people are not seeing the internet 0:37:12.059,0:37:12.950 in the same way. 0:37:12.950,0:37:15.730 This is one reason why we don't like censorship. 0:37:15.730,0:37:17.880 But, rich and poor people, 0:37:17.880,0:37:19.659 from academic research we know that 0:37:19.659,0:37:23.790 there is widespread price discrimination[br]on the internet, 0:37:23.790,0:37:25.650 so rich and poor people see a different view 0:37:25.650,0:37:26.970 of the Internet, 0:37:26.970,0:37:28.400 men and women see a different view 0:37:28.400,0:37:29.940 of the Internet. 0:37:29.940,0:37:31.200 We wanna know how different people 0:37:31.200,0:37:32.450 see the same site, 0:37:32.450,0:37:34.329 and this could be the beginning of 0:37:34.329,0:37:36.329 a defense system for this sort of 0:37:36.329,0:37:41.730 manipulation/tampering that I showed earlier. 0:37:41.730,0:37:45.549 Another interesting approach is obfuscation: 0:37:45.549,0:37:46.980 injecting noise into the system. 0:37:46.980,0:37:49.190 So there's an interesting browser extension 0:37:49.190,0:37:51.720 called Adnauseum, that's for Firefox, 0:37:51.720,0:37:54.579 which clicks on every single ad you're served, 0:37:54.579,0:37:55.680 to inject noise. 0:37:55.680,0:37:57.019 So that's, I think, an interesting approach 0:37:57.019,0:38:00.170 that people haven't looked at too much. 0:38:00.170,0:38:03.780 So in terms of policy, 0:38:03.780,0:38:06.530 Facebook and Google, these internet giants, 0:38:06.530,0:38:08.829 have billions of users, 0:38:08.829,0:38:12.220 and sometimes they like to call themselves 0:38:12.220,0:38:13.769 new public utilities, 0:38:13.769,0:38:15.000 and if that's the case then 0:38:15.000,0:38:17.549 it might be necessary to subject them 0:38:17.549,0:38:20.539 to additional regulation. 0:38:20.539,0:38:21.990 Another problem that's come up, 0:38:21.990,0:38:23.539 for example with some of the studies 0:38:23.539,0:38:24.900 that Facebook has done, 0:38:24.900,0:38:29.039 is sometimes a lack of ethics review. 0:38:29.039,0:38:31.059 So, for example, in academia, 0:38:31.059,0:38:33.859 if you're gonna do research involving humans, 0:38:33.859,0:38:35.390 there's an Institutional Review Board 0:38:35.390,0:38:36.970 that you go to that verifies that 0:38:36.970,0:38:39.140 you're doing things in an ethical manner. 0:38:39.140,0:38:40.910 And some companies do have internal 0:38:40.910,0:38:43.029 review processes like this, but it might 0:38:43.029,0:38:45.119 be important to have an independent 0:38:45.119,0:38:48.200 ethics board that does this sort of thing. 0:38:48.200,0:38:50.849 And we really need 3rd-party auditing. 0:38:50.849,0:38:54.519 So, for example, some companies 0:38:54.519,0:38:56.220 don't want auditing to be done 0:38:56.220,0:38:59.190 because of IP concerns, 0:38:59.190,0:39:00.579 and if that's the concern 0:39:00.579,0:39:03.180 maybe having a set of people 0:39:03.180,0:39:05.680 that are not paid by the company 0:39:05.680,0:39:07.200 to check how some of these systems 0:39:07.200,0:39:08.640 are being implemented, 0:39:08.640,0:39:11.240 could help give us confidence that 0:39:11.240,0:39:16.979 things are being done in a reasonable way. 0:39:16.979,0:39:20.269 So, in closing, 0:39:20.269,0:39:23.180 algorithmic decision making is here, 0:39:23.180,0:39:26.140 and it's barreling forward[br]at a very fast rate, 0:39:26.140,0:39:27.890 and we need to figure out what 0:39:27.890,0:39:30.410 the guide rails should be, 0:39:30.410,0:39:31.380 and how to install them 0:39:31.380,0:39:33.119 to handle some of the potential threats. 0:39:33.119,0:39:35.470 There's a huge amount of power here. 0:39:35.470,0:39:37.910 We need more openness in these systems. 0:39:37.910,0:39:39.589 And, right now, 0:39:39.589,0:39:41.559 with the intelligent systems that do exist, 0:39:41.559,0:39:43.920 we don't know what's occurring really, 0:39:43.920,0:39:46.510 and we need to watch carefully 0:39:46.510,0:39:49.099 where and how these systems are being used. 0:39:49.099,0:39:50.690 And I think this community has 0:39:50.690,0:39:53.940 an important role to play in this fight, 0:39:53.940,0:39:55.730 to study what's being done, 0:39:55.730,0:39:57.160 to show people what's being done, 0:39:57.160,0:39:58.670 to raise the debate and advocate, 0:39:58.670,0:40:01.200 and, where necessary, to resist. 0:40:01.200,0:40:03.339 Thanks. 0:40:03.339,0:40:13.129 applause 0:40:13.129,0:40:17.519 Herald: So, let's have a question and answer. 0:40:17.519,0:40:19.080 Microphone 2, please. 0:40:19.080,0:40:20.199 Mic 2: Hi there. 0:40:20.199,0:40:23.259 Thanks for the talk. 0:40:23.259,0:40:26.230 Since these pre-crime softwares also 0:40:26.230,0:40:27.359 arrived here in Germany 0:40:27.359,0:40:29.680 with the start of the so-called CopWatch system 0:40:29.680,0:40:32.779 in southern Germany,[br]and Bavaria and Nuremberg especially, 0:40:32.779,0:40:35.420 where they try to predict burglary crime 0:40:35.420,0:40:37.460 using that criminal record 0:40:37.460,0:40:40.170 geographical analysis, like you explained, 0:40:40.170,0:40:43.380 leads me to a 2-fold question: 0:40:43.380,0:40:47.900 first, have you heard of any research 0:40:47.900,0:40:49.760 that measures the effectiveness 0:40:49.760,0:40:53.690 of such measures, at all? 0:40:53.690,0:40:57.040 And, second: 0:40:57.040,0:41:00.599 What do you think of the game theory 0:41:00.599,0:41:02.690 if the thieves or the bad guys 0:41:02.690,0:41:07.619 know the system, and when they[br]game the system, 0:41:07.619,0:41:09.980 they will probably win, 0:41:09.980,0:41:11.640 since one police officer in an interview said 0:41:11.640,0:41:14.019 this system is used to reduce 0:41:14.019,0:41:16.460 the personal costs of policing, 0:41:16.460,0:41:19.460 so they just send the guys[br]where the red flags are, 0:41:19.460,0:41:22.290 and the others take the day off. 0:41:22.290,0:41:24.360 Dr. Helsby: Yup. 0:41:24.360,0:41:27.150 Um, so, with respect to 0:41:27.150,0:41:30.990 testing the effectiveness of predictive policing, 0:41:30.990,0:41:31.990 the companies, 0:41:31.990,0:41:33.910 some of them do randomized, controlled trials 0:41:33.910,0:41:35.240 and claim a reduction in policing. 0:41:35.240,0:41:38.349 The best independent study that I've seen 0:41:38.349,0:41:40.680 is by this RAND Corporation 0:41:40.680,0:41:43.120 that did a study in, I think, 0:41:43.120,0:41:44.920 Shreveport, Louisiana, 0:41:44.920,0:41:47.589 and in their report they claim 0:41:47.589,0:41:50.190 that there was no statistically significant 0:41:50.190,0:41:52.900 difference, they didn't find any reduction. 0:41:52.900,0:41:54.099 And it was specifically looking at 0:41:54.099,0:41:56.730 property crime, which I think you mentioned. 0:41:56.730,0:41:59.480 So, I think right now there's sort of 0:41:59.480,0:42:01.069 conflicting reports between 0:42:01.069,0:42:06.180 the independent auditors[br]and these company claims. 0:42:06.180,0:42:09.289 So there definitely needs to be more study. 0:42:09.289,0:42:12.240 And then, the 2nd thing...sorry,[br]remind me what it was? 0:42:12.240,0:42:15.189 Mic 2: What about the guys gaming the system? 0:42:15.189,0:42:16.949 Dr. Helsby: Oh, yeah. 0:42:16.949,0:42:18.900 I think it's a legitimate concern. 0:42:18.900,0:42:22.480 Like, if all the outputs[br]were just immediately public, 0:42:22.480,0:42:24.599 then, yes, everyone knows the location 0:42:24.599,0:42:26.549 of all police officers, 0:42:26.549,0:42:29.009 and I imagine that people would have 0:42:29.009,0:42:30.779 a problem with that. 0:42:30.779,0:42:32.679 Yup. 0:42:32.679,0:42:35.990 Heraldl: Microphone #4, please. 0:42:35.990,0:42:39.369 Mic 4: Yeah, this is not actually a question, 0:42:39.369,0:42:40.779 but just a comment. 0:42:40.779,0:42:42.970 I've enjoyed your talk very much, 0:42:42.970,0:42:47.789 in particular after watching 0:42:47.789,0:42:52.270 the talk in Hall 1 earlier in the afternoon. 0:42:52.270,0:42:55.730 The "Say Hi to Your New Boss", about 0:42:55.730,0:42:59.609 algorithms that are trained with big data, 0:42:59.609,0:43:02.390 and finally make decisions. 0:43:02.390,0:43:08.210 And I think these 2 talks are kind of complementary, 0:43:08.210,0:43:11.309 and if people are interested in the topic 0:43:11.309,0:43:14.710 they might want to check out the other talk 0:43:14.710,0:43:16.259 and watch it later, because these 0:43:16.259,0:43:17.319 fit very well together. 0:43:17.319,0:43:19.589 Dr. Helsby: Yeah, it was a great talk. 0:43:19.589,0:43:22.130 Herald: Microphone #2, please. 0:43:22.130,0:43:25.049 Mic 2: Um, yeah, you mentioned 0:43:25.049,0:43:27.319 the need to have some kind of 3rd-party auditing 0:43:27.319,0:43:30.900 or some kind of way to 0:43:30.900,0:43:31.930 peek into these algorithms 0:43:31.930,0:43:33.079 and to see what they're doing, 0:43:33.079,0:43:34.420 and to see if they're being fair. 0:43:34.420,0:43:36.199 Can you talk a little bit more about that? 0:43:36.199,0:43:38.059 Like, going forward, 0:43:38.059,0:43:40.690 some kind of regulatory structures 0:43:40.690,0:43:44.200 would probably have to emerge 0:43:44.200,0:43:47.200 to analyze and to look at 0:43:47.200,0:43:49.339 these black boxes that are just sort of 0:43:49.339,0:43:51.309 popping up everywhere and, you know, 0:43:51.309,0:43:52.939 controlling more and more of the things 0:43:52.939,0:43:56.150 in our lives, and important decisions. 0:43:56.150,0:43:58.539 So, just, what kind of discussions 0:43:58.539,0:43:59.460 are there for that? 0:43:59.460,0:44:01.809 And what kind of possibility[br]is there for that? 0:44:01.809,0:44:04.900 And, I'm sure that companies would be 0:44:04.900,0:44:08.000 very, very resistant to 0:44:08.000,0:44:09.890 any kind of attempt to look into 0:44:09.890,0:44:13.890 algorithms, and to... 0:44:13.890,0:44:15.070 Dr. Helsby: Yeah, I mean, definitely 0:44:15.070,0:44:18.069 companies would be very resistant to 0:44:18.069,0:44:19.670 having people look into their algorithms. 0:44:19.670,0:44:22.190 So, if you wanna do a very rigorous 0:44:22.190,0:44:23.339 audit of what's going on 0:44:23.339,0:44:25.660 then it's probably necessary to have 0:44:25.660,0:44:26.589 a few people come in 0:44:26.589,0:44:28.900 and sign NDAs, and then 0:44:28.900,0:44:31.039 look through the systems. 0:44:31.039,0:44:33.140 So, that's one way to proceed. 0:44:33.140,0:44:35.049 But, another way to proceed that-- 0:44:35.049,0:44:38.720 so, these academic researchers have done 0:44:38.720,0:44:40.009 a few experiments 0:44:40.009,0:44:42.809 and found some interesting things, 0:44:42.809,0:44:45.500 and that's sort all the attempts at auditing 0:44:45.500,0:44:46.450 that we've seen: 0:44:46.450,0:44:48.490 there was 1 attempt in 2012[br]for the Obama campaign, 0:44:48.490,0:44:49.910 but there's really not been any 0:44:49.910,0:44:51.500 sort of systematic attempt-- 0:44:51.500,0:44:52.589 you know, like, in censorship 0:44:52.589,0:44:54.539 we see a systematic attempt to 0:44:54.539,0:44:56.779 do measurement as often as possible, 0:44:56.779,0:44:58.240 check what's going on, 0:44:58.240,0:44:59.339 and that itself, you know, 0:44:59.339,0:45:00.900 can act as an oversight mechanism. 0:45:00.900,0:45:01.880 But, right now, 0:45:01.880,0:45:03.900 I think many of these companies 0:45:03.900,0:45:05.259 realize no one is watching, 0:45:05.259,0:45:07.160 so there's no real push to have 0:45:07.160,0:45:10.440 people verify: are you being fair when you 0:45:10.440,0:45:11.539 implement this system? 0:45:11.539,0:45:12.969 Because no one's really checking. 0:45:12.969,0:45:13.980 Mic 2: Do you think that, 0:45:13.980,0:45:15.339 at some point, it would be like 0:45:15.339,0:45:19.059 an FDA or SEC, to give some American examples... 0:45:19.059,0:45:21.490 an actual government regulatory agency 0:45:21.490,0:45:24.960 that has the power and ability to 0:45:24.960,0:45:27.930 not just sort of look and try to 0:45:27.930,0:45:31.710 reverse engineer some of these algorithms, 0:45:31.710,0:45:33.920 but actually peek in there and make sure 0:45:33.920,0:45:36.420 that things are fair, because it seems like 0:45:36.420,0:45:38.240 there's just-- it's so important now 0:45:38.240,0:45:41.769 that, again, it could be the difference between 0:45:41.769,0:45:42.930 life and death, between 0:45:42.930,0:45:44.589 getting a job, not getting a job, 0:45:44.589,0:45:46.130 being pulled over,[br]not being pulled over, 0:45:46.130,0:45:48.069 being racially profiled,[br]not racially profiled, 0:45:48.069,0:45:49.410 things like that.[br]Dr. Helsby: Right. 0:45:49.410,0:45:50.430 Mic 2: Is it moving in that direction? 0:45:50.430,0:45:52.249 Or is it way too early for it? 0:45:52.249,0:45:55.110 Dr. Helsby: I mean, so some people have... 0:45:55.110,0:45:56.859 someone has called for, like, 0:45:56.859,0:45:59.079 a Federal Search Commission, 0:45:59.079,0:46:00.930 or like a Federal Algorithms Commission, 0:46:00.930,0:46:03.200 that would do this sort of oversight work, 0:46:03.200,0:46:06.130 but it's in such early stages right now 0:46:06.130,0:46:09.970 that there's no real push for that. 0:46:09.970,0:46:13.330 But I think it's a good idea. 0:46:13.330,0:46:15.729 Herald: And again, #2 please. 0:46:15.729,0:46:17.059 Mic 2: Thank you again for your talk. 0:46:17.059,0:46:19.309 I was just curious if you can point 0:46:19.309,0:46:20.440 to any examples of 0:46:20.440,0:46:22.619 either current producers or consumers 0:46:22.619,0:46:24.029 of these algorithmic systems 0:46:24.029,0:46:26.390 who are actively and publicly trying 0:46:26.390,0:46:27.720 to do so in a responsible manner 0:46:27.720,0:46:29.720 by describing what they're trying to do 0:46:29.720,0:46:31.380 and how they're going about it? 0:46:31.380,0:46:37.210 Dr. Helsby: So, yeah, there are some companies, 0:46:37.210,0:46:39.000 for example, like DataKind, 0:46:39.000,0:46:42.710 that try to deploy algorithmic systems 0:46:42.710,0:46:44.640 in as responsible a way as possible, 0:46:44.640,0:46:47.250 for like public policy. 0:46:47.250,0:46:49.549 Like, I actually also implement systems 0:46:49.549,0:46:51.750 for public policy in a transparent way. 0:46:51.750,0:46:54.329 Like, all the code is in GitHub, etc. 0:46:54.329,0:47:00.020 And it is also the case to give credit to 0:47:00.020,0:47:01.990 Google, and these giants, 0:47:01.990,0:47:06.109 they're trying to implement transparency systems 0:47:06.109,0:47:08.170 that help you understand. 0:47:08.170,0:47:09.289 This has been done with respect to 0:47:09.289,0:47:12.329 how your data is being collected, 0:47:12.329,0:47:14.579 but for example if you go on Amazon.com 0:47:14.579,0:47:17.890 you can see a recommendation has been made, 0:47:17.890,0:47:19.420 and that is pretty transparent. 0:47:19.420,0:47:21.480 You can see "this item[br]was recommended to me," 0:47:21.480,0:47:25.039 so you know that prediction[br]is being used in this case, 0:47:25.039,0:47:27.089 and it will say why prediction is being used: 0:47:27.089,0:47:29.230 because you purchased some item. 0:47:29.230,0:47:30.380 And Google has a similar thing, 0:47:30.380,0:47:32.420 if you go to like Google Ad Settings, 0:47:32.420,0:47:35.249 you can even turn off personalization of ads 0:47:35.249,0:47:36.380 if you want, 0:47:36.380,0:47:38.119 and you can also see some of the inferences 0:47:38.119,0:47:39.400 that have been learned about you. 0:47:39.400,0:47:40.819 A subset of the inferences that have been 0:47:40.819,0:47:41.700 learned about you. 0:47:41.700,0:47:43.940 So, like, what interests... 0:47:43.940,0:47:47.869 Herald: A question from the internet, please? 0:47:47.869,0:47:50.930 Signal Angel: Yes, billetQ is asking 0:47:50.930,0:47:54.479 how do you avoid biases in machine learning? 0:47:54.479,0:47:57.380 I asume analysis system, for example, 0:47:57.380,0:48:00.420 could be biased against women and minorities, 0:48:00.420,0:48:04.960 if used for hiring decisions[br]based on known data. 0:48:04.960,0:48:06.499 Dr. Helsby: Yeah, so one thing is to 0:48:06.499,0:48:08.529 just explicitly check. 0:48:08.529,0:48:12.199 So, you can check to see how 0:48:12.199,0:48:14.309 positive outcomes are being distributed 0:48:14.309,0:48:16.779 among those protected classes. 0:48:16.779,0:48:19.210 You could also incorporate these sort of 0:48:19.210,0:48:21.440 fairness constraints in the function 0:48:21.440,0:48:24.069 that you optimize when you train the system, 0:48:24.069,0:48:25.950 and so, if you're interested in reading more 0:48:25.950,0:48:28.960 about this, the 2 papers-- 0:48:28.960,0:48:31.909 let me go to References-- 0:48:31.909,0:48:32.730 there's a good paper called 0:48:32.730,0:48:35.339 Fairness Through Awareness that describes 0:48:35.339,0:48:37.499 how to go about doing this, 0:48:37.499,0:48:39.579 so I recommend this person read that. 0:48:39.579,0:48:40.970 It's good. 0:48:40.970,0:48:43.400 Herald: Microphone 2, please. 0:48:43.400,0:48:45.400 Mic2: Thanks again for your talk. 0:48:45.400,0:48:49.649 Umm, hello? 0:48:49.649,0:48:50.999 Okay. 0:48:50.999,0:48:52.960 Umm, I see of course a problem with 0:48:52.960,0:48:54.619 all the black boxes that you describe 0:48:54.619,0:48:57.069 with regards for the crime systems, 0:48:57.069,0:48:59.569 but when we look at the advertising systems 0:48:59.569,0:49:02.169 in many cases they are very networked. 0:49:02.169,0:49:04.160 There are many different systems collaborating 0:49:04.160,0:49:07.109 and exchanging data via open APIs: 0:49:07.109,0:49:08.720 RESTful APIs, and various 0:49:08.720,0:49:11.720 demand-side platforms[br]and audience-exchange platforms, 0:49:11.720,0:49:12.539 and everything. 0:49:12.539,0:49:15.420 So, can that help to at least 0:49:15.420,0:49:22.160 increase awareness on where targeting, personalization 0:49:22.160,0:49:23.679 might be happening? 0:49:23.679,0:49:26.190 I mean, I'm looking at systems like 0:49:26.190,0:49:29.539 BuiltWith, that surface what kind of 0:49:29.539,0:49:31.380 JavaScript libraries are used elsewhere. 0:49:31.380,0:49:32.999 So, is that something that could help 0:49:32.999,0:49:35.670 at least to give a better awareness 0:49:35.670,0:49:38.690 and listing all the points where 0:49:38.690,0:49:41.409 you might be targeted... 0:49:41.409,0:49:43.070 Dr. Helsby: So, like, with respect to 0:49:43.070,0:49:46.460 advertising, the fact that[br]there is behind the scenes 0:49:46.460,0:49:48.450 this like complicated auction process 0:49:48.450,0:49:50.650 that's occurring, just makes things 0:49:50.650,0:49:51.819 a lot more complicated. 0:49:51.819,0:49:54.170 So, for example, I said briefly 0:49:54.170,0:49:57.269 that they found that there's this[br]statistical difference 0:49:57.269,0:49:59.099 between how men and women are treated, 0:49:59.099,0:50:01.339 but it doesn't necessarily mean that 0:50:01.339,0:50:03.640 "Oh, the algorithm is definitely biased." 0:50:03.640,0:50:06.369 It could be because of this auction process, 0:50:06.369,0:50:10.569 it could be that women are considered 0:50:10.569,0:50:12.630 more valuable when it comes to advertising, 0:50:12.630,0:50:15.099 and so these executive ads are getting 0:50:15.099,0:50:17.160 outbid by some other ads, 0:50:17.160,0:50:18.890 and so there's a lot of potential 0:50:18.890,0:50:20.490 causes for that. 0:50:20.490,0:50:22.829 So, I think it just makes things[br]a lot more complicated. 0:50:22.829,0:50:25.910 I don't know if it helps[br]with the bias at all. 0:50:25.910,0:50:27.410 Mic 2: Well, the question was more 0:50:27.410,0:50:30.299 a direction... can it help to surface 0:50:30.299,0:50:32.499 and make people aware of that fact? 0:50:32.499,0:50:34.930 I mean, I can talk to my kids probably, 0:50:34.930,0:50:36.259 and they will probably understand, 0:50:36.259,0:50:38.420 but I can't explain that to my grandma, 0:50:38.420,0:50:43.150 who's also, umm, looking at an iPad. 0:50:43.150,0:50:44.289 Dr. Helsby: So, the fact that 0:50:44.289,0:50:45.690 the systems are... 0:50:45.690,0:50:48.509 I don't know if I understand. 0:50:48.509,0:50:50.529 Mic 2: OK. I think that the main problem 0:50:50.529,0:50:53.710 is that we are behind the industry efforts 0:50:53.710,0:50:57.179 to being targeted at, and many people 0:50:57.179,0:51:00.579 do know, but a lot more people don't know, 0:51:00.579,0:51:03.160 and making them aware of the fact 0:51:03.160,0:51:07.269 that they are a target, in a way, 0:51:07.269,0:51:10.990 is something that can only be shown 0:51:10.990,0:51:14.779 by a 3rd party that disposed that data, 0:51:14.779,0:51:16.339 and make audits in a way-- 0:51:16.339,0:51:17.929 maybe in an automated way. 0:51:17.929,0:51:19.170 Dr. Helsby: Right. 0:51:19.170,0:51:21.410 Yeah, I think it certainly[br]could help with advocacy 0:51:21.410,0:51:23.059 if that's the point, yeah. 0:51:23.059,0:51:26.079 Herald: Another question[br]from the internet, please. 0:51:26.079,0:51:29.319 Signal Angel: Yes, on IRC they are asking 0:51:29.319,0:51:31.440 if we know that prediction in some cases 0:51:31.440,0:51:34.460 provides an influence that cannot be controlled. 0:51:34.460,0:51:38.480 So, r4v5 would like to know from you 0:51:38.480,0:51:41.519 if there are some cases or areas where 0:51:41.519,0:51:45.060 machine learning simply shouldn't go? 0:51:45.060,0:51:48.349 Dr. Helsby: Umm, so I think... 0:51:48.349,0:51:52.559 I mean, yes, I think that it is the case 0:51:52.559,0:51:54.650 that in some cases machine learning 0:51:54.650,0:51:56.180 might not be appropriate. 0:51:56.180,0:51:58.359 For example, if you use machine learning 0:51:58.359,0:52:00.970 to decide who should be searched. 0:52:00.970,0:52:02.619 I don't think it should be the case that 0:52:02.619,0:52:03.809 machine learning algorithms should 0:52:03.809,0:52:05.440 ever be used to determine 0:52:05.440,0:52:08.430 probable cause, or something like that. 0:52:08.430,0:52:12.339 So, if it's just one piece of evidence 0:52:12.339,0:52:13.299 that you consider, 0:52:13.299,0:52:14.990 and there's human oversight always, 0:52:14.990,0:52:18.519 maybe it's fine, but 0:52:18.519,0:52:20.839 we should be very suspicious and hesitant 0:52:20.839,0:52:22.119 in certain contexts where 0:52:22.119,0:52:24.529 the ramifications are very serious. 0:52:24.529,0:52:27.259 Like the No Fly List, and so on. 0:52:27.259,0:52:29.200 Herald: And #2 again. 0:52:29.200,0:52:30.809 Mic 2: A second question 0:52:30.809,0:52:33.509 that just occurred to me, if you don't mind. 0:52:33.509,0:52:35.339 Umm, until the advent of 0:52:35.339,0:52:36.559 algorithmic systems, 0:52:36.559,0:52:40.470 when there've been cases of serious harm 0:52:40.470,0:52:42.799 that's been resulted in individuals or groups, 0:52:42.799,0:52:44.579 and it's been demonstrated that 0:52:44.579,0:52:46.029 it's occurred because of 0:52:46.029,0:52:49.400 an individual or a system of people 0:52:49.400,0:52:53.019 being systematically biased, then often 0:52:53.019,0:52:55.130 one of the actions that's taken is 0:52:55.130,0:52:56.869 pressure's applied, and then 0:52:56.869,0:52:59.660 people are required to change, 0:52:59.660,0:53:01.049 and hopely be held responsible, 0:53:01.049,0:53:02.910 and then change the way that they do things 0:53:02.910,0:53:06.400 to try to remove bias from that system. 0:53:06.400,0:53:07.839 What's the current thinking about 0:53:07.839,0:53:10.299 how we can go about doing that 0:53:10.299,0:53:12.599 when the systems that are doing that 0:53:12.599,0:53:13.650 are algorithmic? 0:53:13.650,0:53:15.999 Is it just going to be human oversight, 0:53:15.999,0:53:16.910 and humans are gonna have to be 0:53:16.910,0:53:18.379 held responsible for the oversight? 0:53:18.379,0:53:20.890 Dr. Helsby: So, in terms of bias, 0:53:20.890,0:53:22.569 if we're concerned about bias towards 0:53:22.569,0:53:24.019 particular types of people, 0:53:24.019,0:53:25.710 that's something that we can optimize for. 0:53:25.710,0:53:28.839 So, we can train systems that are unbiased 0:53:28.839,0:53:30.019 in this way. 0:53:30.019,0:53:32.109 So that's one way to deal with it. 0:53:32.109,0:53:34.039 But there's always gonna be errors, 0:53:34.039,0:53:35.420 so that's sort of a separate issue 0:53:35.420,0:53:37.509 from the bias, and in the case 0:53:37.509,0:53:39.180 where there are errors, 0:53:39.180,0:53:40.539 there must be oversight. 0:53:40.539,0:53:45.079 So, one way that one could improve 0:53:45.079,0:53:46.410 the way that this is done 0:53:46.410,0:53:48.160 is by making sure that you're 0:53:48.160,0:53:50.799 keeping track of confidence of decisions. 0:53:50.799,0:53:54.039 So, if you have a low confidence prediction, 0:53:54.039,0:53:56.259 then maybe a human[br]should come in and check things. 0:53:56.259,0:53:58.809 So, that might be one way to proceed. 0:54:02.099,0:54:03.990 Herald: So, there's no more question. 0:54:03.990,0:54:06.199 I close this talk now, 0:54:06.199,0:54:08.239 and thank you very much 0:54:08.239,0:54:09.410 and a big applause to 0:54:09.410,0:54:11.780 Jennifer Helsby! 0:54:11.780,0:54:16.310 roaring applause 0:54:16.310,0:54:28.000 subtitles created by c3subtitles.de[br]Join, and help us!