1
00:00:00,000 --> 00:00:08,895
Musik
2
00:00:08,895 --> 00:00:20,040
Herald: Who of you is using Facebook? Twitter?
Diaspora?
3
00:00:20,040 --> 00:00:27,630
concerned noise And all of that data
you enter there
4
00:00:27,630 --> 00:00:34,240
gets to server, gets into the hand of somebody
who's using it
5
00:00:34,240 --> 00:00:38,519
and the next talk
is especially about that,
6
00:00:38,519 --> 00:00:43,879
because there's also intelligent machines
and intelligent algorithms
7
00:00:43,879 --> 00:00:47,489
that try to make something
out of that data.
8
00:00:47,489 --> 00:00:50,920
So the post-doc researcher Jennifer Helsby
9
00:00:50,920 --> 00:00:55,839
of the University of Chicago,
which works in this
10
00:00:55,839 --> 00:00:59,370
intersection between policy and
technology,
11
00:00:59,370 --> 00:01:04,709
will now ask you the question:
To who would we give that power?
12
00:01:04,709 --> 00:01:12,860
Dr. Helsby: Thanks.
applause
13
00:01:12,860 --> 00:01:17,090
Okay, so, today I'm gonna do a brief tour
of intelligent systems
14
00:01:17,090 --> 00:01:18,640
and how they're currently used
15
00:01:18,640 --> 00:01:21,760
and then we're gonna look at some examples
with respect
16
00:01:21,760 --> 00:01:23,710
to the properties that we might care about
17
00:01:23,710 --> 00:01:26,000
these systems having,
and I'll talk a little bit about
18
00:01:26,000 --> 00:01:27,940
some of the work that's been done in academia
19
00:01:27,940 --> 00:01:28,680
on these topics.
20
00:01:28,680 --> 00:01:31,780
And then we'll talk about some
promising paths forward.
21
00:01:31,780 --> 00:01:37,040
So, I wanna start with this:
Kranzberg's First Law of Technology
22
00:01:37,040 --> 00:01:40,420
So, it's not good or bad,
but it also isn't neutral.
23
00:01:40,420 --> 00:01:42,980
Technology shapes our world,
and it can act as
24
00:01:42,980 --> 00:01:46,140
a liberating force-- or an oppressive and
controlling force.
25
00:01:46,140 --> 00:01:49,730
So, in this talk, I'm gonna go
towards some of the aspects
26
00:01:49,730 --> 00:01:53,830
of intelligent systems that might be more
controlling in nature.
27
00:01:53,830 --> 00:01:56,060
So, as we all know,
28
00:01:56,060 --> 00:01:59,770
because of the rapidly decreasing cost
of storage and computation,
29
00:01:59,770 --> 00:02:02,170
along with the rise of new sensor technologies,
30
00:02:02,170 --> 00:02:05,510
data collection devices
are being pushed into every
31
00:02:05,510 --> 00:02:08,329
aspect of our lives: in our homes, our cars,
32
00:02:08,329 --> 00:02:10,469
in our pockets, on our wrists.
33
00:02:10,469 --> 00:02:13,280
And data collection systems act as intermediaries
34
00:02:13,280 --> 00:02:15,230
for a huge amount of human communication.
35
00:02:15,230 --> 00:02:17,900
And much of this data sits in government
36
00:02:17,900 --> 00:02:19,860
and corporate databases.
37
00:02:19,860 --> 00:02:23,090
So, in order to make use of this data,
38
00:02:23,090 --> 00:02:27,280
we need to be able to make some inferences.
39
00:02:27,280 --> 00:02:30,280
So, one way of approaching this is I can hire
40
00:02:30,280 --> 00:02:32,310
a lot of humans, and I can have these humans
41
00:02:32,310 --> 00:02:34,990
manually examine the data, and they can acquire
42
00:02:34,990 --> 00:02:36,900
expert knowledge of the domain, and then
43
00:02:36,900 --> 00:02:38,510
perhaps they can make some decisions
44
00:02:38,510 --> 00:02:40,830
or at least some recommendations
based on it.
45
00:02:40,830 --> 00:02:43,030
However, there's some problems with this.
46
00:02:43,030 --> 00:02:45,810
One is that it's slow, and thus expensive.
47
00:02:45,810 --> 00:02:48,060
It's also biased. We know that humans have
48
00:02:48,060 --> 00:02:50,700
all sorts of biases, both conscious and unconscious,
49
00:02:50,700 --> 00:02:53,390
and it would be nice to have a system
that did not have
50
00:02:53,390 --> 00:02:54,959
these inaccuracies.
51
00:02:54,959 --> 00:02:57,069
It's also not very transparent: I might
52
00:02:57,069 --> 00:02:58,910
not really know the factors that led to
53
00:02:58,910 --> 00:03:00,930
some decisions being made.
54
00:03:00,930 --> 00:03:03,360
Even humans themselves
often don't really understand
55
00:03:03,360 --> 00:03:05,360
why they came to a given decision, because
56
00:03:05,360 --> 00:03:08,130
of their being emotional in nature.
57
00:03:08,130 --> 00:03:11,530
And, thus, these human decision making systems
58
00:03:11,530 --> 00:03:13,170
are often difficult to audit.
59
00:03:13,170 --> 00:03:15,819
So, another way to proceed is maybe instead
60
00:03:15,819 --> 00:03:18,000
I study the system and the data carefully
61
00:03:18,000 --> 00:03:20,520
and I write down the best rules
for making a decision
62
00:03:20,520 --> 00:03:23,280
or, I can have a machine
dynamically figure out
63
00:03:23,280 --> 00:03:25,459
the best rules, as in machine learning.
64
00:03:25,459 --> 00:03:28,640
So, maybe this is a better approach.
65
00:03:28,640 --> 00:03:32,230
It's certainly fast, and thus cheap.
66
00:03:32,230 --> 00:03:34,290
And maybe I can construct
the system in such a way
67
00:03:34,290 --> 00:03:37,090
that it doesn't have the biases that are inherent
68
00:03:37,090 --> 00:03:39,209
in human decision making.
69
00:03:39,209 --> 00:03:41,560
And, since I've written these rules down,
70
00:03:41,560 --> 00:03:42,819
or a computer has learned these rules,
71
00:03:42,819 --> 00:03:45,140
then I can just show them to somebody, right?
72
00:03:45,140 --> 00:03:46,819
And then they can audit it.
73
00:03:46,819 --> 00:03:49,020
So, more and more decision making is being
74
00:03:49,020 --> 00:03:50,750
done in this way.
75
00:03:50,750 --> 00:03:53,170
And so, in this model, we take data
76
00:03:53,170 --> 00:03:55,709
we make an inference based on that data
77
00:03:55,709 --> 00:03:58,120
using these algorithms, and then
78
00:03:58,120 --> 00:03:59,420
we can take actions.
79
00:03:59,420 --> 00:04:01,860
And, when we take this more scientific approach
80
00:04:01,860 --> 00:04:04,200
to making decisions and optimizing for
81
00:04:04,200 --> 00:04:07,310
a desired outcome,
we can take an experimental approach
82
00:04:07,310 --> 00:04:10,080
so we can determine
which actions are most effective
83
00:04:10,080 --> 00:04:12,310
in achieving a desired outcome.
84
00:04:12,310 --> 00:04:14,010
Maybe there are some types of communication
85
00:04:14,010 --> 00:04:16,750
styles that are most effective
with certain people.
86
00:04:16,750 --> 00:04:19,510
I can perhaps deploy some individualized incentives
87
00:04:19,510 --> 00:04:22,060
to get the outcome that I desire.
88
00:04:22,060 --> 00:04:25,990
And, maybe even if I carefully design an experiment
89
00:04:25,990 --> 00:04:27,810
with the environment in which people make
90
00:04:27,810 --> 00:04:30,699
these decisions, perhaps even very small changes
91
00:04:30,699 --> 00:04:34,250
can introduce significant changes
in peoples' behavior.
92
00:04:34,250 --> 00:04:37,320
So, through these mechanisms,
and this experimental approach,
93
00:04:37,320 --> 00:04:39,840
I can maximize the probability
that humans do
94
00:04:39,840 --> 00:04:42,020
what I want.
95
00:04:42,020 --> 00:04:45,380
So, algorithmic decision making is being used
96
00:04:45,380 --> 00:04:47,270
in industry, and is used
in lots of other areas,
97
00:04:47,270 --> 00:04:49,530
from astrophysics to medicine, and is now
98
00:04:49,530 --> 00:04:52,199
moving into new domains, including
99
00:04:52,199 --> 00:04:53,990
government applications.
100
00:04:53,990 --> 00:04:58,560
So, we have recommendation engines like
Netflix, Yelp, SoundCloud,
101
00:04:58,560 --> 00:05:00,699
that direct our attention to what we should
102
00:05:00,699 --> 00:05:03,510
watch and listen to.
103
00:05:03,510 --> 00:05:07,919
Since 2009, Google uses
personalized searched results,
104
00:05:07,919 --> 00:05:12,840
including if you're not logged in
into your Google account.
105
00:05:12,840 --> 00:05:15,389
And we also have algorithm curation and filtering,
106
00:05:15,389 --> 00:05:17,530
as in the case of Facebook News Feed,
107
00:05:17,530 --> 00:05:19,870
Google News, Yahoo News,
108
00:05:19,870 --> 00:05:22,840
which shows you what news articles, for example,
109
00:05:22,840 --> 00:05:24,330
you should be looking at.
110
00:05:24,330 --> 00:05:25,650
And this is important, because a lot of people
111
00:05:25,650 --> 00:05:29,410
get news from these media.
112
00:05:29,410 --> 00:05:31,520
We even have algorithmic journalists!
113
00:05:31,520 --> 00:05:35,240
So, automatic systems generate articles
114
00:05:35,240 --> 00:05:36,880
about weather, traffic, or sports
115
00:05:36,880 --> 00:05:38,729
instead of a human.
116
00:05:38,729 --> 00:05:41,949
And, another application that's more recent
117
00:05:41,949 --> 00:05:43,570
is the use of predictive systems
118
00:05:43,570 --> 00:05:45,180
in political campaigns.
119
00:05:45,180 --> 00:05:47,370
So, political campaigns also now take this
120
00:05:47,370 --> 00:05:50,340
approach to predict on an individual basis
121
00:05:50,340 --> 00:05:53,300
which candidate voters
are likely to vote for.
122
00:05:53,300 --> 00:05:55,500
And then they can target,
on an individual basis,
123
00:05:55,500 --> 00:05:58,199
those that can be persuaded otherwise.
124
00:05:58,199 --> 00:06:00,830
And, finally, in the public sector,
125
00:06:00,830 --> 00:06:02,710
we're starting to use predictive systems
126
00:06:02,710 --> 00:06:06,320
in areas from policing, to health,
to education and energy.
127
00:06:06,320 --> 00:06:08,979
So, there are some advantages to this.
128
00:06:08,979 --> 00:06:12,790
So, one thing is that we can automate
129
00:06:12,790 --> 00:06:15,759
aspects of our lives
that we consider to be mundane
130
00:06:15,759 --> 00:06:17,620
using systems that are intelligent
131
00:06:17,620 --> 00:06:19,580
and adaptive enough.
132
00:06:19,580 --> 00:06:21,680
We can make use of all the data
133
00:06:21,680 --> 00:06:23,990
and really get the pieces of information we
134
00:06:23,990 --> 00:06:25,830
really care about.
135
00:06:25,830 --> 00:06:29,650
We can spend money in the most effective way,
136
00:06:29,650 --> 00:06:32,110
and we can do this with this experimental
137
00:06:32,110 --> 00:06:34,210
approach to optimize actions to produce
138
00:06:34,210 --> 00:06:35,190
desired outcomes.
139
00:06:35,190 --> 00:06:37,300
So, we can embed intelligence
140
00:06:37,300 --> 00:06:39,520
into all of these mundane objects
141
00:06:39,520 --> 00:06:41,180
and enable them to make decisions for us,
142
00:06:41,180 --> 00:06:42,860
and so that's what we're doing more and more,
143
00:06:42,860 --> 00:06:45,210
and we can have an object
that decides for us
144
00:06:45,210 --> 00:06:46,840
what temperature we should set our house,
145
00:06:46,840 --> 00:06:49,009
what we should be doing, etc.
146
00:06:49,009 --> 00:06:52,400
So, there might be some implications here.
147
00:06:52,400 --> 00:06:55,680
We want these systems
that do work on this data
148
00:06:55,680 --> 00:06:58,039
to increase the opportunities
available to us.
149
00:06:58,039 --> 00:07:00,259
But it might be that there are some implications
150
00:07:00,259 --> 00:07:01,780
that we have not carefully thought through.
151
00:07:01,780 --> 00:07:03,430
This is a new area, and people are only
152
00:07:03,430 --> 00:07:05,940
starting to scratch the surface of what the
153
00:07:05,940 --> 00:07:07,289
problems might be.
154
00:07:07,289 --> 00:07:09,600
In some cases, they might narrow the options
155
00:07:09,600 --> 00:07:10,990
available to people,
156
00:07:10,990 --> 00:07:13,199
and this approach subjects people to
157
00:07:13,199 --> 00:07:15,620
suggestive messaging intended to nudge them
158
00:07:15,620 --> 00:07:17,169
to a desired outcome.
159
00:07:17,169 --> 00:07:19,320
Some people may have a problem with that.
160
00:07:19,320 --> 00:07:20,650
Values we care about are not gonna be
161
00:07:20,650 --> 00:07:23,860
baked into these systems by default.
162
00:07:23,860 --> 00:07:25,960
It's also the case that some algorithmic systems
163
00:07:25,960 --> 00:07:28,300
facilitate work that we do not like.
164
00:07:28,300 --> 00:07:30,199
For example, in the case of mass surveillance.
165
00:07:30,199 --> 00:07:32,130
And even the same systems,
166
00:07:32,130 --> 00:07:34,039
used by different people or organizations,
167
00:07:34,039 --> 00:07:36,110
have very different consequences.
168
00:07:36,110 --> 00:07:37,320
For example, if I can predict
169
00:07:37,320 --> 00:07:40,020
with high accuracy, based on say search queries,
170
00:07:40,020 --> 00:07:42,050
who's gonna be admitted to a hospital,
171
00:07:42,050 --> 00:07:43,750
some people would be interested
in knowing that.
172
00:07:43,750 --> 00:07:46,120
You might be interested
in having your doctor know that.
173
00:07:46,120 --> 00:07:47,919
But that same predictive model
in the hands of
174
00:07:47,919 --> 00:07:50,569
an insurance company
has a very different implication.
175
00:07:50,569 --> 00:07:53,389
So, the point here is that these systems
176
00:07:53,389 --> 00:07:55,860
structure and influence how humans interact
177
00:07:55,860 --> 00:07:58,360
with each other, how they interact with society,
178
00:07:58,360 --> 00:07:59,850
and how they interact with government.
179
00:07:59,850 --> 00:08:03,080
And if they constrain what people can do,
180
00:08:03,080 --> 00:08:05,069
we should really care about this.
181
00:08:05,069 --> 00:08:08,270
So now I'm gonna go to
sort of an extreme case,
182
00:08:08,270 --> 00:08:11,930
just as an example, and that's this
Chinese Social Credit System.
183
00:08:11,930 --> 00:08:14,169
And so this is probably one of the more
184
00:08:14,169 --> 00:08:17,259
ambitious uses of data,
185
00:08:17,259 --> 00:08:18,880
that is used to rank each citizen
186
00:08:18,880 --> 00:08:21,190
based on their behavior, in China.
187
00:08:21,190 --> 00:08:24,210
So right now, there are various pilot systems
188
00:08:24,210 --> 00:08:27,660
deployed by various companies doing this in
China.
189
00:08:27,660 --> 00:08:30,729
They're currently voluntary, and by 2020
190
00:08:30,729 --> 00:08:32,630
this system is gonna be decided on,
191
00:08:32,630 --> 00:08:34,679
or a combination of the systems,
192
00:08:34,679 --> 00:08:37,409
that is gonna be mandatory for everyone.
193
00:08:37,409 --> 00:08:40,950
And so, in this system, there are some citizens,
194
00:08:40,950 --> 00:08:44,380
and a huge range of data sources are used.
195
00:08:44,380 --> 00:08:46,820
So, some of the data sources are
196
00:08:46,820 --> 00:08:48,360
your financial data,
197
00:08:48,360 --> 00:08:50,020
your criminal history,
198
00:08:50,020 --> 00:08:52,320
how many points you have
on your driver's license,
199
00:08:52,320 --> 00:08:55,360
medical information-- for example,
if you take birth control pills,
200
00:08:55,360 --> 00:08:56,810
that's incorporated.
201
00:08:56,810 --> 00:08:59,830
Your purchase history-- for example,
if you purchase games,
202
00:08:59,830 --> 00:09:02,430
you are down-ranked in the system.
203
00:09:02,430 --> 00:09:04,490
Some of the systems, not all of them,
204
00:09:04,490 --> 00:09:07,260
incorporate social media monitoring,
205
00:09:07,260 --> 00:09:09,200
which makes sense if you're a state like China,
206
00:09:09,200 --> 00:09:11,270
you probably want to know about
207
00:09:11,270 --> 00:09:14,899
political statements that people
are saying on social media.
208
00:09:14,899 --> 00:09:18,020
And, one of the more interesting parts is
209
00:09:18,020 --> 00:09:22,160
social network analysis:
looking at the relationships between people.
210
00:09:22,160 --> 00:09:24,270
So, if you have a close relationship with
somebody
211
00:09:24,270 --> 00:09:26,180
and they have a low credit score,
212
00:09:26,180 --> 00:09:29,130
that can have implications on your credit
score.
213
00:09:29,130 --> 00:09:34,440
So, the way that these scores
are generated is secret.
214
00:09:34,440 --> 00:09:38,140
And, according to the call for these systems
215
00:09:38,140 --> 00:09:39,270
put out by the government,
216
00:09:39,270 --> 00:09:42,810
the goal is to
"carry forward the sincerity and
217
00:09:42,810 --> 00:09:45,760
traditional virtues" and
establish the idea of a
218
00:09:45,760 --> 00:09:47,520
"sincerity culture."
219
00:09:47,520 --> 00:09:49,440
But wait, it gets better:
220
00:09:49,440 --> 00:09:52,450
so, there's a portal that enables citizens
221
00:09:52,450 --> 00:09:55,040
to look up the citizen score of anyone.
222
00:09:55,040 --> 00:09:56,520
And many people like this system,
223
00:09:56,520 --> 00:09:58,320
they think it's a fun game.
224
00:09:58,320 --> 00:10:00,700
They boast about it on social media,
225
00:10:00,700 --> 00:10:03,610
they put their score in their dating profile,
226
00:10:03,610 --> 00:10:04,760
because if you're ranked highly you're
227
00:10:04,760 --> 00:10:06,589
part of an exclusive club.
228
00:10:06,589 --> 00:10:10,060
You can get VIP treatment
at hotels and other companies.
229
00:10:10,060 --> 00:10:11,880
But the downside is that, if you're excluded
230
00:10:11,880 --> 00:10:15,540
from that club, your weak score
may have other implications,
231
00:10:15,540 --> 00:10:20,120
like being unable to get access
to credit, housing, jobs.
232
00:10:20,120 --> 00:10:23,399
There is some reporting that even travel visas
233
00:10:23,399 --> 00:10:27,000
might be restricted
if your score is particularly low.
234
00:10:27,000 --> 00:10:31,160
So, a system like this, for a state, is really
235
00:10:31,160 --> 00:10:34,690
the optimal solution
to the problem of the public.
236
00:10:34,690 --> 00:10:37,130
It constitutes a very subtle and insiduous
237
00:10:37,130 --> 00:10:39,350
mechanism of social control.
238
00:10:39,350 --> 00:10:41,209
You don't need to spend a lot of money on
239
00:10:41,209 --> 00:10:43,800
police or prisons if you can set up a system
240
00:10:43,800 --> 00:10:45,820
where people discourage one another from
241
00:10:45,820 --> 00:10:48,930
anti-social acts like political action
in exchange for
242
00:10:48,930 --> 00:10:51,430
a coupon for a free Uber ride.
243
00:10:51,430 --> 00:10:55,269
So, there are a lot of
legitimate questions here:
244
00:10:55,269 --> 00:10:58,370
What protections does
user data have in this scheme?
245
00:10:58,370 --> 00:11:01,279
Do any safeguards exist to prevent tampering?
246
00:11:01,279 --> 00:11:04,310
What mechanism, if any, is there to prevent
247
00:11:04,310 --> 00:11:08,810
false input data from creating erroneous inferences?
248
00:11:08,810 --> 00:11:10,420
Is there any way that people can fix
249
00:11:10,420 --> 00:11:12,540
their score once they're ranked poorly?
250
00:11:12,540 --> 00:11:13,899
Or does it end up becoming a
251
00:11:13,899 --> 00:11:15,720
self-fulfilling prophecy?
252
00:11:15,720 --> 00:11:17,850
Your weak score means you have less access
253
00:11:17,850 --> 00:11:21,620
to jobs and credit, and now you will have
254
00:11:21,620 --> 00:11:24,709
limited access to opportunity.
255
00:11:24,709 --> 00:11:27,110
So, let's take a step back.
256
00:11:27,110 --> 00:11:28,470
So, what do we want?
257
00:11:28,470 --> 00:11:31,540
So, we probably don't want that,
258
00:11:31,540 --> 00:11:33,570
but as advocates we really wanna
259
00:11:33,570 --> 00:11:36,130
understand what questions we should be asking
260
00:11:36,130 --> 00:11:37,510
of these systems. Right now there's
261
00:11:37,510 --> 00:11:39,570
very little oversight,
262
00:11:39,570 --> 00:11:41,420
and we wanna make sure that we don't
263
00:11:41,420 --> 00:11:44,029
sort of sleepwalk our way to a situation
264
00:11:44,029 --> 00:11:46,649
where we've lost even more power
265
00:11:46,649 --> 00:11:49,740
to these centralized systems of control.
266
00:11:49,740 --> 00:11:52,209
And if you're an implementer, we wanna understand
267
00:11:52,209 --> 00:11:53,709
what can we be doing better.
268
00:11:53,709 --> 00:11:56,019
Are there better ways that we can be implementing
269
00:11:56,019 --> 00:11:57,640
these systems?
270
00:11:57,640 --> 00:11:59,430
Are there values that, as humans,
271
00:11:59,430 --> 00:12:01,060
we care about that we should make sure
272
00:12:01,060 --> 00:12:02,420
these systems have?
273
00:12:02,420 --> 00:12:05,550
So, the first thing
that most people in the room
274
00:12:05,550 --> 00:12:07,820
might think about is privacy.
275
00:12:07,820 --> 00:12:10,510
Which is, of course, of the utmost importance.
276
00:12:10,510 --> 00:12:12,920
We need privacy, and there is a good discussion
277
00:12:12,920 --> 00:12:15,680
on the importance of protecting
user data where possible.
278
00:12:15,680 --> 00:12:18,420
So, in this talk, I'm gonna focus
on the other aspects of
279
00:12:18,420 --> 00:12:19,470
algorithmic decision making,
280
00:12:19,470 --> 00:12:21,190
that I think have got less attention.
281
00:12:21,190 --> 00:12:25,140
Because it's not just privacy
that we need to worry about here.
282
00:12:25,140 --> 00:12:28,519
We also want systems that are fair and equitable.
283
00:12:28,519 --> 00:12:30,240
We want transparent systems,
284
00:12:30,240 --> 00:12:35,110
we don't want opaque decisions
to be made about us,
285
00:12:35,110 --> 00:12:36,510
decisions that might have serious impacts
286
00:12:36,510 --> 00:12:37,779
on our lives.
287
00:12:37,779 --> 00:12:40,490
And we need some accountability mechanisms.
288
00:12:40,490 --> 00:12:41,890
So, for the rest of this talk
289
00:12:41,890 --> 00:12:43,230
we're gonna go through each one of these things
290
00:12:43,230 --> 00:12:45,230
and look at some examples.
291
00:12:45,230 --> 00:12:47,709
So, the first thing is fairness.
292
00:12:47,709 --> 00:12:50,450
And so, as I said in the beginning,
this is one area
293
00:12:50,450 --> 00:12:52,690
where there might be an advantage
294
00:12:52,690 --> 00:12:55,079
to making decisions by machine,
295
00:12:55,079 --> 00:12:56,740
especially in areas where there have
296
00:12:56,740 --> 00:12:59,410
historically been fairness issues with
297
00:12:59,410 --> 00:13:02,350
decision making, such as law enforcement.
298
00:13:02,350 --> 00:13:05,839
So, this is one way that police departments
299
00:13:05,839 --> 00:13:08,360
use predictive models.
300
00:13:08,360 --> 00:13:10,540
The idea here is police would like to
301
00:13:10,540 --> 00:13:13,450
allocate resources in a more effective way,
302
00:13:13,450 --> 00:13:15,050
and they would also like to enable
303
00:13:15,050 --> 00:13:16,640
proactive policing.
304
00:13:16,640 --> 00:13:20,110
So, if you can predict where crimes
are going to occur,
305
00:13:20,110 --> 00:13:22,149
or who is going to commit crimes,
306
00:13:22,149 --> 00:13:24,870
then you can put cops in those places,
307
00:13:24,870 --> 00:13:27,769
or perhaps following these people,
308
00:13:27,769 --> 00:13:29,300
and then the crimes will not occur.
309
00:13:29,300 --> 00:13:31,370
So, it's sort of the pre-crime approach.
310
00:13:31,370 --> 00:13:34,649
So, there are a few ways of going about this.
311
00:13:34,649 --> 00:13:37,920
One way is doing this individual-level prediction.
312
00:13:37,920 --> 00:13:41,089
So you take each citizen
and estimate the risk
313
00:13:41,089 --> 00:13:43,769
that each citizen will participate,
say, in violence
314
00:13:43,769 --> 00:13:45,279
based on some data.
315
00:13:45,279 --> 00:13:46,779
And then you can flag those people that are
316
00:13:46,779 --> 00:13:49,199
considered particularly violent.
317
00:13:49,199 --> 00:13:51,519
So, this is currently done.
318
00:13:51,519 --> 00:13:52,589
This is done in the U.S.
319
00:13:52,589 --> 00:13:56,120
It's done in Chicago,
by the Chicago Police Department.
320
00:13:56,120 --> 00:13:58,350
And they maintain a heat list of individuals
321
00:13:58,350 --> 00:14:00,790
that are considered most likely to commit,
322
00:14:00,790 --> 00:14:03,529
or be the victim of, violence.
323
00:14:03,529 --> 00:14:06,700
And this is done using data
that the police maintain.
324
00:14:06,700 --> 00:14:09,589
So, the features that are used
in this predictive model
325
00:14:09,589 --> 00:14:12,209
include things that are derived from
326
00:14:12,209 --> 00:14:14,610
individuals' criminal history.
327
00:14:14,610 --> 00:14:16,810
So, for example, have they been involved in
328
00:14:16,810 --> 00:14:18,350
gun violence in the past?
329
00:14:18,350 --> 00:14:21,450
Do they have narcotics arrests? And so on.
330
00:14:21,450 --> 00:14:22,860
But another thing that's incorporated
331
00:14:22,860 --> 00:14:25,060
in the Chicago Police Department model is
332
00:14:25,060 --> 00:14:28,300
information derived from
social media network analysis.
333
00:14:28,300 --> 00:14:30,630
So, who you interact with,
334
00:14:30,630 --> 00:14:32,279
as noted in police data.
335
00:14:32,279 --> 00:14:34,899
So, for example, your co-arrestees.
336
00:14:34,899 --> 00:14:36,440
When officers conduct field interviews,
337
00:14:36,440 --> 00:14:38,240
who are people interacting with?
338
00:14:38,240 --> 00:14:42,940
And then this is all incorporated
into this risk score.
339
00:14:42,940 --> 00:14:44,639
So another way to proceed,
340
00:14:44,639 --> 00:14:47,070
which is the method that most companies
341
00:14:47,070 --> 00:14:49,579
that sell products like this
to the police have taken,
342
00:14:49,579 --> 00:14:51,459
is instead predicting which areas
343
00:14:51,459 --> 00:14:53,810
are likely to have crimes committed in them.
344
00:14:53,810 --> 00:14:56,690
So, take my city, I put a grid down,
345
00:14:56,690 --> 00:14:58,180
and then I use crime statistics
346
00:14:58,180 --> 00:15:00,430
and maybe some ancillary data sources,
347
00:15:00,430 --> 00:15:01,790
to determine which areas have
348
00:15:01,790 --> 00:15:04,709
the highest risk of crimes occurring in them,
349
00:15:04,709 --> 00:15:06,329
and I can flag those areas and send
350
00:15:06,329 --> 00:15:08,470
police officers to them.
351
00:15:08,470 --> 00:15:10,950
So now, let's look at some of the tools
352
00:15:10,950 --> 00:15:14,010
that are used for this geographic-level prediction.
353
00:15:14,010 --> 00:15:19,040
So, here are 3 companies that sell these
354
00:15:19,040 --> 00:15:22,910
geographic-level predictive policing systems.
355
00:15:22,910 --> 00:15:25,639
So, PredPol has a system that uses
356
00:15:25,639 --> 00:15:27,200
primarily crime statistics:
357
00:15:27,200 --> 00:15:30,209
only the time, place, and type of crime
358
00:15:30,209 --> 00:15:33,040
to predict where crimes will occur.
359
00:15:33,040 --> 00:15:35,970
HunchLab uses a wider range of data sources
360
00:15:35,970 --> 00:15:37,260
including, for example, weather
361
00:15:37,260 --> 00:15:39,720
and then Hitachi is a newer system
362
00:15:39,720 --> 00:15:42,100
that has a predictive crime analytics tool
363
00:15:42,100 --> 00:15:44,779
that also incorporates social media.
364
00:15:44,779 --> 00:15:47,850
The first one, to my knowledge, to do so.
365
00:15:47,850 --> 00:15:49,399
And these systems are in use
366
00:15:49,399 --> 00:15:52,820
in 50+ cities in the U.S.
367
00:15:52,820 --> 00:15:56,540
So, why do police departments buy this?
368
00:15:56,540 --> 00:15:57,760
Some police departments are interesting in
369
00:15:57,760 --> 00:16:00,500
buying systems like this, because they're marketed
370
00:16:00,500 --> 00:16:02,660
as impartial systems,
371
00:16:02,660 --> 00:16:06,199
so it's a way to police in an unbiased way.
372
00:16:06,199 --> 00:16:08,040
And so, these companies make
373
00:16:08,040 --> 00:16:08,670
statements like this--
374
00:16:08,670 --> 00:16:10,800
by the way, the references
will all be at the end,
375
00:16:10,800 --> 00:16:12,560
and they'll be on the slides--
376
00:16:12,560 --> 00:16:13,370
So, for example
377
00:16:13,370 --> 00:16:16,110
the predictive crime analytics from Hitachi
378
00:16:16,110 --> 00:16:17,610
claims that the system is anonymous,
379
00:16:17,610 --> 00:16:19,350
because it shows you an area,
380
00:16:19,350 --> 00:16:23,060
it doesn't show you
to look for a particular person.
381
00:16:23,060 --> 00:16:25,699
and PredPol reassures people that
382
00:16:25,699 --> 00:16:29,560
it eliminates any liberties or profiling concerns.
383
00:16:29,560 --> 00:16:32,269
And HunchLab notes that the system
384
00:16:32,269 --> 00:16:35,170
fairly represents priorities for public safety
385
00:16:35,170 --> 00:16:38,769
and is unbiased by race
or ethnicity, for example.
386
00:16:38,769 --> 00:16:43,529
So, let's take a minute
to describe in more detail
387
00:16:43,529 --> 00:16:48,100
what we mean when we talk about fairness.
388
00:16:48,100 --> 00:16:51,300
So, when we talk about fairness,
389
00:16:51,300 --> 00:16:52,740
we mean a few things.
390
00:16:52,740 --> 00:16:56,070
So, one is fairness with respect to individuals:
391
00:16:56,070 --> 00:16:58,040
so if I'm very similar to somebody
392
00:16:58,040 --> 00:17:00,170
and we go through some process
393
00:17:00,170 --> 00:17:03,430
and there is two very different
outcomes to that process
394
00:17:03,430 --> 00:17:05,679
we would consider that to be unfair.
395
00:17:05,679 --> 00:17:07,929
So, we want similar people to be treated
396
00:17:07,929 --> 00:17:09,539
in a similar way.
397
00:17:09,539 --> 00:17:13,079
But, there are certain protected attributes
398
00:17:13,079 --> 00:17:15,199
that we wouldn't want someone
399
00:17:15,199 --> 00:17:17,099
to discriminate based on.
400
00:17:17,099 --> 00:17:20,069
And so, there's this other property,
Group Fairness.
401
00:17:20,069 --> 00:17:22,249
So, we can look at the statistical parity
402
00:17:22,249 --> 00:17:25,439
between groups, based on gender, race, etc.
403
00:17:25,439 --> 00:17:28,049
and see if they're treated in a similar way.
404
00:17:28,049 --> 00:17:30,409
And we might not expect that in some cases,
405
00:17:30,409 --> 00:17:32,429
for example if the base rates in each group
406
00:17:32,429 --> 00:17:34,659
are very different.
407
00:17:34,659 --> 00:17:36,889
And then there's also Fairness in Errors.
408
00:17:36,889 --> 00:17:40,080
All predictive systems are gonna make errors,
409
00:17:40,080 --> 00:17:42,989
and if the errors are concentrated,
410
00:17:42,989 --> 00:17:46,399
then that may also represent unfairness.
411
00:17:46,399 --> 00:17:50,149
And so this concern arose recently with Facebook
412
00:17:50,149 --> 00:17:52,289
because people with Native American names
413
00:17:52,289 --> 00:17:54,389
had their profiles flagged as fraudulent
414
00:17:54,389 --> 00:17:58,759
far more often than those
with White American names.
415
00:17:58,759 --> 00:18:00,559
So these are the sorts of things
that we worry about
416
00:18:00,559 --> 00:18:02,190
and each of these are metrics,
417
00:18:02,190 --> 00:18:04,239
and if you're interested more you should
418
00:18:04,239 --> 00:18:06,159
check those 2 papers out.
419
00:18:06,159 --> 00:18:10,639
So, how can potential issues
with predictive policing
420
00:18:10,639 --> 00:18:13,850
have implications for these principles?
421
00:18:13,850 --> 00:18:18,559
So, one problem is
the training data that's used.
422
00:18:18,559 --> 00:18:21,059
Some of these systems only use crime statistics,
423
00:18:21,059 --> 00:18:23,600
other systems-- all of them use crime statistics
424
00:18:23,600 --> 00:18:25,619
in some way.
425
00:18:25,619 --> 00:18:31,419
So, one problem is that crime databases
426
00:18:31,419 --> 00:18:34,830
contain only crimes that've been detected.
427
00:18:34,830 --> 00:18:38,629
Right? So, the police are only gonna detect
428
00:18:38,629 --> 00:18:41,009
crimes that they know are happening,
429
00:18:41,009 --> 00:18:44,109
either through patrol and their own investigation
430
00:18:44,109 --> 00:18:46,320
or because they've been alerted to crime,
431
00:18:46,320 --> 00:18:48,789
for example by a citizen calling the police.
432
00:18:48,789 --> 00:18:52,179
So, a citizen has to feel like
they can call the police,
433
00:18:52,179 --> 00:18:54,019
like that's a good idea.
434
00:18:54,019 --> 00:18:58,789
So, some crimes suffer
from this problem less than others:
435
00:18:58,789 --> 00:19:02,249
for example, gun violence
is much easier to detect
436
00:19:02,249 --> 00:19:03,639
relative to fraud, for example,
437
00:19:03,639 --> 00:19:07,509
which is very difficult to detect.
438
00:19:07,509 --> 00:19:11,940
Now the racial profiling aspect
of this might come in
439
00:19:11,940 --> 00:19:15,590
because of biased policing in the past.
440
00:19:15,590 --> 00:19:19,999
So, for example, for marijuana arrests,
441
00:19:19,999 --> 00:19:22,619
black people are arrested in the U.S. at rates
442
00:19:22,619 --> 00:19:25,119
4 times that of white people,
443
00:19:25,119 --> 00:19:27,960
even though there is statistical parity
444
00:19:27,960 --> 00:19:31,389
with these 2 groups, to within a few percent.
445
00:19:31,389 --> 00:19:35,820
So, this is where problems can arise.
446
00:19:35,820 --> 00:19:37,159
So, let's go back to this
447
00:19:37,159 --> 00:19:38,749
geographic-level predictive policing.
448
00:19:38,749 --> 00:19:42,460
So the danger here is that, unless this system
449
00:19:42,460 --> 00:19:44,299
is very carefully constructed,
450
00:19:44,299 --> 00:19:47,090
this sort of crime area ranking might
451
00:19:47,090 --> 00:19:49,019
again become a self-fulling prophecy.
452
00:19:49,019 --> 00:19:51,460
If you send police officers to these areas,
453
00:19:51,460 --> 00:19:53,220
you further scrutinize them,
454
00:19:53,220 --> 00:19:55,659
and then again you're only detecting a subset
455
00:19:55,659 --> 00:19:57,979
of crimes, and the cycle continues.
456
00:19:57,979 --> 00:20:02,139
So, one obvious issue is that
457
00:20:02,139 --> 00:20:07,599
this statement about geographic-based
crime prediction
458
00:20:07,599 --> 00:20:10,229
being anonymous is not true,
459
00:20:10,229 --> 00:20:13,159
because race and location are very strongly
460
00:20:13,159 --> 00:20:14,840
correlated in the U.S.
461
00:20:14,840 --> 00:20:16,609
And this is something that machine-learning
systems
462
00:20:16,609 --> 00:20:20,049
can potentially learn.
463
00:20:20,049 --> 00:20:23,039
Another issue is that, for example,
464
00:20:23,039 --> 00:20:25,580
for individual fairness, one of my homes
465
00:20:25,580 --> 00:20:27,599
sits within one of these boxes.
466
00:20:27,599 --> 00:20:29,950
Some of these boxes
in these systems are very small,
467
00:20:29,950 --> 00:20:33,399
for example PredPol is 500ft x 500ft,
468
00:20:33,399 --> 00:20:36,349
so it's maybe only a few houses.
469
00:20:36,349 --> 00:20:39,149
So, the implications of this system are that
470
00:20:39,149 --> 00:20:40,849
you have police officers maybe sitting
471
00:20:40,849 --> 00:20:42,979
in a police cruiser outside your home
472
00:20:42,979 --> 00:20:45,450
and a few doors down someone
473
00:20:45,450 --> 00:20:46,799
may not be within that box,
474
00:20:46,799 --> 00:20:48,159
and doesn't have this.
475
00:20:48,159 --> 00:20:51,399
So, that may represent unfairness.
476
00:20:51,399 --> 00:20:54,929
So, there are real questions here,
477
00:20:54,929 --> 00:20:57,720
especially because there's no opt-out.
478
00:20:57,720 --> 00:21:00,059
There's no way to opt-out of this system:
479
00:21:00,059 --> 00:21:02,239
if you live in a city that has this,
480
00:21:02,239 --> 00:21:04,909
then you have to deal with it.
481
00:21:04,909 --> 00:21:07,229
So, it's quite difficult to find out
482
00:21:07,229 --> 00:21:09,879
what's really going on
483
00:21:09,879 --> 00:21:11,169
because the algorithm is secret.
484
00:21:11,169 --> 00:21:13,049
And, in most cases, we don't know
485
00:21:13,049 --> 00:21:14,789
the full details of the inputs.
486
00:21:14,789 --> 00:21:16,679
We have some idea
about what features are used,
487
00:21:16,679 --> 00:21:17,970
but that's about it.
488
00:21:17,970 --> 00:21:19,509
We also don't know the output.
489
00:21:19,509 --> 00:21:21,899
That would be knowing police allocation,
490
00:21:21,899 --> 00:21:23,179
police strategies,
491
00:21:23,179 --> 00:21:26,299
and in order to nail down
what's really going on here
492
00:21:26,299 --> 00:21:28,609
in order to verify the validity of
493
00:21:28,609 --> 00:21:30,009
these companies' claims,
494
00:21:30,009 --> 00:21:33,799
it may be necessary
to have a 3rd party come in,
495
00:21:33,799 --> 00:21:35,629
examine the inputs and outputs of the system,
496
00:21:35,629 --> 00:21:37,590
and say concretely what's going on.
497
00:21:37,590 --> 00:21:39,460
And if everything is fine and dandy
498
00:21:39,460 --> 00:21:40,929
then this shouldn't be a problem.
499
00:21:40,929 --> 00:21:43,619
So, that's potentially one role that
500
00:21:43,619 --> 00:21:44,769
advocates can play.
501
00:21:44,769 --> 00:21:46,720
Maybe we should start pushing for audits
502
00:21:46,720 --> 00:21:48,820
of systems that are used in this way.
503
00:21:48,820 --> 00:21:50,970
These could have serious implications
504
00:21:50,970 --> 00:21:52,679
for peoples' lives.
505
00:21:52,679 --> 00:21:55,249
So, we'll return
to this idea a little bit later,
506
00:21:55,249 --> 00:21:58,210
but for now this leads us
nicely to Transparency.
507
00:21:58,210 --> 00:21:59,419
So, we wanna know
508
00:21:59,419 --> 00:22:01,929
what these systems are doing.
509
00:22:01,929 --> 00:22:04,729
But it's very hard,
for the reasons described earlier,
510
00:22:04,729 --> 00:22:06,139
but even in the case of something like
511
00:22:06,139 --> 00:22:09,849
trying to understand Google's search algorithm,
512
00:22:09,849 --> 00:22:11,679
it's difficult because it's personalized.
513
00:22:11,679 --> 00:22:13,529
So, by construction, each user is
514
00:22:13,529 --> 00:22:15,320
only seeing one endpoint.
515
00:22:15,320 --> 00:22:18,169
So, it's a very isolating system.
516
00:22:18,169 --> 00:22:20,349
What do other people see?
517
00:22:20,349 --> 00:22:22,409
And one reason it's difficult to make
518
00:22:22,409 --> 00:22:24,099
some of these systems transparent
519
00:22:24,099 --> 00:22:26,679
is because of, simply, the complexity
520
00:22:26,679 --> 00:22:27,950
of the algorithms.
521
00:22:27,950 --> 00:22:30,309
So, an algorithm can become so complex that
522
00:22:30,309 --> 00:22:31,669
it's difficult to comprehend,
523
00:22:31,669 --> 00:22:33,289
even for the designer of the system,
524
00:22:33,289 --> 00:22:35,509
or the implementer of the system.
525
00:22:35,509 --> 00:22:38,419
The designed might know that this algorithm
526
00:22:38,419 --> 00:22:42,889
maximizes some metric-- say, accuracy,
527
00:22:42,889 --> 00:22:44,570
but they may not always have a solid
528
00:22:44,570 --> 00:22:46,779
understanding of what the algorithm is doing
529
00:22:46,779 --> 00:22:48,330
for all inputs.
530
00:22:48,330 --> 00:22:50,970
Certainly with respect to fairness.
531
00:22:50,970 --> 00:22:55,759
So, in some cases,
it might not be appropriate to use
532
00:22:55,759 --> 00:22:57,379
an extremely complex model.
533
00:22:57,379 --> 00:22:59,529
It might be better to use a simpler system
534
00:22:59,529 --> 00:23:02,910
with human-interpretable features.
535
00:23:02,910 --> 00:23:04,749
Another issue that arises
536
00:23:04,749 --> 00:23:07,559
from the opacity of these systems
537
00:23:07,559 --> 00:23:09,409
and the centralized control
538
00:23:09,409 --> 00:23:11,860
is that it makes them very influential.
539
00:23:11,860 --> 00:23:13,950
And thus, an excellent target
540
00:23:13,950 --> 00:23:16,210
for manipulation or tampering.
541
00:23:16,210 --> 00:23:18,479
So, this might be tampering that is done
542
00:23:18,479 --> 00:23:21,950
from an organization that controls the system,
543
00:23:21,950 --> 00:23:23,769
or an insider at one of the organizations,
544
00:23:23,769 --> 00:23:27,139
or anyone who's able to compromise their security.
545
00:23:27,139 --> 00:23:30,249
So, this is an interesting academic work
546
00:23:30,249 --> 00:23:32,099
that looked at the possibility of
547
00:23:32,099 --> 00:23:34,159
slightly modifying search rankings
548
00:23:34,159 --> 00:23:36,619
to shift people's political views.
549
00:23:36,619 --> 00:23:39,009
So, since people are most likely to
550
00:23:39,009 --> 00:23:41,330
click on the top search results,
551
00:23:41,330 --> 00:23:44,429
so 90% of clicks go to the
first page of search results,
552
00:23:44,429 --> 00:23:46,719
then perhaps by reshuffling
things a little bit,
553
00:23:46,719 --> 00:23:48,729
or maybe dropping some search results,
554
00:23:48,729 --> 00:23:50,269
you can influence people's views
555
00:23:50,269 --> 00:23:51,679
in a coherent way,
556
00:23:51,679 --> 00:23:53,090
and maybe you can make it so subtle
557
00:23:53,090 --> 00:23:55,749
that no one is able to notice.
558
00:23:55,749 --> 00:23:57,249
So in this academic study,
559
00:23:57,249 --> 00:24:00,349
they did an experiment
560
00:24:00,349 --> 00:24:02,070
in the 2014 Indian election.
561
00:24:02,070 --> 00:24:04,219
So they used real voters,
562
00:24:04,219 --> 00:24:06,450
and they kept the size
of the experiment small enough
563
00:24:06,450 --> 00:24:08,190
that it was not going to influence the outcome
564
00:24:08,190 --> 00:24:10,090
of the election.
565
00:24:10,090 --> 00:24:12,139
So the researchers took people,
566
00:24:12,139 --> 00:24:14,229
they determined their political leaning,
567
00:24:14,229 --> 00:24:17,429
and they segmented them into
control and treatment groups,
568
00:24:17,429 --> 00:24:19,269
where the treatment was manipulation
569
00:24:19,269 --> 00:24:21,210
of the search ranking results,
570
00:24:21,210 --> 00:24:24,409
And then they had these people
browse the web.
571
00:24:24,409 --> 00:24:25,969
And what they found, is that
572
00:24:25,969 --> 00:24:28,229
this mechanism is very effective at shifting
573
00:24:28,229 --> 00:24:30,429
people's voter preferences.
574
00:24:30,429 --> 00:24:33,649
So, in this study, they were able to introduce
575
00:24:33,649 --> 00:24:36,849
a 20% shift in voter preferences.
576
00:24:36,849 --> 00:24:39,299
Even alerting users to the fact that this
577
00:24:39,299 --> 00:24:41,729
was going to be done, telling them
578
00:24:41,729 --> 00:24:44,049
"we are going to manipulate your search results,"
579
00:24:44,049 --> 00:24:45,729
"really pay attention,"
580
00:24:45,729 --> 00:24:49,099
they were totally unable to decrease
581
00:24:49,099 --> 00:24:50,859
the magnitude of the effect.
582
00:24:50,859 --> 00:24:55,109
So, the margins of error in many elections
583
00:24:55,109 --> 00:24:57,669
is incredibly small,
584
00:24:57,669 --> 00:24:59,929
and the authors estimate that this shift
585
00:24:59,929 --> 00:25:02,009
could change the outcome of about
586
00:25:02,009 --> 00:25:07,109
25% of elections worldwide, if this were done.
587
00:25:07,109 --> 00:25:10,919
And the bias is so small that no one can tell.
588
00:25:10,919 --> 00:25:14,279
So, all humans, no matter how smart
589
00:25:14,279 --> 00:25:17,109
and resistant to manipulation
we think we are,
590
00:25:17,109 --> 00:25:21,909
all of us are subject to this sort of manipulation,
591
00:25:21,909 --> 00:25:24,320
and we really can't tell.
592
00:25:24,320 --> 00:25:27,129
So, I'm not saying that this is occurring,
593
00:25:27,129 --> 00:25:31,389
but right now there is no
regulation to stop this,
594
00:25:31,389 --> 00:25:34,409
there is no way we could reliably detect this,
595
00:25:34,409 --> 00:25:37,210
so there's a huge amount of power here.
596
00:25:37,210 --> 00:25:39,779
So, something to think about.
597
00:25:39,779 --> 00:25:42,710
But it's not only corporations that are interested
598
00:25:42,710 --> 00:25:47,269
in this sort of behavioral manipulation.
599
00:25:47,269 --> 00:25:51,119
In 2010, UK Prime Minister David Cameron
600
00:25:51,119 --> 00:25:54,969
created this UK Behavioural Insights Team,
601
00:25:54,969 --> 00:25:57,269
which is informally called the Nudge Unit.
602
00:25:57,269 --> 00:26:01,489
And so what they do is
they use behavioral science
603
00:26:01,489 --> 00:26:04,769
and this predictive analytics approach,
604
00:26:04,769 --> 00:26:06,119
with experimentation,
605
00:26:06,119 --> 00:26:07,940
to have people make better decisions
606
00:26:07,940 --> 00:26:09,690
for themselves and society--
607
00:26:09,690 --> 00:26:11,989
as determined by the UK government.
608
00:26:11,989 --> 00:26:14,269
And as of a few months ago,
609
00:26:14,269 --> 00:26:16,849
after an executive order signed by Obama
610
00:26:16,849 --> 00:26:19,349
in September, the United States now has
611
00:26:19,349 --> 00:26:21,429
its own Nudge Unit.
612
00:26:21,429 --> 00:26:24,009
So, to be clear, I don't think that this is
613
00:26:24,009 --> 00:26:25,539
some sort of malicious plot.
614
00:26:25,539 --> 00:26:27,440
I think that there can be huge value
615
00:26:27,440 --> 00:26:29,489
in these sorts of initiatives,
616
00:26:29,489 --> 00:26:31,330
positively impacting people's lives,
617
00:26:31,330 --> 00:26:34,179
but when this sort of behavioral manipulation
618
00:26:34,179 --> 00:26:37,289
is being done, in part openly,
619
00:26:37,289 --> 00:26:39,460
oversight is pretty important,
620
00:26:39,460 --> 00:26:41,700
and we really need to consider
621
00:26:41,700 --> 00:26:46,090
what these systems are optimizing for.
622
00:26:46,090 --> 00:26:47,849
And that's something that we might
623
00:26:47,849 --> 00:26:52,090
not always know, or at least understand,
624
00:26:52,090 --> 00:26:54,450
so for example, for industry,
625
00:26:54,450 --> 00:26:57,679
we do have a pretty good understanding there:
626
00:26:57,679 --> 00:26:59,809
industry cares about optimizing for
627
00:26:59,809 --> 00:27:01,960
the time spent on the website,
628
00:27:01,960 --> 00:27:04,929
Facebook wants you to spend more time on Facebook,
629
00:27:04,929 --> 00:27:06,950
they want you to click on ads,
630
00:27:06,950 --> 00:27:09,109
click on newsfeed items,
631
00:27:09,109 --> 00:27:11,299
they want you to like things.
632
00:27:11,299 --> 00:27:14,309
And, fundamentally: profit.
633
00:27:14,309 --> 00:27:17,599
So, already this has some serious implications,
634
00:27:17,599 --> 00:27:19,690
and this had pretty serious implications
635
00:27:19,690 --> 00:27:22,190
in the last 10 years, in media for example.
636
00:27:22,190 --> 00:27:25,119
The optimizing for click-through rate in journalism
637
00:27:25,119 --> 00:27:26,629
has produced a race to the bottom
638
00:27:26,629 --> 00:27:28,039
in terms of quality.
639
00:27:28,039 --> 00:27:30,919
And another issue is that optimizing
640
00:27:30,919 --> 00:27:34,589
for what people like might not always be
641
00:27:34,589 --> 00:27:35,839
the best approach.
642
00:27:35,839 --> 00:27:38,859
So, Facebook officials have said publicly
643
00:27:38,859 --> 00:27:41,279
about how Facebook's goal is to make you happy,
644
00:27:41,279 --> 00:27:43,149
they want you to open that newsfeed
645
00:27:43,149 --> 00:27:45,080
and just feel great.
646
00:27:45,080 --> 00:27:47,379
But, there's an issue there, right?
647
00:27:47,379 --> 00:27:50,169
Because people get their news,
648
00:27:50,169 --> 00:27:52,369
like 40% of people according to Pew Research,
649
00:27:52,369 --> 00:27:54,599
get their news from Facebook.
650
00:27:54,599 --> 00:27:58,460
So, if people don't want to see
651
00:27:58,460 --> 00:28:01,239
war and corpses,
because it makes them feel sad,
652
00:28:01,239 --> 00:28:04,179
so this is not a system that is gonna optimize
653
00:28:04,179 --> 00:28:07,149
for an informed population.
654
00:28:07,149 --> 00:28:09,359
It's not gonna produce a population that is
655
00:28:09,359 --> 00:28:11,469
ready to engage in civic life.
656
00:28:11,469 --> 00:28:13,059
It's gonna produce an amused populations
657
00:28:13,059 --> 00:28:16,809
whose time is occupied by cat pictures.
658
00:28:16,809 --> 00:28:19,159
So, in politics, we have a similar
659
00:28:19,159 --> 00:28:21,269
optimization problem that's occurring.
660
00:28:21,269 --> 00:28:23,769
So, these political campaigns that use
661
00:28:23,769 --> 00:28:26,769
these predictive systems,
662
00:28:26,769 --> 00:28:28,669
are optimizing for votes for the desired candidate,
663
00:28:28,669 --> 00:28:30,200
of course.
664
00:28:30,200 --> 00:28:33,499
So, instead of a political campaign being
665
00:28:33,499 --> 00:28:36,139
--well, maybe this is a naive view, but--
666
00:28:36,139 --> 00:28:38,070
being an open discussion of the issues
667
00:28:38,070 --> 00:28:39,830
facing the country,
668
00:28:39,830 --> 00:28:43,200
it becomes this micro-targeted
persuasion game,
669
00:28:43,200 --> 00:28:44,669
and the people that get targeted
670
00:28:44,669 --> 00:28:47,349
are a very small subset of all people,
671
00:28:47,349 --> 00:28:49,399
and it's only gonna be people that are
672
00:28:49,399 --> 00:28:51,409
you know, on the edge, maybe disinterested,
673
00:28:51,409 --> 00:28:54,399
those are the people that are gonna get attention
674
00:28:54,399 --> 00:28:58,839
from political candidates.
675
00:28:58,839 --> 00:29:01,869
In policy, as with these Nudge Units,
676
00:29:01,869 --> 00:29:03,539
they're being used to enable
677
00:29:03,539 --> 00:29:06,109
better use of government services.
678
00:29:06,109 --> 00:29:07,419
There are some good projects that have
679
00:29:07,419 --> 00:29:09,419
come out of this:
680
00:29:09,419 --> 00:29:11,409
increasing voter registration,
681
00:29:11,409 --> 00:29:12,739
improving health outcomes,
682
00:29:12,739 --> 00:29:14,419
improving education outcomes.
683
00:29:14,419 --> 00:29:16,419
But some of these predictive systems
684
00:29:16,419 --> 00:29:18,229
that we're starting to see in government
685
00:29:18,229 --> 00:29:20,700
are optimizing for compliance,
686
00:29:20,700 --> 00:29:23,669
as is the case with predictive policing.
687
00:29:23,669 --> 00:29:25,460
So this is something that we need to
688
00:29:25,460 --> 00:29:28,649
watch carefully.
689
00:29:28,649 --> 00:29:30,119
I think this is a nice quote that
690
00:29:30,119 --> 00:29:33,339
sort of describes the problem.
691
00:29:33,339 --> 00:29:35,200
In some ways me might be narrowing
692
00:29:35,200 --> 00:29:38,259
our horizon, and the danger is that
693
00:29:38,259 --> 00:29:41,989
these tools are separating people.
694
00:29:41,989 --> 00:29:43,570
And this is particularly bad
695
00:29:43,570 --> 00:29:45,940
for political action, because political action
696
00:29:45,940 --> 00:29:49,879
requires people to have shared experience,
697
00:29:49,879 --> 00:29:53,799
and thus are able to collectively act
698
00:29:53,799 --> 00:29:57,629
to exert pressure to fix problems.
699
00:29:57,629 --> 00:30:00,810
So, finally: accountability.
700
00:30:00,810 --> 00:30:03,399
So, we need some oversight mechanisms.
701
00:30:03,399 --> 00:30:06,519
For example, in the case of errors--
702
00:30:06,519 --> 00:30:08,219
so this is particularly important for
703
00:30:08,219 --> 00:30:10,849
civil or bureaucratic systems.
704
00:30:10,849 --> 00:30:14,330
So, when an algorithm produces some decision,
705
00:30:14,330 --> 00:30:16,549
we don't always want humans to just
706
00:30:16,549 --> 00:30:18,039
defer to the machine,
707
00:30:18,039 --> 00:30:21,859
and that might represent one of the problems.
708
00:30:21,859 --> 00:30:25,419
So, there are starting to be some cases
709
00:30:25,419 --> 00:30:28,039
of computer algorithms yielding a decision,
710
00:30:28,039 --> 00:30:30,409
and then humans being unable to correct
711
00:30:30,409 --> 00:30:31,799
an obvious error.
712
00:30:31,799 --> 00:30:35,190
So there's this case in Georgia,
in the United States,
713
00:30:35,190 --> 00:30:37,259
where 2 young people went to
714
00:30:37,259 --> 00:30:38,529
the Department of Motor Vehicles,
715
00:30:38,529 --> 00:30:39,749
they're twins, and they went
716
00:30:39,749 --> 00:30:42,099
to get their driver's license.
717
00:30:42,099 --> 00:30:44,979
However, they were both flagged by
718
00:30:44,979 --> 00:30:47,489
a fraud algorithm that uses facial recognition
719
00:30:47,489 --> 00:30:48,809
to look for similar faces,
720
00:30:48,809 --> 00:30:50,919
and I guess the people that designed the system
721
00:30:50,919 --> 00:30:54,549
didn't think of the possibility of twins.
722
00:30:54,549 --> 00:30:58,489
Yeah.
So, they just left
723
00:30:58,489 --> 00:30:59,889
without their driver's licenses.
724
00:30:59,889 --> 00:31:01,889
The people in the Department of Motor Vehicles
725
00:31:01,889 --> 00:31:03,809
were unable to correct this.
726
00:31:03,809 --> 00:31:06,820
So, this is one implication--
727
00:31:06,820 --> 00:31:08,579
it's like something out of Kafka.
728
00:31:08,579 --> 00:31:11,529
But there are also cases of errors being made,
729
00:31:11,529 --> 00:31:13,879
and people not noticing until
730
00:31:13,879 --> 00:31:15,909
after actions have been taken,
731
00:31:15,909 --> 00:31:17,570
some of them very serious--
732
00:31:17,570 --> 00:31:19,129
because people simply deferred
733
00:31:19,129 --> 00:31:20,619
to the machine.
734
00:31:20,619 --> 00:31:23,309
So, this is an example from San Francisco.
735
00:31:23,309 --> 00:31:26,679
So, an ALPR-- an Automated License Plate Reader--
736
00:31:26,679 --> 00:31:29,429
is a device that uses image recognition
737
00:31:29,429 --> 00:31:32,099
to detect and read license plates,
738
00:31:32,099 --> 00:31:34,339
and usually to compare license plates
739
00:31:34,339 --> 00:31:37,159
with a known list of plates of interest.
740
00:31:37,159 --> 00:31:39,799
And, so, San Francisco uses these
741
00:31:39,799 --> 00:31:42,179
and they're mounted on police cars.
742
00:31:42,179 --> 00:31:46,659
So, in this case, San Francisco ALPR
743
00:31:46,659 --> 00:31:48,879
got a hit on a car,
744
00:31:48,879 --> 00:31:53,029
and it was the car of a 47-year-old woman,
745
00:31:53,029 --> 00:31:54,839
with no criminal history.
746
00:31:54,839 --> 00:31:56,029
And so it was a false hit
747
00:31:56,029 --> 00:31:58,099
because it was a blurry image,
748
00:31:58,099 --> 00:31:59,709
and it matched erroneously with
749
00:31:59,709 --> 00:32:00,909
one of the plates of interest
750
00:32:00,909 --> 00:32:03,479
that happened to be a stolen vehicle.
751
00:32:03,479 --> 00:32:06,869
So, they conducted a traffic stop on her,
752
00:32:06,869 --> 00:32:09,330
and they take her out of the vehicle,
753
00:32:09,330 --> 00:32:11,049
they search her and the vehicle,
754
00:32:11,049 --> 00:32:12,659
she gets a pat-down,
755
00:32:12,659 --> 00:32:14,849
and they have her kneel
756
00:32:14,849 --> 00:32:17,780
at gunpoint, in the street.
757
00:32:17,780 --> 00:32:20,989
So, how much oversight should be present
758
00:32:20,989 --> 00:32:23,999
depends on the implications of the system.
759
00:32:23,999 --> 00:32:25,279
It's certainly the case that
760
00:32:25,279 --> 00:32:26,910
for some of these decision-making systems,
761
00:32:26,910 --> 00:32:29,219
an error might not be that important,
762
00:32:29,219 --> 00:32:31,149
it could be relatively harmless,
763
00:32:31,149 --> 00:32:33,559
but in this case,
an error in this algorithmic decision
764
00:32:33,559 --> 00:32:36,259
led to this totally innocent person
765
00:32:36,259 --> 00:32:40,019
literally having a gun pointed at her.
766
00:32:40,019 --> 00:32:44,019
So, that brings us to: we need some way of
767
00:32:44,019 --> 00:32:45,419
getting some information about
768
00:32:45,419 --> 00:32:47,249
what is going on here.
769
00:32:47,249 --> 00:32:50,179
We don't wanna have to wait for these events
770
00:32:50,179 --> 00:32:52,580
before we are able to determine
771
00:32:52,580 --> 00:32:54,409
some information about the system.
772
00:32:54,409 --> 00:32:56,139
So, auditing is one option:
773
00:32:56,139 --> 00:32:58,109
to independently verify the statements
774
00:32:58,109 --> 00:33:00,809
of companies, in situations where we have
775
00:33:00,809 --> 00:33:02,939
inputs and outputs.
776
00:33:02,939 --> 00:33:05,200
So, for example, this could be done with
777
00:33:05,200 --> 00:33:07,489
Google, Facebook.
778
00:33:07,489 --> 00:33:09,190
If you have the inputs of a system,
779
00:33:09,190 --> 00:33:10,649
say you have test accounts,
780
00:33:10,649 --> 00:33:11,729
or real accounts,
781
00:33:11,729 --> 00:33:14,359
maybe you can collect
people's information together.
782
00:33:14,359 --> 00:33:15,830
So that was something that was done
783
00:33:15,830 --> 00:33:18,759
during the 2012 Obama campaign
784
00:33:18,759 --> 00:33:20,249
by ProPublica.
785
00:33:20,249 --> 00:33:21,269
People noticed that they were getting
786
00:33:21,269 --> 00:33:24,739
different emails from the Obama campaign,
787
00:33:24,739 --> 00:33:26,009
and were interested to see
788
00:33:26,009 --> 00:33:28,209
based on what factors
789
00:33:28,209 --> 00:33:29,749
the emails were changing.
790
00:33:29,749 --> 00:33:32,659
So, I think about 200 people submitted emails
791
00:33:32,659 --> 00:33:34,940
and they were able to determine some information
792
00:33:34,940 --> 00:33:38,809
about what the emails
were being varied based on.
793
00:33:38,809 --> 00:33:40,859
So there have been some successful
794
00:33:40,859 --> 00:33:43,080
attempts at this.
795
00:33:43,080 --> 00:33:45,919
So, compare inputs and then look at
796
00:33:45,919 --> 00:33:48,709
why one item was shown to one user
797
00:33:48,709 --> 00:33:50,289
and not another, and see if there's
798
00:33:50,289 --> 00:33:51,879
any statistical differences.
799
00:33:51,879 --> 00:33:56,279
So, there's some potential legal issues
800
00:33:56,279 --> 00:33:57,749
with the test accounts, so that's something
801
00:33:57,749 --> 00:34:01,499
to think about-- I'm not a lawyer.
802
00:34:01,499 --> 00:34:03,919
So, for example, if you wanna examine
803
00:34:03,919 --> 00:34:06,269
ad-targeting algorithms,
804
00:34:06,269 --> 00:34:07,969
one way to proceed is to construct
805
00:34:07,969 --> 00:34:10,589
a browsing profile, and then examine
806
00:34:10,589 --> 00:34:12,989
what ads are served back to you.
807
00:34:12,989 --> 00:34:14,119
And so this is something that
808
00:34:14,119 --> 00:34:16,250
academic researchers have looked at,
809
00:34:16,250 --> 00:34:17,489
because, at the time at least,
810
00:34:17,489 --> 00:34:20,879
you didn't need to make an account to do this.
811
00:34:20,879 --> 00:34:24,768
So, this was a study that was presented at
812
00:34:24,768 --> 00:34:27,799
Privacy Enhancing Technologies last year,
813
00:34:27,799 --> 00:34:31,149
and in this study, the researchers
814
00:34:31,149 --> 00:34:33,179
generate some browsing profiles
815
00:34:33,179 --> 00:34:35,909
that differ only by one characteristic,
816
00:34:35,909 --> 00:34:37,690
so they're basically identical in every way
817
00:34:37,690 --> 00:34:39,049
except for one thing.
818
00:34:39,049 --> 00:34:42,359
And that is denoted by Treatment 1 and 2.
819
00:34:42,359 --> 00:34:44,460
So this is a randomized, controlled trial,
820
00:34:44,460 --> 00:34:46,389
but I left out the randomization part
821
00:34:46,389 --> 00:34:48,220
for simplicity.
822
00:34:48,220 --> 00:34:54,799
So, in one study,
they applied a treatment of gender.
823
00:34:54,799 --> 00:34:56,799
So, they had the browsing profiles
824
00:34:56,799 --> 00:34:59,319
in Treatment 1 be male browsing profiles,
825
00:34:59,319 --> 00:35:02,029
and the browsing profiles in Treatment 2
be female.
826
00:35:02,029 --> 00:35:04,430
And they wanted to see: is there any difference
827
00:35:04,430 --> 00:35:06,079
in the way that ads are targeted
828
00:35:06,079 --> 00:35:08,710
if browsing profiles are effectively identical
829
00:35:08,710 --> 00:35:11,019
except for gender?
830
00:35:11,019 --> 00:35:14,710
So, it turns out that there was.
831
00:35:14,710 --> 00:35:19,180
So, a 3rd-party site was showing Google ads
832
00:35:19,180 --> 00:35:21,289
for senior executive positions
833
00:35:21,289 --> 00:35:23,980
at a rate 6 times higher to the fake men
834
00:35:23,980 --> 00:35:27,059
than for the fake women in this study.
835
00:35:27,059 --> 00:35:30,109
So, this sort of auditing is not going to
836
00:35:30,109 --> 00:35:32,779
be able to determine everything
837
00:35:32,779 --> 00:35:34,930
that algorithms are doing, but they can
838
00:35:34,930 --> 00:35:36,519
sometimes uncover interesting,
839
00:35:36,519 --> 00:35:40,900
at least statistical differences.
840
00:35:40,900 --> 00:35:47,099
So, this leads us to the fundamental issue:
841
00:35:47,099 --> 00:35:49,180
Right now, we're really not in control
842
00:35:49,180 --> 00:35:50,510
of some of these systems,
843
00:35:50,510 --> 00:35:54,480
and we really need these predictive systems
844
00:35:54,480 --> 00:35:56,119
to be controlled by us,
845
00:35:56,119 --> 00:35:57,819
in order for them not to be used
846
00:35:57,819 --> 00:36:00,109
as a system of control.
847
00:36:00,109 --> 00:36:03,220
So there are some technologies that I'd like
848
00:36:03,220 --> 00:36:06,890
to point you all to.
849
00:36:06,890 --> 00:36:08,319
We need tools in the digital commons
850
00:36:08,319 --> 00:36:11,160
that can help address some of these concerns.
851
00:36:11,160 --> 00:36:13,349
So, the first thing is that of course
852
00:36:13,349 --> 00:36:14,730
we known that minimizing the amount of
853
00:36:14,730 --> 00:36:17,069
data available can help in some contexts,
854
00:36:17,069 --> 00:36:18,980
which we can do by making systems
855
00:36:18,980 --> 00:36:22,779
that are private by design, and by default.
856
00:36:22,779 --> 00:36:24,549
Another thing is that these audit tools
857
00:36:24,549 --> 00:36:25,890
might be useful.
858
00:36:25,890 --> 00:36:30,720
And, so, these 2 nice examples in academia...
859
00:36:30,720 --> 00:36:34,359
the ad experiment that I just showed was done
860
00:36:34,359 --> 00:36:36,120
using AdFisher.
861
00:36:36,120 --> 00:36:38,200
So, these are 2 toolkits that you can use
862
00:36:38,200 --> 00:36:41,440
to start doing this sort of auditing.
863
00:36:41,440 --> 00:36:44,579
Another technology that is generally useful,
864
00:36:44,579 --> 00:36:46,700
but particularly in the case of prediction
865
00:36:46,700 --> 00:36:48,789
it's useful to maintain access to
866
00:36:48,789 --> 00:36:50,289
as many sites as possible,
867
00:36:50,289 --> 00:36:52,589
through anonymity systems like Tor,
868
00:36:52,589 --> 00:36:54,319
because it's impossible to personalize
869
00:36:54,319 --> 00:36:55,650
when everyone looks the same.
870
00:36:55,650 --> 00:36:59,130
So this is a very important technology.
871
00:36:59,130 --> 00:37:01,519
Something that doesn't really exist,
872
00:37:01,519 --> 00:37:03,630
but that I think is pretty important,
873
00:37:03,630 --> 00:37:05,829
is having some tool to view the landscape.
874
00:37:05,829 --> 00:37:08,160
So, as we know from these few studies
875
00:37:08,160 --> 00:37:10,440
that have been done,
876
00:37:10,440 --> 00:37:12,059
different people are not seeing the internet
877
00:37:12,059 --> 00:37:12,950
in the same way.
878
00:37:12,950 --> 00:37:15,730
This is one reason why we don't like censorship.
879
00:37:15,730 --> 00:37:17,880
But, rich and poor people,
880
00:37:17,880 --> 00:37:19,659
from academic research we know that
881
00:37:19,659 --> 00:37:23,790
there is widespread price discrimination
on the internet,
882
00:37:23,790 --> 00:37:25,650
so rich and poor people see a different view
883
00:37:25,650 --> 00:37:26,970
of the Internet,
884
00:37:26,970 --> 00:37:28,400
men and women see a different view
885
00:37:28,400 --> 00:37:29,940
of the Internet.
886
00:37:29,940 --> 00:37:31,200
We wanna know how different people
887
00:37:31,200 --> 00:37:32,450
see the same site,
888
00:37:32,450 --> 00:37:34,329
and this could be the beginning of
889
00:37:34,329 --> 00:37:36,329
a defense system for this sort of
890
00:37:36,329 --> 00:37:41,730
manipulation/tampering that I showed earlier.
891
00:37:41,730 --> 00:37:45,549
Another interesting approach is obfuscation:
892
00:37:45,549 --> 00:37:46,980
injecting noise into the system.
893
00:37:46,980 --> 00:37:49,190
So there's an interesting browser extension
894
00:37:49,190 --> 00:37:51,720
called Adnauseum, that's for Firefox,
895
00:37:51,720 --> 00:37:54,579
which clicks on every single ad you're served,
896
00:37:54,579 --> 00:37:55,680
to inject noise.
897
00:37:55,680 --> 00:37:57,019
So that's, I think, an interesting approach
898
00:37:57,019 --> 00:38:00,170
that people haven't looked at too much.
899
00:38:00,170 --> 00:38:03,780
So in terms of policy,
900
00:38:03,780 --> 00:38:06,530
Facebook and Google, these internet giants,
901
00:38:06,530 --> 00:38:08,829
have billions of users,
902
00:38:08,829 --> 00:38:12,220
and sometimes they like to call themselves
903
00:38:12,220 --> 00:38:13,769
new public utilities,
904
00:38:13,769 --> 00:38:15,000
and if that's the case then
905
00:38:15,000 --> 00:38:17,549
it might be necessary to subject them
906
00:38:17,549 --> 00:38:20,539
to additional regulation.
907
00:38:20,539 --> 00:38:21,990
Another problem that's come up,
908
00:38:21,990 --> 00:38:23,539
for example with some of the studies
909
00:38:23,539 --> 00:38:24,900
that Facebook has done,
910
00:38:24,900 --> 00:38:29,039
is sometimes a lack of ethics review.
911
00:38:29,039 --> 00:38:31,059
So, for example, in academia,
912
00:38:31,059 --> 00:38:33,859
if you're gonna do research involving humans,
913
00:38:33,859 --> 00:38:35,390
there's an Institutional Review Board
914
00:38:35,390 --> 00:38:36,970
that you go to that verifies that
915
00:38:36,970 --> 00:38:39,140
you're doing things in an ethical manner.
916
00:38:39,140 --> 00:38:40,910
And some companies do have internal
917
00:38:40,910 --> 00:38:43,029
review processes like this, but it might
918
00:38:43,029 --> 00:38:45,119
be important to have an independent
919
00:38:45,119 --> 00:38:48,200
ethics board that does this sort of thing.
920
00:38:48,200 --> 00:38:50,849
And we really need 3rd-party auditing.
921
00:38:50,849 --> 00:38:54,519
So, for example, some companies
922
00:38:54,519 --> 00:38:56,220
don't want auditing to be done
923
00:38:56,220 --> 00:38:59,190
because of IP concerns,
924
00:38:59,190 --> 00:39:00,579
and if that's the concern
925
00:39:00,579 --> 00:39:03,180
maybe having a set of people
926
00:39:03,180 --> 00:39:05,680
that are not paid by the company
927
00:39:05,680 --> 00:39:07,200
to check how some of these systems
928
00:39:07,200 --> 00:39:08,640
are being implemented,
929
00:39:08,640 --> 00:39:11,240
could help give us confidence that
930
00:39:11,240 --> 00:39:16,979
things are being done in a reasonable way.
931
00:39:16,979 --> 00:39:20,269
So, in closing,
932
00:39:20,269 --> 00:39:23,180
algorithmic decision making is here,
933
00:39:23,180 --> 00:39:26,140
and it's barreling forward
at a very fast rate,
934
00:39:26,140 --> 00:39:27,890
and we need to figure out what
935
00:39:27,890 --> 00:39:30,410
the guide rails should be,
936
00:39:30,410 --> 00:39:31,380
and how to install them
937
00:39:31,380 --> 00:39:33,119
to handle some of the potential threats.
938
00:39:33,119 --> 00:39:35,470
There's a huge amount of power here.
939
00:39:35,470 --> 00:39:37,910
We need more openness in these systems.
940
00:39:37,910 --> 00:39:39,589
And, right now,
941
00:39:39,589 --> 00:39:41,559
with the intelligent systems that do exist,
942
00:39:41,559 --> 00:39:43,920
we don't know what's occurring really,
943
00:39:43,920 --> 00:39:46,510
and we need to watch carefully
944
00:39:46,510 --> 00:39:49,099
where and how these systems are being used.
945
00:39:49,099 --> 00:39:50,690
And I think this community has
946
00:39:50,690 --> 00:39:53,940
an important role to play in this fight,
947
00:39:53,940 --> 00:39:55,730
to study what's being done,
948
00:39:55,730 --> 00:39:57,160
to show people what's being done,
949
00:39:57,160 --> 00:39:58,670
to raise the debate and advocate,
950
00:39:58,670 --> 00:40:01,200
and, where necessary, to resist.
951
00:40:01,200 --> 00:40:03,339
Thanks.
952
00:40:03,339 --> 00:40:13,129
applause
953
00:40:13,129 --> 00:40:17,519
Herald: So, let's have a question and answer.
954
00:40:17,519 --> 00:40:19,080
Microphone 2, please.
955
00:40:19,080 --> 00:40:20,199
Mic 2: Hi there.
956
00:40:20,199 --> 00:40:23,259
Thanks for the talk.
957
00:40:23,259 --> 00:40:26,230
Since these pre-crime softwares also
958
00:40:26,230 --> 00:40:27,359
arrived here in Germany
959
00:40:27,359 --> 00:40:29,680
with the start of the so-called CopWatch system
960
00:40:29,680 --> 00:40:32,779
in southern Germany,
and Bavaria and Nuremberg especially,
961
00:40:32,779 --> 00:40:35,420
where they try to predict burglary crime
962
00:40:35,420 --> 00:40:37,460
using that criminal record
963
00:40:37,460 --> 00:40:40,170
geographical analysis, like you explained,
964
00:40:40,170 --> 00:40:43,380
leads me to a 2-fold question:
965
00:40:43,380 --> 00:40:47,900
first, have you heard of any research
966
00:40:47,900 --> 00:40:49,760
that measures the effectiveness
967
00:40:49,760 --> 00:40:53,690
of such measures, at all?
968
00:40:53,690 --> 00:40:57,040
And, second:
969
00:40:57,040 --> 00:41:00,599
What do you think of the game theory
970
00:41:00,599 --> 00:41:02,690
if the thieves or the bad guys
971
00:41:02,690 --> 00:41:07,619
know the system, and when they
game the system,
972
00:41:07,619 --> 00:41:09,980
they will probably win,
973
00:41:09,980 --> 00:41:11,640
since one police officer in an interview said
974
00:41:11,640 --> 00:41:14,019
this system is used to reduce
975
00:41:14,019 --> 00:41:16,460
the personal costs of policing,
976
00:41:16,460 --> 00:41:19,460
so they just send the guys
where the red flags are,
977
00:41:19,460 --> 00:41:22,290
and the others take the day off.
978
00:41:22,290 --> 00:41:24,360
Dr. Helsby: Yup.
979
00:41:24,360 --> 00:41:27,150
Um, so, with respect to
980
00:41:27,150 --> 00:41:30,990
testing the effectiveness of predictive policing,
981
00:41:30,990 --> 00:41:31,990
the companies,
982
00:41:31,990 --> 00:41:33,910
some of them do randomized, controlled trials
983
00:41:33,910 --> 00:41:35,240
and claim a reduction in policing.
984
00:41:35,240 --> 00:41:38,349
The best independent study that I've seen
985
00:41:38,349 --> 00:41:40,680
is by this RAND Corporation
986
00:41:40,680 --> 00:41:43,120
that did a study in, I think,
987
00:41:43,120 --> 00:41:44,920
Shreveport, Louisiana,
988
00:41:44,920 --> 00:41:47,589
and in their report they claim
989
00:41:47,589 --> 00:41:50,190
that there was no statistically significant
990
00:41:50,190 --> 00:41:52,900
difference, they didn't find any reduction.
991
00:41:52,900 --> 00:41:54,099
And it was specifically looking at
992
00:41:54,099 --> 00:41:56,730
property crime, which I think you mentioned.
993
00:41:56,730 --> 00:41:59,480
So, I think right now there's sort of
994
00:41:59,480 --> 00:42:01,069
conflicting reports between
995
00:42:01,069 --> 00:42:06,180
the independent auditors
and these company claims.
996
00:42:06,180 --> 00:42:09,289
So there definitely needs to be more study.
997
00:42:09,289 --> 00:42:12,240
And then, the 2nd thing...sorry,
remind me what it was?
998
00:42:12,240 --> 00:42:15,189
Mic 2: What about the guys gaming the system?
999
00:42:15,189 --> 00:42:16,949
Dr. Helsby: Oh, yeah.
1000
00:42:16,949 --> 00:42:18,900
I think it's a legitimate concern.
1001
00:42:18,900 --> 00:42:22,480
Like, if all the outputs
were just immediately public,
1002
00:42:22,480 --> 00:42:24,599
then, yes, everyone knows the location
1003
00:42:24,599 --> 00:42:26,549
of all police officers,
1004
00:42:26,549 --> 00:42:29,009
and I imagine that people would have
1005
00:42:29,009 --> 00:42:30,779
a problem with that.
1006
00:42:30,779 --> 00:42:32,679
Yup.
1007
00:42:32,679 --> 00:42:35,990
Heraldl: Microphone #4, please.
1008
00:42:35,990 --> 00:42:39,369
Mic 4: Yeah, this is not actually a question,
1009
00:42:39,369 --> 00:42:40,779
but just a comment.
1010
00:42:40,779 --> 00:42:42,970
I've enjoyed your talk very much,
1011
00:42:42,970 --> 00:42:47,789
in particular after watching
1012
00:42:47,789 --> 00:42:52,270
the talk in Hall 1 earlier in the afternoon.
1013
00:42:52,270 --> 00:42:55,730
The "Say Hi to Your New Boss", about
1014
00:42:55,730 --> 00:42:59,609
algorithms that are trained with big data,
1015
00:42:59,609 --> 00:43:02,390
and finally make decisions.
1016
00:43:02,390 --> 00:43:08,210
And I think these 2 talks are kind of complementary,
1017
00:43:08,210 --> 00:43:11,309
and if people are interested in the topic
1018
00:43:11,309 --> 00:43:14,710
they might want to check out the other talk
1019
00:43:14,710 --> 00:43:16,259
and watch it later, because these
1020
00:43:16,259 --> 00:43:17,319
fit very well together.
1021
00:43:17,319 --> 00:43:19,589
Dr. Helsby: Yeah, it was a great talk.
1022
00:43:19,589 --> 00:43:22,130
Herald: Microphone #2, please.
1023
00:43:22,130 --> 00:43:25,049
Mic 2: Um, yeah, you mentioned
1024
00:43:25,049 --> 00:43:27,319
the need to have some kind of 3rd-party auditing
1025
00:43:27,319 --> 00:43:30,900
or some kind of way to
1026
00:43:30,900 --> 00:43:31,930
peek into these algorithms
1027
00:43:31,930 --> 00:43:33,079
and to see what they're doing,
1028
00:43:33,079 --> 00:43:34,420
and to see if they're being fair.
1029
00:43:34,420 --> 00:43:36,199
Can you talk a little bit more about that?
1030
00:43:36,199 --> 00:43:38,059
Like, going forward,
1031
00:43:38,059 --> 00:43:40,690
some kind of regulatory structures
1032
00:43:40,690 --> 00:43:44,200
would probably have to emerge
1033
00:43:44,200 --> 00:43:47,200
to analyze and to look at
1034
00:43:47,200 --> 00:43:49,339
these black boxes that are just sort of
1035
00:43:49,339 --> 00:43:51,309
popping up everywhere and, you know,
1036
00:43:51,309 --> 00:43:52,939
controlling more and more of the things
1037
00:43:52,939 --> 00:43:56,150
in our lives, and important decisions.
1038
00:43:56,150 --> 00:43:58,539
So, just, what kind of discussions
1039
00:43:58,539 --> 00:43:59,460
are there for that?
1040
00:43:59,460 --> 00:44:01,809
And what kind of possibility
is there for that?
1041
00:44:01,809 --> 00:44:04,900
And, I'm sure that companies would be
1042
00:44:04,900 --> 00:44:08,000
very, very resistant to
1043
00:44:08,000 --> 00:44:09,890
any kind of attempt to look into
1044
00:44:09,890 --> 00:44:13,890
algorithms, and to...
1045
00:44:13,890 --> 00:44:15,070
Dr. Helsby: Yeah, I mean, definitely
1046
00:44:15,070 --> 00:44:18,069
companies would be very resistant to
1047
00:44:18,069 --> 00:44:19,670
having people look into their algorithms.
1048
00:44:19,670 --> 00:44:22,190
So, if you wanna do a very rigorous
1049
00:44:22,190 --> 00:44:23,339
audit of what's going on
1050
00:44:23,339 --> 00:44:25,660
then it's probably necessary to have
1051
00:44:25,660 --> 00:44:26,589
a few people come in
1052
00:44:26,589 --> 00:44:28,900
and sign NDAs, and then
1053
00:44:28,900 --> 00:44:31,039
look through the systems.
1054
00:44:31,039 --> 00:44:33,140
So, that's one way to proceed.
1055
00:44:33,140 --> 00:44:35,049
But, another way to proceed that--
1056
00:44:35,049 --> 00:44:38,720
so, these academic researchers have done
1057
00:44:38,720 --> 00:44:40,009
a few experiments
1058
00:44:40,009 --> 00:44:42,809
and found some interesting things,
1059
00:44:42,809 --> 00:44:45,500
and that's sort all the attempts at auditing
1060
00:44:45,500 --> 00:44:46,450
that we've seen:
1061
00:44:46,450 --> 00:44:48,490
there was 1 attempt in 2012
for the Obama campaign,
1062
00:44:48,490 --> 00:44:49,910
but there's really not been any
1063
00:44:49,910 --> 00:44:51,500
sort of systematic attempt--
1064
00:44:51,500 --> 00:44:52,589
you know, like, in censorship
1065
00:44:52,589 --> 00:44:54,539
we see a systematic attempt to
1066
00:44:54,539 --> 00:44:56,779
do measurement as often as possible,
1067
00:44:56,779 --> 00:44:58,240
check what's going on,
1068
00:44:58,240 --> 00:44:59,339
and that itself, you know,
1069
00:44:59,339 --> 00:45:00,900
can act as an oversight mechanism.
1070
00:45:00,900 --> 00:45:01,880
But, right now,
1071
00:45:01,880 --> 00:45:03,900
I think many of these companies
1072
00:45:03,900 --> 00:45:05,259
realize no one is watching,
1073
00:45:05,259 --> 00:45:07,160
so there's no real push to have
1074
00:45:07,160 --> 00:45:10,440
people verify: are you being fair when you
1075
00:45:10,440 --> 00:45:11,539
implement this system?
1076
00:45:11,539 --> 00:45:12,969
Because no one's really checking.
1077
00:45:12,969 --> 00:45:13,980
Mic 2: Do you think that,
1078
00:45:13,980 --> 00:45:15,339
at some point, it would be like
1079
00:45:15,339 --> 00:45:19,059
an FDA or SEC, to give some American examples...
1080
00:45:19,059 --> 00:45:21,490
an actual government regulatory agency
1081
00:45:21,490 --> 00:45:24,960
that has the power and ability to
1082
00:45:24,960 --> 00:45:27,930
not just sort of look and try to
1083
00:45:27,930 --> 00:45:31,710
reverse engineer some of these algorithms,
1084
00:45:31,710 --> 00:45:33,920
but actually peek in there and make sure
1085
00:45:33,920 --> 00:45:36,420
that things are fair, because it seems like
1086
00:45:36,420 --> 00:45:38,240
there's just-- it's so important now
1087
00:45:38,240 --> 00:45:41,769
that, again, it could be the difference between
1088
00:45:41,769 --> 00:45:42,930
life and death, between
1089
00:45:42,930 --> 00:45:44,589
getting a job, not getting a job,
1090
00:45:44,589 --> 00:45:46,130
being pulled over,
not being pulled over,
1091
00:45:46,130 --> 00:45:48,069
being racially profiled,
not racially profiled,
1092
00:45:48,069 --> 00:45:49,410
things like that.
Dr. Helsby: Right.
1093
00:45:49,410 --> 00:45:50,430
Mic 2: Is it moving in that direction?
1094
00:45:50,430 --> 00:45:52,249
Or is it way too early for it?
1095
00:45:52,249 --> 00:45:55,110
Dr. Helsby: I mean, so some people have...
1096
00:45:55,110 --> 00:45:56,859
someone has called for, like,
1097
00:45:56,859 --> 00:45:59,079
a Federal Search Commission,
1098
00:45:59,079 --> 00:46:00,930
or like a Federal Algorithms Commission,
1099
00:46:00,930 --> 00:46:03,200
that would do this sort of oversight work,
1100
00:46:03,200 --> 00:46:06,130
but it's in such early stages right now
1101
00:46:06,130 --> 00:46:09,970
that there's no real push for that.
1102
00:46:09,970 --> 00:46:13,330
But I think it's a good idea.
1103
00:46:13,330 --> 00:46:15,729
Herald: And again, #2 please.
1104
00:46:15,729 --> 00:46:17,059
Mic 2: Thank you again for your talk.
1105
00:46:17,059 --> 00:46:19,309
I was just curious if you can point
1106
00:46:19,309 --> 00:46:20,440
to any examples of
1107
00:46:20,440 --> 00:46:22,619
either current producers or consumers
1108
00:46:22,619 --> 00:46:24,029
of these algorithmic systems
1109
00:46:24,029 --> 00:46:26,390
who are actively and publicly trying
1110
00:46:26,390 --> 00:46:27,720
to do so in a responsible manner
1111
00:46:27,720 --> 00:46:29,720
by describing what they're trying to do
1112
00:46:29,720 --> 00:46:31,380
and how they're going about it?
1113
00:46:31,380 --> 00:46:37,210
Dr. Helsby: So, yeah, there are some companies,
1114
00:46:37,210 --> 00:46:39,000
for example, like DataKind,
1115
00:46:39,000 --> 00:46:42,710
that try to deploy algorithmic systems
1116
00:46:42,710 --> 00:46:44,640
in as responsible a way as possible,
1117
00:46:44,640 --> 00:46:47,250
for like public policy.
1118
00:46:47,250 --> 00:46:49,549
Like, I actually also implement systems
1119
00:46:49,549 --> 00:46:51,750
for public policy in a transparent way.
1120
00:46:51,750 --> 00:46:54,329
Like, all the code is in GitHub, etc.
1121
00:46:54,329 --> 00:47:00,020
And it is also the case to give credit to
1122
00:47:00,020 --> 00:47:01,990
Google, and these giants,
1123
00:47:01,990 --> 00:47:06,109
they're trying to implement transparency systems
1124
00:47:06,109 --> 00:47:08,170
that help you understand.
1125
00:47:08,170 --> 00:47:09,289
This has been done with respect to
1126
00:47:09,289 --> 00:47:12,329
how your data is being collected,
1127
00:47:12,329 --> 00:47:14,579
but for example if you go on Amazon.com
1128
00:47:14,579 --> 00:47:17,890
you can see a recommendation has been made,
1129
00:47:17,890 --> 00:47:19,420
and that is pretty transparent.
1130
00:47:19,420 --> 00:47:21,480
You can see "this item
was recommended to me,"
1131
00:47:21,480 --> 00:47:25,039
so you know that prediction
is being used in this case,
1132
00:47:25,039 --> 00:47:27,089
and it will say why prediction is being used:
1133
00:47:27,089 --> 00:47:29,230
because you purchased some item.
1134
00:47:29,230 --> 00:47:30,380
And Google has a similar thing,
1135
00:47:30,380 --> 00:47:32,420
if you go to like Google Ad Settings,
1136
00:47:32,420 --> 00:47:35,249
you can even turn off personalization of ads
1137
00:47:35,249 --> 00:47:36,380
if you want,
1138
00:47:36,380 --> 00:47:38,119
and you can also see some of the inferences
1139
00:47:38,119 --> 00:47:39,400
that have been learned about you.
1140
00:47:39,400 --> 00:47:40,819
A subset of the inferences that have been
1141
00:47:40,819 --> 00:47:41,700
learned about you.
1142
00:47:41,700 --> 00:47:43,940
So, like, what interests...
1143
00:47:43,940 --> 00:47:47,869
Herald: A question from the internet, please?
1144
00:47:47,869 --> 00:47:50,930
Signal Angel: Yes, billetQ is asking
1145
00:47:50,930 --> 00:47:54,479
how do you avoid biases in machine learning?
1146
00:47:54,479 --> 00:47:57,380
I asume analysis system, for example,
1147
00:47:57,380 --> 00:48:00,420
could be biased against women and minorities,
1148
00:48:00,420 --> 00:48:04,960
if used for hiring decisions
based on known data.
1149
00:48:04,960 --> 00:48:06,499
Dr. Helsby: Yeah, so one thing is to
1150
00:48:06,499 --> 00:48:08,529
just explicitly check.
1151
00:48:08,529 --> 00:48:12,199
So, you can check to see how
1152
00:48:12,199 --> 00:48:14,309
positive outcomes are being distributed
1153
00:48:14,309 --> 00:48:16,779
among those protected classes.
1154
00:48:16,779 --> 00:48:19,210
You could also incorporate these sort of
1155
00:48:19,210 --> 00:48:21,440
fairness constraints in the function
1156
00:48:21,440 --> 00:48:24,069
that you optimize when you train the system,
1157
00:48:24,069 --> 00:48:25,950
and so, if you're interested in reading more
1158
00:48:25,950 --> 00:48:28,960
about this, the 2 papers--
1159
00:48:28,960 --> 00:48:31,909
let me go to References--
1160
00:48:31,909 --> 00:48:32,730
there's a good paper called
1161
00:48:32,730 --> 00:48:35,339
Fairness Through Awareness that describes
1162
00:48:35,339 --> 00:48:37,499
how to go about doing this,
1163
00:48:37,499 --> 00:48:39,579
so I recommend this person read that.
1164
00:48:39,579 --> 00:48:40,970
It's good.
1165
00:48:40,970 --> 00:48:43,400
Herald: Microphone 2, please.
1166
00:48:43,400 --> 00:48:45,400
Mic2: Thanks again for your talk.
1167
00:48:45,400 --> 00:48:49,649
Umm, hello?
1168
00:48:49,649 --> 00:48:50,999
Okay.
1169
00:48:50,999 --> 00:48:52,960
Umm, I see of course a problem with
1170
00:48:52,960 --> 00:48:54,619
all the black boxes that you describe
1171
00:48:54,619 --> 00:48:57,069
with regards for the crime systems,
1172
00:48:57,069 --> 00:48:59,569
but when we look at the advertising systems
1173
00:48:59,569 --> 00:49:02,169
in many cases they are very networked.
1174
00:49:02,169 --> 00:49:04,160
There are many different systems collaborating
1175
00:49:04,160 --> 00:49:07,109
and exchanging data via open APIs:
1176
00:49:07,109 --> 00:49:08,720
RESTful APIs, and various
1177
00:49:08,720 --> 00:49:11,720
demand-side platforms
and audience-exchange platforms,
1178
00:49:11,720 --> 00:49:12,539
and everything.
1179
00:49:12,539 --> 00:49:15,420
So, can that help to at least
1180
00:49:15,420 --> 00:49:22,160
increase awareness on where targeting, personalization
1181
00:49:22,160 --> 00:49:23,679
might be happening?
1182
00:49:23,679 --> 00:49:26,190
I mean, I'm looking at systems like
1183
00:49:26,190 --> 00:49:29,539
BuiltWith, that surface what kind of
1184
00:49:29,539 --> 00:49:31,380
JavaScript libraries are used elsewhere.
1185
00:49:31,380 --> 00:49:32,999
So, is that something that could help
1186
00:49:32,999 --> 00:49:35,670
at least to give a better awareness
1187
00:49:35,670 --> 00:49:38,690
and listing all the points where
1188
00:49:38,690 --> 00:49:41,409
you might be targeted...
1189
00:49:41,409 --> 00:49:43,070
Dr. Helsby: So, like, with respect to
1190
00:49:43,070 --> 00:49:46,460
advertising, the fact that
there is behind the scenes
1191
00:49:46,460 --> 00:49:48,450
this like complicated auction process
1192
00:49:48,450 --> 00:49:50,650
that's occurring, just makes things
1193
00:49:50,650 --> 00:49:51,819
a lot more complicated.
1194
00:49:51,819 --> 00:49:54,170
So, for example, I said briefly
1195
00:49:54,170 --> 00:49:57,269
that they found that there's this
statistical difference
1196
00:49:57,269 --> 00:49:59,099
between how men and women are treated,
1197
00:49:59,099 --> 00:50:01,339
but it doesn't necessarily mean that
1198
00:50:01,339 --> 00:50:03,640
"Oh, the algorithm is definitely biased."
1199
00:50:03,640 --> 00:50:06,369
It could be because of this auction process,
1200
00:50:06,369 --> 00:50:10,569
it could be that women are considered
1201
00:50:10,569 --> 00:50:12,630
more valuable when it comes to advertising,
1202
00:50:12,630 --> 00:50:15,099
and so these executive ads are getting
1203
00:50:15,099 --> 00:50:17,160
outbid by some other ads,
1204
00:50:17,160 --> 00:50:18,890
and so there's a lot of potential
1205
00:50:18,890 --> 00:50:20,490
causes for that.
1206
00:50:20,490 --> 00:50:22,829
So, I think it just makes things
a lot more complicated.
1207
00:50:22,829 --> 00:50:25,910
I don't know if it helps
with the bias at all.
1208
00:50:25,910 --> 00:50:27,410
Mic 2: Well, the question was more
1209
00:50:27,410 --> 00:50:30,299
a direction... can it help to surface
1210
00:50:30,299 --> 00:50:32,499
and make people aware of that fact?
1211
00:50:32,499 --> 00:50:34,930
I mean, I can talk to my kids probably,
1212
00:50:34,930 --> 00:50:36,259
and they will probably understand,
1213
00:50:36,259 --> 00:50:38,420
but I can't explain that to my grandma,
1214
00:50:38,420 --> 00:50:43,150
who's also, umm, looking at an iPad.
1215
00:50:43,150 --> 00:50:44,289
Dr. Helsby: So, the fact that
1216
00:50:44,289 --> 00:50:45,690
the systems are...
1217
00:50:45,690 --> 00:50:48,509
I don't know if I understand.
1218
00:50:48,509 --> 00:50:50,529
Mic 2: OK. I think that the main problem
1219
00:50:50,529 --> 00:50:53,710
is that we are behind the industry efforts
1220
00:50:53,710 --> 00:50:57,179
to being targeted at, and many people
1221
00:50:57,179 --> 00:51:00,579
do know, but a lot more people don't know,
1222
00:51:00,579 --> 00:51:03,160
and making them aware of the fact
1223
00:51:03,160 --> 00:51:07,269
that they are a target, in a way,
1224
00:51:07,269 --> 00:51:10,990
is something that can only be shown
1225
00:51:10,990 --> 00:51:14,779
by a 3rd party that disposed that data,
1226
00:51:14,779 --> 00:51:16,339
and make audits in a way--
1227
00:51:16,339 --> 00:51:17,929
maybe in an automated way.
1228
00:51:17,929 --> 00:51:19,170
Dr. Helsby: Right.
1229
00:51:19,170 --> 00:51:21,410
Yeah, I think it certainly
could help with advocacy
1230
00:51:21,410 --> 00:51:23,059
if that's the point, yeah.
1231
00:51:23,059 --> 00:51:26,079
Herald: Another question
from the internet, please.
1232
00:51:26,079 --> 00:51:29,319
Signal Angel: Yes, on IRC they are asking
1233
00:51:29,319 --> 00:51:31,440
if we know that prediction in some cases
1234
00:51:31,440 --> 00:51:34,460
provides an influence that cannot be controlled.
1235
00:51:34,460 --> 00:51:38,480
So, r4v5 would like to know from you
1236
00:51:38,480 --> 00:51:41,519
if there are some cases or areas where
1237
00:51:41,519 --> 00:51:45,060
machine learning simply shouldn't go?
1238
00:51:45,060 --> 00:51:48,349
Dr. Helsby: Umm, so I think...
1239
00:51:48,349 --> 00:51:52,559
I mean, yes, I think that it is the case
1240
00:51:52,559 --> 00:51:54,650
that in some cases machine learning
1241
00:51:54,650 --> 00:51:56,180
might not be appropriate.
1242
00:51:56,180 --> 00:51:58,359
For example, if you use machine learning
1243
00:51:58,359 --> 00:52:00,970
to decide who should be searched.
1244
00:52:00,970 --> 00:52:02,619
I don't think it should be the case that
1245
00:52:02,619 --> 00:52:03,809
machine learning algorithms should
1246
00:52:03,809 --> 00:52:05,440
ever be used to determine
1247
00:52:05,440 --> 00:52:08,430
probable cause, or something like that.
1248
00:52:08,430 --> 00:52:12,339
So, if it's just one piece of evidence
1249
00:52:12,339 --> 00:52:13,299
that you consider,
1250
00:52:13,299 --> 00:52:14,990
and there's human oversight always,
1251
00:52:14,990 --> 00:52:18,519
maybe it's fine, but
1252
00:52:18,519 --> 00:52:20,839
we should be very suspicious and hesitant
1253
00:52:20,839 --> 00:52:22,119
in certain contexts where
1254
00:52:22,119 --> 00:52:24,529
the ramifications are very serious.
1255
00:52:24,529 --> 00:52:27,259
Like the No Fly List, and so on.
1256
00:52:27,259 --> 00:52:29,200
Herald: And #2 again.
1257
00:52:29,200 --> 00:52:30,809
Mic 2: A second question
1258
00:52:30,809 --> 00:52:33,509
that just occurred to me, if you don't mind.
1259
00:52:33,509 --> 00:52:35,339
Umm, until the advent of
1260
00:52:35,339 --> 00:52:36,559
algorithmic systems,
1261
00:52:36,559 --> 00:52:40,470
when there've been cases of serious harm
1262
00:52:40,470 --> 00:52:42,799
that's been resulted in individuals or groups,
1263
00:52:42,799 --> 00:52:44,579
and it's been demonstrated that
1264
00:52:44,579 --> 00:52:46,029
it's occurred because of
1265
00:52:46,029 --> 00:52:49,400
an individual or a system of people
1266
00:52:49,400 --> 00:52:53,019
being systematically biased, then often
1267
00:52:53,019 --> 00:52:55,130
one of the actions that's taken is
1268
00:52:55,130 --> 00:52:56,869
pressure's applied, and then
1269
00:52:56,869 --> 00:52:59,660
people are required to change,
1270
00:52:59,660 --> 00:53:01,049
and hopely be held responsible,
1271
00:53:01,049 --> 00:53:02,910
and then change the way that they do things
1272
00:53:02,910 --> 00:53:06,400
to try to remove bias from that system.
1273
00:53:06,400 --> 00:53:07,839
What's the current thinking about
1274
00:53:07,839 --> 00:53:10,299
how we can go about doing that
1275
00:53:10,299 --> 00:53:12,599
when the systems that are doing that
1276
00:53:12,599 --> 00:53:13,650
are algorithmic?
1277
00:53:13,650 --> 00:53:15,999
Is it just going to be human oversight,
1278
00:53:15,999 --> 00:53:16,910
and humans are gonna have to be
1279
00:53:16,910 --> 00:53:18,379
held responsible for the oversight?
1280
00:53:18,379 --> 00:53:20,890
Dr. Helsby: So, in terms of bias,
1281
00:53:20,890 --> 00:53:22,569
if we're concerned about bias towards
1282
00:53:22,569 --> 00:53:24,019
particular types of people,
1283
00:53:24,019 --> 00:53:25,710
that's something that we can optimize for.
1284
00:53:25,710 --> 00:53:28,839
So, we can train systems that are unbiased
1285
00:53:28,839 --> 00:53:30,019
in this way.
1286
00:53:30,019 --> 00:53:32,109
So that's one way to deal with it.
1287
00:53:32,109 --> 00:53:34,039
But there's always gonna be errors,
1288
00:53:34,039 --> 00:53:35,420
so that's sort of a separate issue
1289
00:53:35,420 --> 00:53:37,509
from the bias, and in the case
1290
00:53:37,509 --> 00:53:39,180
where there are errors,
1291
00:53:39,180 --> 00:53:40,539
there must be oversight.
1292
00:53:40,539 --> 00:53:45,079
So, one way that one could improve
1293
00:53:45,079 --> 00:53:46,410
the way that this is done
1294
00:53:46,410 --> 00:53:48,160
is by making sure that you're
1295
00:53:48,160 --> 00:53:50,799
keeping track of confidence of decisions.
1296
00:53:50,799 --> 00:53:54,039
So, if you have a low confidence prediction,
1297
00:53:54,039 --> 00:53:56,259
then maybe a human
should come in and check things.
1298
00:53:56,259 --> 00:53:58,809
So, that might be one way to proceed.
1299
00:54:02,099 --> 00:54:03,990
Herald: So, there's no more question.
1300
00:54:03,990 --> 00:54:06,199
I close this talk now,
1301
00:54:06,199 --> 00:54:08,239
and thank you very much
1302
00:54:08,239 --> 00:54:09,410
and a big applause to
1303
00:54:09,410 --> 00:54:11,780
Jennifer Helsby!
1304
00:54:11,780 --> 00:54:16,310
roaring applause
1305
00:54:16,310 --> 00:54:28,000
subtitles created by c3subtitles.de
Join, and help us!