0:00:00.000,0:00:08.895
<i>Musik</i>

0:00:08.895,0:00:20.040
Herald: Who of you is using Facebook? Twitter? [br]Diaspora?

0:00:20.040,0:00:27.630
<i>concerned noise</i> And all of that data[br]you enter there

0:00:27.630,0:00:34.240
gets to server, gets into the hand of somebody[br]who's using it

0:00:34.240,0:00:38.519
and the next talk[br]is especially about that,

0:00:38.519,0:00:43.879
because there's also intelligent machines[br]and intelligent algorithms

0:00:43.879,0:00:47.489
that try to make something[br]out of that data.

0:00:47.489,0:00:50.920
So the post-doc researcher Jennifer Helsby

0:00:50.920,0:00:55.839
of the University of Chicago,[br]which works in this

0:00:55.839,0:00:59.370
intersection between policy and [br]technology,

0:00:59.370,0:01:04.709
will now ask you the question:[br]To who would we give that power?

0:01:04.709,0:01:12.860
Dr. Helsby: Thanks.[br]<i>applause</i>

0:01:12.860,0:01:17.090
Okay, so, today I'm gonna do a brief tour[br]of intelligent systems

0:01:17.090,0:01:18.640
and how they're currently used

0:01:18.640,0:01:21.760
and then we're gonna look at some examples[br]with respect

0:01:21.760,0:01:23.710
to the properties that we might care about

0:01:23.710,0:01:26.000
these systems having,[br]and I'll talk a little bit about

0:01:26.000,0:01:27.940
some of the work that's been done in academia

0:01:27.940,0:01:28.680
on these topics.

0:01:28.680,0:01:31.780
And then we'll talk about some[br]promising paths forward.

0:01:31.780,0:01:37.040
So, I wanna start with this:[br]Kranzberg's First Law of Technology

0:01:37.040,0:01:40.420
So, it's not good or bad,[br]but it also isn't neutral.

0:01:40.420,0:01:42.980
Technology shapes our world,[br]and it can act as

0:01:42.980,0:01:46.140
a liberating force-- or an oppressive and[br]controlling force.

0:01:46.140,0:01:49.730
So, in this talk, I'm gonna go[br]towards some of the aspects

0:01:49.730,0:01:53.830
of intelligent systems that might be more[br]controlling in nature.

0:01:53.830,0:01:56.060
So, as we all know,

0:01:56.060,0:01:59.770
because of the rapidly decreasing cost[br]of storage and computation,

0:01:59.770,0:02:02.170
along with the rise of new sensor technologies,

0:02:02.170,0:02:05.510
data collection devices[br]are being pushed into every

0:02:05.510,0:02:08.329
aspect of our lives: in our homes, our cars,

0:02:08.329,0:02:10.469
in our pockets, on our wrists.

0:02:10.469,0:02:13.280
And data collection systems act as intermediaries

0:02:13.280,0:02:15.230
for a huge amount of human communication.

0:02:15.230,0:02:17.900
And much of this data sits in government

0:02:17.900,0:02:19.860
and corporate databases.

0:02:19.860,0:02:23.090
So, in order to make use of this data,

0:02:23.090,0:02:27.280
we need to be able to make some inferences.

0:02:27.280,0:02:30.280
So, one way of approaching this is I can hire

0:02:30.280,0:02:32.310
a lot of humans, and I can have these humans

0:02:32.310,0:02:34.990
manually examine the data, and they can acquire

0:02:34.990,0:02:36.900
expert knowledge of the domain, and then

0:02:36.900,0:02:38.510
perhaps they can make some decisions

0:02:38.510,0:02:40.830
or at least some recommendations[br]based on it.

0:02:40.830,0:02:43.030
However, there's some problems with this.

0:02:43.030,0:02:45.810
One is that it's slow, and thus expensive.

0:02:45.810,0:02:48.060
It's also biased. We know that humans have

0:02:48.060,0:02:50.700
all sorts of biases, both conscious and unconscious,

0:02:50.700,0:02:53.390
and it would be nice to have a system[br]that did not have

0:02:53.390,0:02:54.959
these inaccuracies.

0:02:54.959,0:02:57.069
It's also not very transparent: I might

0:02:57.069,0:02:58.910
not really know the factors that led to

0:02:58.910,0:03:00.930
some decisions being made.

0:03:00.930,0:03:03.360
Even humans themselves[br]often don't really understand

0:03:03.360,0:03:05.360
why they came to a given decision, because

0:03:05.360,0:03:08.130
of their being emotional in nature.

0:03:08.130,0:03:11.530
And, thus, these human decision making systems

0:03:11.530,0:03:13.170
are often difficult to audit.

0:03:13.170,0:03:15.819
So, another way to proceed is maybe instead

0:03:15.819,0:03:18.000
I study the system and the data carefully

0:03:18.000,0:03:20.520
and I write down the best rules[br]for making a decision

0:03:20.520,0:03:23.280
or, I can have a machine[br]dynamically figure out

0:03:23.280,0:03:25.459
the best rules, as in machine learning.

0:03:25.459,0:03:28.640
So, maybe this is a better approach.

0:03:28.640,0:03:32.230
It's certainly fast, and thus cheap.

0:03:32.230,0:03:34.290
And maybe I can construct[br]the system in such a way

0:03:34.290,0:03:37.090
that it doesn't have the biases that are inherent

0:03:37.090,0:03:39.209
in human decision making.

0:03:39.209,0:03:41.560
And, since I've written these rules down,

0:03:41.560,0:03:42.819
or a computer has learned these rules,

0:03:42.819,0:03:45.140
then I can just show them to somebody, right?

0:03:45.140,0:03:46.819
And then they can audit it.

0:03:46.819,0:03:49.020
So, more and more decision making is being

0:03:49.020,0:03:50.750
done in this way.

0:03:50.750,0:03:53.170
And so, in this model, we take data

0:03:53.170,0:03:55.709
we make an inference based on that data

0:03:55.709,0:03:58.120
using these algorithms, and then

0:03:58.120,0:03:59.420
we can take actions.

0:03:59.420,0:04:01.860
And, when we take this more scientific approach

0:04:01.860,0:04:04.200
to making decisions and optimizing for

0:04:04.200,0:04:07.310
a desired outcome,[br]we can take an experimental approach

0:04:07.310,0:04:10.080
so we can determine[br]which actions are most effective

0:04:10.080,0:04:12.310
in achieving a desired outcome.

0:04:12.310,0:04:14.010
Maybe there are some types of communication

0:04:14.010,0:04:16.750
styles that are most effective[br]with certain people.

0:04:16.750,0:04:19.510
I can perhaps deploy some individualized incentives

0:04:19.510,0:04:22.060
to get the outcome that I desire.

0:04:22.060,0:04:25.990
And, maybe even if I carefully design an experiment

0:04:25.990,0:04:27.810
with the environment in which people make

0:04:27.810,0:04:30.699
these decisions, perhaps even very small changes

0:04:30.699,0:04:34.250
can introduce significant changes[br]in peoples' behavior.

0:04:34.250,0:04:37.320
So, through these mechanisms,[br]and this experimental approach,

0:04:37.320,0:04:39.840
I can maximize the probability[br]that humans do

0:04:39.840,0:04:42.020
what I want.

0:04:42.020,0:04:45.380
So, algorithmic decision making is being used

0:04:45.380,0:04:47.270
in industry, and is used[br]in lots of other areas,

0:04:47.270,0:04:49.530
from astrophysics to medicine, and is now

0:04:49.530,0:04:52.199
moving into new domains, including

0:04:52.199,0:04:53.990
government applications.

0:04:53.990,0:04:58.560
So, we have recommendation engines like[br]Netflix, Yelp, SoundCloud,

0:04:58.560,0:05:00.699
that direct our attention to what we should

0:05:00.699,0:05:03.510
watch and listen to.

0:05:03.510,0:05:07.919
Since 2009, Google uses[br]personalized searched results,

0:05:07.919,0:05:12.840
including if you're not logged in[br]into your Google account.

0:05:12.840,0:05:15.389
And we also have algorithm curation and filtering,

0:05:15.389,0:05:17.530
as in the case of Facebook News Feed,

0:05:17.530,0:05:19.870
Google News, Yahoo News,

0:05:19.870,0:05:22.840
which shows you what news articles, for example,

0:05:22.840,0:05:24.330
you should be looking at.

0:05:24.330,0:05:25.650
And this is important, because a lot of people

0:05:25.650,0:05:29.410
get news from these media.

0:05:29.410,0:05:31.520
We even have algorithmic journalists!

0:05:31.520,0:05:35.240
So, automatic systems generate articles

0:05:35.240,0:05:36.880
about weather, traffic, or sports

0:05:36.880,0:05:38.729
instead of a human.

0:05:38.729,0:05:41.949
And, another application that's more recent

0:05:41.949,0:05:43.570
is the use of predictive systems

0:05:43.570,0:05:45.180
in political campaigns.

0:05:45.180,0:05:47.370
So, political campaigns also now take this

0:05:47.370,0:05:50.340
approach to predict on an individual basis

0:05:50.340,0:05:53.300
which candidate voters[br]are likely to vote for.

0:05:53.300,0:05:55.500
And then they can target,[br]on an individual basis,

0:05:55.500,0:05:58.199
those that can be persuaded otherwise.

0:05:58.199,0:06:00.830
And, finally, in the public sector,

0:06:00.830,0:06:02.710
we're starting to use predictive systems

0:06:02.710,0:06:06.320
in areas from policing, to health,[br]to education and energy.

0:06:06.320,0:06:08.979
So, there are some advantages to this.

0:06:08.979,0:06:12.790
So, one thing is that we can automate

0:06:12.790,0:06:15.759
aspects of our lives[br]that we consider to be mundane

0:06:15.759,0:06:17.620
using systems that are intelligent

0:06:17.620,0:06:19.580
and adaptive enough.

0:06:19.580,0:06:21.680
We can make use of all the data

0:06:21.680,0:06:23.990
and really get the pieces of information we

0:06:23.990,0:06:25.830
really care about.

0:06:25.830,0:06:29.650
We can spend money in the most effective way,

0:06:29.650,0:06:32.110
and we can do this with this experimental

0:06:32.110,0:06:34.210
approach to optimize actions to produce

0:06:34.210,0:06:35.190
desired outcomes.

0:06:35.190,0:06:37.300
So, we can embed intelligence

0:06:37.300,0:06:39.520
into all of these mundane objects

0:06:39.520,0:06:41.180
and enable them to make decisions for us,

0:06:41.180,0:06:42.860
and so that's what we're doing more and more,

0:06:42.860,0:06:45.210
and we can have an object[br]that decides for us

0:06:45.210,0:06:46.840
what temperature we should set our house,

0:06:46.840,0:06:49.009
what we should be doing, etc.

0:06:49.009,0:06:52.400
So, there might be some implications here.

0:06:52.400,0:06:55.680
We want these systems[br]that do work on this data

0:06:55.680,0:06:58.039
to increase the opportunities[br]available to us.

0:06:58.039,0:07:00.259
But it might be that there are some implications

0:07:00.259,0:07:01.780
that we have not carefully thought through.

0:07:01.780,0:07:03.430
This is a new area, and people are only

0:07:03.430,0:07:05.940
starting to scratch the surface of what the

0:07:05.940,0:07:07.289
problems might be.

0:07:07.289,0:07:09.600
In some cases, they might narrow the options

0:07:09.600,0:07:10.990
available to people,

0:07:10.990,0:07:13.199
and this approach subjects people to

0:07:13.199,0:07:15.620
suggestive messaging intended to nudge them

0:07:15.620,0:07:17.169
to a desired outcome.

0:07:17.169,0:07:19.320
Some people may have a problem with that.

0:07:19.320,0:07:20.650
Values we care about are not gonna be

0:07:20.650,0:07:23.860
baked into these systems by default.

0:07:23.860,0:07:25.960
It's also the case that some algorithmic systems

0:07:25.960,0:07:28.300
facilitate work that we do not like.

0:07:28.300,0:07:30.199
For example, in the case of mass surveillance.

0:07:30.199,0:07:32.130
And even the same systems,

0:07:32.130,0:07:34.039
used by different people or organizations,

0:07:34.039,0:07:36.110
have very different consequences.

0:07:36.110,0:07:37.320
For example, if I can predict

0:07:37.320,0:07:40.020
with high accuracy, based on say search queries,

0:07:40.020,0:07:42.050
who's gonna be admitted to a hospital,

0:07:42.050,0:07:43.750
some people would be interested[br]in knowing that.

0:07:43.750,0:07:46.120
You might be interested[br]in having your doctor know that.

0:07:46.120,0:07:47.919
But that same predictive model[br]in the hands of

0:07:47.919,0:07:50.569
an insurance company[br]has a very different implication.

0:07:50.569,0:07:53.389
So, the point here is that these systems

0:07:53.389,0:07:55.860
structure and influence how humans interact

0:07:55.860,0:07:58.360
with each other, how they interact with society,

0:07:58.360,0:07:59.850
and how they interact with government.

0:07:59.850,0:08:03.080
And if they constrain what people can do,

0:08:03.080,0:08:05.069
we should really care about this.

0:08:05.069,0:08:08.270
So now I'm gonna go to[br]sort of an extreme case,

0:08:08.270,0:08:11.930
just as an example, and that's this[br]Chinese Social Credit System.

0:08:11.930,0:08:14.169
And so this is probably one of the more

0:08:14.169,0:08:17.259
ambitious uses of data,

0:08:17.259,0:08:18.880
that is used to rank each citizen

0:08:18.880,0:08:21.190
based on their behavior, in China.

0:08:21.190,0:08:24.210
So right now, there are various pilot systems

0:08:24.210,0:08:27.660
deployed by various companies doing this in[br]China.

0:08:27.660,0:08:30.729
They're currently voluntary, and by 2020

0:08:30.729,0:08:32.630
this system is gonna be decided on,

0:08:32.630,0:08:34.679
or a combination of the systems,

0:08:34.679,0:08:37.409
that is gonna be mandatory for everyone.

0:08:37.409,0:08:40.950
And so, in this system, there are some citizens,

0:08:40.950,0:08:44.380
and a huge range of data sources are used.

0:08:44.380,0:08:46.820
So, some of the data sources are

0:08:46.820,0:08:48.360
your financial data,

0:08:48.360,0:08:50.020
your criminal history,

0:08:50.020,0:08:52.320
how many points you have[br]on your driver's license,

0:08:52.320,0:08:55.360
medical information-- for example,[br]if you take birth control pills,

0:08:55.360,0:08:56.810
that's incorporated.

0:08:56.810,0:08:59.830
Your purchase history-- for example,[br]if you purchase games,

0:08:59.830,0:09:02.430
you are down-ranked in the system.

0:09:02.430,0:09:04.490
Some of the systems, not all of them,

0:09:04.490,0:09:07.260
incorporate social media monitoring,

0:09:07.260,0:09:09.200
which makes sense if you're a state like China,

0:09:09.200,0:09:11.270
you probably want to know about

0:09:11.270,0:09:14.899
political statements that people[br]are saying on social media.

0:09:14.899,0:09:18.020
And, one of the more interesting parts is

0:09:18.020,0:09:22.160
social network analysis:[br]looking at the relationships between people.

0:09:22.160,0:09:24.270
So, if you have a close relationship with[br]somebody

0:09:24.270,0:09:26.180
and they have a low credit score,

0:09:26.180,0:09:29.130
that can have implications on your credit[br]score.

0:09:29.130,0:09:34.440
So, the way that these scores[br]are generated is secret.

0:09:34.440,0:09:38.140
And, according to the call for these systems

0:09:38.140,0:09:39.270
put out by the government,

0:09:39.270,0:09:42.810
the goal is to[br]"carry forward the sincerity and

0:09:42.810,0:09:45.760
traditional virtues" and[br]establish the idea of a

0:09:45.760,0:09:47.520
"sincerity culture."

0:09:47.520,0:09:49.440
But wait, it gets better:

0:09:49.440,0:09:52.450
so, there's a portal that enables citizens

0:09:52.450,0:09:55.040
to look up the citizen score of anyone.

0:09:55.040,0:09:56.520
And many people like this system,

0:09:56.520,0:09:58.320
they think it's a fun game.

0:09:58.320,0:10:00.700
They boast about it on social media,

0:10:00.700,0:10:03.610
they put their score in their dating profile,

0:10:03.610,0:10:04.760
because if you're ranked highly you're

0:10:04.760,0:10:06.589
part of an exclusive club.

0:10:06.589,0:10:10.060
You can get VIP treatment[br]at hotels and other companies.

0:10:10.060,0:10:11.880
But the downside is that, if you're excluded

0:10:11.880,0:10:15.540
from that club, your weak score[br]may have other implications,

0:10:15.540,0:10:20.120
like being unable to get access[br]to credit, housing, jobs.

0:10:20.120,0:10:23.399
There is some reporting that even travel visas

0:10:23.399,0:10:27.000
might be restricted[br]if your score is particularly low.

0:10:27.000,0:10:31.160
So, a system like this, for a state, is really

0:10:31.160,0:10:34.690
the optimal solution[br]to the problem of the public.

0:10:34.690,0:10:37.130
It constitutes a very subtle and insiduous

0:10:37.130,0:10:39.350
mechanism of social control.

0:10:39.350,0:10:41.209
You don't need to spend a lot of money on

0:10:41.209,0:10:43.800
police or prisons if you can set up a system

0:10:43.800,0:10:45.820
where people discourage one another from

0:10:45.820,0:10:48.930
anti-social acts like political action[br]in exchange for

0:10:48.930,0:10:51.430
a coupon for a free Uber ride.

0:10:51.430,0:10:55.269
So, there are a lot of[br]legitimate questions here:

0:10:55.269,0:10:58.370
What protections does[br]user data have in this scheme?

0:10:58.370,0:11:01.279
Do any safeguards exist to prevent tampering?

0:11:01.279,0:11:04.310
What mechanism, if any, is there to prevent

0:11:04.310,0:11:08.810
false input data from creating erroneous inferences?

0:11:08.810,0:11:10.420
Is there any way that people can fix

0:11:10.420,0:11:12.540
their score once they're ranked poorly?

0:11:12.540,0:11:13.899
Or does it end up becoming a

0:11:13.899,0:11:15.720
self-fulfilling prophecy?

0:11:15.720,0:11:17.850
Your weak score means you have less access

0:11:17.850,0:11:21.620
to jobs and credit, and now you will have

0:11:21.620,0:11:24.709
limited access to opportunity.

0:11:24.709,0:11:27.110
So, let's take a step back.

0:11:27.110,0:11:28.470
So, what do we want?

0:11:28.470,0:11:31.540
So, we probably don't want that,

0:11:31.540,0:11:33.570
but as advocates we really wanna

0:11:33.570,0:11:36.130
understand what questions we should be asking

0:11:36.130,0:11:37.510
of these systems. Right now there's

0:11:37.510,0:11:39.570
very little oversight,

0:11:39.570,0:11:41.420
and we wanna make sure that we don't

0:11:41.420,0:11:44.029
sort of sleepwalk our way to a situation

0:11:44.029,0:11:46.649
where we've lost even more power

0:11:46.649,0:11:49.740
to these centralized systems of control.

0:11:49.740,0:11:52.209
And if you're an implementer, we wanna understand

0:11:52.209,0:11:53.709
what can we be doing better.

0:11:53.709,0:11:56.019
Are there better ways that we can be implementing

0:11:56.019,0:11:57.640
these systems?

0:11:57.640,0:11:59.430
Are there values that, as humans,

0:11:59.430,0:12:01.060
we care about that we should make sure

0:12:01.060,0:12:02.420
these systems have?

0:12:02.420,0:12:05.550
So, the first thing[br]that most people in the room

0:12:05.550,0:12:07.820
might think about is privacy.

0:12:07.820,0:12:10.510
Which is, of course, of the utmost importance.

0:12:10.510,0:12:12.920
We need privacy, and there is a good discussion

0:12:12.920,0:12:15.680
on the importance of protecting[br]user data where possible.

0:12:15.680,0:12:18.420
So, in this talk, I'm gonna focus[br]on the other aspects of

0:12:18.420,0:12:19.470
algorithmic decision making,

0:12:19.470,0:12:21.190
that I think have got less attention.

0:12:21.190,0:12:25.140
Because it's not just privacy[br]that we need to worry about here.

0:12:25.140,0:12:28.519
We also want systems that are fair and equitable.

0:12:28.519,0:12:30.240
We want transparent systems,

0:12:30.240,0:12:35.110
we don't want opaque decisions[br]to be made about us,

0:12:35.110,0:12:36.510
decisions that might have serious impacts

0:12:36.510,0:12:37.779
on our lives.

0:12:37.779,0:12:40.490
And we need some accountability mechanisms.

0:12:40.490,0:12:41.890
So, for the rest of this talk

0:12:41.890,0:12:43.230
we're gonna go through each one of these things

0:12:43.230,0:12:45.230
and look at some examples.

0:12:45.230,0:12:47.709
So, the first thing is fairness.

0:12:47.709,0:12:50.450
And so, as I said in the beginning,[br]this is one area

0:12:50.450,0:12:52.690
where there might be an advantage

0:12:52.690,0:12:55.079
to making decisions by machine,

0:12:55.079,0:12:56.740
especially in areas where there have

0:12:56.740,0:12:59.410
historically been fairness issues with

0:12:59.410,0:13:02.350
decision making, such as law enforcement.

0:13:02.350,0:13:05.839
So, this is one way that police departments

0:13:05.839,0:13:08.360
use predictive models.

0:13:08.360,0:13:10.540
The idea here is police would like to

0:13:10.540,0:13:13.450
allocate resources in a more effective way,

0:13:13.450,0:13:15.050
and they would also like to enable

0:13:15.050,0:13:16.640
proactive policing.

0:13:16.640,0:13:20.110
So, if you can predict where crimes[br]are going to occur,

0:13:20.110,0:13:22.149
or who is going to commit crimes,

0:13:22.149,0:13:24.870
then you can put cops in those places,

0:13:24.870,0:13:27.769
or perhaps following these people,

0:13:27.769,0:13:29.300
and then the crimes will not occur.

0:13:29.300,0:13:31.370
So, it's sort of the pre-crime approach.

0:13:31.370,0:13:34.649
So, there are a few ways of going about this.

0:13:34.649,0:13:37.920
One way is doing this individual-level prediction.

0:13:37.920,0:13:41.089
So you take each citizen[br]and estimate the risk

0:13:41.089,0:13:43.769
that each citizen will participate,[br]say, in violence

0:13:43.769,0:13:45.279
based on some data.

0:13:45.279,0:13:46.779
And then you can flag those people that are

0:13:46.779,0:13:49.199
considered particularly violent.

0:13:49.199,0:13:51.519
So, this is currently done.

0:13:51.519,0:13:52.589
This is done in the U.S.

0:13:52.589,0:13:56.120
It's done in Chicago,[br]by the Chicago Police Department.

0:13:56.120,0:13:58.350
And they maintain a heat list of individuals

0:13:58.350,0:14:00.790
that are considered most likely to commit,

0:14:00.790,0:14:03.529
or be the victim of, violence.

0:14:03.529,0:14:06.700
And this is done using data[br]that the police maintain.

0:14:06.700,0:14:09.589
So, the features that are used[br]in this predictive model

0:14:09.589,0:14:12.209
include things that are derived from

0:14:12.209,0:14:14.610
individuals' criminal history.

0:14:14.610,0:14:16.810
So, for example, have they been involved in

0:14:16.810,0:14:18.350
gun violence in the past?

0:14:18.350,0:14:21.450
Do they have narcotics arrests? And so on.

0:14:21.450,0:14:22.860
But another thing that's incorporated

0:14:22.860,0:14:25.060
in the Chicago Police Department model is

0:14:25.060,0:14:28.300
information derived from[br]social media network analysis.

0:14:28.300,0:14:30.630
So, who you interact with,

0:14:30.630,0:14:32.279
as noted in police data.

0:14:32.279,0:14:34.899
So, for example, your co-arrestees.

0:14:34.899,0:14:36.440
When officers conduct field interviews,

0:14:36.440,0:14:38.240
who are people interacting with?

0:14:38.240,0:14:42.940
And then this is all incorporated[br]into this risk score.

0:14:42.940,0:14:44.639
So another way to proceed,

0:14:44.639,0:14:47.070
which is the method that most companies

0:14:47.070,0:14:49.579
that sell products like this[br]to the police have taken,

0:14:49.579,0:14:51.459
is instead predicting which areas

0:14:51.459,0:14:53.810
are likely to have crimes committed in them.

0:14:53.810,0:14:56.690
So, take my city, I put a grid down,

0:14:56.690,0:14:58.180
and then I use crime statistics

0:14:58.180,0:15:00.430
and maybe some ancillary data sources,

0:15:00.430,0:15:01.790
to determine which areas have

0:15:01.790,0:15:04.709
the highest risk of crimes occurring in them,

0:15:04.709,0:15:06.329
and I can flag those areas and send

0:15:06.329,0:15:08.470
police officers to them.

0:15:08.470,0:15:10.950
So now, let's look at some of the tools

0:15:10.950,0:15:14.010
that are used for this geographic-level prediction.

0:15:14.010,0:15:19.040
So, here are 3 companies that sell these

0:15:19.040,0:15:22.910
geographic-level predictive policing systems.

0:15:22.910,0:15:25.639
So, PredPol has a system that uses

0:15:25.639,0:15:27.200
primarily crime statistics:

0:15:27.200,0:15:30.209
only the time, place, and type of crime

0:15:30.209,0:15:33.040
to predict where crimes will occur.

0:15:33.040,0:15:35.970
HunchLab uses a wider range of data sources

0:15:35.970,0:15:37.260
including, for example, weather

0:15:37.260,0:15:39.720
and then Hitachi is a newer system

0:15:39.720,0:15:42.100
that has a predictive crime analytics tool

0:15:42.100,0:15:44.779
that also incorporates social media.

0:15:44.779,0:15:47.850
The first one, to my knowledge, to do so.

0:15:47.850,0:15:49.399
And these systems are in use

0:15:49.399,0:15:52.820
in 50+ cities in the U.S.

0:15:52.820,0:15:56.540
So, why do police departments buy this?

0:15:56.540,0:15:57.760
Some police departments are interesting in

0:15:57.760,0:16:00.500
buying systems like this, because they're marketed

0:16:00.500,0:16:02.660
as impartial systems,

0:16:02.660,0:16:06.199
so it's a way to police in an unbiased way.

0:16:06.199,0:16:08.040
And so, these companies make

0:16:08.040,0:16:08.670
statements like this--

0:16:08.670,0:16:10.800
by the way, the references[br]will all be at the end,

0:16:10.800,0:16:12.560
and they'll be on the slides--

0:16:12.560,0:16:13.370
So, for example

0:16:13.370,0:16:16.110
the predictive crime analytics from Hitachi

0:16:16.110,0:16:17.610
claims that the system is anonymous,

0:16:17.610,0:16:19.350
because it shows you an area,

0:16:19.350,0:16:23.060
it doesn't show you[br]to look for a particular person.

0:16:23.060,0:16:25.699
and PredPol reassures people that

0:16:25.699,0:16:29.560
it eliminates any liberties or profiling concerns.

0:16:29.560,0:16:32.269
And HunchLab notes that the system

0:16:32.269,0:16:35.170
fairly represents priorities for public safety

0:16:35.170,0:16:38.769
and is unbiased by race[br]or ethnicity, for example.

0:16:38.769,0:16:43.529
So, let's take a minute[br]to describe in more detail

0:16:43.529,0:16:48.100
what we mean when we talk about fairness.

0:16:48.100,0:16:51.300
So, when we talk about fairness,

0:16:51.300,0:16:52.740
we mean a few things.

0:16:52.740,0:16:56.070
So, one is fairness with respect to individuals:

0:16:56.070,0:16:58.040
so if I'm very similar to somebody

0:16:58.040,0:17:00.170
and we go through some process

0:17:00.170,0:17:03.430
and there is two very different[br]outcomes to that process

0:17:03.430,0:17:05.679
we would consider that to be unfair.

0:17:05.679,0:17:07.929
So, we want similar people to be treated

0:17:07.929,0:17:09.539
in a similar way.

0:17:09.539,0:17:13.079
But, there are certain protected attributes

0:17:13.079,0:17:15.199
that we wouldn't want someone

0:17:15.199,0:17:17.099
to discriminate based on.

0:17:17.099,0:17:20.069
And so, there's this other property,[br]Group Fairness.

0:17:20.069,0:17:22.249
So, we can look at the statistical parity

0:17:22.249,0:17:25.439
between groups, based on gender, race, etc.

0:17:25.439,0:17:28.049
and see if they're treated in a similar way.

0:17:28.049,0:17:30.409
And we might not expect that in some cases,

0:17:30.409,0:17:32.429
for example if the base rates in each group

0:17:32.429,0:17:34.659
are very different.

0:17:34.659,0:17:36.889
And then there's also Fairness in Errors.

0:17:36.889,0:17:40.080
All predictive systems are gonna make errors,

0:17:40.080,0:17:42.989
and if the errors are concentrated,

0:17:42.989,0:17:46.399
then that may also represent unfairness.

0:17:46.399,0:17:50.149
And so this concern arose recently with Facebook

0:17:50.149,0:17:52.289
because people with Native American names

0:17:52.289,0:17:54.389
had their profiles flagged as fraudulent

0:17:54.389,0:17:58.759
far more often than those[br]with White American names.

0:17:58.759,0:18:00.559
So these are the sorts of things[br]that we worry about

0:18:00.559,0:18:02.190
and each of these are metrics,

0:18:02.190,0:18:04.239
and if you're interested more you should

0:18:04.239,0:18:06.159
check those 2 papers out.

0:18:06.159,0:18:10.639
So, how can potential issues[br]with predictive policing

0:18:10.639,0:18:13.850
have implications for these principles?

0:18:13.850,0:18:18.559
So, one problem is[br]the training data that's used.

0:18:18.559,0:18:21.059
Some of these systems only use crime statistics,

0:18:21.059,0:18:23.600
other systems-- all of them use crime statistics

0:18:23.600,0:18:25.619
in some way.

0:18:25.619,0:18:31.419
So, one problem is that crime databases

0:18:31.419,0:18:34.830
contain only crimes that've been detected.

0:18:34.830,0:18:38.629
Right? So, the police are only gonna detect

0:18:38.629,0:18:41.009
crimes that they know are happening,

0:18:41.009,0:18:44.109
either through patrol and their own investigation

0:18:44.109,0:18:46.320
or because they've been alerted to crime,

0:18:46.320,0:18:48.789
for example by a citizen calling the police.

0:18:48.789,0:18:52.179
So, a citizen has to feel like[br]they <i>can</i> call the police,

0:18:52.179,0:18:54.019
like that's a good idea.

0:18:54.019,0:18:58.789
So, some crimes suffer[br]from this problem less than others:

0:18:58.789,0:19:02.249
for example, gun violence[br]is much easier to detect

0:19:02.249,0:19:03.639
relative to fraud, for example,

0:19:03.639,0:19:07.509
which is very difficult to detect.

0:19:07.509,0:19:11.940
Now the racial profiling aspect[br]of this might come in

0:19:11.940,0:19:15.590
because of biased policing in the past.

0:19:15.590,0:19:19.999
So, for example, for marijuana arrests,

0:19:19.999,0:19:22.619
black people are arrested in the U.S. at rates

0:19:22.619,0:19:25.119
4 times that of white people,

0:19:25.119,0:19:27.960
even though there is statistical parity

0:19:27.960,0:19:31.389
with these 2 groups, to within a few percent.

0:19:31.389,0:19:35.820
So, this is where problems can arise.

0:19:35.820,0:19:37.159
So, let's go back to this

0:19:37.159,0:19:38.749
geographic-level predictive policing.

0:19:38.749,0:19:42.460
So the danger here is that, unless this system

0:19:42.460,0:19:44.299
is very carefully constructed,

0:19:44.299,0:19:47.090
this sort of crime area ranking might

0:19:47.090,0:19:49.019
again become a self-fulling prophecy.

0:19:49.019,0:19:51.460
If you send police officers to these areas,

0:19:51.460,0:19:53.220
you further scrutinize them,

0:19:53.220,0:19:55.659
and then again you're only detecting a subset

0:19:55.659,0:19:57.979
of crimes, and the cycle continues.

0:19:57.979,0:20:02.139
So, one obvious issue is that

0:20:02.139,0:20:07.599
this statement about geographic-based[br]crime prediction

0:20:07.599,0:20:10.229
being anonymous is not true,

0:20:10.229,0:20:13.159
because race and location are very strongly

0:20:13.159,0:20:14.840
correlated in the U.S.

0:20:14.840,0:20:16.609
And this is something that machine-learning[br]systems

0:20:16.609,0:20:20.049
can potentially learn.

0:20:20.049,0:20:23.039
Another issue is that, for example,

0:20:23.039,0:20:25.580
for individual fairness, one of my homes

0:20:25.580,0:20:27.599
sits within one of these boxes.

0:20:27.599,0:20:29.950
Some of these boxes[br]in these systems are very small,

0:20:29.950,0:20:33.399
for example PredPol is 500ft x 500ft,

0:20:33.399,0:20:36.349
so it's maybe only a few houses.

0:20:36.349,0:20:39.149
So, the implications of this system are that

0:20:39.149,0:20:40.849
you have police officers maybe sitting

0:20:40.849,0:20:42.979
in a police cruiser outside your home

0:20:42.979,0:20:45.450
and a few doors down someone

0:20:45.450,0:20:46.799
may not be within that box,

0:20:46.799,0:20:48.159
and doesn't have this.

0:20:48.159,0:20:51.399
So, that may represent unfairness.

0:20:51.399,0:20:54.929
So, there are real questions here,

0:20:54.929,0:20:57.720
especially because there's no opt-out.

0:20:57.720,0:21:00.059
There's no way to opt-out of this system:

0:21:00.059,0:21:02.239
if you live in a city that has this,

0:21:02.239,0:21:04.909
then you have to deal with it.

0:21:04.909,0:21:07.229
So, it's quite difficult to find out

0:21:07.229,0:21:09.879
what's really going on

0:21:09.879,0:21:11.169
because the algorithm is secret.

0:21:11.169,0:21:13.049
And, in most cases, we don't know

0:21:13.049,0:21:14.789
the full details of the inputs.

0:21:14.789,0:21:16.679
We have some idea[br]about what features are used,

0:21:16.679,0:21:17.970
but that's about it.

0:21:17.970,0:21:19.509
We also don't know the output.

0:21:19.509,0:21:21.899
That would be knowing police allocation,

0:21:21.899,0:21:23.179
police strategies,

0:21:23.179,0:21:26.299
and in order to nail down[br]what's really going on here

0:21:26.299,0:21:28.609
in order to verify the validity of

0:21:28.609,0:21:30.009
these companies' claims,

0:21:30.009,0:21:33.799
it may be necessary[br]to have a 3rd party come in,

0:21:33.799,0:21:35.629
examine the inputs and outputs of the system,

0:21:35.629,0:21:37.590
and say concretely what's going on.

0:21:37.590,0:21:39.460
And if everything is fine and dandy

0:21:39.460,0:21:40.929
then this shouldn't be a problem.

0:21:40.929,0:21:43.619
So, that's potentially one role that

0:21:43.619,0:21:44.769
advocates can play.

0:21:44.769,0:21:46.720
Maybe we should start pushing for audits

0:21:46.720,0:21:48.820
of systems that are used in this way.

0:21:48.820,0:21:50.970
These could have serious implications

0:21:50.970,0:21:52.679
for peoples' lives.

0:21:52.679,0:21:55.249
So, we'll return[br]to this idea a little bit later,

0:21:55.249,0:21:58.210
but for now this leads us[br]nicely to Transparency.

0:21:58.210,0:21:59.419
So, we wanna know

0:21:59.419,0:22:01.929
what these systems are doing.

0:22:01.929,0:22:04.729
But it's very hard,[br]for the reasons described earlier,

0:22:04.729,0:22:06.139
but even in the case of something like

0:22:06.139,0:22:09.849
trying to understand Google's search algorithm,

0:22:09.849,0:22:11.679
it's difficult because it's personalized.

0:22:11.679,0:22:13.529
So, by construction, each user is

0:22:13.529,0:22:15.320
only seeing one endpoint.

0:22:15.320,0:22:18.169
So, it's a very isolating system.

0:22:18.169,0:22:20.349
What do other people see?

0:22:20.349,0:22:22.409
And one reason it's difficult to make

0:22:22.409,0:22:24.099
some of these systems transparent

0:22:24.099,0:22:26.679
is because of, simply, the complexity

0:22:26.679,0:22:27.950
of the algorithms.

0:22:27.950,0:22:30.309
So, an algorithm can become so complex that

0:22:30.309,0:22:31.669
it's difficult to comprehend,

0:22:31.669,0:22:33.289
even for the designer of the system,

0:22:33.289,0:22:35.509
or the implementer of the system.

0:22:35.509,0:22:38.419
The designed might know that this algorithm

0:22:38.419,0:22:42.889
maximizes some metric-- say, accuracy,

0:22:42.889,0:22:44.570
but they may not always have a solid

0:22:44.570,0:22:46.779
understanding of what the algorithm is doing

0:22:46.779,0:22:48.330
for all inputs.

0:22:48.330,0:22:50.970
Certainly with respect to fairness.

0:22:50.970,0:22:55.759
So, in some cases,[br]it might not be appropriate to use

0:22:55.759,0:22:57.379
an extremely complex model.

0:22:57.379,0:22:59.529
It might be better to use a simpler system

0:22:59.529,0:23:02.910
with human-interpretable features.

0:23:02.910,0:23:04.749
Another issue that arises

0:23:04.749,0:23:07.559
from the opacity of these systems

0:23:07.559,0:23:09.409
and the centralized control

0:23:09.409,0:23:11.860
is that it makes them very influential.

0:23:11.860,0:23:13.950
And thus, an excellent target

0:23:13.950,0:23:16.210
for manipulation or tampering.

0:23:16.210,0:23:18.479
So, this might be tampering that is done

0:23:18.479,0:23:21.950
from an organization that controls the system,

0:23:21.950,0:23:23.769
or an insider at one of the organizations,

0:23:23.769,0:23:27.139
or anyone who's able to compromise their security.

0:23:27.139,0:23:30.249
So, this is an interesting academic work

0:23:30.249,0:23:32.099
that looked at the possibility of

0:23:32.099,0:23:34.159
slightly modifying search rankings

0:23:34.159,0:23:36.619
to shift people's political views.

0:23:36.619,0:23:39.009
So, since people are most likely to

0:23:39.009,0:23:41.330
click on the top search results,

0:23:41.330,0:23:44.429
so 90% of clicks go to the[br]first page of search results,

0:23:44.429,0:23:46.719
then perhaps by reshuffling[br]things a little bit,

0:23:46.719,0:23:48.729
or maybe dropping some search results,

0:23:48.729,0:23:50.269
you can influence people's views

0:23:50.269,0:23:51.679
in a coherent way,

0:23:51.679,0:23:53.090
and maybe you can make it so subtle

0:23:53.090,0:23:55.749
that no one is able to notice.

0:23:55.749,0:23:57.249
So in this academic study,

0:23:57.249,0:24:00.349
they did an experiment

0:24:00.349,0:24:02.070
in the 2014 Indian election.

0:24:02.070,0:24:04.219
So they used real voters,

0:24:04.219,0:24:06.450
and they kept the size[br]of the experiment small enough

0:24:06.450,0:24:08.190
that it was not going to influence the outcome

0:24:08.190,0:24:10.090
of the election.

0:24:10.090,0:24:12.139
So the researchers took people,

0:24:12.139,0:24:14.229
they determined their political leaning,

0:24:14.229,0:24:17.429
and they segmented them into[br]control and treatment groups,

0:24:17.429,0:24:19.269
where the treatment was manipulation

0:24:19.269,0:24:21.210
of the search ranking results,

0:24:21.210,0:24:24.409
And then they had these people[br]browse the web.

0:24:24.409,0:24:25.969
And what they found, is that

0:24:25.969,0:24:28.229
this mechanism is very effective at shifting

0:24:28.229,0:24:30.429
people's voter preferences.

0:24:30.429,0:24:33.649
So, in this study, they were able to introduce

0:24:33.649,0:24:36.849
a 20% shift in voter preferences.

0:24:36.849,0:24:39.299
Even alerting users to the fact that this

0:24:39.299,0:24:41.729
was going to be done, telling them

0:24:41.729,0:24:44.049
"we are going to manipulate your search results,"

0:24:44.049,0:24:45.729
"really pay attention,"

0:24:45.729,0:24:49.099
they were totally unable to decrease

0:24:49.099,0:24:50.859
the magnitude of the effect.

0:24:50.859,0:24:55.109
So, the margins of error in many elections

0:24:55.109,0:24:57.669
is incredibly small,

0:24:57.669,0:24:59.929
and the authors estimate that this shift

0:24:59.929,0:25:02.009
could change the outcome of about

0:25:02.009,0:25:07.109
25% of elections worldwide, if this were done.

0:25:07.109,0:25:10.919
And the bias is so small that no one can tell.

0:25:10.919,0:25:14.279
So, all humans, no matter how smart

0:25:14.279,0:25:17.109
and resistant to manipulation[br]we think we are,

0:25:17.109,0:25:21.909
all of us are subject to this sort of manipulation,

0:25:21.909,0:25:24.320
and we really can't tell.

0:25:24.320,0:25:27.129
So, I'm not saying that this is occurring,

0:25:27.129,0:25:31.389
but right now there is no[br]regulation to stop this,

0:25:31.389,0:25:34.409
there is no way we could reliably detect this,

0:25:34.409,0:25:37.210
so there's a huge amount of power here.

0:25:37.210,0:25:39.779
So, something to think about.

0:25:39.779,0:25:42.710
But it's not only corporations that are interested

0:25:42.710,0:25:47.269
in this sort of behavioral manipulation.

0:25:47.269,0:25:51.119
In 2010, UK Prime Minister David Cameron

0:25:51.119,0:25:54.969
created this UK Behavioural Insights Team,

0:25:54.969,0:25:57.269
which is informally called the Nudge Unit.

0:25:57.269,0:26:01.489
And so what they do is[br]they use behavioral science

0:26:01.489,0:26:04.769
and this predictive analytics approach,

0:26:04.769,0:26:06.119
with experimentation,

0:26:06.119,0:26:07.940
to have people make better decisions

0:26:07.940,0:26:09.690
for themselves and society--

0:26:09.690,0:26:11.989
as determined by the UK government.

0:26:11.989,0:26:14.269
And as of a few months ago,

0:26:14.269,0:26:16.849
after an executive order signed by Obama

0:26:16.849,0:26:19.349
in September, the United States now has

0:26:19.349,0:26:21.429
its own Nudge Unit.

0:26:21.429,0:26:24.009
So, to be clear, I don't think that this is

0:26:24.009,0:26:25.539
some sort of malicious plot.

0:26:25.539,0:26:27.440
I think that there <i>can</i> be huge value

0:26:27.440,0:26:29.489
in these sorts of initiatives,

0:26:29.489,0:26:31.330
positively impacting people's lives,

0:26:31.330,0:26:34.179
but when this sort of behavioral manipulation

0:26:34.179,0:26:37.289
is being done, in part openly,

0:26:37.289,0:26:39.460
oversight is pretty important,

0:26:39.460,0:26:41.700
and we really need to consider

0:26:41.700,0:26:46.090
what these systems are optimizing for.

0:26:46.090,0:26:47.849
And that's something that we might

0:26:47.849,0:26:52.090
not always know, or at least understand,

0:26:52.090,0:26:54.450
so for example, for industry,

0:26:54.450,0:26:57.679
we do have a pretty good understanding there:

0:26:57.679,0:26:59.809
industry cares about optimizing for

0:26:59.809,0:27:01.960
the time spent on the website,

0:27:01.960,0:27:04.929
Facebook wants you to spend more time on Facebook,

0:27:04.929,0:27:06.950
they want you to click on ads,

0:27:06.950,0:27:09.109
click on newsfeed items,

0:27:09.109,0:27:11.299
they want you to like things.

0:27:11.299,0:27:14.309
And, fundamentally: profit.

0:27:14.309,0:27:17.599
So, already this has some serious implications,

0:27:17.599,0:27:19.690
and this had pretty serious implications

0:27:19.690,0:27:22.190
in the last 10 years, in media for example.

0:27:22.190,0:27:25.119
The optimizing for click-through rate in journalism

0:27:25.119,0:27:26.629
has produced a race to the bottom

0:27:26.629,0:27:28.039
in terms of quality.

0:27:28.039,0:27:30.919
And another issue is that optimizing

0:27:30.919,0:27:34.589
for what people like might not always be

0:27:34.589,0:27:35.839
the best approach.

0:27:35.839,0:27:38.859
So, Facebook officials have said publicly

0:27:38.859,0:27:41.279
about how Facebook's goal is to make you happy,

0:27:41.279,0:27:43.149
they want you to open that newsfeed

0:27:43.149,0:27:45.080
and just feel great.

0:27:45.080,0:27:47.379
But, there's an issue there, right?

0:27:47.379,0:27:50.169
Because people get their news,

0:27:50.169,0:27:52.369
like 40% of people according to Pew Research,

0:27:52.369,0:27:54.599
get their news from Facebook.

0:27:54.599,0:27:58.460
So, if people don't want to see

0:27:58.460,0:28:01.239
war and corpses,[br]because it makes them feel sad,

0:28:01.239,0:28:04.179
so this is not a system that is gonna optimize

0:28:04.179,0:28:07.149
for an informed population.

0:28:07.149,0:28:09.359
It's not gonna produce a population that is

0:28:09.359,0:28:11.469
ready to engage in civic life.

0:28:11.469,0:28:13.059
It's gonna produce an amused populations

0:28:13.059,0:28:16.809
whose time is occupied by cat pictures.

0:28:16.809,0:28:19.159
So, in politics, we have a similar

0:28:19.159,0:28:21.269
optimization problem that's occurring.

0:28:21.269,0:28:23.769
So, these political campaigns that use

0:28:23.769,0:28:26.769
these predictive systems,

0:28:26.769,0:28:28.669
are optimizing for votes for the desired candidate,

0:28:28.669,0:28:30.200
of course.

0:28:30.200,0:28:33.499
So, instead of a political campaign being

0:28:33.499,0:28:36.139
--well, maybe this is a naive view, but--

0:28:36.139,0:28:38.070
being an open discussion of the issues

0:28:38.070,0:28:39.830
facing the country,

0:28:39.830,0:28:43.200
it becomes this micro-targeted[br]persuasion game,

0:28:43.200,0:28:44.669
and the people that get targeted

0:28:44.669,0:28:47.349
are a very small subset of all people,

0:28:47.349,0:28:49.399
and it's only gonna be people that are

0:28:49.399,0:28:51.409
you know, on the edge, maybe disinterested,

0:28:51.409,0:28:54.399
those are the people that are gonna get attention

0:28:54.399,0:28:58.839
from political candidates.

0:28:58.839,0:29:01.869
In policy, as with these Nudge Units,

0:29:01.869,0:29:03.539
they're being used to enable

0:29:03.539,0:29:06.109
better use of government services.

0:29:06.109,0:29:07.419
There are some good projects that have

0:29:07.419,0:29:09.419
come out of this:

0:29:09.419,0:29:11.409
increasing voter registration,

0:29:11.409,0:29:12.739
improving health outcomes,

0:29:12.739,0:29:14.419
improving education outcomes.

0:29:14.419,0:29:16.419
But some of these predictive systems

0:29:16.419,0:29:18.229
that we're starting to see in government

0:29:18.229,0:29:20.700
are optimizing for compliance,

0:29:20.700,0:29:23.669
as is the case with predictive policing.

0:29:23.669,0:29:25.460
So this is something that we need to

0:29:25.460,0:29:28.649
watch carefully.

0:29:28.649,0:29:30.119
I think this is a nice quote that

0:29:30.119,0:29:33.339
sort of describes the problem.

0:29:33.339,0:29:35.200
In some ways me might be narrowing

0:29:35.200,0:29:38.259
our horizon, and the danger is that

0:29:38.259,0:29:41.989
these tools are separating people.

0:29:41.989,0:29:43.570
And this is particularly bad

0:29:43.570,0:29:45.940
for political action, because political action

0:29:45.940,0:29:49.879
requires people to have shared experience,

0:29:49.879,0:29:53.799
and thus are able to collectively act

0:29:53.799,0:29:57.629
to exert pressure to fix problems.

0:29:57.629,0:30:00.810
So, finally: accountability.

0:30:00.810,0:30:03.399
So, we need some oversight mechanisms.

0:30:03.399,0:30:06.519
For example, in the case of errors--

0:30:06.519,0:30:08.219
so this is particularly important for

0:30:08.219,0:30:10.849
civil or bureaucratic systems.

0:30:10.849,0:30:14.330
So, when an algorithm produces some decision,

0:30:14.330,0:30:16.549
we don't always want humans to just

0:30:16.549,0:30:18.039
defer to the machine,

0:30:18.039,0:30:21.859
and that might represent one of the problems.

0:30:21.859,0:30:25.419
So, there are starting to be some cases

0:30:25.419,0:30:28.039
of computer algorithms yielding a decision,

0:30:28.039,0:30:30.409
and then humans being unable to correct

0:30:30.409,0:30:31.799
an obvious error.

0:30:31.799,0:30:35.190
So there's this case in Georgia,[br]in the United States,

0:30:35.190,0:30:37.259
where 2 young people went to

0:30:37.259,0:30:38.529
the Department of Motor Vehicles,

0:30:38.529,0:30:39.749
they're twins, and they went

0:30:39.749,0:30:42.099
to get their driver's license.

0:30:42.099,0:30:44.979
However, they were both flagged by

0:30:44.979,0:30:47.489
a fraud algorithm that uses facial recognition

0:30:47.489,0:30:48.809
to look for similar faces,

0:30:48.809,0:30:50.919
and I guess the people that designed the system

0:30:50.919,0:30:54.549
didn't think of the possibility of twins.

0:30:54.549,0:30:58.489
Yeah.[br]So, they just left

0:30:58.489,0:30:59.889
without their driver's licenses.

0:30:59.889,0:31:01.889
The people in the Department of Motor Vehicles

0:31:01.889,0:31:03.809
were unable to correct this.

0:31:03.809,0:31:06.820
So, this is one implication--

0:31:06.820,0:31:08.579
it's like something out of Kafka.

0:31:08.579,0:31:11.529
But there are also cases of errors being made,

0:31:11.529,0:31:13.879
and people not noticing until

0:31:13.879,0:31:15.909
after actions have been taken,

0:31:15.909,0:31:17.570
some of them very serious--

0:31:17.570,0:31:19.129
because people simply deferred

0:31:19.129,0:31:20.619
to the machine.

0:31:20.619,0:31:23.309
So, this is an example from San Francisco.

0:31:23.309,0:31:26.679
So, an ALPR-- an Automated License Plate Reader--

0:31:26.679,0:31:29.429
is a device that uses image recognition

0:31:29.429,0:31:32.099
to detect and read license plates,

0:31:32.099,0:31:34.339
and usually to compare license plates

0:31:34.339,0:31:37.159
with a known list of plates of interest.

0:31:37.159,0:31:39.799
And, so, San Francisco uses these

0:31:39.799,0:31:42.179
and they're mounted on police cars.

0:31:42.179,0:31:46.659
So, in this case, San Francisco ALPR

0:31:46.659,0:31:48.879
got a hit on a car,

0:31:48.879,0:31:53.029
and it was the car of a 47-year-old woman,

0:31:53.029,0:31:54.839
with no criminal history.

0:31:54.839,0:31:56.029
And so it was a false hit

0:31:56.029,0:31:58.099
because it was a blurry image,

0:31:58.099,0:31:59.709
and it matched erroneously with

0:31:59.709,0:32:00.909
one of the plates of interest

0:32:00.909,0:32:03.479
that happened to be a stolen vehicle.

0:32:03.479,0:32:06.869
So, they conducted a traffic stop on her,

0:32:06.869,0:32:09.330
and they take her out of the vehicle,

0:32:09.330,0:32:11.049
they search her and the vehicle,

0:32:11.049,0:32:12.659
she gets a pat-down,

0:32:12.659,0:32:14.849
and they have her kneel

0:32:14.849,0:32:17.780
at gunpoint, in the street.

0:32:17.780,0:32:20.989
So, how much oversight should be present

0:32:20.989,0:32:23.999
depends on the implications of the system.

0:32:23.999,0:32:25.279
It's certainly the case that

0:32:25.279,0:32:26.910
for some of these decision-making systems,

0:32:26.910,0:32:29.219
an error might not be that important,

0:32:29.219,0:32:31.149
it could be relatively harmless,

0:32:31.149,0:32:33.559
but in this case,[br]an error in this algorithmic decision

0:32:33.559,0:32:36.259
led to this totally innocent person

0:32:36.259,0:32:40.019
literally having a gun pointed at her.

0:32:40.019,0:32:44.019
So, that brings us to: we need some way of

0:32:44.019,0:32:45.419
getting some information about

0:32:45.419,0:32:47.249
what is going on here.

0:32:47.249,0:32:50.179
We don't wanna have to wait for these events

0:32:50.179,0:32:52.580
before we are able to determine

0:32:52.580,0:32:54.409
some information about the system.

0:32:54.409,0:32:56.139
So, auditing is one option:

0:32:56.139,0:32:58.109
to independently verify the statements

0:32:58.109,0:33:00.809
of companies, in situations where we have

0:33:00.809,0:33:02.939
inputs and outputs.

0:33:02.939,0:33:05.200
So, for example, this could be done with

0:33:05.200,0:33:07.489
Google, Facebook.

0:33:07.489,0:33:09.190
If you have the inputs of a system,

0:33:09.190,0:33:10.649
say you have test accounts,

0:33:10.649,0:33:11.729
or real accounts,

0:33:11.729,0:33:14.359
maybe you can collect[br]people's information together.

0:33:14.359,0:33:15.830
So that was something that was done

0:33:15.830,0:33:18.759
during the 2012 Obama campaign

0:33:18.759,0:33:20.249
by ProPublica.

0:33:20.249,0:33:21.269
People noticed that they were getting

0:33:21.269,0:33:24.739
different emails from the Obama campaign,

0:33:24.739,0:33:26.009
and were interested to see

0:33:26.009,0:33:28.209
based on what factors

0:33:28.209,0:33:29.749
the emails were changing.

0:33:29.749,0:33:32.659
So, I think about 200 people submitted emails

0:33:32.659,0:33:34.940
and they were able to determine some information

0:33:34.940,0:33:38.809
about what the emails[br]were being varied based on.

0:33:38.809,0:33:40.859
So there have been some successful

0:33:40.859,0:33:43.080
attempts at this.

0:33:43.080,0:33:45.919
So, compare inputs and then look at

0:33:45.919,0:33:48.709
why one item was shown to one user

0:33:48.709,0:33:50.289
and not another, and see if there's

0:33:50.289,0:33:51.879
any statistical differences.

0:33:51.879,0:33:56.279
So, there's some potential legal issues

0:33:56.279,0:33:57.749
with the test accounts, so that's something

0:33:57.749,0:34:01.499
to think about-- I'm not a lawyer.

0:34:01.499,0:34:03.919
So, for example, if you wanna examine

0:34:03.919,0:34:06.269
ad-targeting algorithms,

0:34:06.269,0:34:07.969
one way to proceed is to construct

0:34:07.969,0:34:10.589
a browsing profile, and then examine

0:34:10.589,0:34:12.989
what ads are served back to you.

0:34:12.989,0:34:14.119
And so this is something that

0:34:14.119,0:34:16.250
academic researchers have looked at,

0:34:16.250,0:34:17.489
because, at the time at least,

0:34:17.489,0:34:20.879
you didn't need to make an account to do this.

0:34:20.879,0:34:24.768
So, this was a study that was presented at

0:34:24.768,0:34:27.799
Privacy Enhancing Technologies last year,

0:34:27.799,0:34:31.149
and in this study, the researchers

0:34:31.149,0:34:33.179
generate some browsing profiles

0:34:33.179,0:34:35.909
that differ only by one characteristic,

0:34:35.909,0:34:37.690
so they're basically identical in every way

0:34:37.690,0:34:39.049
except for one thing.

0:34:39.049,0:34:42.359
And that is denoted by Treatment 1 and 2.

0:34:42.359,0:34:44.460
So this is a randomized, controlled trial,

0:34:44.460,0:34:46.389
but I left out the randomization part

0:34:46.389,0:34:48.220
for simplicity.

0:34:48.220,0:34:54.799
So, in one study,[br]they applied a treatment of gender.

0:34:54.799,0:34:56.799
So, they had the browsing profiles

0:34:56.799,0:34:59.319
in Treatment 1 be male browsing profiles,

0:34:59.319,0:35:02.029
and the browsing profiles in Treatment 2[br]be female.

0:35:02.029,0:35:04.430
And they wanted to see: is there any difference

0:35:04.430,0:35:06.079
in the way that ads are targeted

0:35:06.079,0:35:08.710
if browsing profiles are effectively identical

0:35:08.710,0:35:11.019
except for gender?

0:35:11.019,0:35:14.710
So, it turns out that there <i>was</i>.

0:35:14.710,0:35:19.180
So, a 3rd-party site was showing Google ads

0:35:19.180,0:35:21.289
for senior executive positions

0:35:21.289,0:35:23.980
at a rate 6 times higher to the fake men

0:35:23.980,0:35:27.059
than for the fake women in this study.

0:35:27.059,0:35:30.109
So, this sort of auditing is not going to

0:35:30.109,0:35:32.779
be able to determine everything

0:35:32.779,0:35:34.930
that algorithms are doing, but they can

0:35:34.930,0:35:36.519
sometimes uncover interesting,

0:35:36.519,0:35:40.900
at least statistical differences.

0:35:40.900,0:35:47.099
So, this leads us to the fundamental issue:

0:35:47.099,0:35:49.180
Right now, we're really not in control

0:35:49.180,0:35:50.510
of some of these systems,

0:35:50.510,0:35:54.480
and we really need these predictive systems

0:35:54.480,0:35:56.119
to be controlled by us,

0:35:56.119,0:35:57.819
in order for them not to be used

0:35:57.819,0:36:00.109
as a system of control.

0:36:00.109,0:36:03.220
So there are some technologies that I'd like

0:36:03.220,0:36:06.890
to point you all to.

0:36:06.890,0:36:08.319
We need tools in the digital commons

0:36:08.319,0:36:11.160
that can help address some of these concerns.

0:36:11.160,0:36:13.349
So, the first thing is that of course

0:36:13.349,0:36:14.730
we known that minimizing the amount of

0:36:14.730,0:36:17.069
data available can help in some contexts,

0:36:17.069,0:36:18.980
which we can do by making systems

0:36:18.980,0:36:22.779
that are private by design, and by default.

0:36:22.779,0:36:24.549
Another thing is that these audit tools

0:36:24.549,0:36:25.890
might be useful.

0:36:25.890,0:36:30.720
And, so, these 2 nice examples in academia...

0:36:30.720,0:36:34.359
the ad experiment that I just showed was done

0:36:34.359,0:36:36.120
using AdFisher.

0:36:36.120,0:36:38.200
So, these are 2 toolkits that you can use

0:36:38.200,0:36:41.440
to start doing this sort of auditing.

0:36:41.440,0:36:44.579
Another technology that is generally useful,

0:36:44.579,0:36:46.700
but particularly in the case of prediction

0:36:46.700,0:36:48.789
it's useful to maintain access to

0:36:48.789,0:36:50.289
as many sites as possible,

0:36:50.289,0:36:52.589
through anonymity systems like Tor,

0:36:52.589,0:36:54.319
because it's impossible to personalize

0:36:54.319,0:36:55.650
when everyone looks the same.

0:36:55.650,0:36:59.130
So this is a very important technology.

0:36:59.130,0:37:01.519
Something that doesn't really exist,

0:37:01.519,0:37:03.630
but that I think is pretty important,

0:37:03.630,0:37:05.829
is having some tool to view the landscape.

0:37:05.829,0:37:08.160
So, as we know from these few studies

0:37:08.160,0:37:10.440
that have been done,

0:37:10.440,0:37:12.059
different people are not seeing the internet

0:37:12.059,0:37:12.950
in the same way.

0:37:12.950,0:37:15.730
This is one reason why we don't like censorship.

0:37:15.730,0:37:17.880
But, rich and poor people,

0:37:17.880,0:37:19.659
from academic research we know that

0:37:19.659,0:37:23.790
there is widespread price discrimination[br]on the internet,

0:37:23.790,0:37:25.650
so rich and poor people see a different view

0:37:25.650,0:37:26.970
of the Internet,

0:37:26.970,0:37:28.400
men and women see a different view

0:37:28.400,0:37:29.940
of the Internet.

0:37:29.940,0:37:31.200
We wanna know how different people

0:37:31.200,0:37:32.450
see the same site,

0:37:32.450,0:37:34.329
and this could be the beginning of

0:37:34.329,0:37:36.329
a defense system for this sort of

0:37:36.329,0:37:41.730
manipulation/tampering that I showed earlier.

0:37:41.730,0:37:45.549
Another interesting approach is obfuscation:

0:37:45.549,0:37:46.980
injecting noise into the system.

0:37:46.980,0:37:49.190
So there's an interesting browser extension

0:37:49.190,0:37:51.720
called Adnauseum, that's for Firefox,

0:37:51.720,0:37:54.579
which clicks on every single ad you're served,

0:37:54.579,0:37:55.680
to inject noise.

0:37:55.680,0:37:57.019
So that's, I think, an interesting approach

0:37:57.019,0:38:00.170
that people haven't looked at too much.

0:38:00.170,0:38:03.780
So in terms of policy,

0:38:03.780,0:38:06.530
Facebook and Google, these internet giants,

0:38:06.530,0:38:08.829
have billions of users,

0:38:08.829,0:38:12.220
and sometimes they like to call themselves

0:38:12.220,0:38:13.769
new public utilities,

0:38:13.769,0:38:15.000
and if that's the case then

0:38:15.000,0:38:17.549
it might be necessary to subject them

0:38:17.549,0:38:20.539
to additional regulation.

0:38:20.539,0:38:21.990
Another problem that's come up,

0:38:21.990,0:38:23.539
for example with some of the studies

0:38:23.539,0:38:24.900
that Facebook has done,

0:38:24.900,0:38:29.039
is sometimes a lack of ethics review.

0:38:29.039,0:38:31.059
So, for example, in academia,

0:38:31.059,0:38:33.859
if you're gonna do research involving humans,

0:38:33.859,0:38:35.390
there's an Institutional Review Board

0:38:35.390,0:38:36.970
that you go to that verifies that

0:38:36.970,0:38:39.140
you're doing things in an ethical manner.

0:38:39.140,0:38:40.910
And some companies do have internal

0:38:40.910,0:38:43.029
review processes like this, but it might

0:38:43.029,0:38:45.119
be important to have an independent

0:38:45.119,0:38:48.200
ethics board that does this sort of thing.

0:38:48.200,0:38:50.849
And we <i>really</i> need 3rd-party auditing.

0:38:50.849,0:38:54.519
So, for example, some companies

0:38:54.519,0:38:56.220
don't want auditing to be done

0:38:56.220,0:38:59.190
because of IP concerns,

0:38:59.190,0:39:00.579
and if that's the concern

0:39:00.579,0:39:03.180
maybe having a set of people

0:39:03.180,0:39:05.680
that are not paid by the company

0:39:05.680,0:39:07.200
to check how some of these systems

0:39:07.200,0:39:08.640
are being implemented,

0:39:08.640,0:39:11.240
could help give us confidence that

0:39:11.240,0:39:16.979
things are being done in a reasonable way.

0:39:16.979,0:39:20.269
So, in closing,

0:39:20.269,0:39:23.180
algorithmic decision making is here,

0:39:23.180,0:39:26.140
and it's barreling forward[br]at a very fast rate,

0:39:26.140,0:39:27.890
and we need to figure out what

0:39:27.890,0:39:30.410
the guide rails should be,

0:39:30.410,0:39:31.380
and how to install them

0:39:31.380,0:39:33.119
to handle some of the potential threats.

0:39:33.119,0:39:35.470
There's a huge amount of power here.

0:39:35.470,0:39:37.910
We need more openness in these systems.

0:39:37.910,0:39:39.589
And, right now,

0:39:39.589,0:39:41.559
with the intelligent systems that do exist,

0:39:41.559,0:39:43.920
we don't know what's occurring really,

0:39:43.920,0:39:46.510
and we need to watch carefully

0:39:46.510,0:39:49.099
where and how these systems are being used.

0:39:49.099,0:39:50.690
And I think this community has

0:39:50.690,0:39:53.940
an important role to play in this fight,

0:39:53.940,0:39:55.730
to study what's being done,

0:39:55.730,0:39:57.160
to show people what's being done,

0:39:57.160,0:39:58.670
to raise the debate and advocate,

0:39:58.670,0:40:01.200
and, where necessary, to resist.

0:40:01.200,0:40:03.339
Thanks.

0:40:03.339,0:40:13.129
<i>applause</i>

0:40:13.129,0:40:17.519
Herald: So, let's have a question and answer.

0:40:17.519,0:40:19.080
Microphone 2, please.

0:40:19.080,0:40:20.199
Mic 2: Hi there.

0:40:20.199,0:40:23.259
Thanks for the talk.

0:40:23.259,0:40:26.230
Since these pre-crime softwares also

0:40:26.230,0:40:27.359
arrived here in Germany

0:40:27.359,0:40:29.680
with the start of the so-called CopWatch system

0:40:29.680,0:40:32.779
in southern Germany,[br]and Bavaria and Nuremberg especially,

0:40:32.779,0:40:35.420
where they try to predict burglary crime

0:40:35.420,0:40:37.460
using that criminal record

0:40:37.460,0:40:40.170
geographical analysis, like you explained,

0:40:40.170,0:40:43.380
leads me to a 2-fold question:

0:40:43.380,0:40:47.900
first, have you heard of any research

0:40:47.900,0:40:49.760
that measures the effectiveness

0:40:49.760,0:40:53.690
of such measures, at all?

0:40:53.690,0:40:57.040
And, second:

0:40:57.040,0:41:00.599
What do you think of the game theory

0:41:00.599,0:41:02.690
if the thieves or the bad guys

0:41:02.690,0:41:07.619
know the system, and when they[br]game the system,

0:41:07.619,0:41:09.980
they will probably win,

0:41:09.980,0:41:11.640
since one police officer in an interview said

0:41:11.640,0:41:14.019
this system is used to reduce

0:41:14.019,0:41:16.460
the personal costs of policing,

0:41:16.460,0:41:19.460
so they just send the guys[br]where the red flags are,

0:41:19.460,0:41:22.290
and the others take the day off.

0:41:22.290,0:41:24.360
Dr. Helsby: Yup.

0:41:24.360,0:41:27.150
Um, so, with respect to

0:41:27.150,0:41:30.990
testing the effectiveness of predictive policing,

0:41:30.990,0:41:31.990
the companies,

0:41:31.990,0:41:33.910
some of them do randomized, controlled trials

0:41:33.910,0:41:35.240
and claim a reduction in policing.

0:41:35.240,0:41:38.349
The best independent study that I've seen

0:41:38.349,0:41:40.680
is by this RAND Corporation

0:41:40.680,0:41:43.120
that did a study in, I think,

0:41:43.120,0:41:44.920
Shreveport, Louisiana,

0:41:44.920,0:41:47.589
and in their report they claim

0:41:47.589,0:41:50.190
that there was no statistically significant

0:41:50.190,0:41:52.900
difference, they didn't find any reduction.

0:41:52.900,0:41:54.099
And it <i>was</i> specifically looking at

0:41:54.099,0:41:56.730
property crime, which I think you mentioned.

0:41:56.730,0:41:59.480
So, I think right now there's sort of

0:41:59.480,0:42:01.069
conflicting reports between

0:42:01.069,0:42:06.180
the independent auditors[br]and these company claims.

0:42:06.180,0:42:09.289
So there definitely needs to be more study.

0:42:09.289,0:42:12.240
And then, the 2nd thing...sorry,[br]remind me what it was?

0:42:12.240,0:42:15.189
Mic 2: What about the guys gaming the system?

0:42:15.189,0:42:16.949
Dr. Helsby: Oh, yeah.

0:42:16.949,0:42:18.900
I think it's a legitimate concern.

0:42:18.900,0:42:22.480
Like, if all the outputs[br]were just immediately public,

0:42:22.480,0:42:24.599
then, yes, everyone knows the location

0:42:24.599,0:42:26.549
of all police officers,

0:42:26.549,0:42:29.009
and I imagine that people would have

0:42:29.009,0:42:30.779
a problem with that.

0:42:30.779,0:42:32.679
Yup.

0:42:32.679,0:42:35.990
Heraldl: Microphone #4, please.

0:42:35.990,0:42:39.369
Mic 4: Yeah, this is not actually a question,

0:42:39.369,0:42:40.779
but just a comment.

0:42:40.779,0:42:42.970
I've enjoyed your talk very much,

0:42:42.970,0:42:47.789
in particular after watching

0:42:47.789,0:42:52.270
the talk in Hall 1 earlier in the afternoon.

0:42:52.270,0:42:55.730
The "Say Hi to Your New Boss", about

0:42:55.730,0:42:59.609
algorithms that are trained with big data,

0:42:59.609,0:43:02.390
and finally make decisions.

0:43:02.390,0:43:08.210
And I think these 2 talks are kind of complementary,

0:43:08.210,0:43:11.309
and if people are interested in the topic

0:43:11.309,0:43:14.710
they might want to check out the other talk

0:43:14.710,0:43:16.259
and watch it later, because these

0:43:16.259,0:43:17.319
fit very well together.

0:43:17.319,0:43:19.589
Dr. Helsby: Yeah, it was a great talk.

0:43:19.589,0:43:22.130
Herald: Microphone #2, please.

0:43:22.130,0:43:25.049
Mic 2: Um, yeah, you mentioned

0:43:25.049,0:43:27.319
the need to have some kind of 3rd-party auditing

0:43:27.319,0:43:30.900
or some kind of way to

0:43:30.900,0:43:31.930
peek into these algorithms

0:43:31.930,0:43:33.079
and to see what they're doing,

0:43:33.079,0:43:34.420
and to see if they're being fair.

0:43:34.420,0:43:36.199
Can you talk a little bit more about that?

0:43:36.199,0:43:38.059
Like, going forward,

0:43:38.059,0:43:40.690
some kind of regulatory structures

0:43:40.690,0:43:44.200
would probably have to emerge

0:43:44.200,0:43:47.200
to analyze and to look at

0:43:47.200,0:43:49.339
these black boxes that are just sort of

0:43:49.339,0:43:51.309
popping up everywhere and, you know,

0:43:51.309,0:43:52.939
controlling more and more of the things

0:43:52.939,0:43:56.150
in our lives, and important decisions.

0:43:56.150,0:43:58.539
So, just, what kind of discussions

0:43:58.539,0:43:59.460
are there for that?

0:43:59.460,0:44:01.809
And what kind of possibility[br]is there for that?

0:44:01.809,0:44:04.900
And, I'm sure that companies would be

0:44:04.900,0:44:08.000
very, very resistant to

0:44:08.000,0:44:09.890
any kind of attempt to look into

0:44:09.890,0:44:13.890
algorithms, and to...

0:44:13.890,0:44:15.070
Dr. Helsby: Yeah, I mean, definitely

0:44:15.070,0:44:18.069
companies would be very resistant to

0:44:18.069,0:44:19.670
having people look into their algorithms.

0:44:19.670,0:44:22.190
So, if you wanna do a very rigorous

0:44:22.190,0:44:23.339
audit of what's going on

0:44:23.339,0:44:25.660
then it's probably necessary to have

0:44:25.660,0:44:26.589
a few people come in

0:44:26.589,0:44:28.900
and sign NDAs, and then

0:44:28.900,0:44:31.039
look through the systems.

0:44:31.039,0:44:33.140
So, that's one way to proceed.

0:44:33.140,0:44:35.049
But, another way to proceed that--

0:44:35.049,0:44:38.720
so, these academic researchers have done

0:44:38.720,0:44:40.009
a few experiments

0:44:40.009,0:44:42.809
and found some interesting things,

0:44:42.809,0:44:45.500
and that's sort all the attempts at auditing

0:44:45.500,0:44:46.450
that we've seen:

0:44:46.450,0:44:48.490
there was 1 attempt in 2012[br]for the Obama campaign,

0:44:48.490,0:44:49.910
but there's really not been any

0:44:49.910,0:44:51.500
sort of systematic attempt--

0:44:51.500,0:44:52.589
you know, like, in censorship

0:44:52.589,0:44:54.539
we see a systematic attempt to

0:44:54.539,0:44:56.779
do measurement as often as possible,

0:44:56.779,0:44:58.240
check what's going on,

0:44:58.240,0:44:59.339
and that itself, you know,

0:44:59.339,0:45:00.900
can act as an oversight mechanism.

0:45:00.900,0:45:01.880
But, right now,

0:45:01.880,0:45:03.900
I think many of these companies

0:45:03.900,0:45:05.259
realize no one is watching,

0:45:05.259,0:45:07.160
so there's no real push to have

0:45:07.160,0:45:10.440
people verify: are you being fair when you

0:45:10.440,0:45:11.539
implement this system?

0:45:11.539,0:45:12.969
Because no one's really checking.

0:45:12.969,0:45:13.980
Mic 2: Do you think that,

0:45:13.980,0:45:15.339
at some point, it would be like

0:45:15.339,0:45:19.059
an FDA or SEC, to give some American examples...

0:45:19.059,0:45:21.490
an actual government regulatory agency

0:45:21.490,0:45:24.960
that has the power and ability to

0:45:24.960,0:45:27.930
not just sort of look and try to

0:45:27.930,0:45:31.710
reverse engineer some of these algorithms,

0:45:31.710,0:45:33.920
but actually peek in there and make sure

0:45:33.920,0:45:36.420
that things are fair, because it seems like

0:45:36.420,0:45:38.240
there's just-- it's so important now

0:45:38.240,0:45:41.769
that, again, it could be the difference between

0:45:41.769,0:45:42.930
life and death, between

0:45:42.930,0:45:44.589
getting a job, not getting a job,

0:45:44.589,0:45:46.130
being pulled over,[br]not being pulled over,

0:45:46.130,0:45:48.069
being racially profiled,[br]not racially profiled,

0:45:48.069,0:45:49.410
things like that.[br]Dr. Helsby: Right.

0:45:49.410,0:45:50.430
Mic 2: Is it moving in that direction?

0:45:50.430,0:45:52.249
Or is it way too early for it?

0:45:52.249,0:45:55.110
Dr. Helsby: I mean, so some people have...

0:45:55.110,0:45:56.859
someone has called for, like,

0:45:56.859,0:45:59.079
a Federal Search Commission,

0:45:59.079,0:46:00.930
or like a Federal Algorithms Commission,

0:46:00.930,0:46:03.200
that would do this sort of oversight work,

0:46:03.200,0:46:06.130
but it's in such early stages right now

0:46:06.130,0:46:09.970
that there's no real push for that.

0:46:09.970,0:46:13.330
But I think it's a good idea.

0:46:13.330,0:46:15.729
Herald: And again, #2 please.

0:46:15.729,0:46:17.059
Mic 2: Thank you again for your talk.

0:46:17.059,0:46:19.309
I was just curious if you can point

0:46:19.309,0:46:20.440
to any examples of

0:46:20.440,0:46:22.619
either current producers or consumers

0:46:22.619,0:46:24.029
of these algorithmic systems

0:46:24.029,0:46:26.390
who are actively and publicly trying

0:46:26.390,0:46:27.720
to do so in a responsible manner

0:46:27.720,0:46:29.720
by describing what they're trying to do

0:46:29.720,0:46:31.380
and how they're going about it?

0:46:31.380,0:46:37.210
Dr. Helsby: So, yeah, there are some companies,

0:46:37.210,0:46:39.000
for example, like DataKind,

0:46:39.000,0:46:42.710
that try to deploy algorithmic systems

0:46:42.710,0:46:44.640
in as responsible a way as possible,

0:46:44.640,0:46:47.250
for like public policy.

0:46:47.250,0:46:49.549
Like, I actually also implement systems

0:46:49.549,0:46:51.750
for public policy in a transparent way.

0:46:51.750,0:46:54.329
Like, all the code is in GitHub, etc.

0:46:54.329,0:47:00.020
And it is also the case to give credit to

0:47:00.020,0:47:01.990
Google, and these giants,

0:47:01.990,0:47:06.109
they're trying to implement transparency systems

0:47:06.109,0:47:08.170
that help you understand.

0:47:08.170,0:47:09.289
This has been done with respect to

0:47:09.289,0:47:12.329
how your data is being collected,

0:47:12.329,0:47:14.579
but for example if you go on Amazon.com

0:47:14.579,0:47:17.890
you can see a recommendation has been made,

0:47:17.890,0:47:19.420
and that is pretty transparent.

0:47:19.420,0:47:21.480
You can see "this item[br]was recommended to me,"

0:47:21.480,0:47:25.039
so you know that prediction[br]is being used in this case,

0:47:25.039,0:47:27.089
and it will say why prediction is being used:

0:47:27.089,0:47:29.230
because you purchased some item.

0:47:29.230,0:47:30.380
And Google has a similar thing,

0:47:30.380,0:47:32.420
if you go to like Google Ad Settings,

0:47:32.420,0:47:35.249
you can even turn off personalization of ads

0:47:35.249,0:47:36.380
if you want,

0:47:36.380,0:47:38.119
and you can also see some of the inferences

0:47:38.119,0:47:39.400
that have been learned about you.

0:47:39.400,0:47:40.819
A subset of the inferences that have been

0:47:40.819,0:47:41.700
learned about you.

0:47:41.700,0:47:43.940
So, like, what interests...

0:47:43.940,0:47:47.869
Herald: A question from the internet, please?

0:47:47.869,0:47:50.930
Signal Angel: Yes, billetQ is asking

0:47:50.930,0:47:54.479
how do you avoid biases in machine learning?

0:47:54.479,0:47:57.380
I asume analysis system, for example,

0:47:57.380,0:48:00.420
could be biased against women and minorities,

0:48:00.420,0:48:04.960
if used for hiring decisions[br]based on known data.

0:48:04.960,0:48:06.499
Dr. Helsby: Yeah, so one thing is to

0:48:06.499,0:48:08.529
just explicitly check.

0:48:08.529,0:48:12.199
So, you can check to see how

0:48:12.199,0:48:14.309
positive outcomes are being distributed

0:48:14.309,0:48:16.779
among those protected classes.

0:48:16.779,0:48:19.210
You could also incorporate these sort of

0:48:19.210,0:48:21.440
fairness constraints in the function

0:48:21.440,0:48:24.069
that you optimize when you train the system,

0:48:24.069,0:48:25.950
and so, if you're interested in reading more

0:48:25.950,0:48:28.960
about this, the 2 papers--

0:48:28.960,0:48:31.909
let me go to References--

0:48:31.909,0:48:32.730
there's a good paper called

0:48:32.730,0:48:35.339
Fairness Through Awareness that describes

0:48:35.339,0:48:37.499
how to go about doing this,

0:48:37.499,0:48:39.579
so I recommend this person read that.

0:48:39.579,0:48:40.970
It's good.

0:48:40.970,0:48:43.400
Herald: Microphone 2, please.

0:48:43.400,0:48:45.400
Mic2: Thanks again for your talk.

0:48:45.400,0:48:49.649
Umm, hello?

0:48:49.649,0:48:50.999
Okay.

0:48:50.999,0:48:52.960
Umm, I see of course a problem with

0:48:52.960,0:48:54.619
all the black boxes that you describe

0:48:54.619,0:48:57.069
with regards for the crime systems,

0:48:57.069,0:48:59.569
but when we look at the advertising systems

0:48:59.569,0:49:02.169
in many cases they are very networked.

0:49:02.169,0:49:04.160
There are many different systems collaborating

0:49:04.160,0:49:07.109
and exchanging data via open APIs:

0:49:07.109,0:49:08.720
RESTful APIs, and various

0:49:08.720,0:49:11.720
demand-side platforms[br]and audience-exchange platforms,

0:49:11.720,0:49:12.539
and everything.

0:49:12.539,0:49:15.420
So, can that help to at least

0:49:15.420,0:49:22.160
increase awareness on where targeting, personalization

0:49:22.160,0:49:23.679
might be happening?

0:49:23.679,0:49:26.190
I mean, I'm looking at systems like

0:49:26.190,0:49:29.539
BuiltWith, that surface what kind of

0:49:29.539,0:49:31.380
JavaScript libraries are used elsewhere.

0:49:31.380,0:49:32.999
So, is that something that could help

0:49:32.999,0:49:35.670
at least to give a better awareness

0:49:35.670,0:49:38.690
and listing all the points where

0:49:38.690,0:49:41.409
you might be targeted...

0:49:41.409,0:49:43.070
Dr. Helsby: So, like, with respect to

0:49:43.070,0:49:46.460
advertising, the fact that[br]there is behind the scenes

0:49:46.460,0:49:48.450
this like complicated auction process

0:49:48.450,0:49:50.650
that's occurring, just makes things

0:49:50.650,0:49:51.819
a lot more complicated.

0:49:51.819,0:49:54.170
So, for example, I said briefly

0:49:54.170,0:49:57.269
that they found that there's this[br]statistical difference

0:49:57.269,0:49:59.099
between how men and women are treated,

0:49:59.099,0:50:01.339
but it doesn't necessarily mean that

0:50:01.339,0:50:03.640
"Oh, the algorithm is definitely biased."

0:50:03.640,0:50:06.369
It could be because of this auction process,

0:50:06.369,0:50:10.569
it could be that women are considered

0:50:10.569,0:50:12.630
more valuable when it comes to advertising,

0:50:12.630,0:50:15.099
and so these executive ads are getting

0:50:15.099,0:50:17.160
outbid by some other ads,

0:50:17.160,0:50:18.890
and so there's a lot of potential

0:50:18.890,0:50:20.490
causes for that.

0:50:20.490,0:50:22.829
So, I think it just makes things[br]a lot more complicated.

0:50:22.829,0:50:25.910
I don't know if it helps[br]with the bias at all.

0:50:25.910,0:50:27.410
Mic 2: Well, the question was more

0:50:27.410,0:50:30.299
a direction... can it help to surface

0:50:30.299,0:50:32.499
and make people aware of that fact?

0:50:32.499,0:50:34.930
I mean, I can talk to my kids probably,

0:50:34.930,0:50:36.259
and they will probably understand,

0:50:36.259,0:50:38.420
but I can't explain that to my grandma,

0:50:38.420,0:50:43.150
who's also, umm, looking at an iPad.

0:50:43.150,0:50:44.289
Dr. Helsby: So, the fact that

0:50:44.289,0:50:45.690
the systems are...

0:50:45.690,0:50:48.509
I don't know if I understand.

0:50:48.509,0:50:50.529
Mic 2: OK. I think that the main problem

0:50:50.529,0:50:53.710
is that we are behind the industry efforts

0:50:53.710,0:50:57.179
to being targeted at, and many people

0:50:57.179,0:51:00.579
do know, but a lot more people don't know,

0:51:00.579,0:51:03.160
and making them aware of the fact

0:51:03.160,0:51:07.269
that they are a target, in a way,

0:51:07.269,0:51:10.990
is something that can only be shown

0:51:10.990,0:51:14.779
by a 3rd party that disposed that data,

0:51:14.779,0:51:16.339
and make audits in a way--

0:51:16.339,0:51:17.929
maybe in an automated way.

0:51:17.929,0:51:19.170
Dr. Helsby: Right.

0:51:19.170,0:51:21.410
Yeah, I think it certainly[br]could help with advocacy

0:51:21.410,0:51:23.059
if that's the point, yeah.

0:51:23.059,0:51:26.079
Herald: Another question[br]from the internet, please.

0:51:26.079,0:51:29.319
Signal Angel: Yes, on IRC they are asking

0:51:29.319,0:51:31.440
if we know that prediction in some cases

0:51:31.440,0:51:34.460
provides an influence that cannot be controlled.

0:51:34.460,0:51:38.480
So, r4v5 would like to know from you

0:51:38.480,0:51:41.519
if there are some cases or areas where

0:51:41.519,0:51:45.060
machine learning simply shouldn't go?

0:51:45.060,0:51:48.349
Dr. Helsby: Umm, so I think...

0:51:48.349,0:51:52.559
I mean, yes, I think that it is the case

0:51:52.559,0:51:54.650
that in some cases machine learning

0:51:54.650,0:51:56.180
might not be appropriate.

0:51:56.180,0:51:58.359
For example, if you use machine learning

0:51:58.359,0:52:00.970
to decide who should be searched.

0:52:00.970,0:52:02.619
I don't think it should be the case that

0:52:02.619,0:52:03.809
machine learning algorithms should

0:52:03.809,0:52:05.440
ever be used to determine

0:52:05.440,0:52:08.430
probable cause, or something like that.

0:52:08.430,0:52:12.339
So, if it's just one piece of evidence

0:52:12.339,0:52:13.299
that you consider,

0:52:13.299,0:52:14.990
and there's human oversight always,

0:52:14.990,0:52:18.519
<i>maybe</i> it's fine, but

0:52:18.519,0:52:20.839
we should be very suspicious and hesitant

0:52:20.839,0:52:22.119
in certain contexts where

0:52:22.119,0:52:24.529
the ramifications are very serious.

0:52:24.529,0:52:27.259
Like the No Fly List, and so on.

0:52:27.259,0:52:29.200
Herald: And #2 again.

0:52:29.200,0:52:30.809
Mic 2: A second question

0:52:30.809,0:52:33.509
that just occurred to me, if you don't mind.

0:52:33.509,0:52:35.339
Umm, until the advent of

0:52:35.339,0:52:36.559
algorithmic systems,

0:52:36.559,0:52:40.470
when there've been cases of serious harm

0:52:40.470,0:52:42.799
that's been resulted in individuals or groups,

0:52:42.799,0:52:44.579
and it's been demonstrated that

0:52:44.579,0:52:46.029
it's occurred because of

0:52:46.029,0:52:49.400
an individual or a system of people

0:52:49.400,0:52:53.019
being systematically biased, then often

0:52:53.019,0:52:55.130
one of the actions that's taken is

0:52:55.130,0:52:56.869
pressure's applied, and then

0:52:56.869,0:52:59.660
people are required to change,

0:52:59.660,0:53:01.049
and hopely be held responsible,

0:53:01.049,0:53:02.910
and then change the way that they do things

0:53:02.910,0:53:06.400
to try to remove bias from that system.

0:53:06.400,0:53:07.839
What's the current thinking about

0:53:07.839,0:53:10.299
how we can go about doing that

0:53:10.299,0:53:12.599
when the systems that are doing that

0:53:12.599,0:53:13.650
are algorithmic?

0:53:13.650,0:53:15.999
Is it just going to be human oversight,

0:53:15.999,0:53:16.910
and humans are gonna have to be

0:53:16.910,0:53:18.379
held responsible for the oversight?

0:53:18.379,0:53:20.890
Dr. Helsby: So, in terms of bias,

0:53:20.890,0:53:22.569
if we're concerned about bias towards

0:53:22.569,0:53:24.019
particular types of people,

0:53:24.019,0:53:25.710
that's something that we can optimize for.

0:53:25.710,0:53:28.839
So, we can train systems that are unbiased

0:53:28.839,0:53:30.019
in this way.

0:53:30.019,0:53:32.109
So that's one way to deal with it.

0:53:32.109,0:53:34.039
But there's always gonna be errors,

0:53:34.039,0:53:35.420
so that's sort of a separate issue

0:53:35.420,0:53:37.509
from the bias, and in the case

0:53:37.509,0:53:39.180
where there are errors,

0:53:39.180,0:53:40.539
there must be oversight.

0:53:40.539,0:53:45.079
So, one way that one could improve

0:53:45.079,0:53:46.410
the way that this is done

0:53:46.410,0:53:48.160
is by making sure that you're

0:53:48.160,0:53:50.799
keeping track of confidence of decisions.

0:53:50.799,0:53:54.039
So, if you have a low confidence prediction,

0:53:54.039,0:53:56.259
then maybe a human[br]should come in and check things.

0:53:56.259,0:53:58.809
So, that might be one way to proceed.

0:54:02.099,0:54:03.990
Herald: So, there's no more question.

0:54:03.990,0:54:06.199
I close this talk now,

0:54:06.199,0:54:08.239
and thank you very much

0:54:08.239,0:54:09.410
and a big applause to

0:54:09.410,0:54:11.780
Jennifer Helsby!

0:54:11.780,0:54:16.310
<i>roaring applause</i>

0:54:16.310,0:54:28.000
subtitles created by c3subtitles.de[br]Join, and help us!