0:00:00.000,0:00:08.895
Musik
0:00:08.895,0:00:20.040
Herald: Who of you is using Facebook? Twitter? [br]Diaspora?
0:00:20.040,0:00:27.630
concerned noise And all of that data[br]you enter there
0:00:27.630,0:00:34.240
gets to server, gets into the hand of somebody[br]who's using it
0:00:34.240,0:00:38.519
and the next talk[br]is especially about that,
0:00:38.519,0:00:43.879
because there's also intelligent machines[br]and intelligent algorithms
0:00:43.879,0:00:47.489
that try to make something[br]out of that data.
0:00:47.489,0:00:50.920
So the post-doc researcher Jennifer Helsby
0:00:50.920,0:00:55.839
of the University of Chicago,[br]which works in this
0:00:55.839,0:00:59.370
intersection between policy and [br]technology,
0:00:59.370,0:01:04.709
will now ask you the question:[br]To who would we give that power?
0:01:04.709,0:01:12.860
Dr. Helsby: Thanks.[br]applause
0:01:12.860,0:01:17.090
Okay, so, today I'm gonna do a brief tour[br]of intelligent systems
0:01:17.090,0:01:18.640
and how they're currently used
0:01:18.640,0:01:21.760
and then we're gonna look at some examples[br]with respect
0:01:21.760,0:01:23.710
to the properties that we might care about
0:01:23.710,0:01:26.000
these systems having,[br]and I'll talk a little bit about
0:01:26.000,0:01:27.940
some of the work that's been done in academia
0:01:27.940,0:01:28.680
on these topics.
0:01:28.680,0:01:31.780
And then we'll talk about some[br]promising paths forward.
0:01:31.780,0:01:37.040
So, I wanna start with this:[br]Kranzberg's First Law of Technology
0:01:37.040,0:01:40.420
So, it's not good or bad,[br]but it also isn't neutral.
0:01:40.420,0:01:42.980
Technology shapes our world,[br]and it can act as
0:01:42.980,0:01:46.140
a liberating force-- or an oppressive and[br]controlling force.
0:01:46.140,0:01:49.730
So, in this talk, I'm gonna go[br]towards some of the aspects
0:01:49.730,0:01:53.830
of intelligent systems that might be more[br]controlling in nature.
0:01:53.830,0:01:56.060
So, as we all know,
0:01:56.060,0:01:59.770
because of the rapidly decreasing cost[br]of storage and computation,
0:01:59.770,0:02:02.170
along with the rise of new sensor technologies,
0:02:02.170,0:02:05.510
data collection devices[br]are being pushed into every
0:02:05.510,0:02:08.329
aspect of our lives: in our homes, our cars,
0:02:08.329,0:02:10.469
in our pockets, on our wrists.
0:02:10.469,0:02:13.280
And data collection systems act as intermediaries
0:02:13.280,0:02:15.230
for a huge amount of human communication.
0:02:15.230,0:02:17.900
And much of this data sits in government
0:02:17.900,0:02:19.860
and corporate databases.
0:02:19.860,0:02:23.090
So, in order to make use of this data,
0:02:23.090,0:02:27.280
we need to be able to make some inferences.
0:02:27.280,0:02:30.280
So, one way of approaching this is I can hire
0:02:30.280,0:02:32.310
a lot of humans, and I can have these humans
0:02:32.310,0:02:34.990
manually examine the data, and they can acquire
0:02:34.990,0:02:36.900
expert knowledge of the domain, and then
0:02:36.900,0:02:38.510
perhaps they can make some decisions
0:02:38.510,0:02:40.830
or at least some recommendations[br]based on it.
0:02:40.830,0:02:43.030
However, there's some problems with this.
0:02:43.030,0:02:45.810
One is that it's slow, and thus expensive.
0:02:45.810,0:02:48.060
It's also biased. We know that humans have
0:02:48.060,0:02:50.700
all sorts of biases, both conscious and unconscious,
0:02:50.700,0:02:53.390
and it would be nice to have a system[br]that did not have
0:02:53.390,0:02:54.959
these inaccuracies.
0:02:54.959,0:02:57.069
It's also not very transparent: I might
0:02:57.069,0:02:58.910
not really know the factors that led to
0:02:58.910,0:03:00.930
some decisions being made.
0:03:00.930,0:03:03.360
Even humans themselves[br]often don't really understand
0:03:03.360,0:03:05.360
why they came to a given decision, because
0:03:05.360,0:03:08.130
of their being emotional in nature.
0:03:08.130,0:03:11.530
And, thus, these human decision making systems
0:03:11.530,0:03:13.170
are often difficult to audit.
0:03:13.170,0:03:15.819
So, another way to proceed is maybe instead
0:03:15.819,0:03:18.000
I study the system and the data carefully
0:03:18.000,0:03:20.520
and I write down the best rules[br]for making a decision
0:03:20.520,0:03:23.280
or, I can have a machine[br]dynamically figure out
0:03:23.280,0:03:25.459
the best rules, as in machine learning.
0:03:25.459,0:03:28.640
So, maybe this is a better approach.
0:03:28.640,0:03:32.230
It's certainly fast, and thus cheap.
0:03:32.230,0:03:34.290
And maybe I can construct[br]the system in such a way
0:03:34.290,0:03:37.090
that it doesn't have the biases that are inherent
0:03:37.090,0:03:39.209
in human decision making.
0:03:39.209,0:03:41.560
And, since I've written these rules down,
0:03:41.560,0:03:42.819
or a computer has learned these rules,
0:03:42.819,0:03:45.140
then I can just show them to somebody, right?
0:03:45.140,0:03:46.819
And then they can audit it.
0:03:46.819,0:03:49.020
So, more and more decision making is being
0:03:49.020,0:03:50.750
done in this way.
0:03:50.750,0:03:53.170
And so, in this model, we take data
0:03:53.170,0:03:55.709
we make an inference based on that data
0:03:55.709,0:03:58.120
using these algorithms, and then
0:03:58.120,0:03:59.420
we can take actions.
0:03:59.420,0:04:01.860
And, when we take this more scientific approach
0:04:01.860,0:04:04.200
to making decisions and optimizing for
0:04:04.200,0:04:07.310
a desired outcome,[br]we can take an experimental approach
0:04:07.310,0:04:10.080
so we can determine[br]which actions are most effective
0:04:10.080,0:04:12.310
in achieving a desired outcome.
0:04:12.310,0:04:14.010
Maybe there are some types of communication
0:04:14.010,0:04:16.750
styles that are most effective[br]with certain people.
0:04:16.750,0:04:19.510
I can perhaps deploy some individualized incentives
0:04:19.510,0:04:22.060
to get the outcome that I desire.
0:04:22.060,0:04:25.990
And, maybe even if I carefully design an experiment
0:04:25.990,0:04:27.810
with the environment in which people make
0:04:27.810,0:04:30.699
these decisions, perhaps even very small changes
0:04:30.699,0:04:34.250
can introduce significant changes[br]in peoples' behavior.
0:04:34.250,0:04:37.320
So, through these mechanisms,[br]and this experimental approach,
0:04:37.320,0:04:39.840
I can maximize the probability[br]that humans do
0:04:39.840,0:04:42.020
what I want.
0:04:42.020,0:04:45.380
So, algorithmic decision making is being used
0:04:45.380,0:04:47.270
in industry, and is used[br]in lots of other areas,
0:04:47.270,0:04:49.530
from astrophysics to medicine, and is now
0:04:49.530,0:04:52.199
moving into new domains, including
0:04:52.199,0:04:53.990
government applications.
0:04:53.990,0:04:58.560
So, we have recommendation engines like[br]Netflix, Yelp, SoundCloud,
0:04:58.560,0:05:00.699
that direct our attention to what we should
0:05:00.699,0:05:03.510
watch and listen to.
0:05:03.510,0:05:07.919
Since 2009, Google uses[br]personalized searched results,
0:05:07.919,0:05:12.840
including if you're not logged in[br]into your Google account.
0:05:12.840,0:05:15.389
And we also have algorithm curation and filtering,
0:05:15.389,0:05:17.530
as in the case of Facebook News Feed,
0:05:17.530,0:05:19.870
Google News, Yahoo News,
0:05:19.870,0:05:22.840
which shows you what news articles, for example,
0:05:22.840,0:05:24.330
you should be looking at.
0:05:24.330,0:05:25.650
And this is important, because a lot of people
0:05:25.650,0:05:29.410
get news from these media.
0:05:29.410,0:05:31.520
We even have algorithmic journalists!
0:05:31.520,0:05:35.240
So, automatic systems generate articles
0:05:35.240,0:05:36.880
about weather, traffic, or sports
0:05:36.880,0:05:38.729
instead of a human.
0:05:38.729,0:05:41.949
And, another application that's more recent
0:05:41.949,0:05:43.570
is the use of predictive systems
0:05:43.570,0:05:45.180
in political campaigns.
0:05:45.180,0:05:47.370
So, political campaigns also now take this
0:05:47.370,0:05:50.340
approach to predict on an individual basis
0:05:50.340,0:05:53.300
which candidate voters[br]are likely to vote for.
0:05:53.300,0:05:55.500
And then they can target,[br]on an individual basis,
0:05:55.500,0:05:58.199
those that can be persuaded otherwise.
0:05:58.199,0:06:00.830
And, finally, in the public sector,
0:06:00.830,0:06:02.710
we're starting to use predictive systems
0:06:02.710,0:06:06.320
in areas from policing, to health,[br]to education and energy.
0:06:06.320,0:06:08.979
So, there are some advantages to this.
0:06:08.979,0:06:12.790
So, one thing is that we can automate
0:06:12.790,0:06:15.759
aspects of our lives[br]that we consider to be mundane
0:06:15.759,0:06:17.620
using systems that are intelligent
0:06:17.620,0:06:19.580
and adaptive enough.
0:06:19.580,0:06:21.680
We can make use of all the data
0:06:21.680,0:06:23.990
and really get the pieces of information we
0:06:23.990,0:06:25.830
really care about.
0:06:25.830,0:06:29.650
We can spend money in the most effective way,
0:06:29.650,0:06:32.110
and we can do this with this experimental
0:06:32.110,0:06:34.210
approach to optimize actions to produce
0:06:34.210,0:06:35.190
desired outcomes.
0:06:35.190,0:06:37.300
So, we can embed intelligence
0:06:37.300,0:06:39.520
into all of these mundane objects
0:06:39.520,0:06:41.180
and enable them to make decisions for us,
0:06:41.180,0:06:42.860
and so that's what we're doing more and more,
0:06:42.860,0:06:45.210
and we can have an object[br]that decides for us
0:06:45.210,0:06:46.840
what temperature we should set our house,
0:06:46.840,0:06:49.009
what we should be doing, etc.
0:06:49.009,0:06:52.400
So, there might be some implications here.
0:06:52.400,0:06:55.680
We want these systems[br]that do work on this data
0:06:55.680,0:06:58.039
to increase the opportunities[br]available to us.
0:06:58.039,0:07:00.259
But it might be that there are some implications
0:07:00.259,0:07:01.780
that we have not carefully thought through.
0:07:01.780,0:07:03.430
This is a new area, and people are only
0:07:03.430,0:07:05.940
starting to scratch the surface of what the
0:07:05.940,0:07:07.289
problems might be.
0:07:07.289,0:07:09.600
In some cases, they might narrow the options
0:07:09.600,0:07:10.990
available to people,
0:07:10.990,0:07:13.199
and this approach subjects people to
0:07:13.199,0:07:15.620
suggestive messaging intended to nudge them
0:07:15.620,0:07:17.169
to a desired outcome.
0:07:17.169,0:07:19.320
Some people may have a problem with that.
0:07:19.320,0:07:20.650
Values we care about are not gonna be
0:07:20.650,0:07:23.860
baked into these systems by default.
0:07:23.860,0:07:25.960
It's also the case that some algorithmic systems
0:07:25.960,0:07:28.300
facilitate work that we do not like.
0:07:28.300,0:07:30.199
For example, in the case of mass surveillance.
0:07:30.199,0:07:32.130
And even the same systems,
0:07:32.130,0:07:34.039
used by different people or organizations,
0:07:34.039,0:07:36.110
have very different consequences.
0:07:36.110,0:07:37.320
For example, if I can predict
0:07:37.320,0:07:40.020
with high accuracy, based on say search queries,
0:07:40.020,0:07:42.050
who's gonna be admitted to a hospital,
0:07:42.050,0:07:43.750
some people would be interested[br]in knowing that.
0:07:43.750,0:07:46.120
You might be interested[br]in having your doctor know that.
0:07:46.120,0:07:47.919
But that same predictive model[br]in the hands of
0:07:47.919,0:07:50.569
an insurance company[br]has a very different implication.
0:07:50.569,0:07:53.389
So, the point here is that these systems
0:07:53.389,0:07:55.860
structure and influence how humans interact
0:07:55.860,0:07:58.360
with each other, how they interact with society,
0:07:58.360,0:07:59.850
and how they interact with government.
0:07:59.850,0:08:03.080
And if they constrain what people can do,
0:08:03.080,0:08:05.069
we should really care about this.
0:08:05.069,0:08:08.270
So now I'm gonna go to[br]sort of an extreme case,
0:08:08.270,0:08:11.930
just as an example, and that's this[br]Chinese Social Credit System.
0:08:11.930,0:08:14.169
And so this is probably one of the more
0:08:14.169,0:08:17.259
ambitious uses of data,
0:08:17.259,0:08:18.880
that is used to rank each citizen
0:08:18.880,0:08:21.190
based on their behavior, in China.
0:08:21.190,0:08:24.210
So right now, there are various pilot systems
0:08:24.210,0:08:27.660
deployed by various companies doing this in[br]China.
0:08:27.660,0:08:30.729
They're currently voluntary, and by 2020
0:08:30.729,0:08:32.630
this system is gonna be decided on,
0:08:32.630,0:08:34.679
or a combination of the systems,
0:08:34.679,0:08:37.409
that is gonna be mandatory for everyone.
0:08:37.409,0:08:40.950
And so, in this system, there are some citizens,
0:08:40.950,0:08:44.380
and a huge range of data sources are used.
0:08:44.380,0:08:46.820
So, some of the data sources are
0:08:46.820,0:08:48.360
your financial data,
0:08:48.360,0:08:50.020
your criminal history,
0:08:50.020,0:08:52.320
how many points you have[br]on your driver's license,
0:08:52.320,0:08:55.360
medical information-- for example,[br]if you take birth control pills,
0:08:55.360,0:08:56.810
that's incorporated.
0:08:56.810,0:08:59.830
Your purchase history-- for example,[br]if you purchase games,
0:08:59.830,0:09:02.430
you are down-ranked in the system.
0:09:02.430,0:09:04.490
Some of the systems, not all of them,
0:09:04.490,0:09:07.260
incorporate social media monitoring,
0:09:07.260,0:09:09.200
which makes sense if you're a state like China,
0:09:09.200,0:09:11.270
you probably want to know about
0:09:11.270,0:09:14.899
political statements that people[br]are saying on social media.
0:09:14.899,0:09:18.020
And, one of the more interesting parts is
0:09:18.020,0:09:22.160
social network analysis:[br]looking at the relationships between people.
0:09:22.160,0:09:24.270
So, if you have a close relationship with[br]somebody
0:09:24.270,0:09:26.180
and they have a low credit score,
0:09:26.180,0:09:29.130
that can have implications on your credit[br]score.
0:09:29.130,0:09:34.440
So, the way that these scores[br]are generated is secret.
0:09:34.440,0:09:38.140
And, according to the call for these systems
0:09:38.140,0:09:39.270
put out by the government,
0:09:39.270,0:09:42.810
the goal is to[br]"carry forward the sincerity and
0:09:42.810,0:09:45.760
traditional virtues" and[br]establish the idea of a
0:09:45.760,0:09:47.520
"sincerity culture."
0:09:47.520,0:09:49.440
But wait, it gets better:
0:09:49.440,0:09:52.450
so, there's a portal that enables citizens
0:09:52.450,0:09:55.040
to look up the citizen score of anyone.
0:09:55.040,0:09:56.520
And many people like this system,
0:09:56.520,0:09:58.320
they think it's a fun game.
0:09:58.320,0:10:00.700
They boast about it on social media,
0:10:00.700,0:10:03.610
they put their score in their dating profile,
0:10:03.610,0:10:04.760
because if you're ranked highly you're
0:10:04.760,0:10:06.589
part of an exclusive club.
0:10:06.589,0:10:10.060
You can get VIP treatment[br]at hotels and other companies.
0:10:10.060,0:10:11.880
But the downside is that, if you're excluded
0:10:11.880,0:10:15.540
from that club, your weak score[br]may have other implications,
0:10:15.540,0:10:20.120
like being unable to get access[br]to credit, housing, jobs.
0:10:20.120,0:10:23.399
There is some reporting that even travel visas
0:10:23.399,0:10:27.000
might be restricted[br]if your score is particularly low.
0:10:27.000,0:10:31.160
So, a system like this, for a state, is really
0:10:31.160,0:10:34.690
the optimal solution[br]to the problem of the public.
0:10:34.690,0:10:37.130
It constitutes a very subtle and insiduous
0:10:37.130,0:10:39.350
mechanism of social control.
0:10:39.350,0:10:41.209
You don't need to spend a lot of money on
0:10:41.209,0:10:43.800
police or prisons if you can set up a system
0:10:43.800,0:10:45.820
where people discourage one another from
0:10:45.820,0:10:48.930
anti-social acts like political action[br]in exchange for
0:10:48.930,0:10:51.430
a coupon for a free Uber ride.
0:10:51.430,0:10:55.269
So, there are a lot of[br]legitimate questions here:
0:10:55.269,0:10:58.370
What protections does[br]user data have in this scheme?
0:10:58.370,0:11:01.279
Do any safeguards exist to prevent tampering?
0:11:01.279,0:11:04.310
What mechanism, if any, is there to prevent
0:11:04.310,0:11:08.810
false input data from creating erroneous inferences?
0:11:08.810,0:11:10.420
Is there any way that people can fix
0:11:10.420,0:11:12.540
their score once they're ranked poorly?
0:11:12.540,0:11:13.899
Or does it end up becoming a
0:11:13.899,0:11:15.720
self-fulfilling prophecy?
0:11:15.720,0:11:17.850
Your weak score means you have less access
0:11:17.850,0:11:21.620
to jobs and credit, and now you will have
0:11:21.620,0:11:24.709
limited access to opportunity.
0:11:24.709,0:11:27.110
So, let's take a step back.
0:11:27.110,0:11:28.470
So, what do we want?
0:11:28.470,0:11:31.540
So, we probably don't want that,
0:11:31.540,0:11:33.570
but as advocates we really wanna
0:11:33.570,0:11:36.130
understand what questions we should be asking
0:11:36.130,0:11:37.510
of these systems. Right now there's
0:11:37.510,0:11:39.570
very little oversight,
0:11:39.570,0:11:41.420
and we wanna make sure that we don't
0:11:41.420,0:11:44.029
sort of sleepwalk our way to a situation
0:11:44.029,0:11:46.649
where we've lost even more power
0:11:46.649,0:11:49.740
to these centralized systems of control.
0:11:49.740,0:11:52.209
And if you're an implementer, we wanna understand
0:11:52.209,0:11:53.709
what can we be doing better.
0:11:53.709,0:11:56.019
Are there better ways that we can be implementing
0:11:56.019,0:11:57.640
these systems?
0:11:57.640,0:11:59.430
Are there values that, as humans,
0:11:59.430,0:12:01.060
we care about that we should make sure
0:12:01.060,0:12:02.420
these systems have?
0:12:02.420,0:12:05.550
So, the first thing[br]that most people in the room
0:12:05.550,0:12:07.820
might think about is privacy.
0:12:07.820,0:12:10.510
Which is, of course, of the utmost importance.
0:12:10.510,0:12:12.920
We need privacy, and there is a good discussion
0:12:12.920,0:12:15.680
on the importance of protecting[br]user data where possible.
0:12:15.680,0:12:18.420
So, in this talk, I'm gonna focus[br]on the other aspects of
0:12:18.420,0:12:19.470
algorithmic decision making,
0:12:19.470,0:12:21.190
that I think have got less attention.
0:12:21.190,0:12:25.140
Because it's not just privacy[br]that we need to worry about here.
0:12:25.140,0:12:28.519
We also want systems that are fair and equitable.
0:12:28.519,0:12:30.240
We want transparent systems,
0:12:30.240,0:12:35.110
we don't want opaque decisions[br]to be made about us,
0:12:35.110,0:12:36.510
decisions that might have serious impacts
0:12:36.510,0:12:37.779
on our lives.
0:12:37.779,0:12:40.490
And we need some accountability mechanisms.
0:12:40.490,0:12:41.890
So, for the rest of this talk
0:12:41.890,0:12:43.230
we're gonna go through each one of these things
0:12:43.230,0:12:45.230
and look at some examples.
0:12:45.230,0:12:47.709
So, the first thing is fairness.
0:12:47.709,0:12:50.450
And so, as I said in the beginning,[br]this is one area
0:12:50.450,0:12:52.690
where there might be an advantage
0:12:52.690,0:12:55.079
to making decisions by machine,
0:12:55.079,0:12:56.740
especially in areas where there have
0:12:56.740,0:12:59.410
historically been fairness issues with
0:12:59.410,0:13:02.350
decision making, such as law enforcement.
0:13:02.350,0:13:05.839
So, this is one way that police departments
0:13:05.839,0:13:08.360
use predictive models.
0:13:08.360,0:13:10.540
The idea here is police would like to
0:13:10.540,0:13:13.450
allocate resources in a more effective way,
0:13:13.450,0:13:15.050
and they would also like to enable
0:13:15.050,0:13:16.640
proactive policing.
0:13:16.640,0:13:20.110
So, if you can predict where crimes[br]are going to occur,
0:13:20.110,0:13:22.149
or who is going to commit crimes,
0:13:22.149,0:13:24.870
then you can put cops in those places,
0:13:24.870,0:13:27.769
or perhaps following these people,
0:13:27.769,0:13:29.300
and then the crimes will not occur.
0:13:29.300,0:13:31.370
So, it's sort of the pre-crime approach.
0:13:31.370,0:13:34.649
So, there are a few ways of going about this.
0:13:34.649,0:13:37.920
One way is doing this individual-level prediction.
0:13:37.920,0:13:41.089
So you take each citizen[br]and estimate the risk
0:13:41.089,0:13:43.769
that each citizen will participate,[br]say, in violence
0:13:43.769,0:13:45.279
based on some data.
0:13:45.279,0:13:46.779
And then you can flag those people that are
0:13:46.779,0:13:49.199
considered particularly violent.
0:13:49.199,0:13:51.519
So, this is currently done.
0:13:51.519,0:13:52.589
This is done in the U.S.
0:13:52.589,0:13:56.120
It's done in Chicago,[br]by the Chicago Police Department.
0:13:56.120,0:13:58.350
And they maintain a heat list of individuals
0:13:58.350,0:14:00.790
that are considered most likely to commit,
0:14:00.790,0:14:03.529
or be the victim of, violence.
0:14:03.529,0:14:06.700
And this is done using data[br]that the police maintain.
0:14:06.700,0:14:09.589
So, the features that are used[br]in this predictive model
0:14:09.589,0:14:12.209
include things that are derived from
0:14:12.209,0:14:14.610
individuals' criminal history.
0:14:14.610,0:14:16.810
So, for example, have they been involved in
0:14:16.810,0:14:18.350
gun violence in the past?
0:14:18.350,0:14:21.450
Do they have narcotics arrests? And so on.
0:14:21.450,0:14:22.860
But another thing that's incorporated
0:14:22.860,0:14:25.060
in the Chicago Police Department model is
0:14:25.060,0:14:28.300
information derived from[br]social media network analysis.
0:14:28.300,0:14:30.630
So, who you interact with,
0:14:30.630,0:14:32.279
as noted in police data.
0:14:32.279,0:14:34.899
So, for example, your co-arrestees.
0:14:34.899,0:14:36.440
When officers conduct field interviews,
0:14:36.440,0:14:38.240
who are people interacting with?
0:14:38.240,0:14:42.940
And then this is all incorporated[br]into this risk score.
0:14:42.940,0:14:44.639
So another way to proceed,
0:14:44.639,0:14:47.070
which is the method that most companies
0:14:47.070,0:14:49.579
that sell products like this[br]to the police have taken,
0:14:49.579,0:14:51.459
is instead predicting which areas
0:14:51.459,0:14:53.810
are likely to have crimes committed in them.
0:14:53.810,0:14:56.690
So, take my city, I put a grid down,
0:14:56.690,0:14:58.180
and then I use crime statistics
0:14:58.180,0:15:00.430
and maybe some ancillary data sources,
0:15:00.430,0:15:01.790
to determine which areas have
0:15:01.790,0:15:04.709
the highest risk of crimes occurring in them,
0:15:04.709,0:15:06.329
and I can flag those areas and send
0:15:06.329,0:15:08.470
police officers to them.
0:15:08.470,0:15:10.950
So now, let's look at some of the tools
0:15:10.950,0:15:14.010
that are used for this geographic-level prediction.
0:15:14.010,0:15:19.040
So, here are 3 companies that sell these
0:15:19.040,0:15:22.910
geographic-level predictive policing systems.
0:15:22.910,0:15:25.639
So, PredPol has a system that uses
0:15:25.639,0:15:27.200
primarily crime statistics:
0:15:27.200,0:15:30.209
only the time, place, and type of crime
0:15:30.209,0:15:33.040
to predict where crimes will occur.
0:15:33.040,0:15:35.970
HunchLab uses a wider range of data sources
0:15:35.970,0:15:37.260
including, for example, weather
0:15:37.260,0:15:39.720
and then Hitachi is a newer system
0:15:39.720,0:15:42.100
that has a predictive crime analytics tool
0:15:42.100,0:15:44.779
that also incorporates social media.
0:15:44.779,0:15:47.850
The first one, to my knowledge, to do so.
0:15:47.850,0:15:49.399
And these systems are in use
0:15:49.399,0:15:52.820
in 50+ cities in the U.S.
0:15:52.820,0:15:56.540
So, why do police departments buy this?
0:15:56.540,0:15:57.760
Some police departments are interesting in
0:15:57.760,0:16:00.500
buying systems like this, because they're marketed
0:16:00.500,0:16:02.660
as impartial systems,
0:16:02.660,0:16:06.199
so it's a way to police in an unbiased way.
0:16:06.199,0:16:08.040
And so, these companies make
0:16:08.040,0:16:08.670
statements like this--
0:16:08.670,0:16:10.800
by the way, the references[br]will all be at the end,
0:16:10.800,0:16:12.560
and they'll be on the slides--
0:16:12.560,0:16:13.370
So, for example
0:16:13.370,0:16:16.110
the predictive crime analytics from Hitachi
0:16:16.110,0:16:17.610
claims that the system is anonymous,
0:16:17.610,0:16:19.350
because it shows you an area,
0:16:19.350,0:16:23.060
it doesn't show you[br]to look for a particular person.
0:16:23.060,0:16:25.699
and PredPol reassures people that
0:16:25.699,0:16:29.560
it eliminates any liberties or profiling concerns.
0:16:29.560,0:16:32.269
And HunchLab notes that the system
0:16:32.269,0:16:35.170
fairly represents priorities for public safety
0:16:35.170,0:16:38.769
and is unbiased by race[br]or ethnicity, for example.
0:16:38.769,0:16:43.529
So, let's take a minute[br]to describe in more detail
0:16:43.529,0:16:48.100
what we mean when we talk about fairness.
0:16:48.100,0:16:51.300
So, when we talk about fairness,
0:16:51.300,0:16:52.740
we mean a few things.
0:16:52.740,0:16:56.070
So, one is fairness with respect to individuals:
0:16:56.070,0:16:58.040
so if I'm very similar to somebody
0:16:58.040,0:17:00.170
and we go through some process
0:17:00.170,0:17:03.430
and there is two very different[br]outcomes to that process
0:17:03.430,0:17:05.679
we would consider that to be unfair.
0:17:05.679,0:17:07.929
So, we want similar people to be treated
0:17:07.929,0:17:09.539
in a similar way.
0:17:09.539,0:17:13.079
But, there are certain protected attributes
0:17:13.079,0:17:15.199
that we wouldn't want someone
0:17:15.199,0:17:17.099
to discriminate based on.
0:17:17.099,0:17:20.069
And so, there's this other property,[br]Group Fairness.
0:17:20.069,0:17:22.249
So, we can look at the statistical parity
0:17:22.249,0:17:25.439
between groups, based on gender, race, etc.
0:17:25.439,0:17:28.049
and see if they're treated in a similar way.
0:17:28.049,0:17:30.409
And we might not expect that in some cases,
0:17:30.409,0:17:32.429
for example if the base rates in each group
0:17:32.429,0:17:34.659
are very different.
0:17:34.659,0:17:36.889
And then there's also Fairness in Errors.
0:17:36.889,0:17:40.080
All predictive systems are gonna make errors,
0:17:40.080,0:17:42.989
and if the errors are concentrated,
0:17:42.989,0:17:46.399
then that may also represent unfairness.
0:17:46.399,0:17:50.149
And so this concern arose recently with Facebook
0:17:50.149,0:17:52.289
because people with Native American names
0:17:52.289,0:17:54.389
had their profiles flagged as fraudulent
0:17:54.389,0:17:58.759
far more often than those[br]with White American names.
0:17:58.759,0:18:00.559
So these are the sorts of things[br]that we worry about
0:18:00.559,0:18:02.190
and each of these are metrics,
0:18:02.190,0:18:04.239
and if you're interested more you should
0:18:04.239,0:18:06.159
check those 2 papers out.
0:18:06.159,0:18:10.639
So, how can potential issues[br]with predictive policing
0:18:10.639,0:18:13.850
have implications for these principles?
0:18:13.850,0:18:18.559
So, one problem is[br]the training data that's used.
0:18:18.559,0:18:21.059
Some of these systems only use crime statistics,
0:18:21.059,0:18:23.600
other systems-- all of them use crime statistics
0:18:23.600,0:18:25.619
in some way.
0:18:25.619,0:18:31.419
So, one problem is that crime databases
0:18:31.419,0:18:34.830
contain only crimes that've been detected.
0:18:34.830,0:18:38.629
Right? So, the police are only gonna detect
0:18:38.629,0:18:41.009
crimes that they know are happening,
0:18:41.009,0:18:44.109
either through patrol and their own investigation
0:18:44.109,0:18:46.320
or because they've been alerted to crime,
0:18:46.320,0:18:48.789
for example by a citizen calling the police.
0:18:48.789,0:18:52.179
So, a citizen has to feel like[br]they can call the police,
0:18:52.179,0:18:54.019
like that's a good idea.
0:18:54.019,0:18:58.789
So, some crimes suffer[br]from this problem less than others:
0:18:58.789,0:19:02.249
for example, gun violence[br]is much easier to detect
0:19:02.249,0:19:03.639
relative to fraud, for example,
0:19:03.639,0:19:07.509
which is very difficult to detect.
0:19:07.509,0:19:11.940
Now the racial profiling aspect[br]of this might come in
0:19:11.940,0:19:15.590
because of biased policing in the past.
0:19:15.590,0:19:19.999
So, for example, for marijuana arrests,
0:19:19.999,0:19:22.619
black people are arrested in the U.S. at rates
0:19:22.619,0:19:25.119
4 times that of white people,
0:19:25.119,0:19:27.960
even though there is statistical parity
0:19:27.960,0:19:31.389
with these 2 groups, to within a few percent.
0:19:31.389,0:19:35.820
So, this is where problems can arise.
0:19:35.820,0:19:37.159
So, let's go back to this
0:19:37.159,0:19:38.749
geographic-level predictive policing.
0:19:38.749,0:19:42.460
So the danger here is that, unless this system
0:19:42.460,0:19:44.299
is very carefully constructed,
0:19:44.299,0:19:47.090
this sort of crime area ranking might
0:19:47.090,0:19:49.019
again become a self-fulling prophecy.
0:19:49.019,0:19:51.460
If you send police officers to these areas,
0:19:51.460,0:19:53.220
you further scrutinize them,
0:19:53.220,0:19:55.659
and then again you're only detecting a subset
0:19:55.659,0:19:57.979
of crimes, and the cycle continues.
0:19:57.979,0:20:02.139
So, one obvious issue is that
0:20:02.139,0:20:07.599
this statement about geographic-based[br]crime prediction
0:20:07.599,0:20:10.229
being anonymous is not true,
0:20:10.229,0:20:13.159
because race and location are very strongly
0:20:13.159,0:20:14.840
correlated in the U.S.
0:20:14.840,0:20:16.609
And this is something that machine-learning[br]systems
0:20:16.609,0:20:20.049
can potentially learn.
0:20:20.049,0:20:23.039
Another issue is that, for example,
0:20:23.039,0:20:25.580
for individual fairness, one of my homes
0:20:25.580,0:20:27.599
sits within one of these boxes.
0:20:27.599,0:20:29.950
Some of these boxes[br]in these systems are very small,
0:20:29.950,0:20:33.399
for example PredPol is 500ft x 500ft,
0:20:33.399,0:20:36.349
so it's maybe only a few houses.
0:20:36.349,0:20:39.149
So, the implications of this system are that
0:20:39.149,0:20:40.849
you have police officers maybe sitting
0:20:40.849,0:20:42.979
in a police cruiser outside your home
0:20:42.979,0:20:45.450
and a few doors down someone
0:20:45.450,0:20:46.799
may not be within that box,
0:20:46.799,0:20:48.159
and doesn't have this.
0:20:48.159,0:20:51.399
So, that may represent unfairness.
0:20:51.399,0:20:54.929
So, there are real questions here,
0:20:54.929,0:20:57.720
especially because there's no opt-out.
0:20:57.720,0:21:00.059
There's no way to opt-out of this system:
0:21:00.059,0:21:02.239
if you live in a city that has this,
0:21:02.239,0:21:04.909
then you have to deal with it.
0:21:04.909,0:21:07.229
So, it's quite difficult to find out
0:21:07.229,0:21:09.879
what's really going on
0:21:09.879,0:21:11.169
because the algorithm is secret.
0:21:11.169,0:21:13.049
And, in most cases, we don't know
0:21:13.049,0:21:14.789
the full details of the inputs.
0:21:14.789,0:21:16.679
We have some idea[br]about what features are used,
0:21:16.679,0:21:17.970
but that's about it.
0:21:17.970,0:21:19.509
We also don't know the output.
0:21:19.509,0:21:21.899
That would be knowing police allocation,
0:21:21.899,0:21:23.179
police strategies,
0:21:23.179,0:21:26.299
and in order to nail down[br]what's really going on here
0:21:26.299,0:21:28.609
in order to verify the validity of
0:21:28.609,0:21:30.009
these companies' claims,
0:21:30.009,0:21:33.799
it may be necessary[br]to have a 3rd party come in,
0:21:33.799,0:21:35.629
examine the inputs and outputs of the system,
0:21:35.629,0:21:37.590
and say concretely what's going on.
0:21:37.590,0:21:39.460
And if everything is fine and dandy
0:21:39.460,0:21:40.929
then this shouldn't be a problem.
0:21:40.929,0:21:43.619
So, that's potentially one role that
0:21:43.619,0:21:44.769
advocates can play.
0:21:44.769,0:21:46.720
Maybe we should start pushing for audits
0:21:46.720,0:21:48.820
of systems that are used in this way.
0:21:48.820,0:21:50.970
These could have serious implications
0:21:50.970,0:21:52.679
for peoples' lives.
0:21:52.679,0:21:55.249
So, we'll return[br]to this idea a little bit later,
0:21:55.249,0:21:58.210
but for now this leads us[br]nicely to Transparency.
0:21:58.210,0:21:59.419
So, we wanna know
0:21:59.419,0:22:01.929
what these systems are doing.
0:22:01.929,0:22:04.729
But it's very hard,[br]for the reasons described earlier,
0:22:04.729,0:22:06.139
but even in the case of something like
0:22:06.139,0:22:09.849
trying to understand Google's search algorithm,
0:22:09.849,0:22:11.679
it's difficult because it's personalized.
0:22:11.679,0:22:13.529
So, by construction, each user is
0:22:13.529,0:22:15.320
only seeing one endpoint.
0:22:15.320,0:22:18.169
So, it's a very isolating system.
0:22:18.169,0:22:20.349
What do other people see?
0:22:20.349,0:22:22.409
And one reason it's difficult to make
0:22:22.409,0:22:24.099
some of these systems transparent
0:22:24.099,0:22:26.679
is because of, simply, the complexity
0:22:26.679,0:22:27.950
of the algorithms.
0:22:27.950,0:22:30.309
So, an algorithm can become so complex that
0:22:30.309,0:22:31.669
it's difficult to comprehend,
0:22:31.669,0:22:33.289
even for the designer of the system,
0:22:33.289,0:22:35.509
or the implementer of the system.
0:22:35.509,0:22:38.419
The designed might know that this algorithm
0:22:38.419,0:22:42.889
maximizes some metric-- say, accuracy,
0:22:42.889,0:22:44.570
but they may not always have a solid
0:22:44.570,0:22:46.779
understanding of what the algorithm is doing
0:22:46.779,0:22:48.330
for all inputs.
0:22:48.330,0:22:50.970
Certainly with respect to fairness.
0:22:50.970,0:22:55.759
So, in some cases,[br]it might not be appropriate to use
0:22:55.759,0:22:57.379
an extremely complex model.
0:22:57.379,0:22:59.529
It might be better to use a simpler system
0:22:59.529,0:23:02.910
with human-interpretable features.
0:23:02.910,0:23:04.749
Another issue that arises
0:23:04.749,0:23:07.559
from the opacity of these systems
0:23:07.559,0:23:09.409
and the centralized control
0:23:09.409,0:23:11.860
is that it makes them very influential.
0:23:11.860,0:23:13.950
And thus, an excellent target
0:23:13.950,0:23:16.210
for manipulation or tampering.
0:23:16.210,0:23:18.479
So, this might be tampering that is done
0:23:18.479,0:23:21.950
from an organization that controls the system,
0:23:21.950,0:23:23.769
or an insider at one of the organizations,
0:23:23.769,0:23:27.139
or anyone who's able to compromise their security.
0:23:27.139,0:23:30.249
So, this is an interesting academic work
0:23:30.249,0:23:32.099
that looked at the possibility of
0:23:32.099,0:23:34.159
slightly modifying search rankings
0:23:34.159,0:23:36.619
to shift people's political views.
0:23:36.619,0:23:39.009
So, since people are most likely to
0:23:39.009,0:23:41.330
click on the top search results,
0:23:41.330,0:23:44.429
so 90% of clicks go to the[br]first page of search results,
0:23:44.429,0:23:46.719
then perhaps by reshuffling[br]things a little bit,
0:23:46.719,0:23:48.729
or maybe dropping some search results,
0:23:48.729,0:23:50.269
you can influence people's views
0:23:50.269,0:23:51.679
in a coherent way,
0:23:51.679,0:23:53.090
and maybe you can make it so subtle
0:23:53.090,0:23:55.749
that no one is able to notice.
0:23:55.749,0:23:57.249
So in this academic study,
0:23:57.249,0:24:00.349
they did an experiment
0:24:00.349,0:24:02.070
in the 2014 Indian election.
0:24:02.070,0:24:04.219
So they used real voters,
0:24:04.219,0:24:06.450
and they kept the size[br]of the experiment small enough
0:24:06.450,0:24:08.190
that it was not going to influence the outcome
0:24:08.190,0:24:10.090
of the election.
0:24:10.090,0:24:12.139
So the researchers took people,
0:24:12.139,0:24:14.229
they determined their political leaning,
0:24:14.229,0:24:17.429
and they segmented them into[br]control and treatment groups,
0:24:17.429,0:24:19.269
where the treatment was manipulation
0:24:19.269,0:24:21.210
of the search ranking results,
0:24:21.210,0:24:24.409
And then they had these people[br]browse the web.
0:24:24.409,0:24:25.969
And what they found, is that
0:24:25.969,0:24:28.229
this mechanism is very effective at shifting
0:24:28.229,0:24:30.429
people's voter preferences.
0:24:30.429,0:24:33.649
So, in this study, they were able to introduce
0:24:33.649,0:24:36.849
a 20% shift in voter preferences.
0:24:36.849,0:24:39.299
Even alerting users to the fact that this
0:24:39.299,0:24:41.729
was going to be done, telling them
0:24:41.729,0:24:44.049
"we are going to manipulate your search results,"
0:24:44.049,0:24:45.729
"really pay attention,"
0:24:45.729,0:24:49.099
they were totally unable to decrease
0:24:49.099,0:24:50.859
the magnitude of the effect.
0:24:50.859,0:24:55.109
So, the margins of error in many elections
0:24:55.109,0:24:57.669
is incredibly small,
0:24:57.669,0:24:59.929
and the authors estimate that this shift
0:24:59.929,0:25:02.009
could change the outcome of about
0:25:02.009,0:25:07.109
25% of elections worldwide, if this were done.
0:25:07.109,0:25:10.919
And the bias is so small that no one can tell.
0:25:10.919,0:25:14.279
So, all humans, no matter how smart
0:25:14.279,0:25:17.109
and resistant to manipulation[br]we think we are,
0:25:17.109,0:25:21.909
all of us are subject to this sort of manipulation,
0:25:21.909,0:25:24.320
and we really can't tell.
0:25:24.320,0:25:27.129
So, I'm not saying that this is occurring,
0:25:27.129,0:25:31.389
but right now there is no[br]regulation to stop this,
0:25:31.389,0:25:34.409
there is no way we could reliably detect this,
0:25:34.409,0:25:37.210
so there's a huge amount of power here.
0:25:37.210,0:25:39.779
So, something to think about.
0:25:39.779,0:25:42.710
But it's not only corporations that are interested
0:25:42.710,0:25:47.269
in this sort of behavioral manipulation.
0:25:47.269,0:25:51.119
In 2010, UK Prime Minister David Cameron
0:25:51.119,0:25:54.969
created this UK Behavioural Insights Team,
0:25:54.969,0:25:57.269
which is informally called the Nudge Unit.
0:25:57.269,0:26:01.489
And so what they do is[br]they use behavioral science
0:26:01.489,0:26:04.769
and this predictive analytics approach,
0:26:04.769,0:26:06.119
with experimentation,
0:26:06.119,0:26:07.940
to have people make better decisions
0:26:07.940,0:26:09.690
for themselves and society--
0:26:09.690,0:26:11.989
as determined by the UK government.
0:26:11.989,0:26:14.269
And as of a few months ago,
0:26:14.269,0:26:16.849
after an executive order signed by Obama
0:26:16.849,0:26:19.349
in September, the United States now has
0:26:19.349,0:26:21.429
its own Nudge Unit.
0:26:21.429,0:26:24.009
So, to be clear, I don't think that this is
0:26:24.009,0:26:25.539
some sort of malicious plot.
0:26:25.539,0:26:27.440
I think that there can be huge value
0:26:27.440,0:26:29.489
in these sorts of initiatives,
0:26:29.489,0:26:31.330
positively impacting people's lives,
0:26:31.330,0:26:34.179
but when this sort of behavioral manipulation
0:26:34.179,0:26:37.289
is being done, in part openly,
0:26:37.289,0:26:39.460
oversight is pretty important,
0:26:39.460,0:26:41.700
and we really need to consider
0:26:41.700,0:26:46.090
what these systems are optimizing for.
0:26:46.090,0:26:47.849
And that's something that we might
0:26:47.849,0:26:52.090
not always know, or at least understand,
0:26:52.090,0:26:54.450
so for example, for industry,
0:26:54.450,0:26:57.679
we do have a pretty good understanding there:
0:26:57.679,0:26:59.809
industry cares about optimizing for
0:26:59.809,0:27:01.960
the time spent on the website,
0:27:01.960,0:27:04.929
Facebook wants you to spend more time on Facebook,
0:27:04.929,0:27:06.950
they want you to click on ads,
0:27:06.950,0:27:09.109
click on newsfeed items,
0:27:09.109,0:27:11.299
they want you to like things.
0:27:11.299,0:27:14.309
And, fundamentally: profit.
0:27:14.309,0:27:17.599
So, already this has some serious implications,
0:27:17.599,0:27:19.690
and this had pretty serious implications
0:27:19.690,0:27:22.190
in the last 10 years, in media for example.
0:27:22.190,0:27:25.119
The optimizing for click-through rate in journalism
0:27:25.119,0:27:26.629
has produced a race to the bottom
0:27:26.629,0:27:28.039
in terms of quality.
0:27:28.039,0:27:30.919
And another issue is that optimizing
0:27:30.919,0:27:34.589
for what people like might not always be
0:27:34.589,0:27:35.839
the best approach.
0:27:35.839,0:27:38.859
So, Facebook officials have said publicly
0:27:38.859,0:27:41.279
about how Facebook's goal is to make you happy,
0:27:41.279,0:27:43.149
they want you to open that newsfeed
0:27:43.149,0:27:45.080
and just feel great.
0:27:45.080,0:27:47.379
But, there's an issue there, right?
0:27:47.379,0:27:50.169
Because people get their news,
0:27:50.169,0:27:52.369
like 40% of people according to Pew Research,
0:27:52.369,0:27:54.599
get their news from Facebook.
0:27:54.599,0:27:58.460
So, if people don't want to see
0:27:58.460,0:28:01.239
war and corpses,[br]because it makes them feel sad,
0:28:01.239,0:28:04.179
so this is not a system that is gonna optimize
0:28:04.179,0:28:07.149
for an informed population.
0:28:07.149,0:28:09.359
It's not gonna produce a population that is
0:28:09.359,0:28:11.469
ready to engage in civic life.
0:28:11.469,0:28:13.059
It's gonna produce an amused populations
0:28:13.059,0:28:16.809
whose time is occupied by cat pictures.
0:28:16.809,0:28:19.159
So, in politics, we have a similar
0:28:19.159,0:28:21.269
optimization problem that's occurring.
0:28:21.269,0:28:23.769
So, these political campaigns that use
0:28:23.769,0:28:26.769
these predictive systems,
0:28:26.769,0:28:28.669
are optimizing for votes for the desired candidate,
0:28:28.669,0:28:30.200
of course.
0:28:30.200,0:28:33.499
So, instead of a political campaign being
0:28:33.499,0:28:36.139
--well, maybe this is a naive view, but--
0:28:36.139,0:28:38.070
being an open discussion of the issues
0:28:38.070,0:28:39.830
facing the country,
0:28:39.830,0:28:43.200
it becomes this micro-targeted[br]persuasion game,
0:28:43.200,0:28:44.669
and the people that get targeted
0:28:44.669,0:28:47.349
are a very small subset of all people,
0:28:47.349,0:28:49.399
and it's only gonna be people that are
0:28:49.399,0:28:51.409
you know, on the edge, maybe disinterested,
0:28:51.409,0:28:54.399
those are the people that are gonna get attention
0:28:54.399,0:28:58.839
from political candidates.
0:28:58.839,0:29:01.869
In policy, as with these Nudge Units,
0:29:01.869,0:29:03.539
they're being used to enable
0:29:03.539,0:29:06.109
better use of government services.
0:29:06.109,0:29:07.419
There are some good projects that have
0:29:07.419,0:29:09.419
come out of this:
0:29:09.419,0:29:11.409
increasing voter registration,
0:29:11.409,0:29:12.739
improving health outcomes,
0:29:12.739,0:29:14.419
improving education outcomes.
0:29:14.419,0:29:16.419
But some of these predictive systems
0:29:16.419,0:29:18.229
that we're starting to see in government
0:29:18.229,0:29:20.700
are optimizing for compliance,
0:29:20.700,0:29:23.669
as is the case with predictive policing.
0:29:23.669,0:29:25.460
So this is something that we need to
0:29:25.460,0:29:28.649
watch carefully.
0:29:28.649,0:29:30.119
I think this is a nice quote that
0:29:30.119,0:29:33.339
sort of describes the problem.
0:29:33.339,0:29:35.200
In some ways me might be narrowing
0:29:35.200,0:29:38.259
our horizon, and the danger is that
0:29:38.259,0:29:41.989
these tools are separating people.
0:29:41.989,0:29:43.570
And this is particularly bad
0:29:43.570,0:29:45.940
for political action, because political action
0:29:45.940,0:29:49.879
requires people to have shared experience,
0:29:49.879,0:29:53.799
and thus are able to collectively act
0:29:53.799,0:29:57.629
to exert pressure to fix problems.
0:29:57.629,0:30:00.810
So, finally: accountability.
0:30:00.810,0:30:03.399
So, we need some oversight mechanisms.
0:30:03.399,0:30:06.519
For example, in the case of errors--
0:30:06.519,0:30:08.219
so this is particularly important for
0:30:08.219,0:30:10.849
civil or bureaucratic systems.
0:30:10.849,0:30:14.330
So, when an algorithm produces some decision,
0:30:14.330,0:30:16.549
we don't always want humans to just
0:30:16.549,0:30:18.039
defer to the machine,
0:30:18.039,0:30:21.859
and that might represent one of the problems.
0:30:21.859,0:30:25.419
So, there are starting to be some cases
0:30:25.419,0:30:28.039
of computer algorithms yielding a decision,
0:30:28.039,0:30:30.409
and then humans being unable to correct
0:30:30.409,0:30:31.799
an obvious error.
0:30:31.799,0:30:35.190
So there's this case in Georgia,[br]in the United States,
0:30:35.190,0:30:37.259
where 2 young people went to
0:30:37.259,0:30:38.529
the Department of Motor Vehicles,
0:30:38.529,0:30:39.749
they're twins, and they went
0:30:39.749,0:30:42.099
to get their driver's license.
0:30:42.099,0:30:44.979
However, they were both flagged by
0:30:44.979,0:30:47.489
a fraud algorithm that uses facial recognition
0:30:47.489,0:30:48.809
to look for similar faces,
0:30:48.809,0:30:50.919
and I guess the people that designed the system
0:30:50.919,0:30:54.549
didn't think of the possibility of twins.
0:30:54.549,0:30:58.489
Yeah.[br]So, they just left
0:30:58.489,0:30:59.889
without their driver's licenses.
0:30:59.889,0:31:01.889
The people in the Department of Motor Vehicles
0:31:01.889,0:31:03.809
were unable to correct this.
0:31:03.809,0:31:06.820
So, this is one implication--
0:31:06.820,0:31:08.579
it's like something out of Kafka.
0:31:08.579,0:31:11.529
But there are also cases of errors being made,
0:31:11.529,0:31:13.879
and people not noticing until
0:31:13.879,0:31:15.909
after actions have been taken,
0:31:15.909,0:31:17.570
some of them very serious--
0:31:17.570,0:31:19.129
because people simply deferred
0:31:19.129,0:31:20.619
to the machine.
0:31:20.619,0:31:23.309
So, this is an example from San Francisco.
0:31:23.309,0:31:26.679
So, an ALPR-- an Automated License Plate Reader--
0:31:26.679,0:31:29.429
is a device that uses image recognition
0:31:29.429,0:31:32.099
to detect and read license plates,
0:31:32.099,0:31:34.339
and usually to compare license plates
0:31:34.339,0:31:37.159
with a known list of plates of interest.
0:31:37.159,0:31:39.799
And, so, San Francisco uses these
0:31:39.799,0:31:42.179
and they're mounted on police cars.
0:31:42.179,0:31:46.659
So, in this case, San Francisco ALPR
0:31:46.659,0:31:48.879
got a hit on a car,
0:31:48.879,0:31:53.029
and it was the car of a 47-year-old woman,
0:31:53.029,0:31:54.839
with no criminal history.
0:31:54.839,0:31:56.029
And so it was a false hit
0:31:56.029,0:31:58.099
because it was a blurry image,
0:31:58.099,0:31:59.709
and it matched erroneously with
0:31:59.709,0:32:00.909
one of the plates of interest
0:32:00.909,0:32:03.479
that happened to be a stolen vehicle.
0:32:03.479,0:32:06.869
So, they conducted a traffic stop on her,
0:32:06.869,0:32:09.330
and they take her out of the vehicle,
0:32:09.330,0:32:11.049
they search her and the vehicle,
0:32:11.049,0:32:12.659
she gets a pat-down,
0:32:12.659,0:32:14.849
and they have her kneel
0:32:14.849,0:32:17.780
at gunpoint, in the street.
0:32:17.780,0:32:20.989
So, how much oversight should be present
0:32:20.989,0:32:23.999
depends on the implications of the system.
0:32:23.999,0:32:25.279
It's certainly the case that
0:32:25.279,0:32:26.910
for some of these decision-making systems,
0:32:26.910,0:32:29.219
an error might not be that important,
0:32:29.219,0:32:31.149
it could be relatively harmless,
0:32:31.149,0:32:33.559
but in this case,[br]an error in this algorithmic decision
0:32:33.559,0:32:36.259
led to this totally innocent person
0:32:36.259,0:32:40.019
literally having a gun pointed at her.
0:32:40.019,0:32:44.019
So, that brings us to: we need some way of
0:32:44.019,0:32:45.419
getting some information about
0:32:45.419,0:32:47.249
what is going on here.
0:32:47.249,0:32:50.179
We don't wanna have to wait for these events
0:32:50.179,0:32:52.580
before we are able to determine
0:32:52.580,0:32:54.409
some information about the system.
0:32:54.409,0:32:56.139
So, auditing is one option:
0:32:56.139,0:32:58.109
to independently verify the statements
0:32:58.109,0:33:00.809
of companies, in situations where we have
0:33:00.809,0:33:02.939
inputs and outputs.
0:33:02.939,0:33:05.200
So, for example, this could be done with
0:33:05.200,0:33:07.489
Google, Facebook.
0:33:07.489,0:33:09.190
If you have the inputs of a system,
0:33:09.190,0:33:10.649
say you have test accounts,
0:33:10.649,0:33:11.729
or real accounts,
0:33:11.729,0:33:14.359
maybe you can collect[br]people's information together.
0:33:14.359,0:33:15.830
So that was something that was done
0:33:15.830,0:33:18.759
during the 2012 Obama campaign
0:33:18.759,0:33:20.249
by ProPublica.
0:33:20.249,0:33:21.269
People noticed that they were getting
0:33:21.269,0:33:24.739
different emails from the Obama campaign,
0:33:24.739,0:33:26.009
and were interested to see
0:33:26.009,0:33:28.209
based on what factors
0:33:28.209,0:33:29.749
the emails were changing.
0:33:29.749,0:33:32.659
So, I think about 200 people submitted emails
0:33:32.659,0:33:34.940
and they were able to determine some information
0:33:34.940,0:33:38.809
about what the emails[br]were being varied based on.
0:33:38.809,0:33:40.859
So there have been some successful
0:33:40.859,0:33:43.080
attempts at this.
0:33:43.080,0:33:45.919
So, compare inputs and then look at
0:33:45.919,0:33:48.709
why one item was shown to one user
0:33:48.709,0:33:50.289
and not another, and see if there's
0:33:50.289,0:33:51.879
any statistical differences.
0:33:51.879,0:33:56.279
So, there's some potential legal issues
0:33:56.279,0:33:57.749
with the test accounts, so that's something
0:33:57.749,0:34:01.499
to think about-- I'm not a lawyer.
0:34:01.499,0:34:03.919
So, for example, if you wanna examine
0:34:03.919,0:34:06.269
ad-targeting algorithms,
0:34:06.269,0:34:07.969
one way to proceed is to construct
0:34:07.969,0:34:10.589
a browsing profile, and then examine
0:34:10.589,0:34:12.989
what ads are served back to you.
0:34:12.989,0:34:14.119
And so this is something that
0:34:14.119,0:34:16.250
academic researchers have looked at,
0:34:16.250,0:34:17.489
because, at the time at least,
0:34:17.489,0:34:20.879
you didn't need to make an account to do this.
0:34:20.879,0:34:24.768
So, this was a study that was presented at
0:34:24.768,0:34:27.799
Privacy Enhancing Technologies last year,
0:34:27.799,0:34:31.149
and in this study, the researchers
0:34:31.149,0:34:33.179
generate some browsing profiles
0:34:33.179,0:34:35.909
that differ only by one characteristic,
0:34:35.909,0:34:37.690
so they're basically identical in every way
0:34:37.690,0:34:39.049
except for one thing.
0:34:39.049,0:34:42.359
And that is denoted by Treatment 1 and 2.
0:34:42.359,0:34:44.460
So this is a randomized, controlled trial,
0:34:44.460,0:34:46.389
but I left out the randomization part
0:34:46.389,0:34:48.220
for simplicity.
0:34:48.220,0:34:54.799
So, in one study,[br]they applied a treatment of gender.
0:34:54.799,0:34:56.799
So, they had the browsing profiles
0:34:56.799,0:34:59.319
in Treatment 1 be male browsing profiles,
0:34:59.319,0:35:02.029
and the browsing profiles in Treatment 2[br]be female.
0:35:02.029,0:35:04.430
And they wanted to see: is there any difference
0:35:04.430,0:35:06.079
in the way that ads are targeted
0:35:06.079,0:35:08.710
if browsing profiles are effectively identical
0:35:08.710,0:35:11.019
except for gender?
0:35:11.019,0:35:14.710
So, it turns out that there was.
0:35:14.710,0:35:19.180
So, a 3rd-party site was showing Google ads
0:35:19.180,0:35:21.289
for senior executive positions
0:35:21.289,0:35:23.980
at a rate 6 times higher to the fake men
0:35:23.980,0:35:27.059
than for the fake women in this study.
0:35:27.059,0:35:30.109
So, this sort of auditing is not going to
0:35:30.109,0:35:32.779
be able to determine everything
0:35:32.779,0:35:34.930
that algorithms are doing, but they can
0:35:34.930,0:35:36.519
sometimes uncover interesting,
0:35:36.519,0:35:40.900
at least statistical differences.
0:35:40.900,0:35:47.099
So, this leads us to the fundamental issue:
0:35:47.099,0:35:49.180
Right now, we're really not in control
0:35:49.180,0:35:50.510
of some of these systems,
0:35:50.510,0:35:54.480
and we really need these predictive systems
0:35:54.480,0:35:56.119
to be controlled by us,
0:35:56.119,0:35:57.819
in order for them not to be used
0:35:57.819,0:36:00.109
as a system of control.
0:36:00.109,0:36:03.220
So there are some technologies that I'd like
0:36:03.220,0:36:06.890
to point you all to.
0:36:06.890,0:36:08.319
We need tools in the digital commons
0:36:08.319,0:36:11.160
that can help address some of these concerns.
0:36:11.160,0:36:13.349
So, the first thing is that of course
0:36:13.349,0:36:14.730
we known that minimizing the amount of
0:36:14.730,0:36:17.069
data available can help in some contexts,
0:36:17.069,0:36:18.980
which we can do by making systems
0:36:18.980,0:36:22.779
that are private by design, and by default.
0:36:22.779,0:36:24.549
Another thing is that these audit tools
0:36:24.549,0:36:25.890
might be useful.
0:36:25.890,0:36:30.720
And, so, these 2 nice examples in academia...
0:36:30.720,0:36:34.359
the ad experiment that I just showed was done
0:36:34.359,0:36:36.120
using AdFisher.
0:36:36.120,0:36:38.200
So, these are 2 toolkits that you can use
0:36:38.200,0:36:41.440
to start doing this sort of auditing.
0:36:41.440,0:36:44.579
Another technology that is generally useful,
0:36:44.579,0:36:46.700
but particularly in the case of prediction
0:36:46.700,0:36:48.789
it's useful to maintain access to
0:36:48.789,0:36:50.289
as many sites as possible,
0:36:50.289,0:36:52.589
through anonymity systems like Tor,
0:36:52.589,0:36:54.319
because it's impossible to personalize
0:36:54.319,0:36:55.650
when everyone looks the same.
0:36:55.650,0:36:59.130
So this is a very important technology.
0:36:59.130,0:37:01.519
Something that doesn't really exist,
0:37:01.519,0:37:03.630
but that I think is pretty important,
0:37:03.630,0:37:05.829
is having some tool to view the landscape.
0:37:05.829,0:37:08.160
So, as we know from these few studies
0:37:08.160,0:37:10.440
that have been done,
0:37:10.440,0:37:12.059
different people are not seeing the internet
0:37:12.059,0:37:12.950
in the same way.
0:37:12.950,0:37:15.730
This is one reason why we don't like censorship.
0:37:15.730,0:37:17.880
But, rich and poor people,
0:37:17.880,0:37:19.659
from academic research we know that
0:37:19.659,0:37:23.790
there is widespread price discrimination[br]on the internet,
0:37:23.790,0:37:25.650
so rich and poor people see a different view
0:37:25.650,0:37:26.970
of the Internet,
0:37:26.970,0:37:28.400
men and women see a different view
0:37:28.400,0:37:29.940
of the Internet.
0:37:29.940,0:37:31.200
We wanna know how different people
0:37:31.200,0:37:32.450
see the same site,
0:37:32.450,0:37:34.329
and this could be the beginning of
0:37:34.329,0:37:36.329
a defense system for this sort of
0:37:36.329,0:37:41.730
manipulation/tampering that I showed earlier.
0:37:41.730,0:37:45.549
Another interesting approach is obfuscation:
0:37:45.549,0:37:46.980
injecting noise into the system.
0:37:46.980,0:37:49.190
So there's an interesting browser extension
0:37:49.190,0:37:51.720
called Adnauseum, that's for Firefox,
0:37:51.720,0:37:54.579
which clicks on every single ad you're served,
0:37:54.579,0:37:55.680
to inject noise.
0:37:55.680,0:37:57.019
So that's, I think, an interesting approach
0:37:57.019,0:38:00.170
that people haven't looked at too much.
0:38:00.170,0:38:03.780
So in terms of policy,
0:38:03.780,0:38:06.530
Facebook and Google, these internet giants,
0:38:06.530,0:38:08.829
have billions of users,
0:38:08.829,0:38:12.220
and sometimes they like to call themselves
0:38:12.220,0:38:13.769
new public utilities,
0:38:13.769,0:38:15.000
and if that's the case then
0:38:15.000,0:38:17.549
it might be necessary to subject them
0:38:17.549,0:38:20.539
to additional regulation.
0:38:20.539,0:38:21.990
Another problem that's come up,
0:38:21.990,0:38:23.539
for example with some of the studies
0:38:23.539,0:38:24.900
that Facebook has done,
0:38:24.900,0:38:29.039
is sometimes a lack of ethics review.
0:38:29.039,0:38:31.059
So, for example, in academia,
0:38:31.059,0:38:33.859
if you're gonna do research involving humans,
0:38:33.859,0:38:35.390
there's an Institutional Review Board
0:38:35.390,0:38:36.970
that you go to that verifies that
0:38:36.970,0:38:39.140
you're doing things in an ethical manner.
0:38:39.140,0:38:40.910
And some companies do have internal
0:38:40.910,0:38:43.029
review processes like this, but it might
0:38:43.029,0:38:45.119
be important to have an independent
0:38:45.119,0:38:48.200
ethics board that does this sort of thing.
0:38:48.200,0:38:50.849
And we really need 3rd-party auditing.
0:38:50.849,0:38:54.519
So, for example, some companies
0:38:54.519,0:38:56.220
don't want auditing to be done
0:38:56.220,0:38:59.190
because of IP concerns,
0:38:59.190,0:39:00.579
and if that's the concern
0:39:00.579,0:39:03.180
maybe having a set of people
0:39:03.180,0:39:05.680
that are not paid by the company
0:39:05.680,0:39:07.200
to check how some of these systems
0:39:07.200,0:39:08.640
are being implemented,
0:39:08.640,0:39:11.240
could help give us confidence that
0:39:11.240,0:39:16.979
things are being done in a reasonable way.
0:39:16.979,0:39:20.269
So, in closing,
0:39:20.269,0:39:23.180
algorithmic decision making is here,
0:39:23.180,0:39:26.140
and it's barreling forward[br]at a very fast rate,
0:39:26.140,0:39:27.890
and we need to figure out what
0:39:27.890,0:39:30.410
the guide rails should be,
0:39:30.410,0:39:31.380
and how to install them
0:39:31.380,0:39:33.119
to handle some of the potential threats.
0:39:33.119,0:39:35.470
There's a huge amount of power here.
0:39:35.470,0:39:37.910
We need more openness in these systems.
0:39:37.910,0:39:39.589
And, right now,
0:39:39.589,0:39:41.559
with the intelligent systems that do exist,
0:39:41.559,0:39:43.920
we don't know what's occurring really,
0:39:43.920,0:39:46.510
and we need to watch carefully
0:39:46.510,0:39:49.099
where and how these systems are being used.
0:39:49.099,0:39:50.690
And I think this community has
0:39:50.690,0:39:53.940
an important role to play in this fight,
0:39:53.940,0:39:55.730
to study what's being done,
0:39:55.730,0:39:57.160
to show people what's being done,
0:39:57.160,0:39:58.670
to raise the debate and advocate,
0:39:58.670,0:40:01.200
and, where necessary, to resist.
0:40:01.200,0:40:03.339
Thanks.
0:40:03.339,0:40:13.129
applause
0:40:13.129,0:40:17.519
Herald: So, let's have a question and answer.
0:40:17.519,0:40:19.080
Microphone 2, please.
0:40:19.080,0:40:20.199
Mic 2: Hi there.
0:40:20.199,0:40:23.259
Thanks for the talk.
0:40:23.259,0:40:26.230
Since these pre-crime softwares also
0:40:26.230,0:40:27.359
arrived here in Germany
0:40:27.359,0:40:29.680
with the start of the so-called CopWatch system
0:40:29.680,0:40:32.779
in southern Germany,[br]and Bavaria and Nuremberg especially,
0:40:32.779,0:40:35.420
where they try to predict burglary crime
0:40:35.420,0:40:37.460
using that criminal record
0:40:37.460,0:40:40.170
geographical analysis, like you explained,
0:40:40.170,0:40:43.380
leads me to a 2-fold question:
0:40:43.380,0:40:47.900
first, have you heard of any research
0:40:47.900,0:40:49.760
that measures the effectiveness
0:40:49.760,0:40:53.690
of such measures, at all?
0:40:53.690,0:40:57.040
And, second:
0:40:57.040,0:41:00.599
What do you think of the game theory
0:41:00.599,0:41:02.690
if the thieves or the bad guys
0:41:02.690,0:41:07.619
know the system, and when they[br]game the system,
0:41:07.619,0:41:09.980
they will probably win,
0:41:09.980,0:41:11.640
since one police officer in an interview said
0:41:11.640,0:41:14.019
this system is used to reduce
0:41:14.019,0:41:16.460
the personal costs of policing,
0:41:16.460,0:41:19.460
so they just send the guys[br]where the red flags are,
0:41:19.460,0:41:22.290
and the others take the day off.
0:41:22.290,0:41:24.360
Dr. Helsby: Yup.
0:41:24.360,0:41:27.150
Um, so, with respect to
0:41:27.150,0:41:30.990
testing the effectiveness of predictive policing,
0:41:30.990,0:41:31.990
the companies,
0:41:31.990,0:41:33.910
some of them do randomized, controlled trials
0:41:33.910,0:41:35.240
and claim a reduction in policing.
0:41:35.240,0:41:38.349
The best independent study that I've seen
0:41:38.349,0:41:40.680
is by this RAND Corporation
0:41:40.680,0:41:43.120
that did a study in, I think,
0:41:43.120,0:41:44.920
Shreveport, Louisiana,
0:41:44.920,0:41:47.589
and in their report they claim
0:41:47.589,0:41:50.190
that there was no statistically significant
0:41:50.190,0:41:52.900
difference, they didn't find any reduction.
0:41:52.900,0:41:54.099
And it was specifically looking at
0:41:54.099,0:41:56.730
property crime, which I think you mentioned.
0:41:56.730,0:41:59.480
So, I think right now there's sort of
0:41:59.480,0:42:01.069
conflicting reports between
0:42:01.069,0:42:06.180
the independent auditors[br]and these company claims.
0:42:06.180,0:42:09.289
So there definitely needs to be more study.
0:42:09.289,0:42:12.240
And then, the 2nd thing...sorry,[br]remind me what it was?
0:42:12.240,0:42:15.189
Mic 2: What about the guys gaming the system?
0:42:15.189,0:42:16.949
Dr. Helsby: Oh, yeah.
0:42:16.949,0:42:18.900
I think it's a legitimate concern.
0:42:18.900,0:42:22.480
Like, if all the outputs[br]were just immediately public,
0:42:22.480,0:42:24.599
then, yes, everyone knows the location
0:42:24.599,0:42:26.549
of all police officers,
0:42:26.549,0:42:29.009
and I imagine that people would have
0:42:29.009,0:42:30.779
a problem with that.
0:42:30.779,0:42:32.679
Yup.
0:42:32.679,0:42:35.990
Heraldl: Microphone #4, please.
0:42:35.990,0:42:39.369
Mic 4: Yeah, this is not actually a question,
0:42:39.369,0:42:40.779
but just a comment.
0:42:40.779,0:42:42.970
I've enjoyed your talk very much,
0:42:42.970,0:42:47.789
in particular after watching
0:42:47.789,0:42:52.270
the talk in Hall 1 earlier in the afternoon.
0:42:52.270,0:42:55.730
The "Say Hi to Your New Boss", about
0:42:55.730,0:42:59.609
algorithms that are trained with big data,
0:42:59.609,0:43:02.390
and finally make decisions.
0:43:02.390,0:43:08.210
And I think these 2 talks are kind of complementary,
0:43:08.210,0:43:11.309
and if people are interested in the topic
0:43:11.309,0:43:14.710
they might want to check out the other talk
0:43:14.710,0:43:16.259
and watch it later, because these
0:43:16.259,0:43:17.319
fit very well together.
0:43:17.319,0:43:19.589
Dr. Helsby: Yeah, it was a great talk.
0:43:19.589,0:43:22.130
Herald: Microphone #2, please.
0:43:22.130,0:43:25.049
Mic 2: Um, yeah, you mentioned
0:43:25.049,0:43:27.319
the need to have some kind of 3rd-party auditing
0:43:27.319,0:43:30.900
or some kind of way to
0:43:30.900,0:43:31.930
peek into these algorithms
0:43:31.930,0:43:33.079
and to see what they're doing,
0:43:33.079,0:43:34.420
and to see if they're being fair.
0:43:34.420,0:43:36.199
Can you talk a little bit more about that?
0:43:36.199,0:43:38.059
Like, going forward,
0:43:38.059,0:43:40.690
some kind of regulatory structures
0:43:40.690,0:43:44.200
would probably have to emerge
0:43:44.200,0:43:47.200
to analyze and to look at
0:43:47.200,0:43:49.339
these black boxes that are just sort of
0:43:49.339,0:43:51.309
popping up everywhere and, you know,
0:43:51.309,0:43:52.939
controlling more and more of the things
0:43:52.939,0:43:56.150
in our lives, and important decisions.
0:43:56.150,0:43:58.539
So, just, what kind of discussions
0:43:58.539,0:43:59.460
are there for that?
0:43:59.460,0:44:01.809
And what kind of possibility[br]is there for that?
0:44:01.809,0:44:04.900
And, I'm sure that companies would be
0:44:04.900,0:44:08.000
very, very resistant to
0:44:08.000,0:44:09.890
any kind of attempt to look into
0:44:09.890,0:44:13.890
algorithms, and to...
0:44:13.890,0:44:15.070
Dr. Helsby: Yeah, I mean, definitely
0:44:15.070,0:44:18.069
companies would be very resistant to
0:44:18.069,0:44:19.670
having people look into their algorithms.
0:44:19.670,0:44:22.190
So, if you wanna do a very rigorous
0:44:22.190,0:44:23.339
audit of what's going on
0:44:23.339,0:44:25.660
then it's probably necessary to have
0:44:25.660,0:44:26.589
a few people come in
0:44:26.589,0:44:28.900
and sign NDAs, and then
0:44:28.900,0:44:31.039
look through the systems.
0:44:31.039,0:44:33.140
So, that's one way to proceed.
0:44:33.140,0:44:35.049
But, another way to proceed that--
0:44:35.049,0:44:38.720
so, these academic researchers have done
0:44:38.720,0:44:40.009
a few experiments
0:44:40.009,0:44:42.809
and found some interesting things,
0:44:42.809,0:44:45.500
and that's sort all the attempts at auditing
0:44:45.500,0:44:46.450
that we've seen:
0:44:46.450,0:44:48.490
there was 1 attempt in 2012[br]for the Obama campaign,
0:44:48.490,0:44:49.910
but there's really not been any
0:44:49.910,0:44:51.500
sort of systematic attempt--
0:44:51.500,0:44:52.589
you know, like, in censorship
0:44:52.589,0:44:54.539
we see a systematic attempt to
0:44:54.539,0:44:56.779
do measurement as often as possible,
0:44:56.779,0:44:58.240
check what's going on,
0:44:58.240,0:44:59.339
and that itself, you know,
0:44:59.339,0:45:00.900
can act as an oversight mechanism.
0:45:00.900,0:45:01.880
But, right now,
0:45:01.880,0:45:03.900
I think many of these companies
0:45:03.900,0:45:05.259
realize no one is watching,
0:45:05.259,0:45:07.160
so there's no real push to have
0:45:07.160,0:45:10.440
people verify: are you being fair when you
0:45:10.440,0:45:11.539
implement this system?
0:45:11.539,0:45:12.969
Because no one's really checking.
0:45:12.969,0:45:13.980
Mic 2: Do you think that,
0:45:13.980,0:45:15.339
at some point, it would be like
0:45:15.339,0:45:19.059
an FDA or SEC, to give some American examples...
0:45:19.059,0:45:21.490
an actual government regulatory agency
0:45:21.490,0:45:24.960
that has the power and ability to
0:45:24.960,0:45:27.930
not just sort of look and try to
0:45:27.930,0:45:31.710
reverse engineer some of these algorithms,
0:45:31.710,0:45:33.920
but actually peek in there and make sure
0:45:33.920,0:45:36.420
that things are fair, because it seems like
0:45:36.420,0:45:38.240
there's just-- it's so important now
0:45:38.240,0:45:41.769
that, again, it could be the difference between
0:45:41.769,0:45:42.930
life and death, between
0:45:42.930,0:45:44.589
getting a job, not getting a job,
0:45:44.589,0:45:46.130
being pulled over,[br]not being pulled over,
0:45:46.130,0:45:48.069
being racially profiled,[br]not racially profiled,
0:45:48.069,0:45:49.410
things like that.[br]Dr. Helsby: Right.
0:45:49.410,0:45:50.430
Mic 2: Is it moving in that direction?
0:45:50.430,0:45:52.249
Or is it way too early for it?
0:45:52.249,0:45:55.110
Dr. Helsby: I mean, so some people have...
0:45:55.110,0:45:56.859
someone has called for, like,
0:45:56.859,0:45:59.079
a Federal Search Commission,
0:45:59.079,0:46:00.930
or like a Federal Algorithms Commission,
0:46:00.930,0:46:03.200
that would do this sort of oversight work,
0:46:03.200,0:46:06.130
but it's in such early stages right now
0:46:06.130,0:46:09.970
that there's no real push for that.
0:46:09.970,0:46:13.330
But I think it's a good idea.
0:46:13.330,0:46:15.729
Herald: And again, #2 please.
0:46:15.729,0:46:17.059
Mic 2: Thank you again for your talk.
0:46:17.059,0:46:19.309
I was just curious if you can point
0:46:19.309,0:46:20.440
to any examples of
0:46:20.440,0:46:22.619
either current producers or consumers
0:46:22.619,0:46:24.029
of these algorithmic systems
0:46:24.029,0:46:26.390
who are actively and publicly trying
0:46:26.390,0:46:27.720
to do so in a responsible manner
0:46:27.720,0:46:29.720
by describing what they're trying to do
0:46:29.720,0:46:31.380
and how they're going about it?
0:46:31.380,0:46:37.210
Dr. Helsby: So, yeah, there are some companies,
0:46:37.210,0:46:39.000
for example, like DataKind,
0:46:39.000,0:46:42.710
that try to deploy algorithmic systems
0:46:42.710,0:46:44.640
in as responsible a way as possible,
0:46:44.640,0:46:47.250
for like public policy.
0:46:47.250,0:46:49.549
Like, I actually also implement systems
0:46:49.549,0:46:51.750
for public policy in a transparent way.
0:46:51.750,0:46:54.329
Like, all the code is in GitHub, etc.
0:46:54.329,0:47:00.020
And it is also the case to give credit to
0:47:00.020,0:47:01.990
Google, and these giants,
0:47:01.990,0:47:06.109
they're trying to implement transparency systems
0:47:06.109,0:47:08.170
that help you understand.
0:47:08.170,0:47:09.289
This has been done with respect to
0:47:09.289,0:47:12.329
how your data is being collected,
0:47:12.329,0:47:14.579
but for example if you go on Amazon.com
0:47:14.579,0:47:17.890
you can see a recommendation has been made,
0:47:17.890,0:47:19.420
and that is pretty transparent.
0:47:19.420,0:47:21.480
You can see "this item[br]was recommended to me,"
0:47:21.480,0:47:25.039
so you know that prediction[br]is being used in this case,
0:47:25.039,0:47:27.089
and it will say why prediction is being used:
0:47:27.089,0:47:29.230
because you purchased some item.
0:47:29.230,0:47:30.380
And Google has a similar thing,
0:47:30.380,0:47:32.420
if you go to like Google Ad Settings,
0:47:32.420,0:47:35.249
you can even turn off personalization of ads
0:47:35.249,0:47:36.380
if you want,
0:47:36.380,0:47:38.119
and you can also see some of the inferences
0:47:38.119,0:47:39.400
that have been learned about you.
0:47:39.400,0:47:40.819
A subset of the inferences that have been
0:47:40.819,0:47:41.700
learned about you.
0:47:41.700,0:47:43.940
So, like, what interests...
0:47:43.940,0:47:47.869
Herald: A question from the internet, please?
0:47:47.869,0:47:50.930
Signal Angel: Yes, billetQ is asking
0:47:50.930,0:47:54.479
how do you avoid biases in machine learning?
0:47:54.479,0:47:57.380
I asume analysis system, for example,
0:47:57.380,0:48:00.420
could be biased against women and minorities,
0:48:00.420,0:48:04.960
if used for hiring decisions[br]based on known data.
0:48:04.960,0:48:06.499
Dr. Helsby: Yeah, so one thing is to
0:48:06.499,0:48:08.529
just explicitly check.
0:48:08.529,0:48:12.199
So, you can check to see how
0:48:12.199,0:48:14.309
positive outcomes are being distributed
0:48:14.309,0:48:16.779
among those protected classes.
0:48:16.779,0:48:19.210
You could also incorporate these sort of
0:48:19.210,0:48:21.440
fairness constraints in the function
0:48:21.440,0:48:24.069
that you optimize when you train the system,
0:48:24.069,0:48:25.950
and so, if you're interested in reading more
0:48:25.950,0:48:28.960
about this, the 2 papers--
0:48:28.960,0:48:31.909
let me go to References--
0:48:31.909,0:48:32.730
there's a good paper called
0:48:32.730,0:48:35.339
Fairness Through Awareness that describes
0:48:35.339,0:48:37.499
how to go about doing this,
0:48:37.499,0:48:39.579
so I recommend this person read that.
0:48:39.579,0:48:40.970
It's good.
0:48:40.970,0:48:43.400
Herald: Microphone 2, please.
0:48:43.400,0:48:45.400
Mic2: Thanks again for your talk.
0:48:45.400,0:48:49.649
Umm, hello?
0:48:49.649,0:48:50.999
Okay.
0:48:50.999,0:48:52.960
Umm, I see of course a problem with
0:48:52.960,0:48:54.619
all the black boxes that you describe
0:48:54.619,0:48:57.069
with regards for the crime systems,
0:48:57.069,0:48:59.569
but when we look at the advertising systems
0:48:59.569,0:49:02.169
in many cases they are very networked.
0:49:02.169,0:49:04.160
There are many different systems collaborating
0:49:04.160,0:49:07.109
and exchanging data via open APIs:
0:49:07.109,0:49:08.720
RESTful APIs, and various
0:49:08.720,0:49:11.720
demand-side platforms[br]and audience-exchange platforms,
0:49:11.720,0:49:12.539
and everything.
0:49:12.539,0:49:15.420
So, can that help to at least
0:49:15.420,0:49:22.160
increase awareness on where targeting, personalization
0:49:22.160,0:49:23.679
might be happening?
0:49:23.679,0:49:26.190
I mean, I'm looking at systems like
0:49:26.190,0:49:29.539
BuiltWith, that surface what kind of
0:49:29.539,0:49:31.380
JavaScript libraries are used elsewhere.
0:49:31.380,0:49:32.999
So, is that something that could help
0:49:32.999,0:49:35.670
at least to give a better awareness
0:49:35.670,0:49:38.690
and listing all the points where
0:49:38.690,0:49:41.409
you might be targeted...
0:49:41.409,0:49:43.070
Dr. Helsby: So, like, with respect to
0:49:43.070,0:49:46.460
advertising, the fact that[br]there is behind the scenes
0:49:46.460,0:49:48.450
this like complicated auction process
0:49:48.450,0:49:50.650
that's occurring, just makes things
0:49:50.650,0:49:51.819
a lot more complicated.
0:49:51.819,0:49:54.170
So, for example, I said briefly
0:49:54.170,0:49:57.269
that they found that there's this[br]statistical difference
0:49:57.269,0:49:59.099
between how men and women are treated,
0:49:59.099,0:50:01.339
but it doesn't necessarily mean that
0:50:01.339,0:50:03.640
"Oh, the algorithm is definitely biased."
0:50:03.640,0:50:06.369
It could be because of this auction process,
0:50:06.369,0:50:10.569
it could be that women are considered
0:50:10.569,0:50:12.630
more valuable when it comes to advertising,
0:50:12.630,0:50:15.099
and so these executive ads are getting
0:50:15.099,0:50:17.160
outbid by some other ads,
0:50:17.160,0:50:18.890
and so there's a lot of potential
0:50:18.890,0:50:20.490
causes for that.
0:50:20.490,0:50:22.829
So, I think it just makes things[br]a lot more complicated.
0:50:22.829,0:50:25.910
I don't know if it helps[br]with the bias at all.
0:50:25.910,0:50:27.410
Mic 2: Well, the question was more
0:50:27.410,0:50:30.299
a direction... can it help to surface
0:50:30.299,0:50:32.499
and make people aware of that fact?
0:50:32.499,0:50:34.930
I mean, I can talk to my kids probably,
0:50:34.930,0:50:36.259
and they will probably understand,
0:50:36.259,0:50:38.420
but I can't explain that to my grandma,
0:50:38.420,0:50:43.150
who's also, umm, looking at an iPad.
0:50:43.150,0:50:44.289
Dr. Helsby: So, the fact that
0:50:44.289,0:50:45.690
the systems are...
0:50:45.690,0:50:48.509
I don't know if I understand.
0:50:48.509,0:50:50.529
Mic 2: OK. I think that the main problem
0:50:50.529,0:50:53.710
is that we are behind the industry efforts
0:50:53.710,0:50:57.179
to being targeted at, and many people
0:50:57.179,0:51:00.579
do know, but a lot more people don't know,
0:51:00.579,0:51:03.160
and making them aware of the fact
0:51:03.160,0:51:07.269
that they are a target, in a way,
0:51:07.269,0:51:10.990
is something that can only be shown
0:51:10.990,0:51:14.779
by a 3rd party that disposed that data,
0:51:14.779,0:51:16.339
and make audits in a way--
0:51:16.339,0:51:17.929
maybe in an automated way.
0:51:17.929,0:51:19.170
Dr. Helsby: Right.
0:51:19.170,0:51:21.410
Yeah, I think it certainly[br]could help with advocacy
0:51:21.410,0:51:23.059
if that's the point, yeah.
0:51:23.059,0:51:26.079
Herald: Another question[br]from the internet, please.
0:51:26.079,0:51:29.319
Signal Angel: Yes, on IRC they are asking
0:51:29.319,0:51:31.440
if we know that prediction in some cases
0:51:31.440,0:51:34.460
provides an influence that cannot be controlled.
0:51:34.460,0:51:38.480
So, r4v5 would like to know from you
0:51:38.480,0:51:41.519
if there are some cases or areas where
0:51:41.519,0:51:45.060
machine learning simply shouldn't go?
0:51:45.060,0:51:48.349
Dr. Helsby: Umm, so I think...
0:51:48.349,0:51:52.559
I mean, yes, I think that it is the case
0:51:52.559,0:51:54.650
that in some cases machine learning
0:51:54.650,0:51:56.180
might not be appropriate.
0:51:56.180,0:51:58.359
For example, if you use machine learning
0:51:58.359,0:52:00.970
to decide who should be searched.
0:52:00.970,0:52:02.619
I don't think it should be the case that
0:52:02.619,0:52:03.809
machine learning algorithms should
0:52:03.809,0:52:05.440
ever be used to determine
0:52:05.440,0:52:08.430
probable cause, or something like that.
0:52:08.430,0:52:12.339
So, if it's just one piece of evidence
0:52:12.339,0:52:13.299
that you consider,
0:52:13.299,0:52:14.990
and there's human oversight always,
0:52:14.990,0:52:18.519
maybe it's fine, but
0:52:18.519,0:52:20.839
we should be very suspicious and hesitant
0:52:20.839,0:52:22.119
in certain contexts where
0:52:22.119,0:52:24.529
the ramifications are very serious.
0:52:24.529,0:52:27.259
Like the No Fly List, and so on.
0:52:27.259,0:52:29.200
Herald: And #2 again.
0:52:29.200,0:52:30.809
Mic 2: A second question
0:52:30.809,0:52:33.509
that just occurred to me, if you don't mind.
0:52:33.509,0:52:35.339
Umm, until the advent of
0:52:35.339,0:52:36.559
algorithmic systems,
0:52:36.559,0:52:40.470
when there've been cases of serious harm
0:52:40.470,0:52:42.799
that's been resulted in individuals or groups,
0:52:42.799,0:52:44.579
and it's been demonstrated that
0:52:44.579,0:52:46.029
it's occurred because of
0:52:46.029,0:52:49.400
an individual or a system of people
0:52:49.400,0:52:53.019
being systematically biased, then often
0:52:53.019,0:52:55.130
one of the actions that's taken is
0:52:55.130,0:52:56.869
pressure's applied, and then
0:52:56.869,0:52:59.660
people are required to change,
0:52:59.660,0:53:01.049
and hopely be held responsible,
0:53:01.049,0:53:02.910
and then change the way that they do things
0:53:02.910,0:53:06.400
to try to remove bias from that system.
0:53:06.400,0:53:07.839
What's the current thinking about
0:53:07.839,0:53:10.299
how we can go about doing that
0:53:10.299,0:53:12.599
when the systems that are doing that
0:53:12.599,0:53:13.650
are algorithmic?
0:53:13.650,0:53:15.999
Is it just going to be human oversight,
0:53:15.999,0:53:16.910
and humans are gonna have to be
0:53:16.910,0:53:18.379
held responsible for the oversight?
0:53:18.379,0:53:20.890
Dr. Helsby: So, in terms of bias,
0:53:20.890,0:53:22.569
if we're concerned about bias towards
0:53:22.569,0:53:24.019
particular types of people,
0:53:24.019,0:53:25.710
that's something that we can optimize for.
0:53:25.710,0:53:28.839
So, we can train systems that are unbiased
0:53:28.839,0:53:30.019
in this way.
0:53:30.019,0:53:32.109
So that's one way to deal with it.
0:53:32.109,0:53:34.039
But there's always gonna be errors,
0:53:34.039,0:53:35.420
so that's sort of a separate issue
0:53:35.420,0:53:37.509
from the bias, and in the case
0:53:37.509,0:53:39.180
where there are errors,
0:53:39.180,0:53:40.539
there must be oversight.
0:53:40.539,0:53:45.079
So, one way that one could improve
0:53:45.079,0:53:46.410
the way that this is done
0:53:46.410,0:53:48.160
is by making sure that you're
0:53:48.160,0:53:50.799
keeping track of confidence of decisions.
0:53:50.799,0:53:54.039
So, if you have a low confidence prediction,
0:53:54.039,0:53:56.259
then maybe a human[br]should come in and check things.
0:53:56.259,0:53:58.809
So, that might be one way to proceed.
0:54:02.099,0:54:03.990
Herald: So, there's no more question.
0:54:03.990,0:54:06.199
I close this talk now,
0:54:06.199,0:54:08.239
and thank you very much
0:54:08.239,0:54:09.410
and a big applause to
0:54:09.410,0:54:11.780
Jennifer Helsby!
0:54:11.780,0:54:16.310
roaring applause
0:54:16.310,0:54:28.000
subtitles created by c3subtitles.de[br]Join, and help us!