WEBVTT

00:00:00.000 --> 00:00:08.895
<i>Musik</i>

00:00:08.895 --> 00:00:20.040
Herald: Who of you is using Facebook? Twitter? 
Diaspora?

00:00:20.040 --> 00:00:27.630
<i>concerned noise</i> And all of that data
you enter there

00:00:27.630 --> 00:00:34.240
gets to server, gets into the hand of somebody
who's using it

00:00:34.240 --> 00:00:38.519
and the next talk
is especially about that,

00:00:38.519 --> 00:00:43.879
because there's also intelligent machines
and intelligent algorithms

00:00:43.879 --> 00:00:47.489
that try to make something
out of that data.

00:00:47.489 --> 00:00:50.920
So the post-doc researcher Jennifer Helsby

00:00:50.920 --> 00:00:55.839
of the University of Chicago,
which works in this

00:00:55.839 --> 00:00:59.370
intersection between policy and 
technology,

00:00:59.370 --> 00:01:04.709
will now ask you the question:
To who would we give that power?

00:01:04.709 --> 00:01:12.860
Dr. Helsby: Thanks.
<i>applause</i>

00:01:12.860 --> 00:01:17.090
Okay, so, today I'm gonna do a brief tour
of intelligent systems

00:01:17.090 --> 00:01:18.640
and how they're currently used

00:01:18.640 --> 00:01:21.760
and then we're gonna look at some examples
with respect

00:01:21.760 --> 00:01:23.710
to the properties that we might care about

00:01:23.710 --> 00:01:26.000
these systems having,
and I'll talk a little bit about

00:01:26.000 --> 00:01:27.940
some of the work that's been done in academia

00:01:27.940 --> 00:01:28.680
on these topics.

00:01:28.680 --> 00:01:31.780
And then we'll talk about some
promising paths forward.

00:01:31.780 --> 00:01:37.040
So, I wanna start with this:
Kranzberg's First Law of Technology

00:01:37.040 --> 00:01:40.420
So, it's not good or bad,
but it also isn't neutral.

00:01:40.420 --> 00:01:42.980
Technology shapes our world,
and it can act as

00:01:42.980 --> 00:01:46.140
a liberating force-- or an oppressive and
controlling force.

00:01:46.140 --> 00:01:49.730
So, in this talk, I'm gonna go
towards some of the aspects

00:01:49.730 --> 00:01:53.830
of intelligent systems that might be more
controlling in nature.

00:01:53.830 --> 00:01:56.060
So, as we all know,

00:01:56.060 --> 00:01:59.770
because of the rapidly decreasing cost
of storage and computation,

00:01:59.770 --> 00:02:02.170
along with the rise of new sensor technologies,

00:02:02.170 --> 00:02:05.510
data collection devices
are being pushed into every

00:02:05.510 --> 00:02:08.329
aspect of our lives: in our homes, our cars,

00:02:08.329 --> 00:02:10.469
in our pockets, on our wrists.

00:02:10.469 --> 00:02:13.280
And data collection systems act as intermediaries

00:02:13.280 --> 00:02:15.230
for a huge amount of human communication.

00:02:15.230 --> 00:02:17.900
And much of this data sits in government

00:02:17.900 --> 00:02:19.860
and corporate databases.

00:02:19.860 --> 00:02:23.090
So, in order to make use of this data,

00:02:23.090 --> 00:02:27.280
we need to be able to make some inferences.

00:02:27.280 --> 00:02:30.280
So, one way of approaching this is I can hire

00:02:30.280 --> 00:02:32.310
a lot of humans, and I can have these humans

00:02:32.310 --> 00:02:34.990
manually examine the data, and they can acquire

00:02:34.990 --> 00:02:36.900
expert knowledge of the domain, and then

00:02:36.900 --> 00:02:38.510
perhaps they can make some decisions

00:02:38.510 --> 00:02:40.830
or at least some recommendations
based on it.

00:02:40.830 --> 00:02:43.030
However, there's some problems with this.

00:02:43.030 --> 00:02:45.810
One is that it's slow, and thus expensive.

00:02:45.810 --> 00:02:48.060
It's also biased. We know that humans have

00:02:48.060 --> 00:02:50.700
all sorts of biases, both conscious and unconscious,

00:02:50.700 --> 00:02:53.390
and it would be nice to have a system
that did not have

00:02:53.390 --> 00:02:54.959
these inaccuracies.

00:02:54.959 --> 00:02:57.069
It's also not very transparent: I might

00:02:57.069 --> 00:02:58.910
not really know the factors that led to

00:02:58.910 --> 00:03:00.930
some decisions being made.

00:03:00.930 --> 00:03:03.360
Even humans themselves
often don't really understand

00:03:03.360 --> 00:03:05.360
why they came to a given decision, because

00:03:05.360 --> 00:03:08.130
of their being emotional in nature.

00:03:08.130 --> 00:03:11.530
And, thus, these human decision making systems

00:03:11.530 --> 00:03:13.170
are often difficult to audit.

00:03:13.170 --> 00:03:15.819
So, another way to proceed is maybe instead

00:03:15.819 --> 00:03:18.000
I study the system and the data carefully

00:03:18.000 --> 00:03:20.520
and I write down the best rules
for making a decision

00:03:20.520 --> 00:03:23.280
or, I can have a machine
dynamically figure out

00:03:23.280 --> 00:03:25.459
the best rules, as in machine learning.

00:03:25.459 --> 00:03:28.640
So, maybe this is a better approach.

00:03:28.640 --> 00:03:32.230
It's certainly fast, and thus cheap.

00:03:32.230 --> 00:03:34.290
And maybe I can construct
the system in such a way

00:03:34.290 --> 00:03:37.090
that it doesn't have the biases that are inherent

00:03:37.090 --> 00:03:39.209
in human decision making.

00:03:39.209 --> 00:03:41.560
And, since I've written these rules down,

00:03:41.560 --> 00:03:42.819
or a computer has learned these rules,

00:03:42.819 --> 00:03:45.140
then I can just show them to somebody, right?

00:03:45.140 --> 00:03:46.819
And then they can audit it.

00:03:46.819 --> 00:03:49.020
So, more and more decision making is being

00:03:49.020 --> 00:03:50.750
done in this way.

00:03:50.750 --> 00:03:53.170
And so, in this model, we take data

00:03:53.170 --> 00:03:55.709
we make an inference based on that data

00:03:55.709 --> 00:03:58.120
using these algorithms, and then

00:03:58.120 --> 00:03:59.420
we can take actions.

00:03:59.420 --> 00:04:01.860
And, when we take this more scientific approach

00:04:01.860 --> 00:04:04.200
to making decisions and optimizing for

00:04:04.200 --> 00:04:07.310
a desired outcome,
we can take an experimental approach

00:04:07.310 --> 00:04:10.080
so we can determine
which actions are most effective

00:04:10.080 --> 00:04:12.310
in achieving a desired outcome.

00:04:12.310 --> 00:04:14.010
Maybe there are some types of communication

00:04:14.010 --> 00:04:16.750
styles that are most effective
with certain people.

00:04:16.750 --> 00:04:19.510
I can perhaps deploy some individualized incentives

00:04:19.510 --> 00:04:22.060
to get the outcome that I desire.

00:04:22.060 --> 00:04:25.990
And, maybe even if I carefully design an experiment

00:04:25.990 --> 00:04:27.810
with the environment in which people make

00:04:27.810 --> 00:04:30.699
these decisions, perhaps even very small changes

00:04:30.699 --> 00:04:34.250
can introduce significant changes
in peoples' behavior.

00:04:34.250 --> 00:04:37.320
So, through these mechanisms,
and this experimental approach,

00:04:37.320 --> 00:04:39.840
I can maximize the probability
that humans do

00:04:39.840 --> 00:04:42.020
what I want.

00:04:42.020 --> 00:04:45.380
So, algorithmic decision making is being used

00:04:45.380 --> 00:04:47.270
in industry, and is used
in lots of other areas,

00:04:47.270 --> 00:04:49.530
from astrophysics to medicine, and is now

00:04:49.530 --> 00:04:52.199
moving into new domains, including

00:04:52.199 --> 00:04:53.990
government applications.

00:04:53.990 --> 00:04:58.560
So, we have recommendation engines like
Netflix, Yelp, SoundCloud,

00:04:58.560 --> 00:05:00.699
that direct our attention to what we should

00:05:00.699 --> 00:05:03.510
watch and listen to.

00:05:03.510 --> 00:05:07.919
Since 2009, Google uses
personalized searched results,

00:05:07.919 --> 00:05:12.840
including if you're not logged in
into your Google account.

00:05:12.840 --> 00:05:15.389
And we also have algorithm curation and filtering,

00:05:15.389 --> 00:05:17.530
as in the case of Facebook News Feed,

00:05:17.530 --> 00:05:19.870
Google News, Yahoo News,

00:05:19.870 --> 00:05:22.840
which shows you what news articles, for example,

00:05:22.840 --> 00:05:24.330
you should be looking at.

00:05:24.330 --> 00:05:25.650
And this is important, because a lot of people

00:05:25.650 --> 00:05:29.410
get news from these media.

00:05:29.410 --> 00:05:31.520
We even have algorithmic journalists!

00:05:31.520 --> 00:05:35.240
So, automatic systems generate articles

00:05:35.240 --> 00:05:36.880
about weather, traffic, or sports

00:05:36.880 --> 00:05:38.729
instead of a human.

00:05:38.729 --> 00:05:41.949
And, another application that's more recent

00:05:41.949 --> 00:05:43.570
is the use of predictive systems

00:05:43.570 --> 00:05:45.180
in political campaigns.

00:05:45.180 --> 00:05:47.370
So, political campaigns also now take this

00:05:47.370 --> 00:05:50.340
approach to predict on an individual basis

00:05:50.340 --> 00:05:53.300
which candidate voters
are likely to vote for.

00:05:53.300 --> 00:05:55.500
And then they can target,
on an individual basis,

00:05:55.500 --> 00:05:58.199
those that can be persuaded otherwise.

00:05:58.199 --> 00:06:00.830
And, finally, in the public sector,

00:06:00.830 --> 00:06:02.710
we're starting to use predictive systems

00:06:02.710 --> 00:06:06.320
in areas from policing, to health,
to education and energy.

00:06:06.320 --> 00:06:08.979
So, there are some advantages to this.

00:06:08.979 --> 00:06:12.790
So, one thing is that we can automate

00:06:12.790 --> 00:06:15.759
aspects of our lives
that we consider to be mundane

00:06:15.759 --> 00:06:17.620
using systems that are intelligent

00:06:17.620 --> 00:06:19.580
and adaptive enough.

00:06:19.580 --> 00:06:21.680
We can make use of all the data

00:06:21.680 --> 00:06:23.990
and really get the pieces of information we

00:06:23.990 --> 00:06:25.830
really care about.

00:06:25.830 --> 00:06:29.650
We can spend money in the most effective way,

00:06:29.650 --> 00:06:32.110
and we can do this with this experimental

00:06:32.110 --> 00:06:34.210
approach to optimize actions to produce

00:06:34.210 --> 00:06:35.190
desired outcomes.

00:06:35.190 --> 00:06:37.300
So, we can embed intelligence

00:06:37.300 --> 00:06:39.520
into all of these mundane objects

00:06:39.520 --> 00:06:41.180
and enable them to make decisions for us,

00:06:41.180 --> 00:06:42.860
and so that's what we're doing more and more,

00:06:42.860 --> 00:06:45.210
and we can have an object
that decides for us

00:06:45.210 --> 00:06:46.840
what temperature we should set our house,

00:06:46.840 --> 00:06:49.009
what we should be doing, etc.

00:06:49.009 --> 00:06:52.400
So, there might be some implications here.

00:06:52.400 --> 00:06:55.680
We want these systems
that do work on this data

00:06:55.680 --> 00:06:58.039
to increase the opportunities
available to us.

00:06:58.039 --> 00:07:00.259
But it might be that there are some implications

00:07:00.259 --> 00:07:01.780
that we have not carefully thought through.

00:07:01.780 --> 00:07:03.430
This is a new area, and people are only

00:07:03.430 --> 00:07:05.940
starting to scratch the surface of what the

00:07:05.940 --> 00:07:07.289
problems might be.

00:07:07.289 --> 00:07:09.600
In some cases, they might narrow the options

00:07:09.600 --> 00:07:10.990
available to people,

00:07:10.990 --> 00:07:13.199
and this approach subjects people to

00:07:13.199 --> 00:07:15.620
suggestive messaging intended to nudge them

00:07:15.620 --> 00:07:17.169
to a desired outcome.

00:07:17.169 --> 00:07:19.320
Some people may have a problem with that.

00:07:19.320 --> 00:07:20.650
Values we care about are not gonna be

00:07:20.650 --> 00:07:23.860
baked into these systems by default.

00:07:23.860 --> 00:07:25.960
It's also the case that some algorithmic systems

00:07:25.960 --> 00:07:28.300
facilitate work that we do not like.

00:07:28.300 --> 00:07:30.199
For example, in the case of mass surveillance.

00:07:30.199 --> 00:07:32.130
And even the same systems,

00:07:32.130 --> 00:07:34.039
used by different people or organizations,

00:07:34.039 --> 00:07:36.110
have very different consequences.

00:07:36.110 --> 00:07:37.320
For example, if I can predict

00:07:37.320 --> 00:07:40.020
with high accuracy, based on say search queries,

00:07:40.020 --> 00:07:42.050
who's gonna be admitted to a hospital,

00:07:42.050 --> 00:07:43.750
some people would be interested
in knowing that.

00:07:43.750 --> 00:07:46.120
You might be interested
in having your doctor know that.

00:07:46.120 --> 00:07:47.919
But that same predictive model
in the hands of

00:07:47.919 --> 00:07:50.569
an insurance company
has a very different implication.

00:07:50.569 --> 00:07:53.389
So, the point here is that these systems

00:07:53.389 --> 00:07:55.860
structure and influence how humans interact

00:07:55.860 --> 00:07:58.360
with each other, how they interact with society,

00:07:58.360 --> 00:07:59.850
and how they interact with government.

00:07:59.850 --> 00:08:03.080
And if they constrain what people can do,

00:08:03.080 --> 00:08:05.069
we should really care about this.

00:08:05.069 --> 00:08:08.270
So now I'm gonna go to
sort of an extreme case,

00:08:08.270 --> 00:08:11.930
just as an example, and that's this
Chinese Social Credit System.

00:08:11.930 --> 00:08:14.169
And so this is probably one of the more

00:08:14.169 --> 00:08:17.259
ambitious uses of data,

00:08:17.259 --> 00:08:18.880
that is used to rank each citizen

00:08:18.880 --> 00:08:21.190
based on their behavior, in China.

00:08:21.190 --> 00:08:24.210
So right now, there are various pilot systems

00:08:24.210 --> 00:08:27.660
deployed by various companies doing this in
China.

00:08:27.660 --> 00:08:30.729
They're currently voluntary, and by 2020

00:08:30.729 --> 00:08:32.630
this system is gonna be decided on,

00:08:32.630 --> 00:08:34.679
or a combination of the systems,

00:08:34.679 --> 00:08:37.409
that is gonna be mandatory for everyone.

00:08:37.409 --> 00:08:40.950
And so, in this system, there are some citizens,

00:08:40.950 --> 00:08:44.380
and a huge range of data sources are used.

00:08:44.380 --> 00:08:46.820
So, some of the data sources are

00:08:46.820 --> 00:08:48.360
your financial data,

00:08:48.360 --> 00:08:50.020
your criminal history,

00:08:50.020 --> 00:08:52.320
how many points you have
on your driver's license,

00:08:52.320 --> 00:08:55.360
medical information-- for example,
if you take birth control pills,

00:08:55.360 --> 00:08:56.810
that's incorporated.

00:08:56.810 --> 00:08:59.830
Your purchase history-- for example,
if you purchase games,

00:08:59.830 --> 00:09:02.430
you are down-ranked in the system.

00:09:02.430 --> 00:09:04.490
Some of the systems, not all of them,

00:09:04.490 --> 00:09:07.260
incorporate social media monitoring,

00:09:07.260 --> 00:09:09.200
which makes sense if you're a state like China,

00:09:09.200 --> 00:09:11.270
you probably want to know about

00:09:11.270 --> 00:09:14.899
political statements that people
are saying on social media.

00:09:14.899 --> 00:09:18.020
And, one of the more interesting parts is

00:09:18.020 --> 00:09:22.160
social network analysis:
looking at the relationships between people.

00:09:22.160 --> 00:09:24.270
So, if you have a close relationship with
somebody

00:09:24.270 --> 00:09:26.180
and they have a low credit score,

00:09:26.180 --> 00:09:29.130
that can have implications on your credit
score.

00:09:29.130 --> 00:09:34.440
So, the way that these scores
are generated is secret.

00:09:34.440 --> 00:09:38.140
And, according to the call for these systems

00:09:38.140 --> 00:09:39.270
put out by the government,

00:09:39.270 --> 00:09:42.810
the goal is to
"carry forward the sincerity and

00:09:42.810 --> 00:09:45.760
traditional virtues" and
establish the idea of a

00:09:45.760 --> 00:09:47.520
"sincerity culture."

00:09:47.520 --> 00:09:49.440
But wait, it gets better:

00:09:49.440 --> 00:09:52.450
so, there's a portal that enables citizens

00:09:52.450 --> 00:09:55.040
to look up the citizen score of anyone.

00:09:55.040 --> 00:09:56.520
And many people like this system,

00:09:56.520 --> 00:09:58.320
they think it's a fun game.

00:09:58.320 --> 00:10:00.700
They boast about it on social media,

00:10:00.700 --> 00:10:03.610
they put their score in their dating profile,

00:10:03.610 --> 00:10:04.760
because if you're ranked highly you're

00:10:04.760 --> 00:10:06.589
part of an exclusive club.

00:10:06.589 --> 00:10:10.060
You can get VIP treatment
at hotels and other companies.

00:10:10.060 --> 00:10:11.880
But the downside is that, if you're excluded

00:10:11.880 --> 00:10:15.540
from that club, your weak score
may have other implications,

00:10:15.540 --> 00:10:20.120
like being unable to get access
to credit, housing, jobs.

00:10:20.120 --> 00:10:23.399
There is some reporting that even travel visas

00:10:23.399 --> 00:10:27.000
might be restricted
if your score is particularly low.

00:10:27.000 --> 00:10:31.160
So, a system like this, for a state, is really

00:10:31.160 --> 00:10:34.690
the optimal solution
to the problem of the public.

00:10:34.690 --> 00:10:37.130
It constitutes a very subtle and insiduous

00:10:37.130 --> 00:10:39.350
mechanism of social control.

00:10:39.350 --> 00:10:41.209
You don't need to spend a lot of money on

00:10:41.209 --> 00:10:43.800
police or prisons if you can set up a system

00:10:43.800 --> 00:10:45.820
where people discourage one another from

00:10:45.820 --> 00:10:48.930
anti-social acts like political action
in exchange for

00:10:48.930 --> 00:10:51.430
a coupon for a free Uber ride.

00:10:51.430 --> 00:10:55.269
So, there are a lot of
legitimate questions here:

00:10:55.269 --> 00:10:58.370
What protections does
user data have in this scheme?

00:10:58.370 --> 00:11:01.279
Do any safeguards exist to prevent tampering?

00:11:01.279 --> 00:11:04.310
What mechanism, if any, is there to prevent

00:11:04.310 --> 00:11:08.810
false input data from creating erroneous inferences?

00:11:08.810 --> 00:11:10.420
Is there any way that people can fix

00:11:10.420 --> 00:11:12.540
their score once they're ranked poorly?

00:11:12.540 --> 00:11:13.899
Or does it end up becoming a

00:11:13.899 --> 00:11:15.720
self-fulfilling prophecy?

00:11:15.720 --> 00:11:17.850
Your weak score means you have less access

00:11:17.850 --> 00:11:21.620
to jobs and credit, and now you will have

00:11:21.620 --> 00:11:24.709
limited access to opportunity.

00:11:24.709 --> 00:11:27.110
So, let's take a step back.

00:11:27.110 --> 00:11:28.470
So, what do we want?

00:11:28.470 --> 00:11:31.540
So, we probably don't want that,

00:11:31.540 --> 00:11:33.570
but as advocates we really wanna

00:11:33.570 --> 00:11:36.130
understand what questions we should be asking

00:11:36.130 --> 00:11:37.510
of these systems. Right now there's

00:11:37.510 --> 00:11:39.570
very little oversight,

00:11:39.570 --> 00:11:41.420
and we wanna make sure that we don't

00:11:41.420 --> 00:11:44.029
sort of sleepwalk our way to a situation

00:11:44.029 --> 00:11:46.649
where we've lost even more power

00:11:46.649 --> 00:11:49.740
to these centralized systems of control.

00:11:49.740 --> 00:11:52.209
And if you're an implementer, we wanna understand

00:11:52.209 --> 00:11:53.709
what can we be doing better.

00:11:53.709 --> 00:11:56.019
Are there better ways that we can be implementing

00:11:56.019 --> 00:11:57.640
these systems?

00:11:57.640 --> 00:11:59.430
Are there values that, as humans,

00:11:59.430 --> 00:12:01.060
we care about that we should make sure

00:12:01.060 --> 00:12:02.420
these systems have?

00:12:02.420 --> 00:12:05.550
So, the first thing
that most people in the room

00:12:05.550 --> 00:12:07.820
might think about is privacy.

00:12:07.820 --> 00:12:10.510
Which is, of course, of the utmost importance.

00:12:10.510 --> 00:12:12.920
We need privacy, and there is a good discussion

00:12:12.920 --> 00:12:15.680
on the importance of protecting
user data where possible.

00:12:15.680 --> 00:12:18.420
So, in this talk, I'm gonna focus
on the other aspects of

00:12:18.420 --> 00:12:19.470
algorithmic decision making,

00:12:19.470 --> 00:12:21.190
that I think have got less attention.

00:12:21.190 --> 00:12:25.140
Because it's not just privacy
that we need to worry about here.

00:12:25.140 --> 00:12:28.519
We also want systems that are fair and equitable.

00:12:28.519 --> 00:12:30.240
We want transparent systems,

00:12:30.240 --> 00:12:35.110
we don't want opaque decisions
to be made about us,

00:12:35.110 --> 00:12:36.510
decisions that might have serious impacts

00:12:36.510 --> 00:12:37.779
on our lives.

00:12:37.779 --> 00:12:40.490
And we need some accountability mechanisms.

00:12:40.490 --> 00:12:41.890
So, for the rest of this talk

00:12:41.890 --> 00:12:43.230
we're gonna go through each one of these things

00:12:43.230 --> 00:12:45.230
and look at some examples.

00:12:45.230 --> 00:12:47.709
So, the first thing is fairness.

00:12:47.709 --> 00:12:50.450
And so, as I said in the beginning,
this is one area

00:12:50.450 --> 00:12:52.690
where there might be an advantage

00:12:52.690 --> 00:12:55.079
to making decisions by machine,

00:12:55.079 --> 00:12:56.740
especially in areas where there have

00:12:56.740 --> 00:12:59.410
historically been fairness issues with

00:12:59.410 --> 00:13:02.350
decision making, such as law enforcement.

00:13:02.350 --> 00:13:05.839
So, this is one way that police departments

00:13:05.839 --> 00:13:08.360
use predictive models.

00:13:08.360 --> 00:13:10.540
The idea here is police would like to

00:13:10.540 --> 00:13:13.450
allocate resources in a more effective way,

00:13:13.450 --> 00:13:15.050
and they would also like to enable

00:13:15.050 --> 00:13:16.640
proactive policing.

00:13:16.640 --> 00:13:20.110
So, if you can predict where crimes
are going to occur,

00:13:20.110 --> 00:13:22.149
or who is going to commit crimes,

00:13:22.149 --> 00:13:24.870
then you can put cops in those places,

00:13:24.870 --> 00:13:27.769
or perhaps following these people,

00:13:27.769 --> 00:13:29.300
and then the crimes will not occur.

00:13:29.300 --> 00:13:31.370
So, it's sort of the pre-crime approach.

00:13:31.370 --> 00:13:34.649
So, there are a few ways of going about this.

00:13:34.649 --> 00:13:37.920
One way is doing this individual-level prediction.

00:13:37.920 --> 00:13:41.089
So you take each citizen
and estimate the risk

00:13:41.089 --> 00:13:43.769
that each citizen will participate,
say, in violence

00:13:43.769 --> 00:13:45.279
based on some data.

00:13:45.279 --> 00:13:46.779
And then you can flag those people that are

00:13:46.779 --> 00:13:49.199
considered particularly violent.

00:13:49.199 --> 00:13:51.519
So, this is currently done.

00:13:51.519 --> 00:13:52.589
This is done in the U.S.

00:13:52.589 --> 00:13:56.120
It's done in Chicago,
by the Chicago Police Department.

00:13:56.120 --> 00:13:58.350
And they maintain a heat list of individuals

00:13:58.350 --> 00:14:00.790
that are considered most likely to commit,

00:14:00.790 --> 00:14:03.529
or be the victim of, violence.

00:14:03.529 --> 00:14:06.700
And this is done using data
that the police maintain.

00:14:06.700 --> 00:14:09.589
So, the features that are used
in this predictive model

00:14:09.589 --> 00:14:12.209
include things that are derived from

00:14:12.209 --> 00:14:14.610
individuals' criminal history.

00:14:14.610 --> 00:14:16.810
So, for example, have they been involved in

00:14:16.810 --> 00:14:18.350
gun violence in the past?

00:14:18.350 --> 00:14:21.450
Do they have narcotics arrests? And so on.

00:14:21.450 --> 00:14:22.860
But another thing that's incorporated

00:14:22.860 --> 00:14:25.060
in the Chicago Police Department model is

00:14:25.060 --> 00:14:28.300
information derived from
social media network analysis.

00:14:28.300 --> 00:14:30.630
So, who you interact with,

00:14:30.630 --> 00:14:32.279
as noted in police data.

00:14:32.279 --> 00:14:34.899
So, for example, your co-arrestees.

00:14:34.899 --> 00:14:36.440
When officers conduct field interviews,

00:14:36.440 --> 00:14:38.240
who are people interacting with?

00:14:38.240 --> 00:14:42.940
And then this is all incorporated
into this risk score.

00:14:42.940 --> 00:14:44.639
So another way to proceed,

00:14:44.639 --> 00:14:47.070
which is the method that most companies

00:14:47.070 --> 00:14:49.579
that sell products like this
to the police have taken,

00:14:49.579 --> 00:14:51.459
is instead predicting which areas

00:14:51.459 --> 00:14:53.810
are likely to have crimes committed in them.

00:14:53.810 --> 00:14:56.690
So, take my city, I put a grid down,

00:14:56.690 --> 00:14:58.180
and then I use crime statistics

00:14:58.180 --> 00:15:00.430
and maybe some ancillary data sources,

00:15:00.430 --> 00:15:01.790
to determine which areas have

00:15:01.790 --> 00:15:04.709
the highest risk of crimes occurring in them,

00:15:04.709 --> 00:15:06.329
and I can flag those areas and send

00:15:06.329 --> 00:15:08.470
police officers to them.

00:15:08.470 --> 00:15:10.950
So now, let's look at some of the tools

00:15:10.950 --> 00:15:14.010
that are used for this geographic-level prediction.

00:15:14.010 --> 00:15:19.040
So, here are 3 companies that sell these

00:15:19.040 --> 00:15:22.910
geographic-level predictive policing systems.

00:15:22.910 --> 00:15:25.639
So, PredPol has a system that uses

00:15:25.639 --> 00:15:27.200
primarily crime statistics:

00:15:27.200 --> 00:15:30.209
only the time, place, and type of crime

00:15:30.209 --> 00:15:33.040
to predict where crimes will occur.

00:15:33.040 --> 00:15:35.970
HunchLab uses a wider range of data sources

00:15:35.970 --> 00:15:37.260
including, for example, weather

00:15:37.260 --> 00:15:39.720
and then Hitachi is a newer system

00:15:39.720 --> 00:15:42.100
that has a predictive crime analytics tool

00:15:42.100 --> 00:15:44.779
that also incorporates social media.

00:15:44.779 --> 00:15:47.850
The first one, to my knowledge, to do so.

00:15:47.850 --> 00:15:49.399
And these systems are in use

00:15:49.399 --> 00:15:52.820
in 50+ cities in the U.S.

00:15:52.820 --> 00:15:56.540
So, why do police departments buy this?

00:15:56.540 --> 00:15:57.760
Some police departments are interesting in

00:15:57.760 --> 00:16:00.500
buying systems like this, because they're marketed

00:16:00.500 --> 00:16:02.660
as impartial systems,

00:16:02.660 --> 00:16:06.199
so it's a way to police in an unbiased way.

00:16:06.199 --> 00:16:08.040
And so, these companies make

00:16:08.040 --> 00:16:08.670
statements like this--

00:16:08.670 --> 00:16:10.800
by the way, the references
will all be at the end,

00:16:10.800 --> 00:16:12.560
and they'll be on the slides--

00:16:12.560 --> 00:16:13.370
So, for example

00:16:13.370 --> 00:16:16.110
the predictive crime analytics from Hitachi

00:16:16.110 --> 00:16:17.610
claims that the system is anonymous,

00:16:17.610 --> 00:16:19.350
because it shows you an area,

00:16:19.350 --> 00:16:23.060
it doesn't show you
to look for a particular person.

00:16:23.060 --> 00:16:25.699
and PredPol reassures people that

00:16:25.699 --> 00:16:29.560
it eliminates any liberties or profiling concerns.

00:16:29.560 --> 00:16:32.269
And HunchLab notes that the system

00:16:32.269 --> 00:16:35.170
fairly represents priorities for public safety

00:16:35.170 --> 00:16:38.769
and is unbiased by race
or ethnicity, for example.

00:16:38.769 --> 00:16:43.529
So, let's take a minute
to describe in more detail

00:16:43.529 --> 00:16:48.100
what we mean when we talk about fairness.

00:16:48.100 --> 00:16:51.300
So, when we talk about fairness,

00:16:51.300 --> 00:16:52.740
we mean a few things.

00:16:52.740 --> 00:16:56.070
So, one is fairness with respect to individuals:

00:16:56.070 --> 00:16:58.040
so if I'm very similar to somebody

00:16:58.040 --> 00:17:00.170
and we go through some process

00:17:00.170 --> 00:17:03.430
and there is two very different
outcomes to that process

00:17:03.430 --> 00:17:05.679
we would consider that to be unfair.

00:17:05.679 --> 00:17:07.929
So, we want similar people to be treated

00:17:07.929 --> 00:17:09.539
in a similar way.

00:17:09.539 --> 00:17:13.079
But, there are certain protected attributes

00:17:13.079 --> 00:17:15.199
that we wouldn't want someone

00:17:15.199 --> 00:17:17.099
to discriminate based on.

00:17:17.099 --> 00:17:20.069
And so, there's this other property,
Group Fairness.

00:17:20.069 --> 00:17:22.249
So, we can look at the statistical parity

00:17:22.249 --> 00:17:25.439
between groups, based on gender, race, etc.

00:17:25.439 --> 00:17:28.049
and see if they're treated in a similar way.

00:17:28.049 --> 00:17:30.409
And we might not expect that in some cases,

00:17:30.409 --> 00:17:32.429
for example if the base rates in each group

00:17:32.429 --> 00:17:34.659
are very different.

00:17:34.659 --> 00:17:36.889
And then there's also Fairness in Errors.

00:17:36.889 --> 00:17:40.080
All predictive systems are gonna make errors,

00:17:40.080 --> 00:17:42.989
and if the errors are concentrated,

00:17:42.989 --> 00:17:46.399
then that may also represent unfairness.

00:17:46.399 --> 00:17:50.149
And so this concern arose recently with Facebook

00:17:50.149 --> 00:17:52.289
because people with Native American names

00:17:52.289 --> 00:17:54.389
had their profiles flagged as fraudulent

00:17:54.389 --> 00:17:58.759
far more often than those
with White American names.

00:17:58.759 --> 00:18:00.559
So these are the sorts of things
that we worry about

00:18:00.559 --> 00:18:02.190
and each of these are metrics,

00:18:02.190 --> 00:18:04.239
and if you're interested more you should

00:18:04.239 --> 00:18:06.159
check those 2 papers out.

00:18:06.159 --> 00:18:10.639
So, how can potential issues
with predictive policing

00:18:10.639 --> 00:18:13.850
have implications for these principles?

00:18:13.850 --> 00:18:18.559
So, one problem is
the training data that's used.

00:18:18.559 --> 00:18:21.059
Some of these systems only use crime statistics,

00:18:21.059 --> 00:18:23.600
other systems-- all of them use crime statistics

00:18:23.600 --> 00:18:25.619
in some way.

00:18:25.619 --> 00:18:31.419
So, one problem is that crime databases

00:18:31.419 --> 00:18:34.830
contain only crimes that've been detected.

00:18:34.830 --> 00:18:38.629
Right? So, the police are only gonna detect

00:18:38.629 --> 00:18:41.009
crimes that they know are happening,

00:18:41.009 --> 00:18:44.109
either through patrol and their own investigation

00:18:44.109 --> 00:18:46.320
or because they've been alerted to crime,

00:18:46.320 --> 00:18:48.789
for example by a citizen calling the police.

00:18:48.789 --> 00:18:52.179
So, a citizen has to feel like
they <i>can</i> call the police,

00:18:52.179 --> 00:18:54.019
like that's a good idea.

00:18:54.019 --> 00:18:58.789
So, some crimes suffer
from this problem less than others:

00:18:58.789 --> 00:19:02.249
for example, gun violence
is much easier to detect

00:19:02.249 --> 00:19:03.639
relative to fraud, for example,

00:19:03.639 --> 00:19:07.509
which is very difficult to detect.

00:19:07.509 --> 00:19:11.940
Now the racial profiling aspect
of this might come in

00:19:11.940 --> 00:19:15.590
because of biased policing in the past.

00:19:15.590 --> 00:19:19.999
So, for example, for marijuana arrests,

00:19:19.999 --> 00:19:22.619
black people are arrested in the U.S. at rates

00:19:22.619 --> 00:19:25.119
4 times that of white people,

00:19:25.119 --> 00:19:27.960
even though there is statistical parity

00:19:27.960 --> 00:19:31.389
with these 2 groups, to within a few percent.

00:19:31.389 --> 00:19:35.820
So, this is where problems can arise.

00:19:35.820 --> 00:19:37.159
So, let's go back to this

00:19:37.159 --> 00:19:38.749
geographic-level predictive policing.

00:19:38.749 --> 00:19:42.460
So the danger here is that, unless this system

00:19:42.460 --> 00:19:44.299
is very carefully constructed,

00:19:44.299 --> 00:19:47.090
this sort of crime area ranking might

00:19:47.090 --> 00:19:49.019
again become a self-fulling prophecy.

00:19:49.019 --> 00:19:51.460
If you send police officers to these areas,

00:19:51.460 --> 00:19:53.220
you further scrutinize them,

00:19:53.220 --> 00:19:55.659
and then again you're only detecting a subset

00:19:55.659 --> 00:19:57.979
of crimes, and the cycle continues.

00:19:57.979 --> 00:20:02.139
So, one obvious issue is that

00:20:02.139 --> 00:20:07.599
this statement about geographic-based
crime prediction

00:20:07.599 --> 00:20:10.229
being anonymous is not true,

00:20:10.229 --> 00:20:13.159
because race and location are very strongly

00:20:13.159 --> 00:20:14.840
correlated in the U.S.

00:20:14.840 --> 00:20:16.609
And this is something that machine-learning
systems

00:20:16.609 --> 00:20:20.049
can potentially learn.

00:20:20.049 --> 00:20:23.039
Another issue is that, for example,

00:20:23.039 --> 00:20:25.580
for individual fairness, one of my homes

00:20:25.580 --> 00:20:27.599
sits within one of these boxes.

00:20:27.599 --> 00:20:29.950
Some of these boxes
in these systems are very small,

00:20:29.950 --> 00:20:33.399
for example PredPol is 500ft x 500ft,

00:20:33.399 --> 00:20:36.349
so it's maybe only a few houses.

00:20:36.349 --> 00:20:39.149
So, the implications of this system are that

00:20:39.149 --> 00:20:40.849
you have police officers maybe sitting

00:20:40.849 --> 00:20:42.979
in a police cruiser outside your home

00:20:42.979 --> 00:20:45.450
and a few doors down someone

00:20:45.450 --> 00:20:46.799
may not be within that box,

00:20:46.799 --> 00:20:48.159
and doesn't have this.

00:20:48.159 --> 00:20:51.399
So, that may represent unfairness.

00:20:51.399 --> 00:20:54.929
So, there are real questions here,

00:20:54.929 --> 00:20:57.720
especially because there's no opt-out.

00:20:57.720 --> 00:21:00.059
There's no way to opt-out of this system:

00:21:00.059 --> 00:21:02.239
if you live in a city that has this,

00:21:02.239 --> 00:21:04.909
then you have to deal with it.

00:21:04.909 --> 00:21:07.229
So, it's quite difficult to find out

00:21:07.229 --> 00:21:09.879
what's really going on

00:21:09.879 --> 00:21:11.169
because the algorithm is secret.

00:21:11.169 --> 00:21:13.049
And, in most cases, we don't know

00:21:13.049 --> 00:21:14.789
the full details of the inputs.

00:21:14.789 --> 00:21:16.679
We have some idea
about what features are used,

00:21:16.679 --> 00:21:17.970
but that's about it.

00:21:17.970 --> 00:21:19.509
We also don't know the output.

00:21:19.509 --> 00:21:21.899
That would be knowing police allocation,

00:21:21.899 --> 00:21:23.179
police strategies,

00:21:23.179 --> 00:21:26.299
and in order to nail down
what's really going on here

00:21:26.299 --> 00:21:28.609
in order to verify the validity of

00:21:28.609 --> 00:21:30.009
these companies' claims,

00:21:30.009 --> 00:21:33.799
it may be necessary
to have a 3rd party come in,

00:21:33.799 --> 00:21:35.629
examine the inputs and outputs of the system,

00:21:35.629 --> 00:21:37.590
and say concretely what's going on.

00:21:37.590 --> 00:21:39.460
And if everything is fine and dandy

00:21:39.460 --> 00:21:40.929
then this shouldn't be a problem.

00:21:40.929 --> 00:21:43.619
So, that's potentially one role that

00:21:43.619 --> 00:21:44.769
advocates can play.

00:21:44.769 --> 00:21:46.720
Maybe we should start pushing for audits

00:21:46.720 --> 00:21:48.820
of systems that are used in this way.

00:21:48.820 --> 00:21:50.970
These could have serious implications

00:21:50.970 --> 00:21:52.679
for peoples' lives.

00:21:52.679 --> 00:21:55.249
So, we'll return
to this idea a little bit later,

00:21:55.249 --> 00:21:58.210
but for now this leads us
nicely to Transparency.

00:21:58.210 --> 00:21:59.419
So, we wanna know

00:21:59.419 --> 00:22:01.929
what these systems are doing.

00:22:01.929 --> 00:22:04.729
But it's very hard,
for the reasons described earlier,

00:22:04.729 --> 00:22:06.139
but even in the case of something like

00:22:06.139 --> 00:22:09.849
trying to understand Google's search algorithm,

00:22:09.849 --> 00:22:11.679
it's difficult because it's personalized.

00:22:11.679 --> 00:22:13.529
So, by construction, each user is

00:22:13.529 --> 00:22:15.320
only seeing one endpoint.

00:22:15.320 --> 00:22:18.169
So, it's a very isolating system.

00:22:18.169 --> 00:22:20.349
What do other people see?

00:22:20.349 --> 00:22:22.409
And one reason it's difficult to make

00:22:22.409 --> 00:22:24.099
some of these systems transparent

00:22:24.099 --> 00:22:26.679
is because of, simply, the complexity

00:22:26.679 --> 00:22:27.950
of the algorithms.

00:22:27.950 --> 00:22:30.309
So, an algorithm can become so complex that

00:22:30.309 --> 00:22:31.669
it's difficult to comprehend,

00:22:31.669 --> 00:22:33.289
even for the designer of the system,

00:22:33.289 --> 00:22:35.509
or the implementer of the system.

00:22:35.509 --> 00:22:38.419
The designed might know that this algorithm

00:22:38.419 --> 00:22:42.889
maximizes some metric-- say, accuracy,

00:22:42.889 --> 00:22:44.570
but they may not always have a solid

00:22:44.570 --> 00:22:46.779
understanding of what the algorithm is doing

00:22:46.779 --> 00:22:48.330
for all inputs.

00:22:48.330 --> 00:22:50.970
Certainly with respect to fairness.

00:22:50.970 --> 00:22:55.759
So, in some cases,
it might not be appropriate to use

00:22:55.759 --> 00:22:57.379
an extremely complex model.

00:22:57.379 --> 00:22:59.529
It might be better to use a simpler system

00:22:59.529 --> 00:23:02.910
with human-interpretable features.

00:23:02.910 --> 00:23:04.749
Another issue that arises

00:23:04.749 --> 00:23:07.559
from the opacity of these systems

00:23:07.559 --> 00:23:09.409
and the centralized control

00:23:09.409 --> 00:23:11.860
is that it makes them very influential.

00:23:11.860 --> 00:23:13.950
And thus, an excellent target

00:23:13.950 --> 00:23:16.210
for manipulation or tampering.

00:23:16.210 --> 00:23:18.479
So, this might be tampering that is done

00:23:18.479 --> 00:23:21.950
from an organization that controls the system,

00:23:21.950 --> 00:23:23.769
or an insider at one of the organizations,

00:23:23.769 --> 00:23:27.139
or anyone who's able to compromise their security.

00:23:27.139 --> 00:23:30.249
So, this is an interesting academic work

00:23:30.249 --> 00:23:32.099
that looked at the possibility of

00:23:32.099 --> 00:23:34.159
slightly modifying search rankings

00:23:34.159 --> 00:23:36.619
to shift people's political views.

00:23:36.619 --> 00:23:39.009
So, since people are most likely to

00:23:39.009 --> 00:23:41.330
click on the top search results,

00:23:41.330 --> 00:23:44.429
so 90% of clicks go to the
first page of search results,

00:23:44.429 --> 00:23:46.719
then perhaps by reshuffling
things a little bit,

00:23:46.719 --> 00:23:48.729
or maybe dropping some search results,

00:23:48.729 --> 00:23:50.269
you can influence people's views

00:23:50.269 --> 00:23:51.679
in a coherent way,

00:23:51.679 --> 00:23:53.090
and maybe you can make it so subtle

00:23:53.090 --> 00:23:55.749
that no one is able to notice.

00:23:55.749 --> 00:23:57.249
So in this academic study,

00:23:57.249 --> 00:24:00.349
they did an experiment

00:24:00.349 --> 00:24:02.070
in the 2014 Indian election.

00:24:02.070 --> 00:24:04.219
So they used real voters,

00:24:04.219 --> 00:24:06.450
and they kept the size
of the experiment small enough

00:24:06.450 --> 00:24:08.190
that it was not going to influence the outcome

00:24:08.190 --> 00:24:10.090
of the election.

00:24:10.090 --> 00:24:12.139
So the researchers took people,

00:24:12.139 --> 00:24:14.229
they determined their political leaning,

00:24:14.229 --> 00:24:17.429
and they segmented them into
control and treatment groups,

00:24:17.429 --> 00:24:19.269
where the treatment was manipulation

00:24:19.269 --> 00:24:21.210
of the search ranking results,

00:24:21.210 --> 00:24:24.409
And then they had these people
browse the web.

00:24:24.409 --> 00:24:25.969
And what they found, is that

00:24:25.969 --> 00:24:28.229
this mechanism is very effective at shifting

00:24:28.229 --> 00:24:30.429
people's voter preferences.

00:24:30.429 --> 00:24:33.649
So, in this study, they were able to introduce

00:24:33.649 --> 00:24:36.849
a 20% shift in voter preferences.

00:24:36.849 --> 00:24:39.299
Even alerting users to the fact that this

00:24:39.299 --> 00:24:41.729
was going to be done, telling them

00:24:41.729 --> 00:24:44.049
"we are going to manipulate your search results,"

00:24:44.049 --> 00:24:45.729
"really pay attention,"

00:24:45.729 --> 00:24:49.099
they were totally unable to decrease

00:24:49.099 --> 00:24:50.859
the magnitude of the effect.

00:24:50.859 --> 00:24:55.109
So, the margins of error in many elections

00:24:55.109 --> 00:24:57.669
is incredibly small,

00:24:57.669 --> 00:24:59.929
and the authors estimate that this shift

00:24:59.929 --> 00:25:02.009
could change the outcome of about

00:25:02.009 --> 00:25:07.109
25% of elections worldwide, if this were done.

00:25:07.109 --> 00:25:10.919
And the bias is so small that no one can tell.

00:25:10.919 --> 00:25:14.279
So, all humans, no matter how smart

00:25:14.279 --> 00:25:17.109
and resistant to manipulation
we think we are,

00:25:17.109 --> 00:25:21.909
all of us are subject to this sort of manipulation,

00:25:21.909 --> 00:25:24.320
and we really can't tell.

00:25:24.320 --> 00:25:27.129
So, I'm not saying that this is occurring,

00:25:27.129 --> 00:25:31.389
but right now there is no
regulation to stop this,

00:25:31.389 --> 00:25:34.409
there is no way we could reliably detect this,

00:25:34.409 --> 00:25:37.210
so there's a huge amount of power here.

00:25:37.210 --> 00:25:39.779
So, something to think about.

00:25:39.779 --> 00:25:42.710
But it's not only corporations that are interested

00:25:42.710 --> 00:25:47.269
in this sort of behavioral manipulation.

00:25:47.269 --> 00:25:51.119
In 2010, UK Prime Minister David Cameron

00:25:51.119 --> 00:25:54.969
created this UK Behavioural Insights Team,

00:25:54.969 --> 00:25:57.269
which is informally called the Nudge Unit.

00:25:57.269 --> 00:26:01.489
And so what they do is
they use behavioral science

00:26:01.489 --> 00:26:04.769
and this predictive analytics approach,

00:26:04.769 --> 00:26:06.119
with experimentation,

00:26:06.119 --> 00:26:07.940
to have people make better decisions

00:26:07.940 --> 00:26:09.690
for themselves and society--

00:26:09.690 --> 00:26:11.989
as determined by the UK government.

00:26:11.989 --> 00:26:14.269
And as of a few months ago,

00:26:14.269 --> 00:26:16.849
after an executive order signed by Obama

00:26:16.849 --> 00:26:19.349
in September, the United States now has

00:26:19.349 --> 00:26:21.429
its own Nudge Unit.

00:26:21.429 --> 00:26:24.009
So, to be clear, I don't think that this is

00:26:24.009 --> 00:26:25.539
some sort of malicious plot.

00:26:25.539 --> 00:26:27.440
I think that there <i>can</i> be huge value

00:26:27.440 --> 00:26:29.489
in these sorts of initiatives,

00:26:29.489 --> 00:26:31.330
positively impacting people's lives,

00:26:31.330 --> 00:26:34.179
but when this sort of behavioral manipulation

00:26:34.179 --> 00:26:37.289
is being done, in part openly,

00:26:37.289 --> 00:26:39.460
oversight is pretty important,

00:26:39.460 --> 00:26:41.700
and we really need to consider

00:26:41.700 --> 00:26:46.090
what these systems are optimizing for.

00:26:46.090 --> 00:26:47.849
And that's something that we might

00:26:47.849 --> 00:26:52.090
not always know, or at least understand,

00:26:52.090 --> 00:26:54.450
so for example, for industry,

00:26:54.450 --> 00:26:57.679
we do have a pretty good understanding there:

00:26:57.679 --> 00:26:59.809
industry cares about optimizing for

00:26:59.809 --> 00:27:01.960
the time spent on the website,

00:27:01.960 --> 00:27:04.929
Facebook wants you to spend more time on Facebook,

00:27:04.929 --> 00:27:06.950
they want you to click on ads,

00:27:06.950 --> 00:27:09.109
click on newsfeed items,

00:27:09.109 --> 00:27:11.299
they want you to like things.

00:27:11.299 --> 00:27:14.309
And, fundamentally: profit.

00:27:14.309 --> 00:27:17.599
So, already this has some serious implications,

00:27:17.599 --> 00:27:19.690
and this had pretty serious implications

00:27:19.690 --> 00:27:22.190
in the last 10 years, in media for example.

00:27:22.190 --> 00:27:25.119
The optimizing for click-through rate in journalism

00:27:25.119 --> 00:27:26.629
has produced a race to the bottom

00:27:26.629 --> 00:27:28.039
in terms of quality.

00:27:28.039 --> 00:27:30.919
And another issue is that optimizing

00:27:30.919 --> 00:27:34.589
for what people like might not always be

00:27:34.589 --> 00:27:35.839
the best approach.

00:27:35.839 --> 00:27:38.859
So, Facebook officials have said publicly

00:27:38.859 --> 00:27:41.279
about how Facebook's goal is to make you happy,

00:27:41.279 --> 00:27:43.149
they want you to open that newsfeed

00:27:43.149 --> 00:27:45.080
and just feel great.

00:27:45.080 --> 00:27:47.379
But, there's an issue there, right?

00:27:47.379 --> 00:27:50.169
Because people get their news,

00:27:50.169 --> 00:27:52.369
like 40% of people according to Pew Research,

00:27:52.369 --> 00:27:54.599
get their news from Facebook.

00:27:54.599 --> 00:27:58.460
So, if people don't want to see

00:27:58.460 --> 00:28:01.239
war and corpses,
because it makes them feel sad,

00:28:01.239 --> 00:28:04.179
so this is not a system that is gonna optimize

00:28:04.179 --> 00:28:07.149
for an informed population.

00:28:07.149 --> 00:28:09.359
It's not gonna produce a population that is

00:28:09.359 --> 00:28:11.469
ready to engage in civic life.

00:28:11.469 --> 00:28:13.059
It's gonna produce an amused populations

00:28:13.059 --> 00:28:16.809
whose time is occupied by cat pictures.

00:28:16.809 --> 00:28:19.159
So, in politics, we have a similar

00:28:19.159 --> 00:28:21.269
optimization problem that's occurring.

00:28:21.269 --> 00:28:23.769
So, these political campaigns that use

00:28:23.769 --> 00:28:26.769
these predictive systems,

00:28:26.769 --> 00:28:28.669
are optimizing for votes for the desired candidate,

00:28:28.669 --> 00:28:30.200
of course.

00:28:30.200 --> 00:28:33.499
So, instead of a political campaign being

00:28:33.499 --> 00:28:36.139
--well, maybe this is a naive view, but--

00:28:36.139 --> 00:28:38.070
being an open discussion of the issues

00:28:38.070 --> 00:28:39.830
facing the country,

00:28:39.830 --> 00:28:43.200
it becomes this micro-targeted
persuasion game,

00:28:43.200 --> 00:28:44.669
and the people that get targeted

00:28:44.669 --> 00:28:47.349
are a very small subset of all people,

00:28:47.349 --> 00:28:49.399
and it's only gonna be people that are

00:28:49.399 --> 00:28:51.409
you know, on the edge, maybe disinterested,

00:28:51.409 --> 00:28:54.399
those are the people that are gonna get attention

00:28:54.399 --> 00:28:58.839
from political candidates.

00:28:58.839 --> 00:29:01.869
In policy, as with these Nudge Units,

00:29:01.869 --> 00:29:03.539
they're being used to enable

00:29:03.539 --> 00:29:06.109
better use of government services.

00:29:06.109 --> 00:29:07.419
There are some good projects that have

00:29:07.419 --> 00:29:09.419
come out of this:

00:29:09.419 --> 00:29:11.409
increasing voter registration,

00:29:11.409 --> 00:29:12.739
improving health outcomes,

00:29:12.739 --> 00:29:14.419
improving education outcomes.

00:29:14.419 --> 00:29:16.419
But some of these predictive systems

00:29:16.419 --> 00:29:18.229
that we're starting to see in government

00:29:18.229 --> 00:29:20.700
are optimizing for compliance,

00:29:20.700 --> 00:29:23.669
as is the case with predictive policing.

00:29:23.669 --> 00:29:25.460
So this is something that we need to

00:29:25.460 --> 00:29:28.649
watch carefully.

00:29:28.649 --> 00:29:30.119
I think this is a nice quote that

00:29:30.119 --> 00:29:33.339
sort of describes the problem.

00:29:33.339 --> 00:29:35.200
In some ways me might be narrowing

00:29:35.200 --> 00:29:38.259
our horizon, and the danger is that

00:29:38.259 --> 00:29:41.989
these tools are separating people.

00:29:41.989 --> 00:29:43.570
And this is particularly bad

00:29:43.570 --> 00:29:45.940
for political action, because political action

00:29:45.940 --> 00:29:49.879
requires people to have shared experience,

00:29:49.879 --> 00:29:53.799
and thus are able to collectively act

00:29:53.799 --> 00:29:57.629
to exert pressure to fix problems.

00:29:57.629 --> 00:30:00.810
So, finally: accountability.

00:30:00.810 --> 00:30:03.399
So, we need some oversight mechanisms.

00:30:03.399 --> 00:30:06.519
For example, in the case of errors--

00:30:06.519 --> 00:30:08.219
so this is particularly important for

00:30:08.219 --> 00:30:10.849
civil or bureaucratic systems.

00:30:10.849 --> 00:30:14.330
So, when an algorithm produces some decision,

00:30:14.330 --> 00:30:16.549
we don't always want humans to just

00:30:16.549 --> 00:30:18.039
defer to the machine,

00:30:18.039 --> 00:30:21.859
and that might represent one of the problems.

00:30:21.859 --> 00:30:25.419
So, there are starting to be some cases

00:30:25.419 --> 00:30:28.039
of computer algorithms yielding a decision,

00:30:28.039 --> 00:30:30.409
and then humans being unable to correct

00:30:30.409 --> 00:30:31.799
an obvious error.

00:30:31.799 --> 00:30:35.190
So there's this case in Georgia,
in the United States,

00:30:35.190 --> 00:30:37.259
where 2 young people went to

00:30:37.259 --> 00:30:38.529
the Department of Motor Vehicles,

00:30:38.529 --> 00:30:39.749
they're twins, and they went

00:30:39.749 --> 00:30:42.099
to get their driver's license.

00:30:42.099 --> 00:30:44.979
However, they were both flagged by

00:30:44.979 --> 00:30:47.489
a fraud algorithm that uses facial recognition

00:30:47.489 --> 00:30:48.809
to look for similar faces,

00:30:48.809 --> 00:30:50.919
and I guess the people that designed the system

00:30:50.919 --> 00:30:54.549
didn't think of the possibility of twins.

00:30:54.549 --> 00:30:58.489
Yeah.
So, they just left

00:30:58.489 --> 00:30:59.889
without their driver's licenses.

00:30:59.889 --> 00:31:01.889
The people in the Department of Motor Vehicles

00:31:01.889 --> 00:31:03.809
were unable to correct this.

00:31:03.809 --> 00:31:06.820
So, this is one implication--

00:31:06.820 --> 00:31:08.579
it's like something out of Kafka.

00:31:08.579 --> 00:31:11.529
But there are also cases of errors being made,

00:31:11.529 --> 00:31:13.879
and people not noticing until

00:31:13.879 --> 00:31:15.909
after actions have been taken,

00:31:15.909 --> 00:31:17.570
some of them very serious--

00:31:17.570 --> 00:31:19.129
because people simply deferred

00:31:19.129 --> 00:31:20.619
to the machine.

00:31:20.619 --> 00:31:23.309
So, this is an example from San Francisco.

00:31:23.309 --> 00:31:26.679
So, an ALPR-- an Automated License Plate Reader--

00:31:26.679 --> 00:31:29.429
is a device that uses image recognition

00:31:29.429 --> 00:31:32.099
to detect and read license plates,

00:31:32.099 --> 00:31:34.339
and usually to compare license plates

00:31:34.339 --> 00:31:37.159
with a known list of plates of interest.

00:31:37.159 --> 00:31:39.799
And, so, San Francisco uses these

00:31:39.799 --> 00:31:42.179
and they're mounted on police cars.

00:31:42.179 --> 00:31:46.659
So, in this case, San Francisco ALPR

00:31:46.659 --> 00:31:48.879
got a hit on a car,

00:31:48.879 --> 00:31:53.029
and it was the car of a 47-year-old woman,

00:31:53.029 --> 00:31:54.839
with no criminal history.

00:31:54.839 --> 00:31:56.029
And so it was a false hit

00:31:56.029 --> 00:31:58.099
because it was a blurry image,

00:31:58.099 --> 00:31:59.709
and it matched erroneously with

00:31:59.709 --> 00:32:00.909
one of the plates of interest

00:32:00.909 --> 00:32:03.479
that happened to be a stolen vehicle.

00:32:03.479 --> 00:32:06.869
So, they conducted a traffic stop on her,

00:32:06.869 --> 00:32:09.330
and they take her out of the vehicle,

00:32:09.330 --> 00:32:11.049
they search her and the vehicle,

00:32:11.049 --> 00:32:12.659
she gets a pat-down,

00:32:12.659 --> 00:32:14.849
and they have her kneel

00:32:14.849 --> 00:32:17.780
at gunpoint, in the street.

00:32:17.780 --> 00:32:20.989
So, how much oversight should be present

00:32:20.989 --> 00:32:23.999
depends on the implications of the system.

00:32:23.999 --> 00:32:25.279
It's certainly the case that

00:32:25.279 --> 00:32:26.910
for some of these decision-making systems,

00:32:26.910 --> 00:32:29.219
an error might not be that important,

00:32:29.219 --> 00:32:31.149
it could be relatively harmless,

00:32:31.149 --> 00:32:33.559
but in this case,
an error in this algorithmic decision

00:32:33.559 --> 00:32:36.259
led to this totally innocent person

00:32:36.259 --> 00:32:40.019
literally having a gun pointed at her.

00:32:40.019 --> 00:32:44.019
So, that brings us to: we need some way of

00:32:44.019 --> 00:32:45.419
getting some information about

00:32:45.419 --> 00:32:47.249
what is going on here.

00:32:47.249 --> 00:32:50.179
We don't wanna have to wait for these events

00:32:50.179 --> 00:32:52.580
before we are able to determine

00:32:52.580 --> 00:32:54.409
some information about the system.

00:32:54.409 --> 00:32:56.139
So, auditing is one option:

00:32:56.139 --> 00:32:58.109
to independently verify the statements

00:32:58.109 --> 00:33:00.809
of companies, in situations where we have

00:33:00.809 --> 00:33:02.939
inputs and outputs.

00:33:02.939 --> 00:33:05.200
So, for example, this could be done with

00:33:05.200 --> 00:33:07.489
Google, Facebook.

00:33:07.489 --> 00:33:09.190
If you have the inputs of a system,

00:33:09.190 --> 00:33:10.649
say you have test accounts,

00:33:10.649 --> 00:33:11.729
or real accounts,

00:33:11.729 --> 00:33:14.359
maybe you can collect
people's information together.

00:33:14.359 --> 00:33:15.830
So that was something that was done

00:33:15.830 --> 00:33:18.759
during the 2012 Obama campaign

00:33:18.759 --> 00:33:20.249
by ProPublica.

00:33:20.249 --> 00:33:21.269
People noticed that they were getting

00:33:21.269 --> 00:33:24.739
different emails from the Obama campaign,

00:33:24.739 --> 00:33:26.009
and were interested to see

00:33:26.009 --> 00:33:28.209
based on what factors

00:33:28.209 --> 00:33:29.749
the emails were changing.

00:33:29.749 --> 00:33:32.659
So, I think about 200 people submitted emails

00:33:32.659 --> 00:33:34.940
and they were able to determine some information

00:33:34.940 --> 00:33:38.809
about what the emails
were being varied based on.

00:33:38.809 --> 00:33:40.859
So there have been some successful

00:33:40.859 --> 00:33:43.080
attempts at this.

00:33:43.080 --> 00:33:45.919
So, compare inputs and then look at

00:33:45.919 --> 00:33:48.709
why one item was shown to one user

00:33:48.709 --> 00:33:50.289
and not another, and see if there's

00:33:50.289 --> 00:33:51.879
any statistical differences.

00:33:51.879 --> 00:33:56.279
So, there's some potential legal issues

00:33:56.279 --> 00:33:57.749
with the test accounts, so that's something

00:33:57.749 --> 00:34:01.499
to think about-- I'm not a lawyer.

00:34:01.499 --> 00:34:03.919
So, for example, if you wanna examine

00:34:03.919 --> 00:34:06.269
ad-targeting algorithms,

00:34:06.269 --> 00:34:07.969
one way to proceed is to construct

00:34:07.969 --> 00:34:10.589
a browsing profile, and then examine

00:34:10.589 --> 00:34:12.989
what ads are served back to you.

00:34:12.989 --> 00:34:14.119
And so this is something that

00:34:14.119 --> 00:34:16.250
academic researchers have looked at,

00:34:16.250 --> 00:34:17.489
because, at the time at least,

00:34:17.489 --> 00:34:20.879
you didn't need to make an account to do this.

00:34:20.879 --> 00:34:24.768
So, this was a study that was presented at

00:34:24.768 --> 00:34:27.799
Privacy Enhancing Technologies last year,

00:34:27.799 --> 00:34:31.149
and in this study, the researchers

00:34:31.149 --> 00:34:33.179
generate some browsing profiles

00:34:33.179 --> 00:34:35.909
that differ only by one characteristic,

00:34:35.909 --> 00:34:37.690
so they're basically identical in every way

00:34:37.690 --> 00:34:39.049
except for one thing.

00:34:39.049 --> 00:34:42.359
And that is denoted by Treatment 1 and 2.

00:34:42.359 --> 00:34:44.460
So this is a randomized, controlled trial,

00:34:44.460 --> 00:34:46.389
but I left out the randomization part

00:34:46.389 --> 00:34:48.220
for simplicity.

00:34:48.220 --> 00:34:54.799
So, in one study,
they applied a treatment of gender.

00:34:54.799 --> 00:34:56.799
So, they had the browsing profiles

00:34:56.799 --> 00:34:59.319
in Treatment 1 be male browsing profiles,

00:34:59.319 --> 00:35:02.029
and the browsing profiles in Treatment 2
be female.

00:35:02.029 --> 00:35:04.430
And they wanted to see: is there any difference

00:35:04.430 --> 00:35:06.079
in the way that ads are targeted

00:35:06.079 --> 00:35:08.710
if browsing profiles are effectively identical

00:35:08.710 --> 00:35:11.019
except for gender?

00:35:11.019 --> 00:35:14.710
So, it turns out that there <i>was</i>.

00:35:14.710 --> 00:35:19.180
So, a 3rd-party site was showing Google ads

00:35:19.180 --> 00:35:21.289
for senior executive positions

00:35:21.289 --> 00:35:23.980
at a rate 6 times higher to the fake men

00:35:23.980 --> 00:35:27.059
than for the fake women in this study.

00:35:27.059 --> 00:35:30.109
So, this sort of auditing is not going to

00:35:30.109 --> 00:35:32.779
be able to determine everything

00:35:32.779 --> 00:35:34.930
that algorithms are doing, but they can

00:35:34.930 --> 00:35:36.519
sometimes uncover interesting,

00:35:36.519 --> 00:35:40.900
at least statistical differences.

00:35:40.900 --> 00:35:47.099
So, this leads us to the fundamental issue:

00:35:47.099 --> 00:35:49.180
Right now, we're really not in control

00:35:49.180 --> 00:35:50.510
of some of these systems,

00:35:50.510 --> 00:35:54.480
and we really need these predictive systems

00:35:54.480 --> 00:35:56.119
to be controlled by us,

00:35:56.119 --> 00:35:57.819
in order for them not to be used

00:35:57.819 --> 00:36:00.109
as a system of control.

00:36:00.109 --> 00:36:03.220
So there are some technologies that I'd like

00:36:03.220 --> 00:36:06.890
to point you all to.

00:36:06.890 --> 00:36:08.319
We need tools in the digital commons

00:36:08.319 --> 00:36:11.160
that can help address some of these concerns.

00:36:11.160 --> 00:36:13.349
So, the first thing is that of course

00:36:13.349 --> 00:36:14.730
we known that minimizing the amount of

00:36:14.730 --> 00:36:17.069
data available can help in some contexts,

00:36:17.069 --> 00:36:18.980
which we can do by making systems

00:36:18.980 --> 00:36:22.779
that are private by design, and by default.

00:36:22.779 --> 00:36:24.549
Another thing is that these audit tools

00:36:24.549 --> 00:36:25.890
might be useful.

00:36:25.890 --> 00:36:30.720
And, so, these 2 nice examples in academia...

00:36:30.720 --> 00:36:34.359
the ad experiment that I just showed was done

00:36:34.359 --> 00:36:36.120
using AdFisher.

00:36:36.120 --> 00:36:38.200
So, these are 2 toolkits that you can use

00:36:38.200 --> 00:36:41.440
to start doing this sort of auditing.

00:36:41.440 --> 00:36:44.579
Another technology that is generally useful,

00:36:44.579 --> 00:36:46.700
but particularly in the case of prediction

00:36:46.700 --> 00:36:48.789
it's useful to maintain access to

00:36:48.789 --> 00:36:50.289
as many sites as possible,

00:36:50.289 --> 00:36:52.589
through anonymity systems like Tor,

00:36:52.589 --> 00:36:54.319
because it's impossible to personalize

00:36:54.319 --> 00:36:55.650
when everyone looks the same.

00:36:55.650 --> 00:36:59.130
So this is a very important technology.

00:36:59.130 --> 00:37:01.519
Something that doesn't really exist,

00:37:01.519 --> 00:37:03.630
but that I think is pretty important,

00:37:03.630 --> 00:37:05.829
is having some tool to view the landscape.

00:37:05.829 --> 00:37:08.160
So, as we know from these few studies

00:37:08.160 --> 00:37:10.440
that have been done,

00:37:10.440 --> 00:37:12.059
different people are not seeing the internet

00:37:12.059 --> 00:37:12.950
in the same way.

00:37:12.950 --> 00:37:15.730
This is one reason why we don't like censorship.

00:37:15.730 --> 00:37:17.880
But, rich and poor people,

00:37:17.880 --> 00:37:19.659
from academic research we know that

00:37:19.659 --> 00:37:23.790
there is widespread price discrimination
on the internet,

00:37:23.790 --> 00:37:25.650
so rich and poor people see a different view

00:37:25.650 --> 00:37:26.970
of the Internet,

00:37:26.970 --> 00:37:28.400
men and women see a different view

00:37:28.400 --> 00:37:29.940
of the Internet.

00:37:29.940 --> 00:37:31.200
We wanna know how different people

00:37:31.200 --> 00:37:32.450
see the same site,

00:37:32.450 --> 00:37:34.329
and this could be the beginning of

00:37:34.329 --> 00:37:36.329
a defense system for this sort of

00:37:36.329 --> 00:37:41.730
manipulation/tampering that I showed earlier.

00:37:41.730 --> 00:37:45.549
Another interesting approach is obfuscation:

00:37:45.549 --> 00:37:46.980
injecting noise into the system.

00:37:46.980 --> 00:37:49.190
So there's an interesting browser extension

00:37:49.190 --> 00:37:51.720
called Adnauseum, that's for Firefox,

00:37:51.720 --> 00:37:54.579
which clicks on every single ad you're served,

00:37:54.579 --> 00:37:55.680
to inject noise.

00:37:55.680 --> 00:37:57.019
So that's, I think, an interesting approach

00:37:57.019 --> 00:38:00.170
that people haven't looked at too much.

00:38:00.170 --> 00:38:03.780
So in terms of policy,

00:38:03.780 --> 00:38:06.530
Facebook and Google, these internet giants,

00:38:06.530 --> 00:38:08.829
have billions of users,

00:38:08.829 --> 00:38:12.220
and sometimes they like to call themselves

00:38:12.220 --> 00:38:13.769
new public utilities,

00:38:13.769 --> 00:38:15.000
and if that's the case then

00:38:15.000 --> 00:38:17.549
it might be necessary to subject them

00:38:17.549 --> 00:38:20.539
to additional regulation.

00:38:20.539 --> 00:38:21.990
Another problem that's come up,

00:38:21.990 --> 00:38:23.539
for example with some of the studies

00:38:23.539 --> 00:38:24.900
that Facebook has done,

00:38:24.900 --> 00:38:29.039
is sometimes a lack of ethics review.

00:38:29.039 --> 00:38:31.059
So, for example, in academia,

00:38:31.059 --> 00:38:33.859
if you're gonna do research involving humans,

00:38:33.859 --> 00:38:35.390
there's an Institutional Review Board

00:38:35.390 --> 00:38:36.970
that you go to that verifies that

00:38:36.970 --> 00:38:39.140
you're doing things in an ethical manner.

00:38:39.140 --> 00:38:40.910
And some companies do have internal

00:38:40.910 --> 00:38:43.029
review processes like this, but it might

00:38:43.029 --> 00:38:45.119
be important to have an independent

00:38:45.119 --> 00:38:48.200
ethics board that does this sort of thing.

00:38:48.200 --> 00:38:50.849
And we <i>really</i> need 3rd-party auditing.

00:38:50.849 --> 00:38:54.519
So, for example, some companies

00:38:54.519 --> 00:38:56.220
don't want auditing to be done

00:38:56.220 --> 00:38:59.190
because of IP concerns,

00:38:59.190 --> 00:39:00.579
and if that's the concern

00:39:00.579 --> 00:39:03.180
maybe having a set of people

00:39:03.180 --> 00:39:05.680
that are not paid by the company

00:39:05.680 --> 00:39:07.200
to check how some of these systems

00:39:07.200 --> 00:39:08.640
are being implemented,

00:39:08.640 --> 00:39:11.240
could help give us confidence that

00:39:11.240 --> 00:39:16.979
things are being done in a reasonable way.

00:39:16.979 --> 00:39:20.269
So, in closing,

00:39:20.269 --> 00:39:23.180
algorithmic decision making is here,

00:39:23.180 --> 00:39:26.140
and it's barreling forward
at a very fast rate,

00:39:26.140 --> 00:39:27.890
and we need to figure out what

00:39:27.890 --> 00:39:30.410
the guide rails should be,

00:39:30.410 --> 00:39:31.380
and how to install them

00:39:31.380 --> 00:39:33.119
to handle some of the potential threats.

00:39:33.119 --> 00:39:35.470
There's a huge amount of power here.

00:39:35.470 --> 00:39:37.910
We need more openness in these systems.

00:39:37.910 --> 00:39:39.589
And, right now,

00:39:39.589 --> 00:39:41.559
with the intelligent systems that do exist,

00:39:41.559 --> 00:39:43.920
we don't know what's occurring really,

00:39:43.920 --> 00:39:46.510
and we need to watch carefully

00:39:46.510 --> 00:39:49.099
where and how these systems are being used.

00:39:49.099 --> 00:39:50.690
And I think this community has

00:39:50.690 --> 00:39:53.940
an important role to play in this fight,

00:39:53.940 --> 00:39:55.730
to study what's being done,

00:39:55.730 --> 00:39:57.160
to show people what's being done,

00:39:57.160 --> 00:39:58.670
to raise the debate and advocate,

00:39:58.670 --> 00:40:01.200
and, where necessary, to resist.

00:40:01.200 --> 00:40:03.339
Thanks.

00:40:03.339 --> 00:40:13.129
<i>applause</i>

00:40:13.129 --> 00:40:17.519
Herald: So, let's have a question and answer.

00:40:17.519 --> 00:40:19.080
Microphone 2, please.

00:40:19.080 --> 00:40:20.199
Mic 2: Hi there.

00:40:20.199 --> 00:40:23.259
Thanks for the talk.

00:40:23.259 --> 00:40:26.230
Since these pre-crime softwares also

00:40:26.230 --> 00:40:27.359
arrived here in Germany

00:40:27.359 --> 00:40:29.680
with the start of the so-called CopWatch system

00:40:29.680 --> 00:40:32.779
in southern Germany,
and Bavaria and Nuremberg especially,

00:40:32.779 --> 00:40:35.420
where they try to predict burglary crime

00:40:35.420 --> 00:40:37.460
using that criminal record

00:40:37.460 --> 00:40:40.170
geographical analysis, like you explained,

00:40:40.170 --> 00:40:43.380
leads me to a 2-fold question:

00:40:43.380 --> 00:40:47.900
first, have you heard of any research

00:40:47.900 --> 00:40:49.760
that measures the effectiveness

00:40:49.760 --> 00:40:53.690
of such measures, at all?

00:40:53.690 --> 00:40:57.040
And, second:

00:40:57.040 --> 00:41:00.599
What do you think of the game theory

00:41:00.599 --> 00:41:02.690
if the thieves or the bad guys

00:41:02.690 --> 00:41:07.619
know the system, and when they
game the system,

00:41:07.619 --> 00:41:09.980
they will probably win,

00:41:09.980 --> 00:41:11.640
since one police officer in an interview said

00:41:11.640 --> 00:41:14.019
this system is used to reduce

00:41:14.019 --> 00:41:16.460
the personal costs of policing,

00:41:16.460 --> 00:41:19.460
so they just send the guys
where the red flags are,

00:41:19.460 --> 00:41:22.290
and the others take the day off.

00:41:22.290 --> 00:41:24.360
Dr. Helsby: Yup.

00:41:24.360 --> 00:41:27.150
Um, so, with respect to

00:41:27.150 --> 00:41:30.990
testing the effectiveness of predictive policing,

00:41:30.990 --> 00:41:31.990
the companies,

00:41:31.990 --> 00:41:33.910
some of them do randomized, controlled trials

00:41:33.910 --> 00:41:35.240
and claim a reduction in policing.

00:41:35.240 --> 00:41:38.349
The best independent study that I've seen

00:41:38.349 --> 00:41:40.680
is by this RAND Corporation

00:41:40.680 --> 00:41:43.120
that did a study in, I think,

00:41:43.120 --> 00:41:44.920
Shreveport, Louisiana,

00:41:44.920 --> 00:41:47.589
and in their report they claim

00:41:47.589 --> 00:41:50.190
that there was no statistically significant

00:41:50.190 --> 00:41:52.900
difference, they didn't find any reduction.

00:41:52.900 --> 00:41:54.099
And it <i>was</i> specifically looking at

00:41:54.099 --> 00:41:56.730
property crime, which I think you mentioned.

00:41:56.730 --> 00:41:59.480
So, I think right now there's sort of

00:41:59.480 --> 00:42:01.069
conflicting reports between

00:42:01.069 --> 00:42:06.180
the independent auditors
and these company claims.

00:42:06.180 --> 00:42:09.289
So there definitely needs to be more study.

00:42:09.289 --> 00:42:12.240
And then, the 2nd thing...sorry,
remind me what it was?

00:42:12.240 --> 00:42:15.189
Mic 2: What about the guys gaming the system?

00:42:15.189 --> 00:42:16.949
Dr. Helsby: Oh, yeah.

00:42:16.949 --> 00:42:18.900
I think it's a legitimate concern.

00:42:18.900 --> 00:42:22.480
Like, if all the outputs
were just immediately public,

00:42:22.480 --> 00:42:24.599
then, yes, everyone knows the location

00:42:24.599 --> 00:42:26.549
of all police officers,

00:42:26.549 --> 00:42:29.009
and I imagine that people would have

00:42:29.009 --> 00:42:30.779
a problem with that.

00:42:30.779 --> 00:42:32.679
Yup.

00:42:32.679 --> 00:42:35.990
Heraldl: Microphone #4, please.

00:42:35.990 --> 00:42:39.369
Mic 4: Yeah, this is not actually a question,

00:42:39.369 --> 00:42:40.779
but just a comment.

00:42:40.779 --> 00:42:42.970
I've enjoyed your talk very much,

00:42:42.970 --> 00:42:47.789
in particular after watching

00:42:47.789 --> 00:42:52.270
the talk in Hall 1 earlier in the afternoon.

00:42:52.270 --> 00:42:55.730
The "Say Hi to Your New Boss", about

00:42:55.730 --> 00:42:59.609
algorithms that are trained with big data,

00:42:59.609 --> 00:43:02.390
and finally make decisions.

00:43:02.390 --> 00:43:08.210
And I think these 2 talks are kind of complementary,

00:43:08.210 --> 00:43:11.309
and if people are interested in the topic

00:43:11.309 --> 00:43:14.710
they might want to check out the other talk

00:43:14.710 --> 00:43:16.259
and watch it later, because these

00:43:16.259 --> 00:43:17.319
fit very well together.

00:43:17.319 --> 00:43:19.589
Dr. Helsby: Yeah, it was a great talk.

00:43:19.589 --> 00:43:22.130
Herald: Microphone #2, please.

00:43:22.130 --> 00:43:25.049
Mic 2: Um, yeah, you mentioned

00:43:25.049 --> 00:43:27.319
the need to have some kind of 3rd-party auditing

00:43:27.319 --> 00:43:30.900
or some kind of way to

00:43:30.900 --> 00:43:31.930
peek into these algorithms

00:43:31.930 --> 00:43:33.079
and to see what they're doing,

00:43:33.079 --> 00:43:34.420
and to see if they're being fair.

00:43:34.420 --> 00:43:36.199
Can you talk a little bit more about that?

00:43:36.199 --> 00:43:38.059
Like, going forward,

00:43:38.059 --> 00:43:40.690
some kind of regulatory structures

00:43:40.690 --> 00:43:44.200
would probably have to emerge

00:43:44.200 --> 00:43:47.200
to analyze and to look at

00:43:47.200 --> 00:43:49.339
these black boxes that are just sort of

00:43:49.339 --> 00:43:51.309
popping up everywhere and, you know,

00:43:51.309 --> 00:43:52.939
controlling more and more of the things

00:43:52.939 --> 00:43:56.150
in our lives, and important decisions.

00:43:56.150 --> 00:43:58.539
So, just, what kind of discussions

00:43:58.539 --> 00:43:59.460
are there for that?

00:43:59.460 --> 00:44:01.809
And what kind of possibility
is there for that?

00:44:01.809 --> 00:44:04.900
And, I'm sure that companies would be

00:44:04.900 --> 00:44:08.000
very, very resistant to

00:44:08.000 --> 00:44:09.890
any kind of attempt to look into

00:44:09.890 --> 00:44:13.890
algorithms, and to...

00:44:13.890 --> 00:44:15.070
Dr. Helsby: Yeah, I mean, definitely

00:44:15.070 --> 00:44:18.069
companies would be very resistant to

00:44:18.069 --> 00:44:19.670
having people look into their algorithms.

00:44:19.670 --> 00:44:22.190
So, if you wanna do a very rigorous

00:44:22.190 --> 00:44:23.339
audit of what's going on

00:44:23.339 --> 00:44:25.660
then it's probably necessary to have

00:44:25.660 --> 00:44:26.589
a few people come in

00:44:26.589 --> 00:44:28.900
and sign NDAs, and then

00:44:28.900 --> 00:44:31.039
look through the systems.

00:44:31.039 --> 00:44:33.140
So, that's one way to proceed.

00:44:33.140 --> 00:44:35.049
But, another way to proceed that--

00:44:35.049 --> 00:44:38.720
so, these academic researchers have done

00:44:38.720 --> 00:44:40.009
a few experiments

00:44:40.009 --> 00:44:42.809
and found some interesting things,

00:44:42.809 --> 00:44:45.500
and that's sort all the attempts at auditing

00:44:45.500 --> 00:44:46.450
that we've seen:

00:44:46.450 --> 00:44:48.490
there was 1 attempt in 2012
for the Obama campaign,

00:44:48.490 --> 00:44:49.910
but there's really not been any

00:44:49.910 --> 00:44:51.500
sort of systematic attempt--

00:44:51.500 --> 00:44:52.589
you know, like, in censorship

00:44:52.589 --> 00:44:54.539
we see a systematic attempt to

00:44:54.539 --> 00:44:56.779
do measurement as often as possible,

00:44:56.779 --> 00:44:58.240
check what's going on,

00:44:58.240 --> 00:44:59.339
and that itself, you know,

00:44:59.339 --> 00:45:00.900
can act as an oversight mechanism.

00:45:00.900 --> 00:45:01.880
But, right now,

00:45:01.880 --> 00:45:03.900
I think many of these companies

00:45:03.900 --> 00:45:05.259
realize no one is watching,

00:45:05.259 --> 00:45:07.160
so there's no real push to have

00:45:07.160 --> 00:45:10.440
people verify: are you being fair when you

00:45:10.440 --> 00:45:11.539
implement this system?

00:45:11.539 --> 00:45:12.969
Because no one's really checking.

00:45:12.969 --> 00:45:13.980
Mic 2: Do you think that,

00:45:13.980 --> 00:45:15.339
at some point, it would be like

00:45:15.339 --> 00:45:19.059
an FDA or SEC, to give some American examples...

00:45:19.059 --> 00:45:21.490
an actual government regulatory agency

00:45:21.490 --> 00:45:24.960
that has the power and ability to

00:45:24.960 --> 00:45:27.930
not just sort of look and try to

00:45:27.930 --> 00:45:31.710
reverse engineer some of these algorithms,

00:45:31.710 --> 00:45:33.920
but actually peek in there and make sure

00:45:33.920 --> 00:45:36.420
that things are fair, because it seems like

00:45:36.420 --> 00:45:38.240
there's just-- it's so important now

00:45:38.240 --> 00:45:41.769
that, again, it could be the difference between

00:45:41.769 --> 00:45:42.930
life and death, between

00:45:42.930 --> 00:45:44.589
getting a job, not getting a job,

00:45:44.589 --> 00:45:46.130
being pulled over,
not being pulled over,

00:45:46.130 --> 00:45:48.069
being racially profiled,
not racially profiled,

00:45:48.069 --> 00:45:49.410
things like that.
Dr. Helsby: Right.

00:45:49.410 --> 00:45:50.430
Mic 2: Is it moving in that direction?

00:45:50.430 --> 00:45:52.249
Or is it way too early for it?

00:45:52.249 --> 00:45:55.110
Dr. Helsby: I mean, so some people have...

00:45:55.110 --> 00:45:56.859
someone has called for, like,

00:45:56.859 --> 00:45:59.079
a Federal Search Commission,

00:45:59.079 --> 00:46:00.930
or like a Federal Algorithms Commission,

00:46:00.930 --> 00:46:03.200
that would do this sort of oversight work,

00:46:03.200 --> 00:46:06.130
but it's in such early stages right now

00:46:06.130 --> 00:46:09.970
that there's no real push for that.

00:46:09.970 --> 00:46:13.330
But I think it's a good idea.

00:46:13.330 --> 00:46:15.729
Herald: And again, #2 please.

00:46:15.729 --> 00:46:17.059
Mic 2: Thank you again for your talk.

00:46:17.059 --> 00:46:19.309
I was just curious if you can point

00:46:19.309 --> 00:46:20.440
to any examples of

00:46:20.440 --> 00:46:22.619
either current producers or consumers

00:46:22.619 --> 00:46:24.029
of these algorithmic systems

00:46:24.029 --> 00:46:26.390
who are actively and publicly trying

00:46:26.390 --> 00:46:27.720
to do so in a responsible manner

00:46:27.720 --> 00:46:29.720
by describing what they're trying to do

00:46:29.720 --> 00:46:31.380
and how they're going about it?

00:46:31.380 --> 00:46:37.210
Dr. Helsby: So, yeah, there are some companies,

00:46:37.210 --> 00:46:39.000
for example, like DataKind,

00:46:39.000 --> 00:46:42.710
that try to deploy algorithmic systems

00:46:42.710 --> 00:46:44.640
in as responsible a way as possible,

00:46:44.640 --> 00:46:47.250
for like public policy.

00:46:47.250 --> 00:46:49.549
Like, I actually also implement systems

00:46:49.549 --> 00:46:51.750
for public policy in a transparent way.

00:46:51.750 --> 00:46:54.329
Like, all the code is in GitHub, etc.

00:46:54.329 --> 00:47:00.020
And it is also the case to give credit to

00:47:00.020 --> 00:47:01.990
Google, and these giants,

00:47:01.990 --> 00:47:06.109
they're trying to implement transparency systems

00:47:06.109 --> 00:47:08.170
that help you understand.

00:47:08.170 --> 00:47:09.289
This has been done with respect to

00:47:09.289 --> 00:47:12.329
how your data is being collected,

00:47:12.329 --> 00:47:14.579
but for example if you go on Amazon.com

00:47:14.579 --> 00:47:17.890
you can see a recommendation has been made,

00:47:17.890 --> 00:47:19.420
and that is pretty transparent.

00:47:19.420 --> 00:47:21.480
You can see "this item
was recommended to me,"

00:47:21.480 --> 00:47:25.039
so you know that prediction
is being used in this case,

00:47:25.039 --> 00:47:27.089
and it will say why prediction is being used:

00:47:27.089 --> 00:47:29.230
because you purchased some item.

00:47:29.230 --> 00:47:30.380
And Google has a similar thing,

00:47:30.380 --> 00:47:32.420
if you go to like Google Ad Settings,

00:47:32.420 --> 00:47:35.249
you can even turn off personalization of ads

00:47:35.249 --> 00:47:36.380
if you want,

00:47:36.380 --> 00:47:38.119
and you can also see some of the inferences

00:47:38.119 --> 00:47:39.400
that have been learned about you.

00:47:39.400 --> 00:47:40.819
A subset of the inferences that have been

00:47:40.819 --> 00:47:41.700
learned about you.

00:47:41.700 --> 00:47:43.940
So, like, what interests...

00:47:43.940 --> 00:47:47.869
Herald: A question from the internet, please?

00:47:47.869 --> 00:47:50.930
Signal Angel: Yes, billetQ is asking

00:47:50.930 --> 00:47:54.479
how do you avoid biases in machine learning?

00:47:54.479 --> 00:47:57.380
I asume analysis system, for example,

00:47:57.380 --> 00:48:00.420
could be biased against women and minorities,

00:48:00.420 --> 00:48:04.960
if used for hiring decisions
based on known data.

00:48:04.960 --> 00:48:06.499
Dr. Helsby: Yeah, so one thing is to

00:48:06.499 --> 00:48:08.529
just explicitly check.

00:48:08.529 --> 00:48:12.199
So, you can check to see how

00:48:12.199 --> 00:48:14.309
positive outcomes are being distributed

00:48:14.309 --> 00:48:16.779
among those protected classes.

00:48:16.779 --> 00:48:19.210
You could also incorporate these sort of

00:48:19.210 --> 00:48:21.440
fairness constraints in the function

00:48:21.440 --> 00:48:24.069
that you optimize when you train the system,

00:48:24.069 --> 00:48:25.950
and so, if you're interested in reading more

00:48:25.950 --> 00:48:28.960
about this, the 2 papers--

00:48:28.960 --> 00:48:31.909
let me go to References--

00:48:31.909 --> 00:48:32.730
there's a good paper called

00:48:32.730 --> 00:48:35.339
Fairness Through Awareness that describes

00:48:35.339 --> 00:48:37.499
how to go about doing this,

00:48:37.499 --> 00:48:39.579
so I recommend this person read that.

00:48:39.579 --> 00:48:40.970
It's good.

00:48:40.970 --> 00:48:43.400
Herald: Microphone 2, please.

00:48:43.400 --> 00:48:45.400
Mic2: Thanks again for your talk.

00:48:45.400 --> 00:48:49.649
Umm, hello?

00:48:49.649 --> 00:48:50.999
Okay.

00:48:50.999 --> 00:48:52.960
Umm, I see of course a problem with

00:48:52.960 --> 00:48:54.619
all the black boxes that you describe

00:48:54.619 --> 00:48:57.069
with regards for the crime systems,

00:48:57.069 --> 00:48:59.569
but when we look at the advertising systems

00:48:59.569 --> 00:49:02.169
in many cases they are very networked.

00:49:02.169 --> 00:49:04.160
There are many different systems collaborating

00:49:04.160 --> 00:49:07.109
and exchanging data via open APIs:

00:49:07.109 --> 00:49:08.720
RESTful APIs, and various

00:49:08.720 --> 00:49:11.720
demand-side platforms
and audience-exchange platforms,

00:49:11.720 --> 00:49:12.539
and everything.

00:49:12.539 --> 00:49:15.420
So, can that help to at least

00:49:15.420 --> 00:49:22.160
increase awareness on where targeting, personalization

00:49:22.160 --> 00:49:23.679
might be happening?

00:49:23.679 --> 00:49:26.190
I mean, I'm looking at systems like

00:49:26.190 --> 00:49:29.539
BuiltWith, that surface what kind of

00:49:29.539 --> 00:49:31.380
JavaScript libraries are used elsewhere.

00:49:31.380 --> 00:49:32.999
So, is that something that could help

00:49:32.999 --> 00:49:35.670
at least to give a better awareness

00:49:35.670 --> 00:49:38.690
and listing all the points where

00:49:38.690 --> 00:49:41.409
you might be targeted...

00:49:41.409 --> 00:49:43.070
Dr. Helsby: So, like, with respect to

00:49:43.070 --> 00:49:46.460
advertising, the fact that
there is behind the scenes

00:49:46.460 --> 00:49:48.450
this like complicated auction process

00:49:48.450 --> 00:49:50.650
that's occurring, just makes things

00:49:50.650 --> 00:49:51.819
a lot more complicated.

00:49:51.819 --> 00:49:54.170
So, for example, I said briefly

00:49:54.170 --> 00:49:57.269
that they found that there's this
statistical difference

00:49:57.269 --> 00:49:59.099
between how men and women are treated,

00:49:59.099 --> 00:50:01.339
but it doesn't necessarily mean that

00:50:01.339 --> 00:50:03.640
"Oh, the algorithm is definitely biased."

00:50:03.640 --> 00:50:06.369
It could be because of this auction process,

00:50:06.369 --> 00:50:10.569
it could be that women are considered

00:50:10.569 --> 00:50:12.630
more valuable when it comes to advertising,

00:50:12.630 --> 00:50:15.099
and so these executive ads are getting

00:50:15.099 --> 00:50:17.160
outbid by some other ads,

00:50:17.160 --> 00:50:18.890
and so there's a lot of potential

00:50:18.890 --> 00:50:20.490
causes for that.

00:50:20.490 --> 00:50:22.829
So, I think it just makes things
a lot more complicated.

00:50:22.829 --> 00:50:25.910
I don't know if it helps
with the bias at all.

00:50:25.910 --> 00:50:27.410
Mic 2: Well, the question was more

00:50:27.410 --> 00:50:30.299
a direction... can it help to surface

00:50:30.299 --> 00:50:32.499
and make people aware of that fact?

00:50:32.499 --> 00:50:34.930
I mean, I can talk to my kids probably,

00:50:34.930 --> 00:50:36.259
and they will probably understand,

00:50:36.259 --> 00:50:38.420
but I can't explain that to my grandma,

00:50:38.420 --> 00:50:43.150
who's also, umm, looking at an iPad.

00:50:43.150 --> 00:50:44.289
Dr. Helsby: So, the fact that

00:50:44.289 --> 00:50:45.690
the systems are...

00:50:45.690 --> 00:50:48.509
I don't know if I understand.

00:50:48.509 --> 00:50:50.529
Mic 2: OK. I think that the main problem

00:50:50.529 --> 00:50:53.710
is that we are behind the industry efforts

00:50:53.710 --> 00:50:57.179
to being targeted at, and many people

00:50:57.179 --> 00:51:00.579
do know, but a lot more people don't know,

00:51:00.579 --> 00:51:03.160
and making them aware of the fact

00:51:03.160 --> 00:51:07.269
that they are a target, in a way,

00:51:07.269 --> 00:51:10.990
is something that can only be shown

00:51:10.990 --> 00:51:14.779
by a 3rd party that disposed that data,

00:51:14.779 --> 00:51:16.339
and make audits in a way--

00:51:16.339 --> 00:51:17.929
maybe in an automated way.

00:51:17.929 --> 00:51:19.170
Dr. Helsby: Right.

00:51:19.170 --> 00:51:21.410
Yeah, I think it certainly
could help with advocacy

00:51:21.410 --> 00:51:23.059
if that's the point, yeah.

00:51:23.059 --> 00:51:26.079
Herald: Another question
from the internet, please.

00:51:26.079 --> 00:51:29.319
Signal Angel: Yes, on IRC they are asking

00:51:29.319 --> 00:51:31.440
if we know that prediction in some cases

00:51:31.440 --> 00:51:34.460
provides an influence that cannot be controlled.

00:51:34.460 --> 00:51:38.480
So, r4v5 would like to know from you

00:51:38.480 --> 00:51:41.519
if there are some cases or areas where

00:51:41.519 --> 00:51:45.060
machine learning simply shouldn't go?

00:51:45.060 --> 00:51:48.349
Dr. Helsby: Umm, so I think...

00:51:48.349 --> 00:51:52.559
I mean, yes, I think that it is the case

00:51:52.559 --> 00:51:54.650
that in some cases machine learning

00:51:54.650 --> 00:51:56.180
might not be appropriate.

00:51:56.180 --> 00:51:58.359
For example, if you use machine learning

00:51:58.359 --> 00:52:00.970
to decide who should be searched.

00:52:00.970 --> 00:52:02.619
I don't think it should be the case that

00:52:02.619 --> 00:52:03.809
machine learning algorithms should

00:52:03.809 --> 00:52:05.440
ever be used to determine

00:52:05.440 --> 00:52:08.430
probable cause, or something like that.

00:52:08.430 --> 00:52:12.339
So, if it's just one piece of evidence

00:52:12.339 --> 00:52:13.299
that you consider,

00:52:13.299 --> 00:52:14.990
and there's human oversight always,

00:52:14.990 --> 00:52:18.519
<i>maybe</i> it's fine, but

00:52:18.519 --> 00:52:20.839
we should be very suspicious and hesitant

00:52:20.839 --> 00:52:22.119
in certain contexts where

00:52:22.119 --> 00:52:24.529
the ramifications are very serious.

00:52:24.529 --> 00:52:27.259
Like the No Fly List, and so on.

00:52:27.259 --> 00:52:29.200
Herald: And #2 again.

00:52:29.200 --> 00:52:30.809
Mic 2: A second question

00:52:30.809 --> 00:52:33.509
that just occurred to me, if you don't mind.

00:52:33.509 --> 00:52:35.339
Umm, until the advent of

00:52:35.339 --> 00:52:36.559
algorithmic systems,

00:52:36.559 --> 00:52:40.470
when there've been cases of serious harm

00:52:40.470 --> 00:52:42.799
that's been resulted in individuals or groups,

00:52:42.799 --> 00:52:44.579
and it's been demonstrated that

00:52:44.579 --> 00:52:46.029
it's occurred because of

00:52:46.029 --> 00:52:49.400
an individual or a system of people

00:52:49.400 --> 00:52:53.019
being systematically biased, then often

00:52:53.019 --> 00:52:55.130
one of the actions that's taken is

00:52:55.130 --> 00:52:56.869
pressure's applied, and then

00:52:56.869 --> 00:52:59.660
people are required to change,

00:52:59.660 --> 00:53:01.049
and hopely be held responsible,

00:53:01.049 --> 00:53:02.910
and then change the way that they do things

00:53:02.910 --> 00:53:06.400
to try to remove bias from that system.

00:53:06.400 --> 00:53:07.839
What's the current thinking about

00:53:07.839 --> 00:53:10.299
how we can go about doing that

00:53:10.299 --> 00:53:12.599
when the systems that are doing that

00:53:12.599 --> 00:53:13.650
are algorithmic?

00:53:13.650 --> 00:53:15.999
Is it just going to be human oversight,

00:53:15.999 --> 00:53:16.910
and humans are gonna have to be

00:53:16.910 --> 00:53:18.379
held responsible for the oversight?

00:53:18.379 --> 00:53:20.890
Dr. Helsby: So, in terms of bias,

00:53:20.890 --> 00:53:22.569
if we're concerned about bias towards

00:53:22.569 --> 00:53:24.019
particular types of people,

00:53:24.019 --> 00:53:25.710
that's something that we can optimize for.

00:53:25.710 --> 00:53:28.839
So, we can train systems that are unbiased

00:53:28.839 --> 00:53:30.019
in this way.

00:53:30.019 --> 00:53:32.109
So that's one way to deal with it.

00:53:32.109 --> 00:53:34.039
But there's always gonna be errors,

00:53:34.039 --> 00:53:35.420
so that's sort of a separate issue

00:53:35.420 --> 00:53:37.509
from the bias, and in the case

00:53:37.509 --> 00:53:39.180
where there are errors,

00:53:39.180 --> 00:53:40.539
there must be oversight.

00:53:40.539 --> 00:53:45.079
So, one way that one could improve

00:53:45.079 --> 00:53:46.410
the way that this is done

00:53:46.410 --> 00:53:48.160
is by making sure that you're

00:53:48.160 --> 00:53:50.799
keeping track of confidence of decisions.

00:53:50.799 --> 00:53:54.039
So, if you have a low confidence prediction,

00:53:54.039 --> 00:53:56.259
then maybe a human
should come in and check things.

00:53:56.259 --> 00:53:58.809
So, that might be one way to proceed.

00:54:02.099 --> 00:54:03.990
Herald: So, there's no more question.

00:54:03.990 --> 00:54:06.199
I close this talk now,

00:54:06.199 --> 00:54:08.239
and thank you very much

00:54:08.239 --> 00:54:09.410
and a big applause to

00:54:09.410 --> 00:54:11.780
Jennifer Helsby!

00:54:11.780 --> 00:54:16.310
<i>roaring applause</i>

00:54:16.310 --> 00:54:28.000
subtitles created by c3subtitles.de
Join, and help us!