Jennifer Helsby: Prediction and Control

Edit subtitles

0:00 - 0:09

Musik
0:09 - 0:20

Herald: Who of you is using Facebook? Twitter?
Diaspora?
0:20 - 0:28

concerned noise And all of that data
you enter there
0:28 - 0:34

gets to server, gets into the hand of somebody
who's using it
0:34 - 0:39

and the next talk
is especially about that,
0:39 - 0:44

because there's also intelligent machines
and intelligent algorithms
0:44 - 0:47

that try to make something
out of that data.
0:47 - 0:51

So the post-doc researcher Jennifer Helsby
0:51 - 0:56

of the University of Chicago,
which works in this
0:56 - 0:59

intersection between policy and
technology,
0:59 - 1:05

will now ask you the question:
To who would we give that power?
1:05 - 1:13

Dr. Helsby: Thanks.
applause
1:13 - 1:17

Okay, so, today I'm gonna do a brief tour
of intelligent systems
1:17 - 1:19

and how they're currently used
1:19 - 1:22

and then we're gonna look at some examples
with respect
1:22 - 1:24

to the properties that we might care about
1:24 - 1:26

these systems having,
and I'll talk a little bit about
1:26 - 1:28

some of the work that's been done in academia
1:28 - 1:29

on these topics.
1:29 - 1:32

And then we'll talk about some
promising paths forward.
1:32 - 1:37

So, I wanna start with this:
Kranzberg's First Law of Technology
1:37 - 1:40

So, it's not good or bad,
but it also isn't neutral.
1:40 - 1:43

Technology shapes our world,
and it can act as
1:43 - 1:46

a liberating force-- or an oppressive and
controlling force.
1:46 - 1:50

So, in this talk, I'm gonna go
towards some of the aspects
1:50 - 1:54

of intelligent systems that might be more
controlling in nature.
1:54 - 1:56

So, as we all know,
1:56 - 2:00

because of the rapidly decreasing cost
of storage and computation,
2:00 - 2:02

along with the rise of new sensor technologies,
2:02 - 2:06

data collection devices
are being pushed into every
2:06 - 2:08

aspect of our lives: in our homes, our cars,
2:08 - 2:10

in our pockets, on our wrists.
2:10 - 2:13

And data collection systems act as intermediaries
2:13 - 2:15

for a huge amount of human communication.
2:15 - 2:18

And much of this data sits in government
2:18 - 2:20

and corporate databases.
2:20 - 2:23

So, in order to make use of this data,
2:23 - 2:27

we need to be able to make some inferences.
2:27 - 2:30

So, one way of approaching this is I can hire
2:30 - 2:32

a lot of humans, and I can have these humans
2:32 - 2:35

manually examine the data, and they can acquire
2:35 - 2:37

expert knowledge of the domain, and then
2:37 - 2:39

perhaps they can make some decisions
2:39 - 2:41

or at least some recommendations
based on it.
2:41 - 2:43

However, there's some problems with this.
2:43 - 2:46

One is that it's slow, and thus expensive.
2:46 - 2:48

It's also biased. We know that humans have
2:48 - 2:51

all sorts of biases, both conscious and unconscious,
2:51 - 2:53

and it would be nice to have a system
that did not have
2:53 - 2:55

these inaccuracies.
2:55 - 2:57

It's also not very transparent: I might
2:57 - 2:59

not really know the factors that led to
2:59 - 3:01

some decisions being made.
3:01 - 3:03

Even humans themselves
often don't really understand
3:03 - 3:05

why they came to a given decision, because
3:05 - 3:08

of their being emotional in nature.
3:08 - 3:12

And, thus, these human decision making systems
3:12 - 3:13

are often difficult to audit.
3:13 - 3:16

So, another way to proceed is maybe instead
3:16 - 3:18

I study the system and the data carefully
3:18 - 3:21

and I write down the best rules
for making a decision
3:21 - 3:23

or, I can have a machine
dynamically figure out
3:23 - 3:25

the best rules, as in machine learning.
3:25 - 3:29

So, maybe this is a better approach.
3:29 - 3:32

It's certainly fast, and thus cheap.
3:32 - 3:34

And maybe I can construct
the system in such a way
3:34 - 3:37

that it doesn't have the biases that are inherent
3:37 - 3:39

in human decision making.
3:39 - 3:42

And, since I've written these rules down,
3:42 - 3:43

or a computer has learned these rules,
3:43 - 3:45

then I can just show them to somebody, right?
3:45 - 3:47

And then they can audit it.
3:47 - 3:49

So, more and more decision making is being
3:49 - 3:51

done in this way.
3:51 - 3:53

And so, in this model, we take data
3:53 - 3:56

we make an inference based on that data
3:56 - 3:58

using these algorithms, and then
3:58 - 3:59

we can take actions.
3:59 - 4:02

And, when we take this more scientific approach
4:02 - 4:04

to making decisions and optimizing for
4:04 - 4:07

a desired outcome,
we can take an experimental approach
4:07 - 4:10

so we can determine
which actions are most effective
4:10 - 4:12

in achieving a desired outcome.
4:12 - 4:14

Maybe there are some types of communication
4:14 - 4:17

styles that are most effective
with certain people.
4:17 - 4:20

I can perhaps deploy some individualized incentives
4:20 - 4:22

to get the outcome that I desire.
4:22 - 4:26

And, maybe even if I carefully design an experiment
4:26 - 4:28

with the environment in which people make
4:28 - 4:31

these decisions, perhaps even very small changes
4:31 - 4:34

can introduce significant changes
in peoples' behavior.
4:34 - 4:37

So, through these mechanisms,
and this experimental approach,
4:37 - 4:40

I can maximize the probability
that humans do
4:40 - 4:42

what I want.
4:42 - 4:45

So, algorithmic decision making is being used
4:45 - 4:47

in industry, and is used
in lots of other areas,
4:47 - 4:50

from astrophysics to medicine, and is now
4:50 - 4:52

moving into new domains, including
4:52 - 4:54

government applications.
4:54 - 4:59

So, we have recommendation engines like
Netflix, Yelp, SoundCloud,
4:59 - 5:01

that direct our attention to what we should
5:01 - 5:04

watch and listen to.
5:04 - 5:08

Since 2009, Google uses
personalized searched results,
5:08 - 5:13

including if you're not logged in
into your Google account.
5:13 - 5:15

And we also have algorithm curation and filtering,
5:15 - 5:18

as in the case of Facebook News Feed,
5:18 - 5:20

Google News, Yahoo News,
5:20 - 5:23

which shows you what news articles, for example,
5:23 - 5:24

you should be looking at.
5:24 - 5:26

And this is important, because a lot of people
5:26 - 5:29

get news from these media.
5:29 - 5:32

We even have algorithmic journalists!
5:32 - 5:35

So, automatic systems generate articles
5:35 - 5:37

about weather, traffic, or sports
5:37 - 5:39

instead of a human.
5:39 - 5:42

And, another application that's more recent
5:42 - 5:44

is the use of predictive systems
5:44 - 5:45

in political campaigns.
5:45 - 5:47

So, political campaigns also now take this
5:47 - 5:50

approach to predict on an individual basis
5:50 - 5:53

which candidate voters
are likely to vote for.
5:53 - 5:56

And then they can target,
on an individual basis,
5:56 - 5:58

those that can be persuaded otherwise.
5:58 - 6:01

And, finally, in the public sector,
6:01 - 6:03

we're starting to use predictive systems
6:03 - 6:06

in areas from policing, to health,
to education and energy.
6:06 - 6:09

So, there are some advantages to this.
6:09 - 6:13

So, one thing is that we can automate
6:13 - 6:16

aspects of our lives
that we consider to be mundane
6:16 - 6:18

using systems that are intelligent
6:18 - 6:20

and adaptive enough.
6:20 - 6:22

We can make use of all the data
6:22 - 6:24

and really get the pieces of information we
6:24 - 6:26

really care about.
6:26 - 6:30

We can spend money in the most effective way,
6:30 - 6:32

and we can do this with this experimental
6:32 - 6:34

approach to optimize actions to produce
6:34 - 6:35

desired outcomes.
6:35 - 6:37

So, we can embed intelligence
6:37 - 6:40

into all of these mundane objects
6:40 - 6:41

and enable them to make decisions for us,
6:41 - 6:43

and so that's what we're doing more and more,
6:43 - 6:45

and we can have an object
that decides for us
6:45 - 6:47

what temperature we should set our house,
6:47 - 6:49

what we should be doing, etc.
6:49 - 6:52

So, there might be some implications here.
6:52 - 6:56

We want these systems
that do work on this data
6:56 - 6:58

to increase the opportunities
available to us.
6:58 - 7:00

But it might be that there are some implications
7:00 - 7:02

that we have not carefully thought through.
7:02 - 7:03

This is a new area, and people are only
7:03 - 7:06

starting to scratch the surface of what the
7:06 - 7:07

problems might be.
7:07 - 7:10

In some cases, they might narrow the options
7:10 - 7:11

available to people,
7:11 - 7:13

and this approach subjects people to
7:13 - 7:16

suggestive messaging intended to nudge them
7:16 - 7:17

to a desired outcome.
7:17 - 7:19

Some people may have a problem with that.
7:19 - 7:21

Values we care about are not gonna be
7:21 - 7:24

baked into these systems by default.
7:24 - 7:26

It's also the case that some algorithmic systems
7:26 - 7:28

facilitate work that we do not like.
7:28 - 7:30

For example, in the case of mass surveillance.
7:30 - 7:32

And even the same systems,
7:32 - 7:34

used by different people or organizations,
7:34 - 7:36

have very different consequences.
7:36 - 7:37

For example, if I can predict
7:37 - 7:40

with high accuracy, based on say search queries,
7:40 - 7:42

who's gonna be admitted to a hospital,
7:42 - 7:44

some people would be interested
in knowing that.
7:44 - 7:46

You might be interested
in having your doctor know that.
7:46 - 7:48

But that same predictive model
in the hands of
7:48 - 7:51

an insurance company
has a very different implication.
7:51 - 7:53

So, the point here is that these systems
7:53 - 7:56

structure and influence how humans interact
7:56 - 7:58

with each other, how they interact with society,
7:58 - 8:00

and how they interact with government.
8:00 - 8:03

And if they constrain what people can do,
8:03 - 8:05

we should really care about this.
8:05 - 8:08

So now I'm gonna go to
sort of an extreme case,
8:08 - 8:12

just as an example, and that's this
Chinese Social Credit System.
8:12 - 8:14

And so this is probably one of the more
8:14 - 8:17

ambitious uses of data,
8:17 - 8:19

that is used to rank each citizen
8:19 - 8:21

based on their behavior, in China.
8:21 - 8:24

So right now, there are various pilot systems
8:24 - 8:28

deployed by various companies doing this in
China.
8:28 - 8:31

They're currently voluntary, and by 2020
8:31 - 8:33

this system is gonna be decided on,
8:33 - 8:35

or a combination of the systems,
8:35 - 8:37

that is gonna be mandatory for everyone.
8:37 - 8:41

And so, in this system, there are some citizens,
8:41 - 8:44

and a huge range of data sources are used.
8:44 - 8:47

So, some of the data sources are
8:47 - 8:48

your financial data,
8:48 - 8:50

your criminal history,
8:50 - 8:52

how many points you have
on your driver's license,
8:52 - 8:55

medical information-- for example,
if you take birth control pills,
8:55 - 8:57

that's incorporated.
8:57 - 9:00

Your purchase history-- for example,
if you purchase games,
9:00 - 9:02

you are down-ranked in the system.
9:02 - 9:04

Some of the systems, not all of them,
9:04 - 9:07

incorporate social media monitoring,
9:07 - 9:09

which makes sense if you're a state like China,
9:09 - 9:11

you probably want to know about
9:11 - 9:15

political statements that people
are saying on social media.
9:15 - 9:18

And, one of the more interesting parts is
9:18 - 9:22

social network analysis:
looking at the relationships between people.
9:22 - 9:24

So, if you have a close relationship with
somebody
9:24 - 9:26

and they have a low credit score,
9:26 - 9:29

that can have implications on your credit
score.
9:29 - 9:34

So, the way that these scores
are generated is secret.
9:34 - 9:38

And, according to the call for these systems
9:38 - 9:39

put out by the government,
9:39 - 9:43

the goal is to
"carry forward the sincerity and
9:43 - 9:46

traditional virtues" and
establish the idea of a
9:46 - 9:48

"sincerity culture."
9:48 - 9:49

But wait, it gets better:
9:49 - 9:52

so, there's a portal that enables citizens
9:52 - 9:55

to look up the citizen score of anyone.
9:55 - 9:57

And many people like this system,
9:57 - 9:58

they think it's a fun game.
9:58 - 10:01

They boast about it on social media,
10:01 - 10:04

they put their score in their dating profile,
10:04 - 10:05

because if you're ranked highly you're
10:05 - 10:07

part of an exclusive club.
10:07 - 10:10

You can get VIP treatment
at hotels and other companies.
10:10 - 10:12

But the downside is that, if you're excluded
10:12 - 10:16

from that club, your weak score
may have other implications,
10:16 - 10:20

like being unable to get access
to credit, housing, jobs.
10:20 - 10:23

There is some reporting that even travel visas
10:23 - 10:27

might be restricted
if your score is particularly low.
10:27 - 10:31

So, a system like this, for a state, is really
10:31 - 10:35

the optimal solution
to the problem of the public.
10:35 - 10:37

It constitutes a very subtle and insiduous
10:37 - 10:39

mechanism of social control.
10:39 - 10:41

You don't need to spend a lot of money on
10:41 - 10:44

police or prisons if you can set up a system
10:44 - 10:46

where people discourage one another from
10:46 - 10:49

anti-social acts like political action
in exchange for
10:49 - 10:51

a coupon for a free Uber ride.
10:51 - 10:55

So, there are a lot of
legitimate questions here:
10:55 - 10:58

What protections does
user data have in this scheme?
10:58 - 11:01

Do any safeguards exist to prevent tampering?
11:01 - 11:04

What mechanism, if any, is there to prevent
11:04 - 11:09

false input data from creating erroneous inferences?
11:09 - 11:10

Is there any way that people can fix
11:10 - 11:13

their score once they're ranked poorly?
11:13 - 11:14

Or does it end up becoming a
11:14 - 11:16

self-fulfilling prophecy?
11:16 - 11:18

Your weak score means you have less access
11:18 - 11:22

to jobs and credit, and now you will have
11:22 - 11:25

limited access to opportunity.
11:25 - 11:27

So, let's take a step back.
11:27 - 11:28

So, what do we want?
11:28 - 11:32

So, we probably don't want that,
11:32 - 11:34

but as advocates we really wanna
11:34 - 11:36

understand what questions we should be asking
11:36 - 11:38

of these systems. Right now there's
11:38 - 11:40

very little oversight,
11:40 - 11:41

and we wanna make sure that we don't
11:41 - 11:44

sort of sleepwalk our way to a situation
11:44 - 11:47

where we've lost even more power
11:47 - 11:50

to these centralized systems of control.
11:50 - 11:52

And if you're an implementer, we wanna understand
11:52 - 11:54

what can we be doing better.
11:54 - 11:56

Are there better ways that we can be implementing
11:56 - 11:58

these systems?
11:58 - 11:59

Are there values that, as humans,
11:59 - 12:01

we care about that we should make sure
12:01 - 12:02

these systems have?
12:02 - 12:06

So, the first thing
that most people in the room
12:06 - 12:08

might think about is privacy.
12:08 - 12:11

Which is, of course, of the utmost importance.
12:11 - 12:13

We need privacy, and there is a good discussion
12:13 - 12:16

on the importance of protecting
user data where possible.
12:16 - 12:18

So, in this talk, I'm gonna focus
on the other aspects of
12:18 - 12:19

algorithmic decision making,
12:19 - 12:21

that I think have got less attention.
12:21 - 12:25

Because it's not just privacy
that we need to worry about here.
12:25 - 12:29

We also want systems that are fair and equitable.
12:29 - 12:30

We want transparent systems,
12:30 - 12:35

we don't want opaque decisions
to be made about us,
12:35 - 12:37

decisions that might have serious impacts
12:37 - 12:38

on our lives.
12:38 - 12:40

And we need some accountability mechanisms.
12:40 - 12:42

So, for the rest of this talk
12:42 - 12:43

we're gonna go through each one of these things
12:43 - 12:45

and look at some examples.
12:45 - 12:48

So, the first thing is fairness.
12:48 - 12:50

And so, as I said in the beginning,
this is one area
12:50 - 12:53

where there might be an advantage
12:53 - 12:55

to making decisions by machine,
12:55 - 12:57

especially in areas where there have
12:57 - 12:59

historically been fairness issues with
12:59 - 13:02

decision making, such as law enforcement.
13:02 - 13:06

So, this is one way that police departments
13:06 - 13:08

use predictive models.
13:08 - 13:11

The idea here is police would like to
13:11 - 13:13

allocate resources in a more effective way,
13:13 - 13:15

and they would also like to enable
13:15 - 13:17

proactive policing.
13:17 - 13:20

So, if you can predict where crimes
are going to occur,
13:20 - 13:22

or who is going to commit crimes,
13:22 - 13:25

then you can put cops in those places,
13:25 - 13:28

or perhaps following these people,
13:28 - 13:29

and then the crimes will not occur.
13:29 - 13:31

So, it's sort of the pre-crime approach.
13:31 - 13:35

So, there are a few ways of going about this.
13:35 - 13:38

One way is doing this individual-level prediction.
13:38 - 13:41

So you take each citizen
and estimate the risk
13:41 - 13:44

that each citizen will participate,
say, in violence
13:44 - 13:45

based on some data.
13:45 - 13:47

And then you can flag those people that are
13:47 - 13:49

considered particularly violent.
13:49 - 13:52

So, this is currently done.
13:52 - 13:53

This is done in the U.S.
13:53 - 13:56

It's done in Chicago,
by the Chicago Police Department.
13:56 - 13:58

And they maintain a heat list of individuals
13:58 - 14:01

that are considered most likely to commit,
14:01 - 14:04

or be the victim of, violence.
14:04 - 14:07

And this is done using data
that the police maintain.
14:07 - 14:10

So, the features that are used
in this predictive model
14:10 - 14:12

include things that are derived from
14:12 - 14:15

individuals' criminal history.
14:15 - 14:17

So, for example, have they been involved in
14:17 - 14:18

gun violence in the past?
14:18 - 14:21

Do they have narcotics arrests? And so on.
14:21 - 14:23

But another thing that's incorporated
14:23 - 14:25

in the Chicago Police Department model is
14:25 - 14:28

information derived from
social media network analysis.
14:28 - 14:31

So, who you interact with,
14:31 - 14:32

as noted in police data.
14:32 - 14:35

So, for example, your co-arrestees.
14:35 - 14:36

When officers conduct field interviews,
14:36 - 14:38

who are people interacting with?
14:38 - 14:43

And then this is all incorporated
into this risk score.
14:43 - 14:45

So another way to proceed,
14:45 - 14:47

which is the method that most companies
14:47 - 14:50

that sell products like this
to the police have taken,
14:50 - 14:51

is instead predicting which areas
14:51 - 14:54

are likely to have crimes committed in them.
14:54 - 14:57

So, take my city, I put a grid down,
14:57 - 14:58

and then I use crime statistics
14:58 - 15:00

and maybe some ancillary data sources,
15:00 - 15:02

to determine which areas have
15:02 - 15:05

the highest risk of crimes occurring in them,
15:05 - 15:06

and I can flag those areas and send
15:06 - 15:08

police officers to them.
15:08 - 15:11

So now, let's look at some of the tools
15:11 - 15:14

that are used for this geographic-level prediction.
15:14 - 15:19

So, here are 3 companies that sell these
15:19 - 15:23

geographic-level predictive policing systems.
15:23 - 15:26

So, PredPol has a system that uses
15:26 - 15:27

primarily crime statistics:
15:27 - 15:30

only the time, place, and type of crime
15:30 - 15:33

to predict where crimes will occur.
15:33 - 15:36

HunchLab uses a wider range of data sources
15:36 - 15:37

including, for example, weather
15:37 - 15:40

and then Hitachi is a newer system
15:40 - 15:42

that has a predictive crime analytics tool
15:42 - 15:45

that also incorporates social media.
15:45 - 15:48

The first one, to my knowledge, to do so.
15:48 - 15:49

And these systems are in use
15:49 - 15:53

in 50+ cities in the U.S.
15:53 - 15:57

So, why do police departments buy this?
15:57 - 15:58

Some police departments are interesting in
15:58 - 16:00

buying systems like this, because they're marketed
16:00 - 16:03

as impartial systems,
16:03 - 16:06

so it's a way to police in an unbiased way.
16:06 - 16:08

And so, these companies make
16:08 - 16:09

statements like this--
16:09 - 16:11

by the way, the references
will all be at the end,
16:11 - 16:13

and they'll be on the slides--
16:13 - 16:13

So, for example
16:13 - 16:16

the predictive crime analytics from Hitachi
16:16 - 16:18

claims that the system is anonymous,
16:18 - 16:19

because it shows you an area,
16:19 - 16:23

it doesn't show you
to look for a particular person.
16:23 - 16:26

and PredPol reassures people that
16:26 - 16:30

it eliminates any liberties or profiling concerns.
16:30 - 16:32

And HunchLab notes that the system
16:32 - 16:35

fairly represents priorities for public safety
16:35 - 16:39

and is unbiased by race
or ethnicity, for example.
16:39 - 16:44

So, let's take a minute
to describe in more detail
16:44 - 16:48

what we mean when we talk about fairness.
16:48 - 16:51

So, when we talk about fairness,
16:51 - 16:53

we mean a few things.
16:53 - 16:56

So, one is fairness with respect to individuals:
16:56 - 16:58

so if I'm very similar to somebody
16:58 - 17:00

and we go through some process
17:00 - 17:03

and there is two very different
outcomes to that process
17:03 - 17:06

we would consider that to be unfair.
17:06 - 17:08

So, we want similar people to be treated
17:08 - 17:10

in a similar way.
17:10 - 17:13

But, there are certain protected attributes
17:13 - 17:15

that we wouldn't want someone
17:15 - 17:17

to discriminate based on.
17:17 - 17:20

And so, there's this other property,
Group Fairness.
17:20 - 17:22

So, we can look at the statistical parity
17:22 - 17:25

between groups, based on gender, race, etc.
17:25 - 17:28

and see if they're treated in a similar way.
17:28 - 17:30

And we might not expect that in some cases,
17:30 - 17:32

for example if the base rates in each group
17:32 - 17:35

are very different.
17:35 - 17:37

And then there's also Fairness in Errors.
17:37 - 17:40

All predictive systems are gonna make errors,
17:40 - 17:43

and if the errors are concentrated,
17:43 - 17:46

then that may also represent unfairness.
17:46 - 17:50

And so this concern arose recently with Facebook
17:50 - 17:52

because people with Native American names
17:52 - 17:54

had their profiles flagged as fraudulent
17:54 - 17:59

far more often than those
with White American names.
17:59 - 18:01

So these are the sorts of things
that we worry about
18:01 - 18:02

and each of these are metrics,
18:02 - 18:04

and if you're interested more you should
18:04 - 18:06

check those 2 papers out.
18:06 - 18:11

So, how can potential issues
with predictive policing
18:11 - 18:14

have implications for these principles?
18:14 - 18:19

So, one problem is
the training data that's used.
18:19 - 18:21

Some of these systems only use crime statistics,
18:21 - 18:24

other systems-- all of them use crime statistics
18:24 - 18:26

in some way.
18:26 - 18:31

So, one problem is that crime databases
18:31 - 18:35

contain only crimes that've been detected.
18:35 - 18:39

Right? So, the police are only gonna detect
18:39 - 18:41

crimes that they know are happening,
18:41 - 18:44

either through patrol and their own investigation
18:44 - 18:46

or because they've been alerted to crime,
18:46 - 18:49

for example by a citizen calling the police.
18:49 - 18:52

So, a citizen has to feel like
they can call the police,
18:52 - 18:54

like that's a good idea.
18:54 - 18:59

So, some crimes suffer
from this problem less than others:
18:59 - 19:02

for example, gun violence
is much easier to detect
19:02 - 19:04

relative to fraud, for example,
19:04 - 19:08

which is very difficult to detect.
19:08 - 19:12

Now the racial profiling aspect
of this might come in
19:12 - 19:16

because of biased policing in the past.
19:16 - 19:20

So, for example, for marijuana arrests,
19:20 - 19:23

black people are arrested in the U.S. at rates
19:23 - 19:25

4 times that of white people,
19:25 - 19:28

even though there is statistical parity
19:28 - 19:31

with these 2 groups, to within a few percent.
19:31 - 19:36

So, this is where problems can arise.
19:36 - 19:37

So, let's go back to this
19:37 - 19:39

geographic-level predictive policing.
19:39 - 19:42

So the danger here is that, unless this system
19:42 - 19:44

is very carefully constructed,
19:44 - 19:47

this sort of crime area ranking might
19:47 - 19:49

again become a self-fulling prophecy.
19:49 - 19:51

If you send police officers to these areas,
19:51 - 19:53

you further scrutinize them,
19:53 - 19:56

and then again you're only detecting a subset
19:56 - 19:58

of crimes, and the cycle continues.
19:58 - 20:02

So, one obvious issue is that
20:02 - 20:08

this statement about geographic-based
crime prediction
20:08 - 20:10

being anonymous is not true,
20:10 - 20:13

because race and location are very strongly
20:13 - 20:15

correlated in the U.S.
20:15 - 20:17

And this is something that machine-learning
systems
20:17 - 20:20

can potentially learn.
20:20 - 20:23

Another issue is that, for example,
20:23 - 20:26

for individual fairness, one of my homes
20:26 - 20:28

sits within one of these boxes.
20:28 - 20:30

Some of these boxes
in these systems are very small,
20:30 - 20:33

for example PredPol is 500ft x 500ft,
20:33 - 20:36

so it's maybe only a few houses.
20:36 - 20:39

So, the implications of this system are that
20:39 - 20:41

you have police officers maybe sitting
20:41 - 20:43

in a police cruiser outside your home
20:43 - 20:45

and a few doors down someone
20:45 - 20:47

may not be within that box,
20:47 - 20:48

and doesn't have this.
20:48 - 20:51

So, that may represent unfairness.
20:51 - 20:55

So, there are real questions here,
20:55 - 20:58

especially because there's no opt-out.
20:58 - 21:00

There's no way to opt-out of this system:
21:00 - 21:02

if you live in a city that has this,
21:02 - 21:05

then you have to deal with it.
21:05 - 21:07

So, it's quite difficult to find out
21:07 - 21:10

what's really going on
21:10 - 21:11

because the algorithm is secret.
21:11 - 21:13

And, in most cases, we don't know
21:13 - 21:15

the full details of the inputs.
21:15 - 21:17

We have some idea
about what features are used,
21:17 - 21:18

but that's about it.
21:18 - 21:20

We also don't know the output.
21:20 - 21:22

That would be knowing police allocation,
21:22 - 21:23

police strategies,
21:23 - 21:26

and in order to nail down
what's really going on here
21:26 - 21:29

in order to verify the validity of
21:29 - 21:30

these companies' claims,
21:30 - 21:34

it may be necessary
to have a 3rd party come in,
21:34 - 21:36

examine the inputs and outputs of the system,
21:36 - 21:38

and say concretely what's going on.
21:38 - 21:39

And if everything is fine and dandy
21:39 - 21:41

then this shouldn't be a problem.
21:41 - 21:44

So, that's potentially one role that
21:44 - 21:45

advocates can play.
21:45 - 21:47

Maybe we should start pushing for audits
21:47 - 21:49

of systems that are used in this way.
21:49 - 21:51

These could have serious implications
21:51 - 21:53

for peoples' lives.
21:53 - 21:55

So, we'll return
to this idea a little bit later,
21:55 - 21:58

but for now this leads us
nicely to Transparency.
21:58 - 21:59

So, we wanna know
21:59 - 22:02

what these systems are doing.
22:02 - 22:05

But it's very hard,
for the reasons described earlier,
22:05 - 22:06

but even in the case of something like
22:06 - 22:10

trying to understand Google's search algorithm,
22:10 - 22:12

it's difficult because it's personalized.
22:12 - 22:14

So, by construction, each user is
22:14 - 22:15

only seeing one endpoint.
22:15 - 22:18

So, it's a very isolating system.
22:18 - 22:20

What do other people see?
22:20 - 22:22

And one reason it's difficult to make
22:22 - 22:24

some of these systems transparent
22:24 - 22:27

is because of, simply, the complexity
22:27 - 22:28

of the algorithms.
22:28 - 22:30

So, an algorithm can become so complex that
22:30 - 22:32

it's difficult to comprehend,
22:32 - 22:33

even for the designer of the system,
22:33 - 22:36

or the implementer of the system.
22:36 - 22:38

The designed might know that this algorithm
22:38 - 22:43

maximizes some metric-- say, accuracy,
22:43 - 22:45

but they may not always have a solid
22:45 - 22:47

understanding of what the algorithm is doing
22:47 - 22:48

for all inputs.
22:48 - 22:51

Certainly with respect to fairness.
22:51 - 22:56

So, in some cases,
it might not be appropriate to use
22:56 - 22:57

an extremely complex model.
22:57 - 23:00

It might be better to use a simpler system
23:00 - 23:03

with human-interpretable features.
23:03 - 23:05

Another issue that arises
23:05 - 23:08

from the opacity of these systems
23:08 - 23:09

and the centralized control
23:09 - 23:12

is that it makes them very influential.
23:12 - 23:14

And thus, an excellent target
23:14 - 23:16

for manipulation or tampering.
23:16 - 23:18

So, this might be tampering that is done
23:18 - 23:22

from an organization that controls the system,
23:22 - 23:24

or an insider at one of the organizations,
23:24 - 23:27

or anyone who's able to compromise their security.
23:27 - 23:30

So, this is an interesting academic work
23:30 - 23:32

that looked at the possibility of
23:32 - 23:34

slightly modifying search rankings
23:34 - 23:37

to shift people's political views.
23:37 - 23:39

So, since people are most likely to
23:39 - 23:41

click on the top search results,
23:41 - 23:44

so 90% of clicks go to the
first page of search results,
23:44 - 23:47

then perhaps by reshuffling
things a little bit,
23:47 - 23:49

or maybe dropping some search results,
23:49 - 23:50

you can influence people's views
23:50 - 23:52

in a coherent way,
23:52 - 23:53

and maybe you can make it so subtle
23:53 - 23:56

that no one is able to notice.
23:56 - 23:57

So in this academic study,
23:57 - 24:00

they did an experiment
24:00 - 24:02

in the 2014 Indian election.
24:02 - 24:04

So they used real voters,
24:04 - 24:06

and they kept the size
of the experiment small enough
24:06 - 24:08

that it was not going to influence the outcome
24:08 - 24:10

of the election.
24:10 - 24:12

So the researchers took people,
24:12 - 24:14

they determined their political leaning,
24:14 - 24:17

and they segmented them into
control and treatment groups,
24:17 - 24:19

where the treatment was manipulation
24:19 - 24:21

of the search ranking results,
24:21 - 24:24

And then they had these people
browse the web.
24:24 - 24:26

And what they found, is that
24:26 - 24:28

this mechanism is very effective at shifting
24:28 - 24:30

people's voter preferences.
24:30 - 24:34

So, in this study, they were able to introduce
24:34 - 24:37

a 20% shift in voter preferences.
24:37 - 24:39

Even alerting users to the fact that this
24:39 - 24:42

was going to be done, telling them
24:42 - 24:44

"we are going to manipulate your search results,"
24:44 - 24:46

"really pay attention,"
24:46 - 24:49

they were totally unable to decrease
24:49 - 24:51

the magnitude of the effect.
24:51 - 24:55

So, the margins of error in many elections
24:55 - 24:58

is incredibly small,
24:58 - 25:00

and the authors estimate that this shift
25:00 - 25:02

could change the outcome of about
25:02 - 25:07

25% of elections worldwide, if this were done.
25:07 - 25:11

And the bias is so small that no one can tell.
25:11 - 25:14

So, all humans, no matter how smart
25:14 - 25:17

and resistant to manipulation
we think we are,
25:17 - 25:22

all of us are subject to this sort of manipulation,
25:22 - 25:24

and we really can't tell.
25:24 - 25:27

So, I'm not saying that this is occurring,
25:27 - 25:31

but right now there is no
regulation to stop this,
25:31 - 25:34

there is no way we could reliably detect this,
25:34 - 25:37

so there's a huge amount of power here.
25:37 - 25:40

So, something to think about.
25:40 - 25:43

But it's not only corporations that are interested
25:43 - 25:47

in this sort of behavioral manipulation.
25:47 - 25:51

In 2010, UK Prime Minister David Cameron
25:51 - 25:55

created this UK Behavioural Insights Team,
25:55 - 25:57

which is informally called the Nudge Unit.
25:57 - 26:01

And so what they do is
they use behavioral science
26:01 - 26:05

and this predictive analytics approach,
26:05 - 26:06

with experimentation,
26:06 - 26:08

to have people make better decisions
26:08 - 26:10

for themselves and society--
26:10 - 26:12

as determined by the UK government.
26:12 - 26:14

And as of a few months ago,
26:14 - 26:17

after an executive order signed by Obama
26:17 - 26:19

in September, the United States now has
26:19 - 26:21

its own Nudge Unit.
26:21 - 26:24

So, to be clear, I don't think that this is
26:24 - 26:26

some sort of malicious plot.
26:26 - 26:27

I think that there can be huge value
26:27 - 26:29

in these sorts of initiatives,
26:29 - 26:31

positively impacting people's lives,
26:31 - 26:34

but when this sort of behavioral manipulation
26:34 - 26:37

is being done, in part openly,
26:37 - 26:39

oversight is pretty important,
26:39 - 26:42

and we really need to consider
26:42 - 26:46

what these systems are optimizing for.
26:46 - 26:48

And that's something that we might
26:48 - 26:52

not always know, or at least understand,
26:52 - 26:54

so for example, for industry,
26:54 - 26:58

we do have a pretty good understanding there:
26:58 - 27:00

industry cares about optimizing for
27:00 - 27:02

the time spent on the website,
27:02 - 27:05

Facebook wants you to spend more time on Facebook,
27:05 - 27:07

they want you to click on ads,
27:07 - 27:09

click on newsfeed items,
27:09 - 27:11

they want you to like things.
27:11 - 27:14

And, fundamentally: profit.
27:14 - 27:18

So, already this has some serious implications,
27:18 - 27:20

and this had pretty serious implications
27:20 - 27:22

in the last 10 years, in media for example.
27:22 - 27:25

The optimizing for click-through rate in journalism
27:25 - 27:27

has produced a race to the bottom
27:27 - 27:28

in terms of quality.
27:28 - 27:31

And another issue is that optimizing
27:31 - 27:35

for what people like might not always be
27:35 - 27:36

the best approach.
27:36 - 27:39

So, Facebook officials have said publicly
27:39 - 27:41

about how Facebook's goal is to make you happy,
27:41 - 27:43

they want you to open that newsfeed
27:43 - 27:45

and just feel great.
27:45 - 27:47

But, there's an issue there, right?
27:47 - 27:50

Because people get their news,
27:50 - 27:52

like 40% of people according to Pew Research,
27:52 - 27:55

get their news from Facebook.
27:55 - 27:58

So, if people don't want to see
27:58 - 28:01

war and corpses,
because it makes them feel sad,
28:01 - 28:04

so this is not a system that is gonna optimize
28:04 - 28:07

for an informed population.
28:07 - 28:09

It's not gonna produce a population that is
28:09 - 28:11

ready to engage in civic life.
28:11 - 28:13

It's gonna produce an amused populations
28:13 - 28:17

whose time is occupied by cat pictures.
28:17 - 28:19

So, in politics, we have a similar
28:19 - 28:21

optimization problem that's occurring.
28:21 - 28:24

So, these political campaigns that use
28:24 - 28:27

these predictive systems,
28:27 - 28:29

are optimizing for votes for the desired candidate,
28:29 - 28:30

of course.
28:30 - 28:33

So, instead of a political campaign being
28:33 - 28:36

--well, maybe this is a naive view, but--
28:36 - 28:38

being an open discussion of the issues
28:38 - 28:40

facing the country,
28:40 - 28:43

it becomes this micro-targeted
persuasion game,
28:43 - 28:45

and the people that get targeted
28:45 - 28:47

are a very small subset of all people,
28:47 - 28:49

and it's only gonna be people that are
28:49 - 28:51

you know, on the edge, maybe disinterested,
28:51 - 28:54

those are the people that are gonna get attention
28:54 - 28:59

from political candidates.
28:59 - 29:02

In policy, as with these Nudge Units,
29:02 - 29:04

they're being used to enable
29:04 - 29:06

better use of government services.
29:06 - 29:07

There are some good projects that have
29:07 - 29:09

come out of this:
29:09 - 29:11

increasing voter registration,
29:11 - 29:13

improving health outcomes,
29:13 - 29:14

improving education outcomes.
29:14 - 29:16

But some of these predictive systems
29:16 - 29:18

that we're starting to see in government
29:18 - 29:21

are optimizing for compliance,
29:21 - 29:24

as is the case with predictive policing.
29:24 - 29:25

So this is something that we need to
29:25 - 29:29

watch carefully.
29:29 - 29:30

I think this is a nice quote that
29:30 - 29:33

sort of describes the problem.
29:33 - 29:35

In some ways me might be narrowing
29:35 - 29:38

our horizon, and the danger is that
29:38 - 29:42

these tools are separating people.
29:42 - 29:44

And this is particularly bad
29:44 - 29:46

for political action, because political action
29:46 - 29:50

requires people to have shared experience,
29:50 - 29:54

and thus are able to collectively act
29:54 - 29:58

to exert pressure to fix problems.
29:58 - 30:01

So, finally: accountability.
30:01 - 30:03

So, we need some oversight mechanisms.
30:03 - 30:07

For example, in the case of errors--
30:07 - 30:08

so this is particularly important for
30:08 - 30:11

civil or bureaucratic systems.
30:11 - 30:14

So, when an algorithm produces some decision,
30:14 - 30:17

we don't always want humans to just
30:17 - 30:18

defer to the machine,
30:18 - 30:22

and that might represent one of the problems.
30:22 - 30:25

So, there are starting to be some cases
30:25 - 30:28

of computer algorithms yielding a decision,
30:28 - 30:30

and then humans being unable to correct
30:30 - 30:32

an obvious error.
30:32 - 30:35

So there's this case in Georgia,
in the United States,
30:35 - 30:37

where 2 young people went to
30:37 - 30:39

the Department of Motor Vehicles,
30:39 - 30:40

they're twins, and they went
30:40 - 30:42

to get their driver's license.
30:42 - 30:45

However, they were both flagged by
30:45 - 30:47

a fraud algorithm that uses facial recognition
30:47 - 30:49

to look for similar faces,
30:49 - 30:51

and I guess the people that designed the system
30:51 - 30:55

didn't think of the possibility of twins.
30:55 - 30:58

Yeah.
So, they just left
30:58 - 31:00

without their driver's licenses.
31:00 - 31:02

The people in the Department of Motor Vehicles
31:02 - 31:04

were unable to correct this.
31:04 - 31:07

So, this is one implication--
31:07 - 31:09

it's like something out of Kafka.
31:09 - 31:12

But there are also cases of errors being made,
31:12 - 31:14

and people not noticing until
31:14 - 31:16

after actions have been taken,
31:16 - 31:18

some of them very serious--
31:18 - 31:19

because people simply deferred
31:19 - 31:21

to the machine.
31:21 - 31:23

So, this is an example from San Francisco.
31:23 - 31:27

So, an ALPR-- an Automated License Plate Reader--
31:27 - 31:29

is a device that uses image recognition
31:29 - 31:32

to detect and read license plates,
31:32 - 31:34

and usually to compare license plates
31:34 - 31:37

with a known list of plates of interest.
31:37 - 31:40

And, so, San Francisco uses these
31:40 - 31:42

and they're mounted on police cars.
31:42 - 31:47

So, in this case, San Francisco ALPR
31:47 - 31:49

got a hit on a car,
31:49 - 31:53

and it was the car of a 47-year-old woman,
31:53 - 31:55

with no criminal history.
31:55 - 31:56

And so it was a false hit
31:56 - 31:58

because it was a blurry image,
31:58 - 32:00

and it matched erroneously with
32:00 - 32:01

one of the plates of interest
32:01 - 32:03

that happened to be a stolen vehicle.
32:03 - 32:07

So, they conducted a traffic stop on her,
32:07 - 32:09

and they take her out of the vehicle,
32:09 - 32:11

they search her and the vehicle,
32:11 - 32:13

she gets a pat-down,
32:13 - 32:15

and they have her kneel
32:15 - 32:18

at gunpoint, in the street.
32:18 - 32:21

So, how much oversight should be present
32:21 - 32:24

depends on the implications of the system.
32:24 - 32:25

It's certainly the case that
32:25 - 32:27

for some of these decision-making systems,
32:27 - 32:29

an error might not be that important,
32:29 - 32:31

it could be relatively harmless,
32:31 - 32:34

but in this case,
an error in this algorithmic decision
32:34 - 32:36

led to this totally innocent person
32:36 - 32:40

literally having a gun pointed at her.
32:40 - 32:44

So, that brings us to: we need some way of
32:44 - 32:45

getting some information about
32:45 - 32:47

what is going on here.
32:47 - 32:50

We don't wanna have to wait for these events
32:50 - 32:53

before we are able to determine
32:53 - 32:54

some information about the system.
32:54 - 32:56

So, auditing is one option:
32:56 - 32:58

to independently verify the statements
32:58 - 33:01

of companies, in situations where we have
33:01 - 33:03

inputs and outputs.
33:03 - 33:05

So, for example, this could be done with
33:05 - 33:07

Google, Facebook.
33:07 - 33:09

If you have the inputs of a system,
33:09 - 33:11

say you have test accounts,
33:11 - 33:12

or real accounts,
33:12 - 33:14

maybe you can collect
people's information together.
33:14 - 33:16

So that was something that was done
33:16 - 33:19

during the 2012 Obama campaign
33:19 - 33:20

by ProPublica.
33:20 - 33:21

People noticed that they were getting
33:21 - 33:25

different emails from the Obama campaign,
33:25 - 33:26

and were interested to see
33:26 - 33:28

based on what factors
33:28 - 33:30

the emails were changing.
33:30 - 33:33

So, I think about 200 people submitted emails
33:33 - 33:35

and they were able to determine some information
33:35 - 33:39

about what the emails
were being varied based on.
33:39 - 33:41

So there have been some successful
33:41 - 33:43

attempts at this.
33:43 - 33:46

So, compare inputs and then look at
33:46 - 33:49

why one item was shown to one user
33:49 - 33:50

and not another, and see if there's
33:50 - 33:52

any statistical differences.
33:52 - 33:56

So, there's some potential legal issues
33:56 - 33:58

with the test accounts, so that's something
33:58 - 34:01

to think about-- I'm not a lawyer.
34:01 - 34:04

So, for example, if you wanna examine
34:04 - 34:06

ad-targeting algorithms,
34:06 - 34:08

one way to proceed is to construct
34:08 - 34:11

a browsing profile, and then examine
34:11 - 34:13

what ads are served back to you.
34:13 - 34:14

And so this is something that
34:14 - 34:16

academic researchers have looked at,
34:16 - 34:17

because, at the time at least,
34:17 - 34:21

you didn't need to make an account to do this.
34:21 - 34:25

So, this was a study that was presented at
34:25 - 34:28

Privacy Enhancing Technologies last year,
34:28 - 34:31

and in this study, the researchers
34:31 - 34:33

generate some browsing profiles
34:33 - 34:36

that differ only by one characteristic,
34:36 - 34:38

so they're basically identical in every way
34:38 - 34:39

except for one thing.
34:39 - 34:42

And that is denoted by Treatment 1 and 2.
34:42 - 34:44

So this is a randomized, controlled trial,
34:44 - 34:46

but I left out the randomization part
34:46 - 34:48

for simplicity.
34:48 - 34:55

So, in one study,
they applied a treatment of gender.
34:55 - 34:57

So, they had the browsing profiles
34:57 - 34:59

in Treatment 1 be male browsing profiles,
34:59 - 35:02

and the browsing profiles in Treatment 2
be female.
35:02 - 35:04

And they wanted to see: is there any difference
35:04 - 35:06

in the way that ads are targeted
35:06 - 35:09

if browsing profiles are effectively identical
35:09 - 35:11

except for gender?
35:11 - 35:15

So, it turns out that there was.
35:15 - 35:19

So, a 3rd-party site was showing Google ads
35:19 - 35:21

for senior executive positions
35:21 - 35:24

at a rate 6 times higher to the fake men
35:24 - 35:27

than for the fake women in this study.
35:27 - 35:30

So, this sort of auditing is not going to
35:30 - 35:33

be able to determine everything
35:33 - 35:35

that algorithms are doing, but they can
35:35 - 35:37

sometimes uncover interesting,
35:37 - 35:41

at least statistical differences.
35:41 - 35:47

So, this leads us to the fundamental issue:
35:47 - 35:49

Right now, we're really not in control
35:49 - 35:51

of some of these systems,
35:51 - 35:54

and we really need these predictive systems
35:54 - 35:56

to be controlled by us,
35:56 - 35:58

in order for them not to be used
35:58 - 36:00

as a system of control.
36:00 - 36:03

So there are some technologies that I'd like
36:03 - 36:07

to point you all to.
36:07 - 36:08

We need tools in the digital commons
36:08 - 36:11

that can help address some of these concerns.
36:11 - 36:13

So, the first thing is that of course
36:13 - 36:15

we known that minimizing the amount of
36:15 - 36:17

data available can help in some contexts,
36:17 - 36:19

which we can do by making systems
36:19 - 36:23

that are private by design, and by default.
36:23 - 36:25

Another thing is that these audit tools
36:25 - 36:26

might be useful.
36:26 - 36:31

And, so, these 2 nice examples in academia...
36:31 - 36:34

the ad experiment that I just showed was done
36:34 - 36:36

using AdFisher.
36:36 - 36:38

So, these are 2 toolkits that you can use
36:38 - 36:41

to start doing this sort of auditing.
36:41 - 36:45

Another technology that is generally useful,
36:45 - 36:47

but particularly in the case of prediction
36:47 - 36:49

it's useful to maintain access to
36:49 - 36:50

as many sites as possible,
36:50 - 36:53

through anonymity systems like Tor,
36:53 - 36:54

because it's impossible to personalize
36:54 - 36:56

when everyone looks the same.
36:56 - 36:59

So this is a very important technology.
36:59 - 37:02

Something that doesn't really exist,
37:02 - 37:04

but that I think is pretty important,
37:04 - 37:06

is having some tool to view the landscape.
37:06 - 37:08

So, as we know from these few studies
37:08 - 37:10

that have been done,
37:10 - 37:12

different people are not seeing the internet
37:12 - 37:13

in the same way.
37:13 - 37:16

This is one reason why we don't like censorship.
37:16 - 37:18

But, rich and poor people,
37:18 - 37:20

from academic research we know that
37:20 - 37:24

there is widespread price discrimination
on the internet,
37:24 - 37:26

so rich and poor people see a different view
37:26 - 37:27

of the Internet,
37:27 - 37:28

men and women see a different view
37:28 - 37:30

of the Internet.
37:30 - 37:31

We wanna know how different people
37:31 - 37:32

see the same site,
37:32 - 37:34

and this could be the beginning of
37:34 - 37:36

a defense system for this sort of
37:36 - 37:42

manipulation/tampering that I showed earlier.
37:42 - 37:46

Another interesting approach is obfuscation:
37:46 - 37:47

injecting noise into the system.
37:47 - 37:49

So there's an interesting browser extension
37:49 - 37:52

called Adnauseum, that's for Firefox,
37:52 - 37:55

which clicks on every single ad you're served,
37:55 - 37:56

to inject noise.
37:56 - 37:57

So that's, I think, an interesting approach
37:57 - 38:00

that people haven't looked at too much.
38:00 - 38:04

So in terms of policy,
38:04 - 38:07

Facebook and Google, these internet giants,
38:07 - 38:09

have billions of users,
38:09 - 38:12

and sometimes they like to call themselves
38:12 - 38:14

new public utilities,
38:14 - 38:15

and if that's the case then
38:15 - 38:18

it might be necessary to subject them
38:18 - 38:21

to additional regulation.
38:21 - 38:22

Another problem that's come up,
38:22 - 38:24

for example with some of the studies
38:24 - 38:25

that Facebook has done,
38:25 - 38:29

is sometimes a lack of ethics review.
38:29 - 38:31

So, for example, in academia,
38:31 - 38:34

if you're gonna do research involving humans,
38:34 - 38:35

there's an Institutional Review Board
38:35 - 38:37

that you go to that verifies that
38:37 - 38:39

you're doing things in an ethical manner.
38:39 - 38:41

And some companies do have internal
38:41 - 38:43

review processes like this, but it might
38:43 - 38:45

be important to have an independent
38:45 - 38:48

ethics board that does this sort of thing.
38:48 - 38:51

And we really need 3rd-party auditing.
38:51 - 38:55

So, for example, some companies
38:55 - 38:56

don't want auditing to be done
38:56 - 38:59

because of IP concerns,
38:59 - 39:01

and if that's the concern
39:01 - 39:03

maybe having a set of people
39:03 - 39:06

that are not paid by the company
39:06 - 39:07

to check how some of these systems
39:07 - 39:09

are being implemented,
39:09 - 39:11

could help give us confidence that
39:11 - 39:17

things are being done in a reasonable way.
39:17 - 39:20

So, in closing,
39:20 - 39:23

algorithmic decision making is here,
39:23 - 39:26

and it's barreling forward
at a very fast rate,
39:26 - 39:28

and we need to figure out what
39:28 - 39:30

the guide rails should be,
39:30 - 39:31

and how to install them
39:31 - 39:33

to handle some of the potential threats.
39:33 - 39:35

There's a huge amount of power here.
39:35 - 39:38

We need more openness in these systems.
39:38 - 39:40

And, right now,
39:40 - 39:42

with the intelligent systems that do exist,
39:42 - 39:44

we don't know what's occurring really,
39:44 - 39:47

and we need to watch carefully
39:47 - 39:49

where and how these systems are being used.
39:49 - 39:51

And I think this community has
39:51 - 39:54

an important role to play in this fight,
39:54 - 39:56

to study what's being done,
39:56 - 39:57

to show people what's being done,
39:57 - 39:59

to raise the debate and advocate,
39:59 - 40:01

and, where necessary, to resist.
40:01 - 40:03

Thanks.
40:03 - 40:13

applause
40:13 - 40:18

Herald: So, let's have a question and answer.
40:18 - 40:19

Microphone 2, please.
40:19 - 40:20

Mic 2: Hi there.
40:20 - 40:23

Thanks for the talk.
40:23 - 40:26

Since these pre-crime softwares also
40:26 - 40:27

arrived here in Germany
40:27 - 40:30

with the start of the so-called CopWatch system
40:30 - 40:33

in southern Germany,
and Bavaria and Nuremberg especially,
40:33 - 40:35

where they try to predict burglary crime
40:35 - 40:37

using that criminal record
40:37 - 40:40

geographical analysis, like you explained,
40:40 - 40:43

leads me to a 2-fold question:
40:43 - 40:48

first, have you heard of any research
40:48 - 40:50

that measures the effectiveness
40:50 - 40:54

of such measures, at all?
40:54 - 40:57

And, second:
40:57 - 41:01

What do you think of the game theory
41:01 - 41:03

if the thieves or the bad guys
41:03 - 41:08

know the system, and when they
game the system,
41:08 - 41:10

they will probably win,
41:10 - 41:12

since one police officer in an interview said
41:12 - 41:14

this system is used to reduce
41:14 - 41:16

the personal costs of policing,
41:16 - 41:19

so they just send the guys
where the red flags are,
41:19 - 41:22

and the others take the day off.
41:22 - 41:24

Dr. Helsby: Yup.
41:24 - 41:27

Um, so, with respect to
41:27 - 41:31

testing the effectiveness of predictive policing,
41:31 - 41:32

the companies,
41:32 - 41:34

some of them do randomized, controlled trials
41:34 - 41:35

and claim a reduction in policing.
41:35 - 41:38

The best independent study that I've seen
41:38 - 41:41

is by this RAND Corporation
41:41 - 41:43

that did a study in, I think,
41:43 - 41:45

Shreveport, Louisiana,
41:45 - 41:48

and in their report they claim
41:48 - 41:50

that there was no statistically significant
41:50 - 41:53

difference, they didn't find any reduction.
41:53 - 41:54

And it was specifically looking at
41:54 - 41:57

property crime, which I think you mentioned.
41:57 - 41:59

So, I think right now there's sort of
41:59 - 42:01

conflicting reports between
42:01 - 42:06

the independent auditors
and these company claims.
42:06 - 42:09

So there definitely needs to be more study.
42:09 - 42:12

And then, the 2nd thing...sorry,
remind me what it was?
42:12 - 42:15

Mic 2: What about the guys gaming the system?
42:15 - 42:17

Dr. Helsby: Oh, yeah.
42:17 - 42:19

I think it's a legitimate concern.
42:19 - 42:22

Like, if all the outputs
were just immediately public,
42:22 - 42:25

then, yes, everyone knows the location
42:25 - 42:27

of all police officers,
42:27 - 42:29

and I imagine that people would have
42:29 - 42:31

a problem with that.
42:31 - 42:33

Yup.
42:33 - 42:36

Heraldl: Microphone #4, please.
42:36 - 42:39

Mic 4: Yeah, this is not actually a question,
42:39 - 42:41

but just a comment.
42:41 - 42:43

I've enjoyed your talk very much,
42:43 - 42:48

in particular after watching
42:48 - 42:52

the talk in Hall 1 earlier in the afternoon.
42:52 - 42:56

The "Say Hi to Your New Boss", about
42:56 - 43:00

algorithms that are trained with big data,
43:00 - 43:02

and finally make decisions.
43:02 - 43:08

And I think these 2 talks are kind of complementary,
43:08 - 43:11

and if people are interested in the topic
43:11 - 43:15

they might want to check out the other talk
43:15 - 43:16

and watch it later, because these
43:16 - 43:17

fit very well together.
43:17 - 43:20

Dr. Helsby: Yeah, it was a great talk.
43:20 - 43:22

Herald: Microphone #2, please.
43:22 - 43:25

Mic 2: Um, yeah, you mentioned
43:25 - 43:27

the need to have some kind of 3rd-party auditing
43:27 - 43:31

or some kind of way to
43:31 - 43:32

peek into these algorithms
43:32 - 43:33

and to see what they're doing,
43:33 - 43:34

and to see if they're being fair.
43:34 - 43:36

Can you talk a little bit more about that?
43:36 - 43:38

Like, going forward,
43:38 - 43:41

some kind of regulatory structures
43:41 - 43:44

would probably have to emerge
43:44 - 43:47

to analyze and to look at
43:47 - 43:49

these black boxes that are just sort of
43:49 - 43:51

popping up everywhere and, you know,
43:51 - 43:53

controlling more and more of the things
43:53 - 43:56

in our lives, and important decisions.
43:56 - 43:59

So, just, what kind of discussions
43:59 - 43:59

are there for that?
43:59 - 44:02

And what kind of possibility
is there for that?
44:02 - 44:05

And, I'm sure that companies would be
44:05 - 44:08

very, very resistant to
44:08 - 44:10

any kind of attempt to look into
44:10 - 44:14

algorithms, and to...
44:14 - 44:15

Dr. Helsby: Yeah, I mean, definitely
44:15 - 44:18

companies would be very resistant to
44:18 - 44:20

having people look into their algorithms.
44:20 - 44:22

So, if you wanna do a very rigorous
44:22 - 44:23

audit of what's going on
44:23 - 44:26

then it's probably necessary to have
44:26 - 44:27

a few people come in
44:27 - 44:29

and sign NDAs, and then
44:29 - 44:31

look through the systems.
44:31 - 44:33

So, that's one way to proceed.
44:33 - 44:35

But, another way to proceed that--
44:35 - 44:39

so, these academic researchers have done
44:39 - 44:40

a few experiments
44:40 - 44:43

and found some interesting things,
44:43 - 44:46

and that's sort all the attempts at auditing
44:46 - 44:46

that we've seen:
44:46 - 44:48

there was 1 attempt in 2012
for the Obama campaign,
44:48 - 44:50

but there's really not been any
44:50 - 44:52

sort of systematic attempt--
44:52 - 44:53

you know, like, in censorship
44:53 - 44:55

we see a systematic attempt to
44:55 - 44:57

do measurement as often as possible,
44:57 - 44:58

check what's going on,
44:58 - 44:59

and that itself, you know,
44:59 - 45:01

can act as an oversight mechanism.
45:01 - 45:02

But, right now,
45:02 - 45:04

I think many of these companies
45:04 - 45:05

realize no one is watching,
45:05 - 45:07

so there's no real push to have
45:07 - 45:10

people verify: are you being fair when you
45:10 - 45:12

implement this system?
45:12 - 45:13

Because no one's really checking.
45:13 - 45:14

Mic 2: Do you think that,
45:14 - 45:15

at some point, it would be like
45:15 - 45:19

an FDA or SEC, to give some American examples...
45:19 - 45:21

an actual government regulatory agency
45:21 - 45:25

that has the power and ability to
45:25 - 45:28

not just sort of look and try to
45:28 - 45:32

reverse engineer some of these algorithms,
45:32 - 45:34

but actually peek in there and make sure
45:34 - 45:36

that things are fair, because it seems like
45:36 - 45:38

there's just-- it's so important now
45:38 - 45:42

that, again, it could be the difference between
45:42 - 45:43

life and death, between
45:43 - 45:45

getting a job, not getting a job,
45:45 - 45:46

being pulled over,
not being pulled over,
45:46 - 45:48

being racially profiled,
not racially profiled,
45:48 - 45:49

things like that.
Dr. Helsby: Right.
45:49 - 45:50

Mic 2: Is it moving in that direction?
45:50 - 45:52

Or is it way too early for it?
45:52 - 45:55

Dr. Helsby: I mean, so some people have...
45:55 - 45:57

someone has called for, like,
45:57 - 45:59

a Federal Search Commission,
45:59 - 46:01

or like a Federal Algorithms Commission,
46:01 - 46:03

that would do this sort of oversight work,
46:03 - 46:06

but it's in such early stages right now
46:06 - 46:10

that there's no real push for that.
46:10 - 46:13

But I think it's a good idea.
46:13 - 46:16

Herald: And again, #2 please.
46:16 - 46:17

Mic 2: Thank you again for your talk.
46:17 - 46:19

I was just curious if you can point
46:19 - 46:20

to any examples of
46:20 - 46:23

either current producers or consumers
46:23 - 46:24

of these algorithmic systems
46:24 - 46:26

who are actively and publicly trying
46:26 - 46:28

to do so in a responsible manner
46:28 - 46:30

by describing what they're trying to do
46:30 - 46:31

and how they're going about it?
46:31 - 46:37

Dr. Helsby: So, yeah, there are some companies,
46:37 - 46:39

for example, like DataKind,
46:39 - 46:43

that try to deploy algorithmic systems
46:43 - 46:45

in as responsible a way as possible,
46:45 - 46:47

for like public policy.
46:47 - 46:50

Like, I actually also implement systems
46:50 - 46:52

for public policy in a transparent way.
46:52 - 46:54

Like, all the code is in GitHub, etc.
46:54 - 47:00

And it is also the case to give credit to
47:00 - 47:02

Google, and these giants,
47:02 - 47:06

they're trying to implement transparency systems
47:06 - 47:08

that help you understand.
47:08 - 47:09

This has been done with respect to
47:09 - 47:12

how your data is being collected,
47:12 - 47:15

but for example if you go on Amazon.com
47:15 - 47:18

you can see a recommendation has been made,
47:18 - 47:19

and that is pretty transparent.
47:19 - 47:21

You can see "this item
was recommended to me,"
47:21 - 47:25

so you know that prediction
is being used in this case,
47:25 - 47:27

and it will say why prediction is being used:
47:27 - 47:29

because you purchased some item.
47:29 - 47:30

And Google has a similar thing,
47:30 - 47:32

if you go to like Google Ad Settings,
47:32 - 47:35

you can even turn off personalization of ads
47:35 - 47:36

if you want,
47:36 - 47:38

and you can also see some of the inferences
47:38 - 47:39

that have been learned about you.
47:39 - 47:41

A subset of the inferences that have been
47:41 - 47:42

learned about you.
47:42 - 47:44

So, like, what interests...
47:44 - 47:48

Herald: A question from the internet, please?
47:48 - 47:51

Signal Angel: Yes, billetQ is asking
47:51 - 47:54

how do you avoid biases in machine learning?
47:54 - 47:57

I asume analysis system, for example,
47:57 - 48:00

could be biased against women and minorities,
48:00 - 48:05

if used for hiring decisions
based on known data.
48:05 - 48:06

Dr. Helsby: Yeah, so one thing is to
48:06 - 48:09

just explicitly check.
48:09 - 48:12

So, you can check to see how
48:12 - 48:14

positive outcomes are being distributed
48:14 - 48:17

among those protected classes.
48:17 - 48:19

You could also incorporate these sort of
48:19 - 48:21

fairness constraints in the function
48:21 - 48:24

that you optimize when you train the system,
48:24 - 48:26

and so, if you're interested in reading more
48:26 - 48:29

about this, the 2 papers--
48:29 - 48:32

let me go to References--
48:32 - 48:33

there's a good paper called
48:33 - 48:35

Fairness Through Awareness that describes
48:35 - 48:37

how to go about doing this,
48:37 - 48:40

so I recommend this person read that.
48:40 - 48:41

It's good.
48:41 - 48:43

Herald: Microphone 2, please.
48:43 - 48:45

Mic2: Thanks again for your talk.
48:45 - 48:50

Umm, hello?
48:50 - 48:51

Okay.
48:51 - 48:53

Umm, I see of course a problem with
48:53 - 48:55

all the black boxes that you describe
48:55 - 48:57

with regards for the crime systems,
48:57 - 49:00

but when we look at the advertising systems
49:00 - 49:02

in many cases they are very networked.
49:02 - 49:04

There are many different systems collaborating
49:04 - 49:07

and exchanging data via open APIs:
49:07 - 49:09

RESTful APIs, and various
49:09 - 49:12

demand-side platforms
and audience-exchange platforms,
49:12 - 49:13

and everything.
49:13 - 49:15

So, can that help to at least
49:15 - 49:22

increase awareness on where targeting, personalization
49:22 - 49:24

might be happening?
49:24 - 49:26

I mean, I'm looking at systems like
49:26 - 49:30

BuiltWith, that surface what kind of
49:30 - 49:31

JavaScript libraries are used elsewhere.
49:31 - 49:33

So, is that something that could help
49:33 - 49:36

at least to give a better awareness
49:36 - 49:39

and listing all the points where
49:39 - 49:41

you might be targeted...
49:41 - 49:43

Dr. Helsby: So, like, with respect to
49:43 - 49:46

advertising, the fact that
there is behind the scenes
49:46 - 49:48

this like complicated auction process
49:48 - 49:51

that's occurring, just makes things
49:51 - 49:52

a lot more complicated.
49:52 - 49:54

So, for example, I said briefly
49:54 - 49:57

that they found that there's this
statistical difference
49:57 - 49:59

between how men and women are treated,
49:59 - 50:01

but it doesn't necessarily mean that
50:01 - 50:04

"Oh, the algorithm is definitely biased."
50:04 - 50:06

It could be because of this auction process,
50:06 - 50:11

it could be that women are considered
50:11 - 50:13

more valuable when it comes to advertising,
50:13 - 50:15

and so these executive ads are getting
50:15 - 50:17

outbid by some other ads,
50:17 - 50:19

and so there's a lot of potential
50:19 - 50:20

causes for that.
50:20 - 50:23

So, I think it just makes things
a lot more complicated.
50:23 - 50:26

I don't know if it helps
with the bias at all.
50:26 - 50:27

Mic 2: Well, the question was more
50:27 - 50:30

a direction... can it help to surface
50:30 - 50:32

and make people aware of that fact?
50:32 - 50:35

I mean, I can talk to my kids probably,
50:35 - 50:36

and they will probably understand,
50:36 - 50:38

but I can't explain that to my grandma,
50:38 - 50:43

who's also, umm, looking at an iPad.
50:43 - 50:44

Dr. Helsby: So, the fact that
50:44 - 50:46

the systems are...
50:46 - 50:49

I don't know if I understand.
50:49 - 50:51

Mic 2: OK. I think that the main problem
50:51 - 50:54

is that we are behind the industry efforts
50:54 - 50:57

to being targeted at, and many people
50:57 - 51:01

do know, but a lot more people don't know,
51:01 - 51:03

and making them aware of the fact
51:03 - 51:07

that they are a target, in a way,
51:07 - 51:11

is something that can only be shown
51:11 - 51:15

by a 3rd party that disposed that data,
51:15 - 51:16

and make audits in a way--
51:16 - 51:18

maybe in an automated way.
51:18 - 51:19

Dr. Helsby: Right.
51:19 - 51:21

Yeah, I think it certainly
could help with advocacy
51:21 - 51:23

if that's the point, yeah.
51:23 - 51:26

Herald: Another question
from the internet, please.
51:26 - 51:29

Signal Angel: Yes, on IRC they are asking
51:29 - 51:31

if we know that prediction in some cases
51:31 - 51:34

provides an influence that cannot be controlled.
51:34 - 51:38

So, r4v5 would like to know from you
51:38 - 51:42

if there are some cases or areas where
51:42 - 51:45

machine learning simply shouldn't go?
51:45 - 51:48

Dr. Helsby: Umm, so I think...
51:48 - 51:53

I mean, yes, I think that it is the case
51:53 - 51:55

that in some cases machine learning
51:55 - 51:56

might not be appropriate.
51:56 - 51:58

For example, if you use machine learning
51:58 - 52:01

to decide who should be searched.
52:01 - 52:03

I don't think it should be the case that
52:03 - 52:04

machine learning algorithms should
52:04 - 52:05

ever be used to determine
52:05 - 52:08

probable cause, or something like that.
52:08 - 52:12

So, if it's just one piece of evidence
52:12 - 52:13

that you consider,
52:13 - 52:15

and there's human oversight always,
52:15 - 52:19

maybe it's fine, but
52:19 - 52:21

we should be very suspicious and hesitant
52:21 - 52:22

in certain contexts where
52:22 - 52:25

the ramifications are very serious.
52:25 - 52:27

Like the No Fly List, and so on.
52:27 - 52:29

Herald: And #2 again.
52:29 - 52:31

Mic 2: A second question
52:31 - 52:34

that just occurred to me, if you don't mind.
52:34 - 52:35

Umm, until the advent of
52:35 - 52:37

algorithmic systems,
52:37 - 52:40

when there've been cases of serious harm
52:40 - 52:43

that's been resulted in individuals or groups,
52:43 - 52:45

and it's been demonstrated that
52:45 - 52:46

it's occurred because of
52:46 - 52:49

an individual or a system of people
52:49 - 52:53

being systematically biased, then often
52:53 - 52:55

one of the actions that's taken is
52:55 - 52:57

pressure's applied, and then
52:57 - 53:00

people are required to change,
53:00 - 53:01

and hopely be held responsible,
53:01 - 53:03

and then change the way that they do things
53:03 - 53:06

to try to remove bias from that system.
53:06 - 53:08

What's the current thinking about
53:08 - 53:10

how we can go about doing that
53:10 - 53:13

when the systems that are doing that
53:13 - 53:14

are algorithmic?
53:14 - 53:16

Is it just going to be human oversight,
53:16 - 53:17

and humans are gonna have to be
53:17 - 53:18

held responsible for the oversight?
53:18 - 53:21

Dr. Helsby: So, in terms of bias,
53:21 - 53:23

if we're concerned about bias towards
53:23 - 53:24

particular types of people,
53:24 - 53:26

that's something that we can optimize for.
53:26 - 53:29

So, we can train systems that are unbiased
53:29 - 53:30

in this way.
53:30 - 53:32

So that's one way to deal with it.
53:32 - 53:34

But there's always gonna be errors,
53:34 - 53:35

so that's sort of a separate issue
53:35 - 53:38

from the bias, and in the case
53:38 - 53:39

where there are errors,
53:39 - 53:41

there must be oversight.
53:41 - 53:45

So, one way that one could improve
53:45 - 53:46

the way that this is done
53:46 - 53:48

is by making sure that you're
53:48 - 53:51

keeping track of confidence of decisions.
53:51 - 53:54

So, if you have a low confidence prediction,
53:54 - 53:56

then maybe a human
should come in and check things.
53:56 - 53:59

So, that might be one way to proceed.
54:02 - 54:04

Herald: So, there's no more question.
54:04 - 54:06

I close this talk now,
54:06 - 54:08

and thank you very much
54:08 - 54:09

and a big applause to
54:09 - 54:12

Jennifer Helsby!
54:12 - 54:16

roaring applause
54:16 - 54:28

subtitles created by c3subtitles.de
Join, and help us!

Title:: Jennifer Helsby: Prediction and Control
Description:: more » « less
Video Language:: English
Duration:: 54:28

	C3Subtitles edited English subtitles for Jennifer Helsby: Prediction and Control
	Bar Sch edited English subtitles for Jennifer Helsby: Prediction and Control
	Bar Sch edited English subtitles for Jennifer Helsby: Prediction and Control
	Bar Sch edited English subtitles for Jennifer Helsby: Prediction and Control
	Bar Sch edited English subtitles for Jennifer Helsby: Prediction and Control
	Bar Sch edited English subtitles for Jennifer Helsby: Prediction and Control
	Bar Sch edited English subtitles for Jennifer Helsby: Prediction and Control
	Bar Sch edited English subtitles for Jennifer Helsby: Prediction and Control

Show all

English subtitles

Revisions

Revision 34 Edited

C3Subtitles

Jennifer Helsby: Prediction and Control

Revisions

Our website uses cookies

Operating cookies (Required)