-
Musik
-
Herald: Who of you is using Facebook? Twitter?
Diaspora?
-
concerned noise And all of that data
you enter there
-
gets to server, gets into the hand of somebody
who's using it
-
and the next talk
is especially about that,
-
because there's also intelligent machines
and intelligent algorithms
-
that try to make something
out of that data.
-
So the post-doc researcher Jennifer Helsby
-
of the University of Chicago,
which works in this
-
intersection between policy and
technology,
-
will now ask you the question:
To who would we give that power?
-
Dr. Helsby: Thanks.
applause
-
Okay, so, today I'm gonna do a brief tour
of intelligent systems
-
and how they're currently used
-
and then we're gonna look at some examples
with respect
-
to the properties that we might care about
-
these systems having,
and I'll talk a little bit about
-
some of the work that's been done in academia
-
on these topics.
-
And then we'll talk about some
promising paths forward.
-
So, I wanna start with this:
Kranzberg's First Law of Technology
-
So, it's not good or bad,
but it also isn't neutral.
-
Technology shapes our world,
and it can act as
-
a liberating force-- or an oppressive and
controlling force.
-
So, in this talk, I'm gonna go
towards some of the aspects
-
of intelligent systems that might be more
controlling in nature.
-
So, as we all know,
-
because of the rapidly decreasing cost
of storage and computation,
-
along with the rise of new sensor technologies,
-
data collection devices
are being pushed into every
-
aspect of our lives: in our homes, our cars,
-
in our pockets, on our wrists.
-
And data collection systems act as intermediaries
-
for a huge amount of human communication.
-
And much of this data sits in government
-
and corporate databases.
-
So, in order to make use of this data,
-
we need to be able to make some inferences.
-
So, one way of approaching this is I can hire
-
a lot of humans, and I can have these humans
-
manually examine the data, and they can acquire
-
expert knowledge of the domain, and then
-
perhaps they can make some decisions
-
or at least some recommendations
based on it.
-
However, there's some problems with this.
-
One is that it's slow, and thus expensive.
-
It's also biased. We know that humans have
-
all sorts of biases, both conscious and unconscious,
-
and it would be nice to have a system
that did not have
-
these inaccuracies.
-
It's also not very transparent: I might
-
not really know the factors that led to
-
some decisions being made.
-
Even humans themselves
often don't really understand
-
why they came to a given decision, because
-
of their being emotional in nature.
-
And, thus, these human decision making systems
-
are often difficult to audit.
-
So, another way to proceed is maybe instead
-
I study the system and the data carefully
-
and I write down the best rules
for making a decision
-
or, I can have a machine
dynamically figure out
-
the best rules, as in machine learning.
-
So, maybe this is a better approach.
-
It's certainly fast, and thus cheap.
-
And maybe I can construct
the system in such a way
-
that it doesn't have the biases that are inherent
-
in human decision making.
-
And, since I've written these rules down,
-
or a computer has learned these rules,
-
then I can just show them to somebody, right?
-
And then they can audit it.
-
So, more and more decision making is being
-
done in this way.
-
And so, in this model, we take data
-
we make an inference based on that data
-
using these algorithms, and then
-
we can take actions.
-
And, when we take this more scientific approach
-
to making decisions and optimizing for
-
a desired outcome,
we can take an experimental approach
-
so we can determine
which actions are most effective
-
in achieving a desired outcome.
-
Maybe there are some types of communication
-
styles that are most effective
with certain people.
-
I can perhaps deploy some individualized incentives
-
to get the outcome that I desire.
-
And, maybe even if I carefully design an experiment
-
with the environment in which people make
-
these decisions, perhaps even very small changes
-
can introduce significant changes
in peoples' behavior.
-
So, through these mechanisms,
and this experimental approach,
-
I can maximize the probability
that humans do
-
what I want.
-
So, algorithmic decision making is being used
-
in industry, and is used
in lots of other areas,
-
from astrophysics to medicine, and is now
-
moving into new domains, including
-
government applications.
-
So, we have recommendation engines like
Netflix, Yelp, SoundCloud,
-
that direct our attention to what we should
-
watch and listen to.
-
Since 2009, Google uses
personalized searched results,
-
including if you're not logged in
into your Google account.
-
And we also have algorithm curation and filtering,
-
as in the case of Facebook News Feed,
-
Google News, Yahoo News,
-
which shows you what news articles, for example,
-
you should be looking at.
-
And this is important, because a lot of people
-
get news from these media.
-
We even have algorithmic journalists!
-
So, automatic systems generate articles
-
about weather, traffic, or sports
-
instead of a human.
-
And, another application that's more recent
-
is the use of predictive systems
-
in political campaigns.
-
So, political campaigns also now take this
-
approach to predict on an individual basis
-
which candidate voters
are likely to vote for.
-
And then they can target,
on an individual basis,
-
those that can be persuaded otherwise.
-
And, finally, in the public sector,
-
we're starting to use predictive systems
-
in areas from policing, to health,
to education and energy.
-
So, there are some advantages to this.
-
So, one thing is that we can automate
-
aspects of our lives
that we consider to be mundane
-
using systems that are intelligent
-
and adaptive enough.
-
We can make use of all the data
-
and really get the pieces of information we
-
really care about.
-
We can spend money in the most effective way,
-
and we can do this with this experimental
-
approach to optimize actions to produce
-
desired outcomes.
-
So, we can embed intelligence
-
into all of these mundane objects
-
and enable them to make decisions for us,
-
and so that's what we're doing more and more,
-
and we can have an object
that decides for us
-
what temperature we should set our house,
-
what we should be doing, etc.
-
So, there might be some implications here.
-
We want these systems
that do work on this data
-
to increase the opportunities
available to us.
-
But it might be that there are some implications
-
that we have not carefully thought through.
-
This is a new area, and people are only
-
starting to scratch the surface of what the
-
problems might be.
-
In some cases, they might narrow the options
-
available to people,
-
and this approach subjects people to
-
suggestive messaging intended to nudge them
-
to a desired outcome.
-
Some people may have a problem with that.
-
Values we care about are not gonna be
-
baked into these systems by default.
-
It's also the case that some algorithmic systems
-
facilitate work that we do not like.
-
For example, in the case of mass surveillance.
-
And even the same systems,
-
used by different people or organizations,
-
have very different consequences.
-
For example, if I can predict
-
with high accuracy, based on say search queries,
-
who's gonna be admitted to a hospital,
-
some people would be interested
in knowing that.
-
You might be interested
in having your doctor know that.
-
But that same predictive model
in the hands of
-
an insurance company
has a very different implication.
-
So, the point here is that these systems
-
structure and influence how humans interact
-
with each other, how they interact with society,
-
and how they interact with government.
-
And if they constrain what people can do,
-
we should really care about this.
-
So now I'm gonna go to
sort of an extreme case,
-
just as an example, and that's this
Chinese Social Credit System.
-
And so this is probably one of the more
-
ambitious uses of data,
-
that is used to rank each citizen
-
based on their behavior, in China.
-
So right now, there are various pilot systems
-
deployed by various companies doing this in
China.
-
They're currently voluntary, and by 2020
-
this system is gonna be decided on,
-
or a combination of the systems,
-
that is gonna be mandatory for everyone.
-
And so, in this system, there are some citizens,
-
and a huge range of data sources are used.
-
So, some of the data sources are
-
your financial data,
-
your criminal history,
-
how many points you have
on your driver's license,
-
medical information-- for example,
if you take birth control pills,
-
that's incorporated.
-
Your purchase history-- for example,
if you purchase games,
-
you are down-ranked in the system.
-
Some of the systems, not all of them,
-
incorporate social media monitoring,
-
which makes sense if you're a state like China,
-
you probably want to know about
-
political statements that people
are saying on social media.
-
And, one of the more interesting parts is
-
social network analysis:
looking at the relationships between people.
-
So, if you have a close relationship with
somebody
-
and they have a low credit score,
-
that can have implications on your credit
score.
-
So, the way that these scores
are generated is secret.
-
And, according to the call for these systems
-
put out by the government,
-
the goal is to
"carry forward the sincerity and
-
traditional virtues" and
establish the idea of a
-
"sincerity culture."
-
But wait, it gets better:
-
so, there's a portal that enables citizens
-
to look up the citizen score of anyone.
-
And many people like this system,
-
they think it's a fun game.
-
They boast about it on social media,
-
they put their score in their dating profile,
-
because if you're ranked highly you're
-
part of an exclusive club.
-
You can get VIP treatment
at hotels and other companies.
-
But the downside is that, if you're excluded
-
from that club, your weak score
may have other implications,
-
like being unable to get access
to credit, housing, jobs.
-
There is some reporting that even travel visas
-
might be restricted
if your score is particularly low.
-
So, a system like this, for a state, is really
-
the optimal solution
to the problem of the public.
-
It constitutes a very subtle and insiduous
-
mechanism of social control.
-
You don't need to spend a lot of money on
-
police or prisons if you can set up a system
-
where people discourage one another from
-
anti-social acts like political action
in exchange for
-
a coupon for a free Uber ride.
-
So, there are a lot of
legitimate questions here:
-
What protections does
user data have in this scheme?
-
Do any safeguards exist to prevent tampering?
-
What mechanism, if any, is there to prevent
-
false input data from creating erroneous inferences?
-
Is there any way that people can fix
-
their score once they're ranked poorly?
-
Or does it end up becoming a
-
self-fulfilling prophecy?
-
Your weak score means you have less access
-
to jobs and credit, and now you will have
-
limited access to opportunity.
-
So, let's take a step back.
-
So, what do we want?
-
So, we probably don't want that,
-
but as advocates we really wanna
-
understand what questions we should be asking
-
of these systems. Right now there's
-
very little oversight,
-
and we wanna make sure that we don't
-
sort of sleepwalk our way to a situation
-
where we've lost even more power
-
to these centralized systems of control.
-
And if you're an implementer, we wanna understand
-
what can we be doing better.
-
Are there better ways that we can be implementing
-
these systems?
-
Are there values that, as humans,
-
we care about that we should make sure
-
these systems have?
-
So, the first thing
that most people in the room
-
might think about is privacy.
-
Which is, of course, of the utmost importance.
-
We need privacy, and there is a good discussion
-
on the importance of protecting
user data where possible.
-
So, in this talk, I'm gonna focus
on the other aspects of
-
algorithmic decision making,
-
that I think have got less attention.
-
Because it's not just privacy
that we need to worry about here.
-
We also want systems that are fair and equitable.
-
We want transparent systems,
-
we don't want opaque decisions
to be made about us,
-
decisions that might have serious impacts
-
on our lives.
-
And we need some accountability mechanisms.
-
So, for the rest of this talk
-
we're gonna go through each one of these things
-
and look at some examples.
-
So, the first thing is fairness.
-
And so, as I said in the beginning,
this is one area
-
where there might be an advantage
-
to making decisions by machine,
-
especially in areas where there have
-
historically been fairness issues with
-
decision making, such as law enforcement.
-
So, this is one way that police departments
-
use predictive models.
-
The idea here is police would like to
-
allocate resources in a more effective way,
-
and they would also like to enable
-
proactive policing.
-
So, if you can predict where crimes
are going to occur,
-
or who is going to commit crimes,
-
then you can put cops in those places,
-
or perhaps following these people,
-
and then the crimes will not occur.
-
So, it's sort of the pre-crime approach.
-
So, there are a few ways of going about this.
-
One way is doing this individual-level prediction.
-
So you take each citizen
and estimate the risk
-
that each citizen will participate,
say, in violence
-
based on some data.
-
And then you can flag those people that are
-
considered particularly violent.
-
So, this is currently done.
-
This is done in the U.S.
-
It's done in Chicago,
by the Chicago Police Department.
-
And they maintain a heat list of individuals
-
that are considered most likely to commit,
-
or be the victim of, violence.
-
And this is done using data
that the police maintain.
-
So, the features that are used
in this predictive model
-
include things that are derived from
-
individuals' criminal history.
-
So, for example, have they been involved in
-
gun violence in the past?
-
Do they have narcotics arrests? And so on.
-
But another thing that's incorporated
-
in the Chicago Police Department model is
-
information derived from
social media network analysis.
-
So, who you interact with,
-
as noted in police data.
-
So, for example, your co-arrestees.
-
When officers conduct field interviews,
-
who are people interacting with?
-
And then this is all incorporated
into this risk score.
-
So another way to proceed,
-
which is the method that most companies
-
that sell products like this
to the police have taken,
-
is instead predicting which areas
-
are likely to have crimes committed in them.
-
So, take my city, I put a grid down,
-
and then I use crime statistics
-
and maybe some ancillary data sources,
-
to determine which areas have
-
the highest risk of crimes occurring in them,
-
and I can flag those areas and send
-
police officers to them.
-
So now, let's look at some of the tools
-
that are used for this geographic-level prediction.
-
So, here are 3 companies that sell these
-
geographic-level predictive policing systems.
-
So, PredPol has a system that uses
-
primarily crime statistics:
-
only the time, place, and type of crime
-
to predict where crimes will occur.
-
HunchLab uses a wider range of data sources
-
including, for example, weather
-
and then Hitachi is a newer system
-
that has a predictive crime analytics tool
-
that also incorporates social media.
-
The first one, to my knowledge, to do so.
-
And these systems are in use
-
in 50+ cities in the U.S.
-
So, why do police departments buy this?
-
Some police departments are interesting in
-
buying systems like this, because they're marketed
-
as impartial systems,
-
so it's a way to police in an unbiased way.
-
And so, these companies make
-
statements like this--
-
by the way, the references
will all be at the end,
-
and they'll be on the slides--
-
So, for example
-
the predictive crime analytics from Hitachi
-
claims that the system is anonymous,
-
because it shows you an area,
-
it doesn't show you
to look for a particular person.
-
and PredPol reassures people that
-
it eliminates any liberties or profiling concerns.
-
And HunchLab notes that the system
-
fairly represents priorities for public safety
-
and is unbiased by race
or ethnicity, for example.
-
So, let's take a minute
to describe in more detail
-
what we mean when we talk about fairness.
-
So, when we talk about fairness,
-
we mean a few things.
-
So, one is fairness with respect to individuals:
-
so if I'm very similar to somebody
-
and we go through some process
-
and there is two very different
outcomes to that process
-
we would consider that to be unfair.
-
So, we want similar people to be treated
-
in a similar way.
-
But, there are certain protected attributes
-
that we wouldn't want someone
-
to discriminate based on.
-
And so, there's this other property,
Group Fairness.
-
So, we can look at the statistical parity
-
between groups, based on gender, race, etc.
-
and see if they're treated in a similar way.
-
And we might not expect that in some cases,
-
for example if the base rates in each group
-
are very different.
-
And then there's also Fairness in Errors.
-
All predictive systems are gonna make errors,
-
and if the errors are concentrated,
-
then that may also represent unfairness.
-
And so this concern arose recently with Facebook
-
because people with Native American names
-
had their profiles flagged as fraudulent
-
far more often than those
with White American names.
-
So these are the sorts of things
that we worry about
-
and each of these are metrics,
-
and if you're interested more you should
-
check those 2 papers out.
-
So, how can potential issues
with predictive policing
-
have implications for these principles?
-
So, one problem is
the training data that's used.
-
Some of these systems only use crime statistics,
-
other systems-- all of them use crime statistics
-
in some way.
-
So, one problem is that crime databases
-
contain only crimes that've been detected.
-
Right? So, the police are only gonna detect
-
crimes that they know are happening,
-
either through patrol and their own investigation
-
or because they've been alerted to crime,
-
for example by a citizen calling the police.
-
So, a citizen has to feel like
they can call the police,
-
like that's a good idea.
-
So, some crimes suffer
from this problem less than others:
-
for example, gun violence
is much easier to detect
-
relative to fraud, for example,
-
which is very difficult to detect.
-
Now the racial profiling aspect
of this might come in
-
because of biased policing in the past.
-
So, for example, for marijuana arrests,
-
black people are arrested in the U.S. at rates
-
4 times that of white people,
-
even though there is statistical parity
-
with these 2 groups, to within a few percent.
-
So, this is where problems can arise.
-
So, let's go back to this
-
geographic-level predictive policing.
-
So the danger here is that, unless this system
-
is very carefully constructed,
-
this sort of crime area ranking might
-
again become a self-fulling prophecy.
-
If you send police officers to these areas,
-
you further scrutinize them,
-
and then again you're only detecting a subset
-
of crimes, and the cycle continues.
-
So, one obvious issue is that
-
this statement about geographic-based
crime prediction
-
being anonymous is not true,
-
because race and location are very strongly
-
correlated in the U.S.
-
And this is something that machine-learning
systems
-
can potentially learn.
-
Another issue is that, for example,
-
for individual fairness, one of my homes
-
sits within one of these boxes.
-
Some of these boxes
in these systems are very small,
-
for example PredPol is 500ft x 500ft,
-
so it's maybe only a few houses.
-
So, the implications of this system are that
-
you have police officers maybe sitting
-
in a police cruiser outside your home
-
and a few doors down someone
-
may not be within that box,
-
and doesn't have this.
-
So, that may represent unfairness.
-
So, there are real questions here,
-
especially because there's no opt-out.
-
There's no way to opt-out of this system:
-
if you live in a city that has this,
-
then you have to deal with it.
-
So, it's quite difficult to find out
-
what's really going on
-
because the algorithm is secret.
-
And, in most cases, we don't know
-
the full details of the inputs.
-
We have some idea
about what features are used,
-
but that's about it.
-
We also don't know the output.
-
That would be knowing police allocation,
-
police strategies,
-
and in order to nail down
what's really going on here
-
in order to verify the validity of
-
these companies' claims,
-
it may be necessary
to have a 3rd party come in,
-
examine the inputs and outputs of the system,
-
and say concretely what's going on.
-
And if everything is fine and dandy
-
then this shouldn't be a problem.
-
So, that's potentially one role that
-
advocates can play.
-
Maybe we should start pushing for audits
-
of systems that are used in this way.
-
These could have serious implications
-
for peoples' lives.
-
So, we'll return
to this idea a little bit later,
-
but for now this leads us
nicely to Transparency.
-
So, we wanna know
-
what these systems are doing.
-
But it's very hard,
for the reasons described earlier,
-
but even in the case of something like
-
trying to understand Google's search algorithm,
-
it's difficult because it's personalized.
-
So, by construction, each user is
-
only seeing one endpoint.
-
So, it's a very isolating system.
-
What do other people see?
-
And one reason it's difficult to make
-
some of these systems transparent
-
is because of, simply, the complexity
-
of the algorithms.
-
So, an algorithm can become so complex that
-
it's difficult to comprehend,
-
even for the designer of the system,
-
or the implementer of the system.
-
The designed might know that this algorithm
-
maximizes some metric-- say, accuracy,
-
but they may not always have a solid
-
understanding of what the algorithm is doing
-
for all inputs.
-
Certainly with respect to fairness.
-
So, in some cases,
it might not be appropriate to use
-
an extremely complex model.
-
It might be better to use a simpler system
-
with human-interpretable features.
-
Another issue that arises
-
from the opacity of these systems
-
and the centralized control
-
is that it makes them very influential.
-
And thus, an excellent target
-
for manipulation or tampering.
-
So, this might be tampering that is done
-
from an organization that controls the system,
-
or an insider at one of the organizations,
-
or anyone who's able to compromise their security.
-
So, this is an interesting academic work
-
that looked at the possibility of
-
slightly modifying search rankings
-
to shift people's political views.
-
So, since people are most likely to
-
click on the top search results,
-
so 90% of clicks go to the
first page of search results,
-
then perhaps by reshuffling
things a little bit,
-
or maybe dropping some search results,
-
you can influence people's views
-
in a coherent way,
-
and maybe you can make it so subtle
-
that no one is able to notice.
-
So in this academic study,
-
they did an experiment
-
in the 2014 Indian election.
-
So they used real voters,
-
and they kept the size
of the experiment small enough
-
that it was not going to influence the outcome
-
of the election.
-
So the researchers took people,
-
they determined their political leaning,
-
and they segmented them into
control and treatment groups,
-
where the treatment was manipulation
-
of the search ranking results,
-
And then they had these people
browse the web.
-
And what they found, is that
-
this mechanism is very effective at shifting
-
people's voter preferences.
-
So, in this study, they were able to introduce
-
a 20% shift in voter preferences.
-
Even alerting users to the fact that this
-
was going to be done, telling them
-
"we are going to manipulate your search results,"
-
"really pay attention,"
-
they were totally unable to decrease
-
the magnitude of the effect.
-
So, the margins of error in many elections
-
is incredibly small,
-
and the authors estimate that this shift
-
could change the outcome of about
-
25% of elections worldwide, if this were done.
-
And the bias is so small that no one can tell.
-
So, all humans, no matter how smart
-
and resistant to manipulation
we think we are,
-
all of us are subject to this sort of manipulation,
-
and we really can't tell.
-
So, I'm not saying that this is occurring,
-
but right now there is no
regulation to stop this,
-
there is no way we could reliably detect this,
-
so there's a huge amount of power here.
-
So, something to think about.
-
But it's not only corporations that are interested
-
in this sort of behavioral manipulation.
-
In 2010, UK Prime Minister David Cameron
-
created this UK Behavioural Insights Team,
-
which is informally called the Nudge Unit.
-
And so what they do is
they use behavioral science
-
and this predictive analytics approach,
-
with experimentation,
-
to have people make better decisions
-
for themselves and society--
-
as determined by the UK government.
-
And as of a few months ago,
-
after an executive order signed by Obama
-
in September, the United States now has
-
its own Nudge Unit.
-
So, to be clear, I don't think that this is
-
some sort of malicious plot.
-
I think that there can be huge value
-
in these sorts of initiatives,
-
positively impacting people's lives,
-
but when this sort of behavioral manipulation
-
is being done, in part openly,
-
oversight is pretty important,
-
and we really need to consider
-
what these systems are optimizing for.
-
And that's something that we might
-
not always know, or at least understand,
-
so for example, for industry,
-
we do have a pretty good understanding there:
-
industry cares about optimizing for
-
the time spent on the website,
-
Facebook wants you to spend more time on Facebook,
-
they want you to click on ads,
-
click on newsfeed items,
-
they want you to like things.
-
And, fundamentally: profit.
-
So, already this has some serious implications,
-
and this had pretty serious implications
-
in the last 10 years, in media for example.
-
The optimizing for click-through rate in journalism
-
has produced a race to the bottom
-
in terms of quality.
-
And another issue is that optimizing
-
for what people like might not always be
-
the best approach.
-
So, Facebook officials have said publicly
-
about how Facebook's goal is to make you happy,
-
they want you to open that newsfeed
-
and just feel great.
-
But, there's an issue there, right?
-
Because people get their news,
-
like 40% of people according to Pew Research,
-
get their news from Facebook.
-
So, if people don't want to see
-
war and corpses,
because it makes them feel sad,
-
so this is not a system that is gonna optimize
-
for an informed population.
-
It's not gonna produce a population that is
-
ready to engage in civic life.
-
It's gonna produce an amused populations
-
whose time is occupied by cat pictures.
-
So, in politics, we have a similar
-
optimization problem that's occurring.
-
So, these political campaigns that use
-
these predictive systems,
-
are optimizing for votes for the desired candidate,
-
of course.
-
So, instead of a political campaign being
-
--well, maybe this is a naive view, but--
-
being an open discussion of the issues
-
facing the country,
-
it becomes this micro-targeted
persuasion game,
-
and the people that get targeted
-
are a very small subset of all people,
-
and it's only gonna be people that are
-
you know, on the edge, maybe disinterested,
-
those are the people that are gonna get attention
-
from political candidates.
-
In policy, as with these Nudge Units,
-
they're being used to enable
-
better use of government services.
-
There are some good projects that have
-
come out of this:
-
increasing voter registration,
-
improving health outcomes,
-
improving education outcomes.
-
But some of these predictive systems
-
that we're starting to see in government
-
are optimizing for compliance,
-
as is the case with predictive policing.
-
So this is something that we need to
-
watch carefully.
-
I think this is a nice quote that
-
sort of describes the problem.
-
In some ways me might be narrowing
-
our horizon, and the danger is that
-
these tools are separating people.
-
And this is particularly bad
-
for political action, because political action
-
requires people to have shared experience,
-
and thus are able to collectively act
-
to exert pressure to fix problems.
-
So, finally: accountability.
-
So, we need some oversight mechanisms.
-
For example, in the case of errors--
-
so this is particularly important for
-
civil or bureaucratic systems.
-
So, when an algorithm produces some decision,
-
we don't always want humans to just
-
defer to the machine,
-
and that might represent one of the problems.
-
So, there are starting to be some cases
-
of computer algorithms yielding a decision,
-
and then humans being unable to correct
-
an obvious error.
-
So there's this case in Georgia,
in the United States,
-
where 2 young people went to
-
the Department of Motor Vehicles,
-
they're twins, and they went
-
to get their driver's license.
-
However, they were both flagged by
-
a fraud algorithm that uses facial recognition
-
to look for similar faces,
-
and I guess the people that designed the system
-
didn't think of the possibility of twins.
-
Yeah.
So, they just left
-
without their driver's licenses.
-
The people in the Department of Motor Vehicles
-
were unable to correct this.
-
So, this is one implication--
-
it's like something out of Kafka.
-
But there are also cases of errors being made,
-
and people not noticing until
-
after actions have been taken,
-
some of them very serious--
-
because people simply deferred
-
to the machine.
-
So, this is an example from San Francisco.
-
So, an ALPR-- an Automated License Plate Reader--
-
is a device that uses image recognition
-
to detect and read license plates,
-
and usually to compare license plates
-
with a known list of plates of interest.
-
And, so, San Francisco uses these
-
and they're mounted on police cars.
-
So, in this case, San Francisco ALPR
-
got a hit on a car,
-
and it was the car of a 47-year-old woman,
-
with no criminal history.
-
And so it was a false hit
-
because it was a blurry image,
-
and it matched erroneously with
-
one of the plates of interest
-
that happened to be a stolen vehicle.
-
So, they conducted a traffic stop on her,
-
and they take her out of the vehicle,
-
they search her and the vehicle,
-
she gets a pat-down,
-
and they have her kneel
-
at gunpoint, in the street.
-
So, how much oversight should be present
-
depends on the implications of the system.
-
It's certainly the case that
-
for some of these decision-making systems,
-
an error might not be that important,
-
it could be relatively harmless,
-
but in this case,
an error in this algorithmic decision
-
led to this totally innocent person
-
literally having a gun pointed at her.
-
So, that brings us to: we need some way of
-
getting some information about
-
what is going on here.
-
We don't wanna have to wait for these events
-
before we are able to determine
-
some information about the system.
-
So, auditing is one option:
-
to independently verify the statements
-
of companies, in situations where we have
-
inputs and outputs.
-
So, for example, this could be done with
-
Google, Facebook.
-
If you have the inputs of a system,
-
say you have test accounts,
-
or real accounts,
-
maybe you can collect
people's information together.
-
So that was something that was done
-
during the 2012 Obama campaign
-
by ProPublica.
-
People noticed that they were getting
-
different emails from the Obama campaign,
-
and were interested to see
-
based on what factors
-
the emails were changing.
-
So, I think about 200 people submitted emails
-
and they were able to determine some information
-
about what the emails
were being varied based on.
-
So there have been some successful
-
attempts at this.
-
So, compare inputs and then look at
-
why one item was shown to one user
-
and not another, and see if there's
-
any statistical differences.
-
So, there's some potential legal issues
-
with the test accounts, so that's something
-
to think about-- I'm not a lawyer.
-
So, for example, if you wanna examine
-
ad-targeting algorithms,
-
one way to proceed is to construct
-
a browsing profile, and then examine
-
what ads are served back to you.
-
And so this is something that
-
academic researchers have looked at,
-
because, at the time at least,
-
you didn't need to make an account to do this.
-
So, this was a study that was presented at
-
Privacy Enhancing Technologies last year,
-
and in this study, the researchers
-
generate some browsing profiles
-
that differ only by one characteristic,
-
so they're basically identical in every way
-
except for one thing.
-
And that is denoted by Treatment 1 and 2.
-
So this is a randomized, controlled trial,
-
but I left out the randomization part
-
for simplicity.
-
So, in one study,
they applied a treatment of gender.
-
So, they had the browsing profiles
-
in Treatment 1 be male browsing profiles,
-
and the browsing profiles in Treatment 2
be female.
-
And they wanted to see: is there any difference
-
in the way that ads are targeted
-
if browsing profiles are effectively identical
-
except for gender?
-
So, it turns out that there was.
-
So, a 3rd-party site was showing Google ads
-
for senior executive positions
-
at a rate 6 times higher to the fake men
-
than for the fake women in this study.
-
So, this sort of auditing is not going to
-
be able to determine everything
-
that algorithms are doing, but they can
-
sometimes uncover interesting,
-
at least statistical differences.
-
So, this leads us to the fundamental issue:
-
Right now, we're really not in control
-
of some of these systems,
-
and we really need these predictive systems
-
to be controlled by us,
-
in order for them not to be used
-
as a system of control.
-
So there are some technologies that I'd like
-
to point you all to.
-
We need tools in the digital commons
-
that can help address some of these concerns.
-
So, the first thing is that of course
-
we known that minimizing the amount of
-
data available can help in some contexts,
-
which we can do by making systems
-
that are private by design, and by default.
-
Another thing is that these audit tools
-
might be useful.
-
And, so, these 2 nice examples in academia...
-
the ad experiment that I just showed was done
-
using AdFisher.
-
So, these are 2 toolkits that you can use
-
to start doing this sort of auditing.
-
Another technology that is generally useful,
-
but particularly in the case of prediction
-
it's useful to maintain access to
-
as many sites as possible,
-
through anonymity systems like Tor,
-
because it's impossible to personalize
-
when everyone looks the same.
-
So this is a very important technology.
-
Something that doesn't really exist,
-
but that I think is pretty important,
-
is having some tool to view the landscape.
-
So, as we know from these few studies
-
that have been done,
-
different people are not seeing the internet
-
in the same way.
-
This is one reason why we don't like censorship.
-
But, rich and poor people,
-
from academic research we know that
-
there is widespread price discrimination
on the internet,
-
so rich and poor people see a different view
-
of the Internet,
-
men and women see a different view
-
of the Internet.
-
We wanna know how different people
-
see the same site,
-
and this could be the beginning of
-
a defense system for this sort of
-
manipulation/tampering that I showed earlier.
-
Another interesting approach is obfuscation:
-
injecting noise into the system.
-
So there's an interesting browser extension
-
called Adnauseum, that's for Firefox,
-
which clicks on every single ad you're served,
-
to inject noise.
-
So that's, I think, an interesting approach
-
that people haven't looked at too much.
-
So in terms of policy,
-
Facebook and Google, these internet giants,
-
have billions of users,
-
and sometimes they like to call themselves
-
new public utilities,
-
and if that's the case then
-
it might be necessary to subject them
-
to additional regulation.
-
Another problem that's come up,
-
for example with some of the studies
-
that Facebook has done,
-
is sometimes a lack of ethics review.
-
So, for example, in academia,
-
if you're gonna do research involving humans,
-
there's an Institutional Review Board
-
that you go to that verifies that
-
you're doing things in an ethical manner.
-
And some companies do have internal
-
review processes like this, but it might
-
be important to have an independent
-
ethics board that does this sort of thing.
-
And we really need 3rd-party auditing.
-
So, for example, some companies
-
don't want auditing to be done
-
because of IP concerns,
-
and if that's the concern
-
maybe having a set of people
-
that are not paid by the company
-
to check how some of these systems
-
are being implemented,
-
could help give us confidence that
-
things are being done in a reasonable way.
-
So, in closing,
-
algorithmic decision making is here,
-
and it's barreling forward
at a very fast rate,
-
and we need to figure out what
-
the guide rails should be,
-
and how to install them
-
to handle some of the potential threats.
-
There's a huge amount of power here.
-
We need more openness in these systems.
-
And, right now,
-
with the intelligent systems that do exist,
-
we don't know what's occurring really,
-
and we need to watch carefully
-
where and how these systems are being used.
-
And I think this community has
-
an important role to play in this fight,
-
to study what's being done,
-
to show people what's being done,
-
to raise the debate and advocate,
-
and, where necessary, to resist.
-
Thanks.
-
applause
-
Herald: So, let's have a question and answer.
-
Microphone 2, please.
-
Mic 2: Hi there.
-
Thanks for the talk.
-
Since these pre-crime softwares also
-
arrived here in Germany
-
with the start of the so-called CopWatch system
-
in southern Germany,
and Bavaria and Nuremberg especially,
-
where they try to predict burglary crime
-
using that criminal record
-
geographical analysis, like you explained,
-
leads me to a 2-fold question:
-
first, have you heard of any research
-
that measures the effectiveness
-
of such measures, at all?
-
And, second:
-
What do you think of the game theory
-
if the thieves or the bad guys
-
know the system, and when they
game the system,
-
they will probably win,
-
since one police officer in an interview said
-
this system is used to reduce
-
the personal costs of policing,
-
so they just send the guys
where the red flags are,
-
and the others take the day off.
-
Dr. Helsby: Yup.
-
Um, so, with respect to
-
testing the effectiveness of predictive policing,
-
the companies,
-
some of them do randomized, controlled trials
-
and claim a reduction in policing.
-
The best independent study that I've seen
-
is by this RAND Corporation
-
that did a study in, I think,
-
Shreveport, Louisiana,
-
and in their report they claim
-
that there was no statistically significant
-
difference, they didn't find any reduction.
-
And it was specifically looking at
-
property crime, which I think you mentioned.
-
So, I think right now there's sort of
-
conflicting reports between
-
the independent auditors
and these company claims.
-
So there definitely needs to be more study.
-
And then, the 2nd thing...sorry,
remind me what it was?
-
Mic 2: What about the guys gaming the system?
-
Dr. Helsby: Oh, yeah.
-
I think it's a legitimate concern.
-
Like, if all the outputs
were just immediately public,
-
then, yes, everyone knows the location
-
of all police officers,
-
and I imagine that people would have
-
a problem with that.
-
Yup.
-
Heraldl: Microphone #4, please.
-
Mic 4: Yeah, this is not actually a question,
-
but just a comment.
-
I've enjoyed your talk very much,
-
in particular after watching
-
the talk in Hall 1 earlier in the afternoon.
-
The "Say Hi to Your New Boss", about
-
algorithms that are trained with big data,
-
and finally make decisions.
-
And I think these 2 talks are kind of complementary,
-
and if people are interested in the topic
-
they might want to check out the other talk
-
and watch it later, because these
-
fit very well together.
-
Dr. Helsby: Yeah, it was a great talk.
-
Herald: Microphone #2, please.
-
Mic 2: Um, yeah, you mentioned
-
the need to have some kind of 3rd-party auditing
-
or some kind of way to
-
peek into these algorithms
-
and to see what they're doing,
-
and to see if they're being fair.
-
Can you talk a little bit more about that?
-
Like, going forward,
-
some kind of regulatory structures
-
would probably have to emerge
-
to analyze and to look at
-
these black boxes that are just sort of
-
popping up everywhere and, you know,
-
controlling more and more of the things
-
in our lives, and important decisions.
-
So, just, what kind of discussions
-
are there for that?
-
And what kind of possibility
is there for that?
-
And, I'm sure that companies would be
-
very, very resistant to
-
any kind of attempt to look into
-
algorithms, and to...
-
Dr. Helsby: Yeah, I mean, definitely
-
companies would be very resistant to
-
having people look into their algorithms.
-
So, if you wanna do a very rigorous
-
audit of what's going on
-
then it's probably necessary to have
-
a few people come in
-
and sign NDAs, and then
-
look through the systems.
-
So, that's one way to proceed.
-
But, another way to proceed that--
-
so, these academic researchers have done
-
a few experiments
-
and found some interesting things,
-
and that's sort all the attempts at auditing
-
that we've seen:
-
there was 1 attempt in 2012
for the Obama campaign,
-
but there's really not been any
-
sort of systematic attempt--
-
you know, like, in censorship
-
we see a systematic attempt to
-
do measurement as often as possible,
-
check what's going on,
-
and that itself, you know,
-
can act as an oversight mechanism.
-
But, right now,
-
I think many of these companies
-
realize no one is watching,
-
so there's no real push to have
-
people verify: are you being fair when you
-
implement this system?
-
Because no one's really checking.
-
Mic 2: Do you think that,
-
at some point, it would be like
-
an FDA or SEC, to give some American examples...
-
an actual government regulatory agency
-
that has the power and ability to
-
not just sort of look and try to
-
reverse engineer some of these algorithms,
-
but actually peek in there and make sure
-
that things are fair, because it seems like
-
there's just-- it's so important now
-
that, again, it could be the difference between
-
life and death, between
-
getting a job, not getting a job,
-
being pulled over,
not being pulled over,
-
being racially profiled,
not racially profiled,
-
things like that.
Dr. Helsby: Right.
-
Mic 2: Is it moving in that direction?
-
Or is it way too early for it?
-
Dr. Helsby: I mean, so some people have...
-
someone has called for, like,
-
a Federal Search Commission,
-
or like a Federal Algorithms Commission,
-
that would do this sort of oversight work,
-
but it's in such early stages right now
-
that there's no real push for that.
-
But I think it's a good idea.
-
Herald: And again, #2 please.
-
Mic 2: Thank you again for your talk.
-
I was just curious if you can point
-
to any examples of
-
either current producers or consumers
-
of these algorithmic systems
-
who are actively and publicly trying
-
to do so in a responsible manner
-
by describing what they're trying to do
-
and how they're going about it?
-
Dr. Helsby: So, yeah, there are some companies,
-
for example, like DataKind,
-
that try to deploy algorithmic systems
-
in as responsible a way as possible,
-
for like public policy.
-
Like, I actually also implement systems
-
for public policy in a transparent way.
-
Like, all the code is in GitHub, etc.
-
And it is also the case to give credit to
-
Google, and these giants,
-
they're trying to implement transparency systems
-
that help you understand.
-
This has been done with respect to
-
how your data is being collected,
-
but for example if you go on Amazon.com
-
you can see a recommendation has been made,
-
and that is pretty transparent.
-
You can see "this item
was recommended to me,"
-
so you know that prediction
is being used in this case,
-
and it will say why prediction is being used:
-
because you purchased some item.
-
And Google has a similar thing,
-
if you go to like Google Ad Settings,
-
you can even turn off personalization of ads
-
if you want,
-
and you can also see some of the inferences
-
that have been learned about you.
-
A subset of the inferences that have been
-
learned about you.
-
So, like, what interests...
-
Herald: A question from the internet, please?
-
Signal Angel: Yes, billetQ is asking
-
how do you avoid biases in machine learning?
-
I asume analysis system, for example,
-
could be biased against women and minorities,
-
if used for hiring decisions
based on known data.
-
Dr. Helsby: Yeah, so one thing is to
-
just explicitly check.
-
So, you can check to see how
-
positive outcomes are being distributed
-
among those protected classes.
-
You could also incorporate these sort of
-
fairness constraints in the function
-
that you optimize when you train the system,
-
and so, if you're interested in reading more
-
about this, the 2 papers--
-
let me go to References--
-
there's a good paper called
-
Fairness Through Awareness that describes
-
how to go about doing this,
-
so I recommend this person read that.
-
It's good.
-
Herald: Microphone 2, please.
-
Mic2: Thanks again for your talk.
-
Umm, hello?
-
Okay.
-
Umm, I see of course a problem with
-
all the black boxes that you describe
-
with regards for the crime systems,
-
but when we look at the advertising systems
-
in many cases they are very networked.
-
There are many different systems collaborating
-
and exchanging data via open APIs:
-
RESTful APIs, and various
-
demand-side platforms
and audience-exchange platforms,
-
and everything.
-
So, can that help to at least
-
increase awareness on where targeting, personalization
-
might be happening?
-
I mean, I'm looking at systems like
-
BuiltWith, that surface what kind of
-
JavaScript libraries are used elsewhere.
-
So, is that something that could help
-
at least to give a better awareness
-
and listing all the points where
-
you might be targeted...
-
Dr. Helsby: So, like, with respect to
-
advertising, the fact that
there is behind the scenes
-
this like complicated auction process
-
that's occurring, just makes things
-
a lot more complicated.
-
So, for example, I said briefly
-
that they found that there's this
statistical difference
-
between how men and women are treated,
-
but it doesn't necessarily mean that
-
"Oh, the algorithm is definitely biased."
-
It could be because of this auction process,
-
it could be that women are considered
-
more valuable when it comes to advertising,
-
and so these executive ads are getting
-
outbid by some other ads,
-
and so there's a lot of potential
-
causes for that.
-
So, I think it just makes things
a lot more complicated.
-
I don't know if it helps
with the bias at all.
-
Mic 2: Well, the question was more
-
a direction... can it help to surface
-
and make people aware of that fact?
-
I mean, I can talk to my kids probably,
-
and they will probably understand,
-
but I can't explain that to my grandma,
-
who's also, umm, looking at an iPad.
-
Dr. Helsby: So, the fact that
-
the systems are...
-
I don't know if I understand.
-
Mic 2: OK. I think that the main problem
-
is that we are behind the industry efforts
-
to being targeted at, and many people
-
do know, but a lot more people don't know,
-
and making them aware of the fact
-
that they are a target, in a way,
-
is something that can only be shown
-
by a 3rd party that disposed that data,
-
and make audits in a way--
-
maybe in an automated way.
-
Dr. Helsby: Right.
-
Yeah, I think it certainly
could help with advocacy
-
if that's the point, yeah.
-
Herald: Another question
from the internet, please.
-
Signal Angel: Yes, on IRC they are asking
-
if we know that prediction in some cases
-
provides an influence that cannot be controlled.
-
So, r4v5 would like to know from you
-
if there are some cases or areas where
-
machine learning simply shouldn't go?
-
Dr. Helsby: Umm, so I think...
-
I mean, yes, I think that it is the case
-
that in some cases machine learning
-
might not be appropriate.
-
For example, if you use machine learning
-
to decide who should be searched.
-
I don't think it should be the case that
-
machine learning algorithms should
-
ever be used to determine
-
probable cause, or something like that.
-
So, if it's just one piece of evidence
-
that you consider,
-
and there's human oversight always,
-
maybe it's fine, but
-
we should be very suspicious and hesitant
-
in certain contexts where
-
the ramifications are very serious.
-
Like the No Fly List, and so on.
-
Herald: And #2 again.
-
Mic 2: A second question
-
that just occurred to me, if you don't mind.
-
Umm, until the advent of
-
algorithmic systems,
-
when there've been cases of serious harm
-
that's been resulted in individuals or groups,
-
and it's been demonstrated that
-
it's occurred because of
-
an individual or a system of people
-
being systematically biased, then often
-
one of the actions that's taken is
-
pressure's applied, and then
-
people are required to change,
-
and hopely be held responsible,
-
and then change the way that they do things
-
to try to remove bias from that system.
-
What's the current thinking about
-
how we can go about doing that
-
when the systems that are doing that
-
are algorithmic?
-
Is it just going to be human oversight,
-
and humans are gonna have to be
-
held responsible for the oversight?
-
Dr. Helsby: So, in terms of bias,
-
if we're concerned about bias towards
-
particular types of people,
-
that's something that we can optimize for.
-
So, we can train systems that are unbiased
-
in this way.
-
So that's one way to deal with it.
-
But there's always gonna be errors,
-
so that's sort of a separate issue
-
from the bias, and in the case
-
where there are errors,
-
there must be oversight.
-
So, one way that one could improve
-
the way that this is done
-
is by making sure that you're
-
keeping track of confidence of decisions.
-
So, if you have a low confidence prediction,
-
then maybe a human
should come in and check things.
-
So, that might be one way to proceed.
-
Herald: So, there's no more question.
-
I close this talk now,
-
and thank you very much
-
and a big applause to
-
Jennifer Helsby!
-
roaring applause
-
subtitles created by c3subtitles.de
Join, and help us!