WEBVTT

00:00:01.468 --> 00:00:06.690
So, on April 23 of 2013,

00:00:06.714 --> 00:00:12.228
the Associated Press
put out the following tweet on Twitter.

00:00:12.252 --> 00:00:14.649
It said, "Breaking news:

00:00:14.673 --> 00:00:17.244
Two explosions at the White House

00:00:17.268 --> 00:00:19.601
and Barack Obama has been injured."

00:00:20.212 --> 00:00:25.637
This tweet was retweeted 4,000 times
in less than five minutes,

00:00:25.661 --> 00:00:27.878
and it went viral thereafter.

NOTE Paragraph

00:00:28.760 --> 00:00:33.110
Now, this tweet wasn't real news
put out by the Associated Press.

00:00:33.134 --> 00:00:36.467
In fact it was false news, or fake news,

00:00:36.491 --> 00:00:39.316
that was propagated by Syrian hackers

00:00:39.340 --> 00:00:44.034
that had infiltrated
the Associated Press Twitter handle.

00:00:44.407 --> 00:00:48.296
Their purpose was to disrupt society,
but they disrupted much more.

00:00:48.320 --> 00:00:50.796
Because automated trading algorithms

00:00:50.820 --> 00:00:54.180
immediately seized
on the sentiment on this tweet,

00:00:54.204 --> 00:00:57.172
and began trading based on the potential

00:00:57.196 --> 00:01:00.577
that the president of the United States
had been injured or killed

00:01:00.601 --> 00:01:01.801
in this explosion.

00:01:02.188 --> 00:01:04.180
And as they started tweeting,

00:01:04.204 --> 00:01:07.553
they immediately sent
the stock market crashing,

00:01:07.577 --> 00:01:12.744
wiping out 140 billion dollars
in equity value in a single day.

NOTE Paragraph

00:01:13.062 --> 00:01:17.538
Robert Mueller, special counsel
prosecutor in the United States,

00:01:17.562 --> 00:01:21.454
issued indictments
against three Russian companies

00:01:21.478 --> 00:01:24.097
and 13 Russian individuals

00:01:24.121 --> 00:01:27.288
on a conspiracy to defraud
the United States

00:01:27.312 --> 00:01:31.092
by meddling in the 2016
presidential election.

00:01:31.855 --> 00:01:35.419
And what this indictment tells as a story

00:01:35.443 --> 00:01:38.585
is the story of the Internet
Research Agency,

00:01:38.609 --> 00:01:42.203
the shadowy arm of the Kremlin
on social media.

00:01:42.815 --> 00:01:45.592
During the presidential election alone,

00:01:45.616 --> 00:01:47.505
the Internet Agency's efforts

00:01:47.529 --> 00:01:52.696
reached 126 million people
on Facebook in the United States,

00:01:52.720 --> 00:01:55.997
issued three million individual tweets

00:01:56.021 --> 00:01:59.863
and 43 hours' worth of YouTube content.

00:01:59.887 --> 00:02:01.539
All of which was fake --

00:02:01.563 --> 00:02:07.886
misinformation designed to sow discord
in the US presidential election.

NOTE Paragraph

00:02:08.996 --> 00:02:11.646
A recent study by Oxford University

00:02:11.670 --> 00:02:14.940
showed that in the recent
Swedish elections,

00:02:14.964 --> 00:02:19.339
one third of all of the information
spreading on social media

00:02:19.363 --> 00:02:20.561
about the election

00:02:20.585 --> 00:02:22.672
was fake or misinformation.

NOTE Paragraph

00:02:23.037 --> 00:02:28.115
In addition, these types
of social-media misinformation campaigns

00:02:28.139 --> 00:02:32.290
can spread what has been called
"genocidal propaganda,"

00:02:32.314 --> 00:02:35.425
for instance against
the Rohingya in Burma,

00:02:35.449 --> 00:02:37.752
triggering mob killings in India.

NOTE Paragraph

00:02:37.776 --> 00:02:39.270
We studied fake news

00:02:39.294 --> 00:02:42.513
and began studying it
before it was a popular term.

00:02:43.030 --> 00:02:48.070
And we recently published
the largest-ever longitudinal study

00:02:48.094 --> 00:02:50.380
of the spread of fake news online

00:02:50.404 --> 00:02:53.608
on the cover of "Science"
in March of this year.

00:02:54.523 --> 00:02:58.684
We studied all of the verified
true and false news stories

00:02:58.708 --> 00:03:00.461
that ever spread on Twitter,

00:03:00.485 --> 00:03:04.303
from its inception in 2006 to 2017.

00:03:04.612 --> 00:03:06.926
And when we studied this information,

00:03:06.950 --> 00:03:09.826
we studied verified news stories

00:03:09.850 --> 00:03:13.768
that were verified by six
independent fact-checking organizations.

00:03:13.792 --> 00:03:16.554
So we knew which stories were true

00:03:16.578 --> 00:03:18.704
and which stories were false.

00:03:18.728 --> 00:03:20.601
We can measure their diffusion,

00:03:20.625 --> 00:03:22.276
the speed of their diffusion,

00:03:22.300 --> 00:03:24.395
the depth and breadth of their diffusion,

00:03:24.419 --> 00:03:28.561
how many people become entangled
in this information cascade and so on.

00:03:28.942 --> 00:03:30.426
And what we did in this paper

00:03:30.450 --> 00:03:34.315
was we compared the spread of true news
to the spread of false news.

00:03:34.339 --> 00:03:36.022
And here's what we found.

NOTE Paragraph

00:03:36.046 --> 00:03:40.025
We found that false news
diffused further, faster, deeper

00:03:40.049 --> 00:03:41.855
and more broadly than the truth

00:03:41.879 --> 00:03:44.882
in every category of information
that we studied,

00:03:44.906 --> 00:03:47.405
sometimes by an order of magnitude.

00:03:47.842 --> 00:03:51.366
And in fact, false political news
was the most viral.

00:03:51.390 --> 00:03:54.537
It diffused further, faster,
deeper and more broadly

00:03:54.561 --> 00:03:57.363
than any other type of false news.

00:03:57.387 --> 00:03:58.680
When we saw this,

00:03:58.704 --> 00:04:01.545
we were at once worried but also curious.

00:04:01.569 --> 00:04:02.720
Why?

00:04:02.744 --> 00:04:06.117
Why does false news travel
so much further, faster, deeper

00:04:06.141 --> 00:04:08.005
and more broadly than the truth?

NOTE Paragraph

00:04:08.339 --> 00:04:11.300
The first hypothesis
that we came up with was,

00:04:11.324 --> 00:04:16.116
"Well, maybe people who spread false news
have more followers or follow more people,

00:04:16.140 --> 00:04:17.697
or tweet more often,

00:04:17.721 --> 00:04:21.847
or maybe they're more often 'verified'
users of Twitter, with more credibility,

00:04:21.871 --> 00:04:24.053
or maybe they've been on Twitter longer."

00:04:24.077 --> 00:04:26.375
So we checked each one of these in turn.

00:04:26.691 --> 00:04:29.611
And what we found
was exactly the opposite.

00:04:29.635 --> 00:04:32.071
False-news spreaders had fewer followers,

00:04:32.095 --> 00:04:34.349
followed fewer people, were less active,

00:04:34.373 --> 00:04:35.833
less often "verified"

00:04:35.857 --> 00:04:38.817
and had been on Twitter
for a shorter period of time.

00:04:38.841 --> 00:04:40.030
And yet,

00:04:40.054 --> 00:04:45.087
false news was 70 percent more likely
to be retweeted than the truth,

00:04:45.111 --> 00:04:48.474
controlling for all of these
and many other factors.

NOTE Paragraph

00:04:48.498 --> 00:04:51.188
So we had to come up
with other explanations.

00:04:51.212 --> 00:04:54.679
And we devised what we called
a "novelty hypothesis."

00:04:55.038 --> 00:04:56.998
So if you read the literature,

00:04:57.022 --> 00:05:00.776
it is well known that human attention
is drawn to novelty,

00:05:00.800 --> 00:05:03.319
things that are new in the environment.

00:05:03.343 --> 00:05:05.328
And if you read the sociology literature,

00:05:05.352 --> 00:05:09.652
you know that we like to share
novel information.

00:05:09.676 --> 00:05:13.514
It makes us seem like we have access
to inside information,

00:05:13.538 --> 00:05:17.323
and we gain in status
by spreading this kind of information.

NOTE Paragraph

00:05:17.792 --> 00:05:24.244
So what we did was we measured the novelty
of an incoming true or false tweet,

00:05:24.268 --> 00:05:28.323
compared to the corpus
of what that individual had seen

00:05:28.347 --> 00:05:31.299
in the 60 days prior on Twitter.

00:05:31.323 --> 00:05:33.982
But that wasn't enough,
because we thought to ourselves,

00:05:34.006 --> 00:05:38.214
"Well, maybe false news is more novel
in an information-theoretic sense,

00:05:38.238 --> 00:05:41.496
but maybe people
don't perceive it as more novel."

NOTE Paragraph

00:05:41.849 --> 00:05:45.776
So to understand people's
perceptions of false news,

00:05:45.800 --> 00:05:49.490
we looked at the information
and the sentiment

00:05:49.514 --> 00:05:53.720
contained in the replies
to true and false tweets.

00:05:54.022 --> 00:05:55.228
And what we found

00:05:55.252 --> 00:05:59.466
was that across a bunch
of different measures of sentiment --

00:05:59.490 --> 00:06:02.791
surprise, disgust, fear, sadness,

00:06:02.815 --> 00:06:05.299
anticipation, joy and trust --

00:06:05.323 --> 00:06:11.180
false news exhibited significantly more
surprise and disgust

00:06:11.204 --> 00:06:14.010
in the replies to false tweets.

00:06:14.392 --> 00:06:18.181
And true news exhibited
significantly more anticipation,

00:06:18.205 --> 00:06:19.752
joy and trust

00:06:19.776 --> 00:06:22.323
in reply to true tweets.

00:06:22.347 --> 00:06:26.133
The surprise corroborates
our novelty hypothesis.

00:06:26.157 --> 00:06:30.766
This is new and surprising,
and so we're more likely to share it.

NOTE Paragraph

00:06:31.092 --> 00:06:34.017
At the same time,
there was congressional testimony

00:06:34.041 --> 00:06:37.077
in front of both houses of Congress
in the United States,

00:06:37.101 --> 00:06:40.839
looking at the role of bots
in the spread of misinformation.

00:06:40.863 --> 00:06:42.217
So we looked at this too --

00:06:42.241 --> 00:06:45.839
we used multiple sophisticated
bot-detection algorithms

00:06:45.863 --> 00:06:48.937
to find the bots in our data
and to pull them out.

00:06:49.347 --> 00:06:52.006
So we pulled them out,
we put them back in

00:06:52.030 --> 00:06:55.149
and we compared what happens
to our measurement.

00:06:55.173 --> 00:06:57.466
And what we found was that, yes indeed,

00:06:57.490 --> 00:07:01.172
bots were accelerating
the spread of false news online,

00:07:01.196 --> 00:07:03.847
but they were accelerating
the spread of true news

00:07:03.871 --> 00:07:06.276
at approximately the same rate.

00:07:06.300 --> 00:07:09.158
Which means bots are not responsible

00:07:09.182 --> 00:07:13.895
for the differential diffusion
of truth and falsity online.

00:07:13.919 --> 00:07:16.768
We can't abdicate that responsibility,

00:07:16.792 --> 00:07:21.051
because we, humans,
are responsible for that spread.

NOTE Paragraph

00:07:22.472 --> 00:07:25.806
Now, everything
that I have told you so far,

00:07:25.830 --> 00:07:27.584
unfortunately for all of us,

00:07:27.608 --> 00:07:28.869
is the good news.

NOTE Paragraph

00:07:30.670 --> 00:07:35.120
The reason is because
it's about to get a whole lot worse.

00:07:35.850 --> 00:07:39.532
And two specific technologies
are going to make it worse.

00:07:40.207 --> 00:07:45.379
We are going to see the rise
of a tremendous wave of synthetic media.

00:07:45.403 --> 00:07:51.434
Fake video, fake audio
that is very convincing to the human eye.

00:07:51.458 --> 00:07:54.212
And this will powered by two technologies.

NOTE Paragraph

00:07:54.236 --> 00:07:58.069
The first of these is known
as "generative adversarial networks."

00:07:58.093 --> 00:08:00.656
This is a machine-learning model
with two networks:

00:08:00.680 --> 00:08:02.227
a discriminator,

00:08:02.251 --> 00:08:06.451
whose job it is to determine
whether something is true or false,

00:08:06.475 --> 00:08:07.642
and a generator,

00:08:07.666 --> 00:08:10.816
whose job it is to generate
synthetic media.

00:08:10.840 --> 00:08:15.942
So the synthetic generator
generates synthetic video or audio,

00:08:15.966 --> 00:08:20.641
and the discriminator tries to tell,
"Is this real or is this fake?"

00:08:20.665 --> 00:08:23.539
And in fact, it is the job
of the generator

00:08:23.563 --> 00:08:27.998
to maximize the likelihood
that it will fool the discriminator

00:08:28.022 --> 00:08:31.609
into thinking the synthetic
video and audio that it is creating

00:08:31.633 --> 00:08:33.363
is actually true.

00:08:33.387 --> 00:08:35.760
Imagine a machine in a hyperloop,

00:08:35.784 --> 00:08:38.587
trying to get better
and better at fooling us.

NOTE Paragraph

00:08:39.114 --> 00:08:41.614
This, combined with the second technology,

00:08:41.638 --> 00:08:47.360
which is essentially the democratization
of artificial intelligence to the people,

00:08:47.384 --> 00:08:49.573
the ability for anyone,

00:08:49.597 --> 00:08:52.427
without any background
in artificial intelligence

00:08:52.451 --> 00:08:53.633
or machine learning,

00:08:53.657 --> 00:08:57.760
to deploy these kinds of algorithms
to generate synthetic media

00:08:57.784 --> 00:09:02.331
makes it ultimately so much easier
to create videos.

NOTE Paragraph

00:09:02.355 --> 00:09:06.776
The White House issued
a false, doctored video

00:09:06.800 --> 00:09:11.088
of a journalist interacting with an intern
who was trying to take his microphone.

00:09:11.427 --> 00:09:13.426
They removed frames from this video

00:09:13.450 --> 00:09:16.737
in order to make his actions
seem more punchy.

00:09:17.157 --> 00:09:20.542
And when videographers
and stuntmen and women

00:09:20.566 --> 00:09:22.993
were interviewed
about this type of technique,

00:09:23.017 --> 00:09:26.845
they said, "Yes, we use this
in the movies all the time

00:09:26.869 --> 00:09:31.632
to make our punches and kicks
look more choppy and more aggressive."

00:09:32.268 --> 00:09:34.135
They then put out this video

00:09:34.159 --> 00:09:36.659
and partly used it as justification

00:09:36.683 --> 00:09:40.682
to revoke Jim Acosta,
the reporter's, press pass

00:09:40.706 --> 00:09:42.045
from the White House.

00:09:42.069 --> 00:09:46.878
And CNN had to sue
to have that press pass reinstated.

NOTE Paragraph

00:09:48.538 --> 00:09:54.141
There are about five different paths
that I can think of that we can follow

00:09:54.165 --> 00:09:57.904
to try and address some
of these very difficult problems today.

00:09:58.379 --> 00:10:00.189
Each one of them has promise,

00:10:00.213 --> 00:10:03.212
but each one of them
has its own challenges.

00:10:03.236 --> 00:10:05.244
The first one is labeling.

00:10:05.268 --> 00:10:06.625
Think about it this way:

00:10:06.649 --> 00:10:10.260
when you go to the grocery store
to buy food to consume,

00:10:10.284 --> 00:10:12.188
it's extensively labeled.

00:10:12.212 --> 00:10:14.204
You know how many calories it has,

00:10:14.228 --> 00:10:16.029
how much fat it contains --

00:10:16.053 --> 00:10:20.331
and yet when we consume information,
we have no labels whatsoever.

00:10:20.355 --> 00:10:22.283
What is contained in this information?

00:10:22.307 --> 00:10:23.760
Is the source credible?

00:10:23.784 --> 00:10:26.101
Where is this information gathered from?

00:10:26.125 --> 00:10:27.950
We have none of that information

00:10:27.974 --> 00:10:30.077
when we are consuming information.

00:10:30.101 --> 00:10:33.339
That is a potential avenue,
but it comes with its challenges.

00:10:33.363 --> 00:10:39.814
For instance, who gets to decide,
in society, what's true and what's false?

00:10:40.387 --> 00:10:42.029
Is it the governments?

00:10:42.053 --> 00:10:43.203
Is it Facebook?

00:10:43.601 --> 00:10:47.363
Is it an independent
consortium of fact-checkers?

00:10:47.387 --> 00:10:49.853
And who's checking the fact-checkers?

NOTE Paragraph

00:10:50.427 --> 00:10:53.511
Another potential avenue is incentives.

00:10:53.535 --> 00:10:56.169
We know that during
the US presidential election

00:10:56.193 --> 00:10:59.883
there was a wave of misinformation
that came from Macedonia

00:10:59.907 --> 00:11:02.244
that didn't have any political motive

00:11:02.268 --> 00:11:04.728
but instead had an economic motive.

00:11:04.752 --> 00:11:06.900
And this economic motive existed,

00:11:06.924 --> 00:11:10.448
because false news travels
so much farther, faster

00:11:10.472 --> 00:11:12.482
and more deeply than the truth,

00:11:12.506 --> 00:11:17.466
and you can earn advertising dollars
as you garner eyeballs and attention

00:11:17.490 --> 00:11:19.450
with this type of information.

00:11:19.474 --> 00:11:23.307
But if we can depress the spread
of this information,

00:11:23.331 --> 00:11:26.228
perhaps it would reduce
the economic incentive

00:11:26.252 --> 00:11:28.942
to produce it at all in the first place.

NOTE Paragraph

00:11:28.966 --> 00:11:31.466
Third, we can think about regulation,

00:11:31.490 --> 00:11:33.815
and certainly, we should think
about this option.

00:11:33.839 --> 00:11:35.450
In the United States, currently,

00:11:35.474 --> 00:11:40.322
we are exploring what might happen
if Facebook and others are regulated.

00:11:40.346 --> 00:11:44.147
While we should consider things
like regulating political speech,

00:11:44.171 --> 00:11:46.679
labeling the fact
that it's political speech,

00:11:46.703 --> 00:11:50.522
making sure foreign actors
can't fund political speech,

00:11:50.546 --> 00:11:53.093
it also has its own dangers.

00:11:53.522 --> 00:11:58.400
For instance, Malaysia just instituted
a six-year prison sentence

00:11:58.424 --> 00:12:01.158
for anyone found spreading misinformation.

00:12:01.696 --> 00:12:03.775
And in authoritarian regimes,

00:12:03.799 --> 00:12:08.465
these kinds of policies can be used
to suppress minority opinions

00:12:08.489 --> 00:12:11.997
and to continue to extend repression.

NOTE Paragraph

00:12:12.680 --> 00:12:16.223
The fourth possible option
is transparency.

00:12:16.843 --> 00:12:20.557
We want to know
how do Facebook's algorithms work.

00:12:20.581 --> 00:12:23.461
How does the data
combine with the algorithms

00:12:23.485 --> 00:12:26.323
to produce the outcomes that we see?

00:12:26.347 --> 00:12:28.696
We want them to open the kimono

00:12:28.720 --> 00:12:32.934
and show us exactly the inner workings
of how Facebook is working.

00:12:32.958 --> 00:12:35.737
And if we want to know
social media's effect on society,

00:12:35.761 --> 00:12:37.847
we need scientists, researchers

00:12:37.871 --> 00:12:41.014
and others to have access
to this kind of information.

00:12:41.038 --> 00:12:42.585
But at the same time,

00:12:42.609 --> 00:12:46.410
we are asking Facebook
to lock everything down,

00:12:46.434 --> 00:12:48.607
to keep all of the data secure.

NOTE Paragraph

00:12:48.631 --> 00:12:51.790
So, Facebook and the other
social media platforms

00:12:51.814 --> 00:12:54.948
are facing what I call
a transparency paradox.

00:12:55.266 --> 00:12:57.940
We are asking them, at the same time,

00:12:57.964 --> 00:13:02.773
to be open and transparent
and, simultaneously secure.

00:13:02.797 --> 00:13:05.488
This is a very difficult needle to thread,

00:13:05.512 --> 00:13:07.425
but they will need to thread this needle

00:13:07.449 --> 00:13:11.236
if we are to achieve the promise
of social technologies

00:13:11.260 --> 00:13:12.902
while avoiding their peril.

NOTE Paragraph

00:13:12.926 --> 00:13:17.617
The final thing that we could think about
is algorithms and machine learning.

00:13:17.641 --> 00:13:22.918
Technology devised to root out
and understand fake news, how it spreads,

00:13:22.942 --> 00:13:25.273
and to try and dampen its flow.

00:13:25.824 --> 00:13:28.721
Humans have to be in the loop
of this technology,

00:13:28.745 --> 00:13:31.023
because we can never escape

00:13:31.047 --> 00:13:35.085
that underlying any technological
solution or approach

00:13:35.109 --> 00:13:39.156
is a fundamental ethical
and philosophical question

00:13:39.180 --> 00:13:42.450
about how do we define truth and falsity,

00:13:42.474 --> 00:13:45.654
to whom do we give the power
to define truth and falsity

00:13:45.678 --> 00:13:48.138
and which opinions are legitimate,

00:13:48.162 --> 00:13:51.868
which type of speech
should be allowed and so on.

00:13:51.892 --> 00:13:54.220
Technology is not a solution for that.

00:13:54.244 --> 00:13:57.942
Ethics and philosophy
is a solution for that.

NOTE Paragraph

00:13:58.950 --> 00:14:02.268
Nearly every theory
of human decision making,

00:14:02.292 --> 00:14:05.053
human cooperation and human coordination

00:14:05.077 --> 00:14:08.751
has some sense of the truth at its core.

00:14:09.347 --> 00:14:11.403
But with the rise of fake news,

00:14:11.427 --> 00:14:12.870
the rise of fake video,

00:14:12.894 --> 00:14:14.776
the rise of fake audio,

00:14:14.800 --> 00:14:18.724
we are teetering on the brink
of the end of reality,

00:14:18.748 --> 00:14:22.637
where we cannot tell
what is real from what is fake.

00:14:22.661 --> 00:14:25.700
And that's potentially
incredibly dangerous.

NOTE Paragraph

00:14:26.931 --> 00:14:30.879
We have to be vigilant
in defending the truth

00:14:30.903 --> 00:14:32.437
against misinformation.

00:14:32.919 --> 00:14:36.355
With our technologies, with our policies

00:14:36.379 --> 00:14:38.299
and, perhaps most importantly,

00:14:38.323 --> 00:14:41.537
with our own individual responsibilities,

00:14:41.561 --> 00:14:45.116
decisions, behaviors and actions.

NOTE Paragraph

00:14:45.553 --> 00:14:46.990
Thank you very much.

NOTE Paragraph

00:14:47.014 --> 00:14:50.531
(Applause)