35C3 - Explaining Online US Political Advertising

Edit subtitles

0:00 - 0:19

35c3 preroll music
0:19 - 0:24

Herald: This talk will be held by Damon
McCoy. He will be explaining online U.S.
0:24 - 0:28

political advertising and he has been
working with researching like how
0:28 - 0:34

different online communities basically
behave around many different topics. But
0:34 - 0:37

this is what he's going to talk about
today so please give him a great round of
0:37 - 0:47

applause.
Applause
0:47 - 0:51

Damon McCoy: Thank you everyione for coming. I'm
up here speaking and I'm the only one that
0:51 - 0:55

wanted to fly to Germany over Christmas
and New Year's. However there were three
0:55 - 1:00

real people that were really key in
helping out with this research and before
1:00 - 1:05

we get started, I just want to credit
them. One is my grad student Laura
1:05 - 1:08

Adelson. She did a lot of the analysis
that you're going to be seeing generated
1:08 - 1:15

all the graphs. One of the undergraduate
students that's from our NYU Shanghai
1:15 - 1:20

campus secure did a lot of the work to
collect all the data that you're going to
1:20 - 1:25

see here. And then Raytown who is a
professor at NYU and the Shanghai campus
1:25 - 1:31

also helped out with kind of our initial
efforts of collecting some of this data.
1:31 - 1:35

And so before we get started I guess I'll
give a little bit of an introduction about
1:35 - 1:40

myself. I'm a professor at NYU tenant
school of engineering. As was mentioned
1:40 - 1:45

before I do a lot of stuff kind of looking
at how technology kind of impacts the
1:45 - 1:51

security and privacy of you know society
groups of people and things like them. So
1:51 - 1:57

this was really kind of an opportunistic
project that kind of captured the impact
1:57 - 2:07

of online advertising in the political
sphere of U.S. campaigns. And also quick
2:07 - 2:13

plug. So everything that I'm going to be
showing you most of the data scripts and
2:13 - 2:17

things like that we've put in a github
that's accessible by anyone that wants to
2:17 - 2:23

analyze the data or look at our scripts
and improve them or things like that.
2:23 - 2:31

Applause Thank you. This is the first
time that I've given this talk outside of
2:31 - 2:36

the U.S. so let me just start with some
quick explanation as to how U.S. elections
2:36 - 2:41

work for those of you that might not know
about this. So every two years in the U.S.
2:41 - 2:46

we hold federal elections. These are
elections - right - that impact all of the
2:46 - 2:53

states within the U.S. And so every four
years we have an election for president,
2:53 - 2:59

2018 were our last elections. This was not
a presidential election year so the
2:59 - 3:05

elections were for the Senate and the
House seats at the federal level. And then
3:05 - 3:10

we also had elections for state and local
positions as well with in here. And some
3:10 - 3:15

of them that are captured in our data
especially our Facebook data not so much
3:15 - 3:21

in our Twitter and Google add transparency
data that we have here. So this will be
3:21 - 3:27

focused this talk on the 2008 elections
that happened on November 7th. Election
3:27 - 3:35

day is always the first Tuesday in
November in the U.S. every two years. So
3:35 - 3:38

to begin with in the background right
probably some of you know about this, some
3:38 - 3:42

of you might not know about this, but in
the 2016 elections which were a
3:42 - 3:47

presidential election year there was right
this election interference that happened
3:47 - 3:53

here. And so Facebook has released these
ads. These ads were paid for by a Russian
3:53 - 4:01

company the Internet Research Agency that
ran these ads. And Facebook released these
4:01 - 4:05

to - right - the Senate and then the
Senate re-released these publicly to
4:05 - 4:10

people. And so this is an ad basically
trying to disenfranchise people from
4:10 - 4:15

voting in the elections. And you can see
right it's targeted at people in the U.S.
4:15 - 4:20

of a certain age range and interests like
Martin Luther King, African-American
4:20 - 4:24

culture, African-American civil rights.
Facebook doesn't actually allow you to
4:24 - 4:30

directly target to people based on like
their ethnicity. So this is a pretty good
4:30 - 4:34

proxy though. If you target these kinds of
interests for this, so this would probably
4:34 - 4:38

be fairly effective at targeting African-
American people within the U.S. for this
4:38 - 4:44

ad to try and disenfranchise them from
voting in the elections. There were other
4:44 - 4:50

ads , right, that tried to do
misinformation distance information kinds
4:50 - 4:55

of campaigns. So this was an ad that was
again paid for by the Russian agency that
4:55 - 5:01

was trying to perpetuate this rumor
basically unsubstantiated rumor that Bill
5:01 - 5:05

Clinton has this ill legitimate child
within here. And again, right, the
5:05 - 5:09

targeting information is targeted at
African-Americans within the U.S. and
5:09 - 5:15

African-Americans are kind of a key voting
block for a kind of more liberal,
5:15 - 5:20

democratic people within the U.S.
oftentimes. That's probably the other
5:20 - 5:24

thing that should have explained about the
U.S. election system especially at the
5:24 - 5:29

federal level is that we effectively have
two parties that are you know when any
5:29 - 5:34

meaningful amount of elections within the
U.S. and one of them is the Democratic
5:34 - 5:39

party that tends to skew more liberal. So
they're more kind of right for bigger
5:39 - 5:43

government, more social services and then
we have the Republicans which skew kind of
5:43 - 5:48

more conservative wanting kind of a
smaller government providing less services
5:48 - 5:55

and kind of less regulation around things
as well. And so right these are two
5:55 - 6:02

examples but there were a whole bunch of
these ads that were shown on Facebook
6:02 - 6:05

within a year. So right. Pretty much all
of these that they're tried to
6:05 - 6:11

disenfranchise people or tried to kind of
create chaos, kind of polarize people
6:11 - 6:17

around the election oftentimes with kind
of disinformation sorts of that things.
6:17 - 6:25

And so in 2017 our Office of the Director
of National Intelligence put out a report
6:25 - 6:29

- sorry for the big block of text, this
will be the only big box of text in here.
6:29 - 6:34

But I thought it was kind of important to
show this because they they pretty much
6:34 - 6:41

unequivocally state that Russia tried to
interfere in the US elections and that
6:41 - 6:46

Vladimir Putin was somehow involved within
this interference. And so this is - this
6:46 - 6:53

is pretty much as far as the National
Security Agency, the CIA, NSA pretty much
6:53 - 6:58

solid evidence that they have - that this
occurred within here. And so the other
6:58 - 7:03

thing that broke was right the Cambridge
Analytica scandal as well broke within
7:03 - 7:08

Facebook where there was this straight
third party advertising agency that
7:08 - 7:13

collected you know a whole bunch of data
on 80 million profiles within Facebook and
7:13 - 7:17

then tried to create psychological
profiles for targeting and messaging and
7:17 - 7:22

things like that around here. And so these
two particular scandals broke within here
7:22 - 7:29

and the first result of this is we have
Mark Zuckerberg in a suit a real suit not
7:29 - 7:37

a hoodie suit. Testifying right in front
of our Senate within a year. And so he,
7:37 - 7:42

right, he testified before House and
Senate committees about the abuses
7:42 - 7:50

occurring within Facebook. And he did this
on April 10th and 11th 2018 within here,
7:50 - 7:56

and right. So this is right. In here, he
did admit that Facebook had made mistakes
7:56 - 8:04

and that they they need to improve things
moving forward within their platform. The
8:04 - 8:11

most kind of tangible outcome from these
testimonies were these transparency
8:11 - 8:20

archives that began to appear. And here is
a view of what Facebook's add transparency
8:20 - 8:25

archive looks at. When it originally
deployed you needed a Facebook account to
8:25 - 8:30

interact with it. Now Facebook has dropped
the requirement so anyone with Internet
8:30 - 8:37

access, unless you're censored somehow,
can go to this archive and access these
8:37 - 8:42

ads. So the user facing portal for these
add transparency archives you type in
8:42 - 8:48

keywords and then it basically does is
write a pattern matching on the ad text
8:48 - 8:54

and other parts of the ad and then returns
the ads that matches that within here. And
8:54 - 8:59

so then you can see all the political ads
that match that particular term within
8:59 - 9:07

here. Facebook began archiving these ads
kind of at a large scale starting on May
9:07 - 9:16

7th 2018 by Election Day November 7th
2018. There were 1.6 million ads paid for
9:16 - 9:22

by over 85 thousand advertisers within
Facebook's platform. Facebook is actually
9:22 - 9:27

fairly broad as to what they included
within their political archive. They
9:27 - 9:33

included any ads related to US elections,
either federal state or local elections.
9:33 - 9:37

They also included these very important
kind of issue ads as we saw when we looked
9:37 - 9:41

at the Russian interference a lot of times
the ads didn't mention actual political
9:41 - 9:46

candidates, they mentioned kind of
polarizing issues within the U.S. So
9:46 - 9:51

Facebook also included these ads of
political national importance. They had a
9:51 - 9:56

list of I think about 13 different
criteria is the last of them being values.
9:56 - 10:00

And so it was a fairly encompassing set of
ads they tried to include within their
10:00 - 10:08

archive. Along with the text and images
or videos of the ad, they also included
10:08 - 10:15

basically ranges of geographic impressions
and demographic impressions. So right.
10:15 - 10:22

State level impression information in some
kind of ranges and demographic by gender
10:22 - 10:29

and by age kind of bucketed within here.
And they did this again for impressions
10:29 - 10:34

and then for they also included some spend
information again kind of in ranges so
10:34 - 10:41

they gave ranges of 0 to 99 dollars, a
hundred dollars to say like 500 dollars,
10:41 - 10:48

501 dollars, 2000 dollars and so on. And
so forth within these buckets one of the
10:48 - 10:52

key pieces of information that they did
not release was the targeting information.
10:52 - 10:55

So like I showed you before of those ads
they - right - they have that targeting
10:55 - 11:02

information, Facebook does not release
that within their transparency archive.
11:02 - 11:04

They have this right. They have right-
thay had that user portal where you could
11:04 - 11:10

do the keyword search from within there.
However right. I'm I like to do large
11:10 - 11:16

scale data analysis and so I wanted to
basically try and collect all of the ads
11:16 - 11:21

within this web portal. And so initially
all they had was this keyword search
11:21 - 11:25

portal within here. And so what we did is
we compiled kind of a large list of what
11:25 - 11:31

we thought were reasonables of keywords,
names of prominent politicians, names of
11:31 - 11:36

states issues within here. And so we tried
to compile this long list of keyword
11:36 - 11:41

searches and we began scraping the
reporter within here and I'll tell you the
11:41 - 11:48

story of how our scraping efforts went.
Now currently they are so off for a API
11:48 - 11:52

it's still keyword based their API and
it's restricted by an NDA so I'll kind of
11:52 - 11:58

flesh out the story of how this goes. So
at the beginning they, they released this
11:58 - 12:03

kind of towards the end of May the user
archive and I played with it and I
12:03 - 12:09

realized that this didn't lend itself well
to kind of large scale analysis of these
12:09 - 12:16

ads and so on. I went to my students
Secur and Laura and Secur worked kind
12:16 - 12:22

of furiously night and day and within
three days he had a workable scraper that
12:22 - 12:26

was able to put in our keywords and then
we were able to scrape all the results
12:26 - 12:33

from our keyword within here. And so we
ran the scraper for about 2 months and
12:33 - 12:38

then we released a report. Just kind of a
very general statistical report and we
12:38 - 12:45

released the data in our github archive at
that. After that about 2 weeks later
12:45 - 12:54

Facebook began anti-scraping measures
within here. And so, right, this kind of
12:54 - 13:00

hampered our efforts to scrape Facebook's
archive. At this point. I'm - I don't want
13:00 - 13:04

to attribute any malice. I don't believe
that Facebook was targeting just our
13:04 - 13:08

scraping efforts they were targeting
everyones scraping efforts. The
13:08 - 13:13

transparency whether it's wrong or right
to block people from collecting data on a
13:13 - 13:17

transparency archive I might kind of
quibble with them on that and say they
13:17 - 13:22

might once provide better access to the
data within their transparency archive.
13:22 - 13:27

But this was the choice that Facebook made
to kind of clamp down on the scraping
13:27 - 13:32

within here. So we tried to fight with
them a little bit to - right kind of a can
13:32 - 13:36

mouse game. You know we make some changes
to our scraper to avoid their anti
13:36 - 13:42

scraping. They do some things on their end
to block our scraper and probably other
13:42 - 13:48

people scrapers that are doing similar
things to us as well within here. And so
13:48 - 13:57

this persisted for probably about 2 weeks,
and then Facebook basically deployed their
13:57 - 14:02

API within here. However they said right,
their API is very limited and still in
14:02 - 14:07

beta at this point. So these were part of
the terms and conditions from here. One of
14:07 - 14:13

the ones that I found kind of the most
unease (?) is that it limited it only to
14:13 - 14:18

U.S. people so we could essentially only
very closely work with U.S. people within
14:18 - 14:23

here and at least it did kind of - it
limited the types of people that we could
14:23 - 14:28

work with in here. And so right
unfortunately this kind of ruled us out
14:28 - 14:32

from working closely with journalists from
you know really good news organizations
14:32 - 14:37

like the Guardian and so like that just
happened to have the misfortune of being
14:37 - 14:46

located somewhere outside of the U.S.
within here. Maybe the good fortune, yes.
14:46 - 14:51

And then the list of restrictions
continue. They also placed the data
14:51 - 14:57

retention on it so we could only retain
the data for one year. Again placing data
14:57 - 15:02

retention. So Facebook's data retention on
their archive is 7 years within here but
15:02 - 15:08

they're placing a 1 year data retention on
the data that we collect from their NDA.
15:08 - 15:14

I'd like to say that - right - we - right
- I got this NDA and I lit it on fire, I
15:14 - 15:23

tore it up and we continued to scrape the
archive. Within a year. No unfortunately
15:23 - 15:30

it was a hard call to make but right you
know there's basically two students and we
15:30 - 15:34

basically had to make a call whether we
wanted the data to analyze or whether we
15:34 - 15:38

wanted to spend all of our time kind of
faded - fighting with Facebook's anti-
15:38 - 15:46

scraping efforts. And so in the end we did
- I did in fact agree to their NDA within
15:46 - 15:51

a year. So the initial data we scraped, we
release were still scraping a small amount
15:51 - 15:57

of data that we do release as well from
here. But unfortunately at this point any
15:57 - 16:01

of the data that we collected from the NDA
we cannot release within here. If anyone
16:01 - 16:07

doesn't want to fight with Facebook and
resurrect the crawler within here I would
16:07 - 16:12

be more than happy for that to happen
within a year. Unfortunately given our
16:12 - 16:17

engineering constraints it just simply
wasn't feasible for us to do that within a
16:17 - 16:24

year. And so the story is a little bit
different with Google. So Google's archive
16:24 - 16:34

they began archiving ads on May 31, 2018.
By election day they had 45.000 and from
16:34 - 16:39

600 advertisers. Their criteria for
introducing advertising was much more
16:39 - 16:44

narrow than Facebook's, so they only
released ads related to U.S. federal
16:44 - 16:50

candidates and federal office holders
within here. So it is a much more limited
16:50 - 16:55

set of data that Google released within a
year. None of the issue ads that Facebook
16:55 - 17:03

released. They didn't release any of the
geographic or demographic data by
17:03 - 17:09

impression, they did release ranges of
impressions and ranges of spend data, and
17:09 - 17:15

they did release some limited targeting
data from here so they released geographic
17:15 - 17:20

and demographic targeting information
which Facebook hadn't released in their
17:20 - 17:25

ads. And their data is available through a
similar keyword based portal. But they
17:25 - 17:31

also make it available through just a
database, if you want to within here. So
17:31 - 17:35

this is what their portal looks like
within here. And this is - right - their
17:35 - 17:41

big table, sorry, their big query database
that they released from here. And so they
17:41 - 17:46

updated every week within here and you can
download it and analyze the data
17:46 - 17:53

relatively easily within here. So the last
one to kind of implement their archive was
17:53 - 18:01

Twitter. Twitter began archiving ads on
June 27, 2018. The scale of ads and
18:01 - 18:07

Twitter is very small compared to the rest
of them. The scale of their ad network in
18:07 - 18:13

general is much smaller than Google and
Facebook's. And what they included, it was
18:13 - 18:18

similar to what Google included in terms
of - right - only federal candidates
18:18 - 18:22

within here. Kind of closer to the
election, they also said that they were
18:22 - 18:28

going to release political issue ads.
However, the mechanism of enforcement
18:28 - 18:33

doesn't appear to exist within Twitter's
system. There doesn't appear to be anyones
18:33 - 18:39

job it is to actually enforce transparency
of ads from here. So we've been kind of
18:39 - 18:43

manually finding accounts and reporting
them to Twitter within here and then when
18:43 - 18:47

we manually report them to Twitter,
Twitter then includes them and future
18:47 - 18:52

transparency kinds of efforts within here.
But it appears like we're basically the
18:52 - 18:57

ones [Damon McCoy] short laughter it's
become our job to monitor the Twitter
18:57 - 19:01

accounts and then notify Twitter and then
they'll manually kind of deal with it.
19:01 - 19:04

Unfortunately, they still don't appear to
have a person that actually manages this
19:04 - 19:10

process internal to Twitter at this point.
Twitter does however release the most
19:10 - 19:16

information. So they release exact data
not the range data on impressions and
19:16 - 19:21

spend information, also by geographic and
demographics and they also include all of
19:21 - 19:28

the targeted information as far as we can
tell and their data is available through
19:28 - 19:32

without an account. Basically through
their portal and we've been scraping them
19:32 - 19:35

and there's been no problems they haven't
blocked us at this point. So we just
19:35 - 19:40

simply scraped their data and then we
republish it to github at that point. And
19:40 - 19:46

we've had no problems with Twitter in this
way in the scale, their data is so small
19:46 - 19:51

that it's been relatively easy to keep
pace with it at this point. And here's
19:51 - 19:57

just a picture of the Twitter transparency
archive and again this have a list of all
19:57 - 20:00

the Twitter accounts that they include in
their transparency archive. So we can
20:00 - 20:04

monitor this and then we can monitor other
people that we know that are politically
20:04 - 20:08

active when we see them doing paid
advertising then we can notify Twitter and
20:08 - 20:13

then Twitter will include them in their
transparency archive normally within like
20:13 - 20:19

a week, or so of here. And so this is,
this is kind of the background that you
20:19 - 20:25

need to understand the transparency
archives. So now we have a data set that
20:25 - 20:32

we can begin to analyze within here. For
Facebook since it was the keyword driven
20:32 - 20:39

thing at the beginning and it still is, we
were able to collect about 80% of the ads
20:39 - 20:43

in Twitter's database from there. The
other problem with the API is that it is
20:43 - 20:51

severely rate-limited at this point. I'm
talking about 3 to 4 queries per minute
20:51 - 20:58

that we can get through Facebook's API at
this point. And so we kind of did our best
20:58 - 21:03

effort to collect as much data as we could
from Facebook. About two weeks before the
21:03 - 21:08

election, Facebook began releasing a
transparency archive that included
21:08 - 21:12

basically an aggregated list of all the
advertisers and how many ads they have and
21:12 - 21:17

how much spent and this is how we can tell
that we got about 80% of the ads from
21:17 - 21:22

Facebook's archive based on this within
here. And the nice thing about the
21:22 - 21:26

transparency report is that we could go
back and now that we know we're missing we
21:26 - 21:33

could readjust our usage of the API and so
now we have virtually 100% coverage of
21:33 - 21:40

Facebook going forward within here.
Twitter - right - we could collect 100% of
21:40 - 21:46

their data. And again we've republished
the SOL (?) in an easier to process kind
21:46 - 21:50

of form. Google again - right - we have a
100% of their ads because they're all in
21:50 - 21:55

the big query database. However when we
started analyzing the data we noticed that
21:55 - 22:00

for a lot of the ads we're missing the
actual content, the images and text of the
22:00 - 22:07

ad. It turns out that for Google's ad
network if the ad was originally purchased
22:07 - 22:11

through a third party advertiser and then
run on one of Google's properties the
22:11 - 22:16

content of the ad won't be archived within
your system. This is unfortunately a big
22:16 - 22:21

loophole. So - right - if you're if you're
running a kind of malicious misinformation
22:21 - 22:26

thing, you can easily unfortunately
circumvent Google's archive at least from
22:26 - 22:31

archiving your content by simply just
paying for it by a third party within
22:31 - 22:35

here. It's unclear whether this is a
policy limitation or whether this is the
22:35 - 22:40

technical limitation on Google's part, but
the outcome is that we only have the
22:40 - 22:46

content for about 70% of Google's ads that
were paid for directly on Google's
22:46 - 22:52

platform and within here. So one of the
first things that we want to do is kind of
22:52 - 22:57

add some semantic meaning to these ads a
kind of large scale. And so we played
22:57 - 23:02

around with a few techniques, some fancy
kinds of natural language processing and
23:02 - 23:06

things like that. But we found that
there's actually a really fairly simple
23:06 - 23:12

and effective way of categorizing kind of
the intent of the ad, and that's that most
23:12 - 23:17

of these ads have a URL of some kind and
a lot easier or else just point back to
23:17 - 23:20

like third party services like if you're
holding some kind of event you're going to
23:20 - 23:24

coordinate it with like Everbright or
something that if you're seeking
23:24 - 23:28

donations, if you're a Democrat you're
going to use this third party Paron
23:28 - 23:31

processor they're called Act Blue, if
you're Republican there's like two or
23:31 - 23:36

three payment processors that you're going
to use for this. So we could simply just
23:36 - 23:40

look at these really prominent URLs that
occur a lot of times and just kind of
23:40 - 23:46

manually tag what is the purpose for this.
And by doing this we can tag ads as either
23:46 - 23:51

just purely informational that they wanted
just kind of get some kind of message
23:51 - 23:56

about the candidate either positive or
negative out their connection ads that are
23:56 - 24:01

seeking contact information like people's
e-mail addresses, phone numbers, names and
24:01 - 24:05

things like that. Presumably so they can -
you know - either get them to volunteer or
24:05 - 24:10

donate money in the future for the
campaign. There's move ads that are either
24:10 - 24:15

they're trying to get people to vote or to
attend some kind of rally or to volunteer
24:15 - 24:19

or something like that and then right
there's donation ads. And then finally
24:19 - 24:24

there's kind of commercial ads. These are
things either they are selling products
24:24 - 24:33

that are kind of directly critical nature
like a bobble head of some candidate or
24:33 - 24:39

they might be like solar panels which have
tax credits in the U.S. and things like
24:39 - 24:43

that. So there's some kind of commercial
good that's linked somehow to some
24:43 - 24:50

political messaging within here. So we use
this method and we were able to categorize
24:50 - 24:56

about 70% of the ads, we took a random
sample of them, we manually checked what
24:56 - 25:01

we were doing and we found it was pretty
accurate. About 96% accuracy we got using
25:01 - 25:08

this methods. The other thing that we did
is for the top advertisers, so for
25:08 - 25:14

Facebook the top 75% of the advertisers,
for Google the top 80% of the advertisers,
25:14 - 25:18

in terms of the money spent by the
advertiser. We went in and we manually
25:18 - 25:22

categorized what was this type of
organization. Was it a political
25:22 - 25:27

candidate, was it what's called a
political action committee, so these are
25:27 - 25:33

the PACs within the U.S., was it a
union, was that a for profit operation,
25:33 - 25:37

was it a non-profit operation. So on and
so forth and so we wrote like some regular
25:37 - 25:42

expressions that got us most of the way
there. Most of them have fairly uniform
25:42 - 25:47

naming conventions and for the ones that
we couldn't kind of automatically classify
25:47 - 25:51

we just did it manually, within a year.
And then since Twitter had so few
25:51 - 25:58

advertisers, we just did these all,
manually, within here. Now, right, we can
25:58 - 26:01

start to do some analysis. So the first
analysis that we did, the easiest
26:01 - 26:08

analysis, was we looked at the size of the
ads. And the thing that pops out is that
26:08 - 26:13

the majority of ads on all the platforms
are between $0 and $100 dollars. So these
26:13 - 26:18

are what are normally called the micro
targeted ads, that are typically seen by
26:18 - 26:23

less than a 1000 people within a year. So
these are very short lived, narrowly
26:23 - 26:28

targeted ads that are kind of honing in on
a specific demographic within here. So
26:28 - 26:32

these are these micro targeted ads within
here. And it appears, right, that the
26:32 - 26:37

majority of ads, especially on Facebook's
platform 82% of them, are of this micro
26:37 - 26:42

targeted kind of ilk within here. So it's
kind of confirms the reporting that people
26:42 - 26:47

had of this kind of trend of
microtargeting within political
26:47 - 26:52

advertising. The other thing, based on our
categorization we can look at how the
26:52 - 26:58

different platforms were used from within
here. The problem with these numbers is
26:58 - 27:04

that there was different inclusion
criteria within each of these databases.
27:04 - 27:12

And then right. Finally, we can kind of
look at the different types of advertisers
27:12 - 27:15

on these kind of platforms. And again it's
hard to read too much into these numbers
27:15 - 27:19

because again, right, Facebook included
much more of the commercial stuff. So
27:19 - 27:24

we're going to see a lot more of the
commercial stuff within here. And the the
27:24 - 27:31

final analysis of the entire data set that
we did was looking at right kind of
27:31 - 27:35

basically the ramp up to the election. We
cut this off in late October. This
27:35 - 27:41

analysis was done for a paper. So the due
date of the paper was ironically November
27:41 - 27:47

6 within here. So we cut it off a few
weeks later and we haven't regenerated the
27:47 - 27:50

contents since then. The one thing that
you can see is at the top there is that
27:50 - 27:58

green Spike. That's kind of the move ads.
So right, closer into the election the
27:58 - 28:04

campaigns were kind of doing sophisticated
get out the vote kinds of ads, within
28:04 - 28:07

here. So there were really sophisticated
kind of microtargeted ads that get out the
28:07 - 28:12

vote. Where like, it was almost kind of
spooky where like they knew where the
28:12 - 28:16

person lived that they were targeting and
so they gave them like directions on how
28:16 - 28:20

to get from where they live to their
nearest polling place within here. So
28:20 - 28:24

there are these really sophisticated kind
of get out the vote efforts that were
28:24 - 28:29

being run online, within here, towards the
end of the campaign. To kind of give you
28:29 - 28:34

more of a kind of apples to apples
comparison of these different ad
28:34 - 28:40

platforms, we also did some analysis kind
of narrowing each of the different
28:40 - 28:45

advertiser types to the ones that were
made transparent by all three platforms,
28:45 - 28:50

which were the federal candidates only.
And so this can give you some idea of kind
28:50 - 28:56

of a scale of these things. And we can see
that when we narrow it here we can still
28:56 - 29:01

see that Facebook has a lot more
advertisers and a lot more ads compared to
29:01 - 29:06

Google. However the spending numbers are
kind of comparable here. For Facebook
29:06 - 29:11

impressions and spends are ranges, that's
all that Facebook releases. For Google the
29:11 - 29:16

impression data is ranges, however we can
get exact spend data, because Google
29:16 - 29:22

basically released a weekly report of
exact spend numbers, aggregated by the
29:22 - 29:27

different advertisers, with here. So we
can use that, to get an exact number of
29:27 - 29:31

the spend. And again, right, Twitter's
numbers are much smaller in terms of
29:31 - 29:37

everything, within here. And we redid some
of our analysis to just see whether our
29:37 - 29:42

effects were simply a distortion based on
what was included in the archives. So
29:42 - 29:48

right we redid our ad size analysis and
even when we limit it to federal
29:48 - 29:53

candidates we can see this still holds,
that a lot of the ads on Facebook are so
29:53 - 29:57

these micro targeted ads. And they are
still micro targeted ads on the other
29:57 - 30:02

platforms, as well, within here. And right
this microtargeting of course varies
30:02 - 30:10

depending on the advertiser. So you take
someone like President Trump and he does a
30:10 - 30:16

lot of microtargeting. So almost all of
his ads probably about 90%, 95% of his ads
30:16 - 30:22

are micro targeted, within here. You look
at other candidates and they do much less
30:22 - 30:26

microtargeting, within here. So this is
definitely different strategies are used
30:26 - 30:30

by different advertisers, within here. But
when we look at it in aggregate, it still
30:30 - 30:38

appears that microtargeting is a very
popular strategy across advertisers. We
30:38 - 30:44

can also, right, look at some of the spend
type by ad type and this kind of shows you
30:44 - 30:50

a little bit how the different platforms
are used, within here. So Facebook's
30:50 - 30:54

platform looks like it's a little bit more
kind of informationally, it's still used a
30:54 - 30:59

lot for donations, whereas Google's
platform is used a lot more for donations
30:59 - 31:04

and a lot less for a kind of informational
ads and to connect within here. It's
31:04 - 31:08

really kind of hard to read anything into
Twitter's data because it's such a small
31:08 - 31:12

set of data. But from the data that we do
have it looks like there's a lot more kind
31:12 - 31:19

of collection of e-mails and things like
that, within here. The other analysis that
31:19 - 31:25

we did on the federal candidate ads was to
look at, that for Facebook in particular
31:25 - 31:30

right, we have the geographic impression
data from here. So we can effectively look
31:30 - 31:37

at how many states were targeted by each
ad with a Facebook advertiser. And the
31:37 - 31:40

interesting thing here is that right.
There was no presidential election. So
31:40 - 31:44

basically all these campaigns were
operating in one state. So their
31:44 - 31:50

constituents for all these elections were
essentially in one state, within here. And
31:50 - 31:54

so if you look at the inform ads, right,
most of those shown a very small number of
31:54 - 31:59

states. So the inform ads are mostly being
shown to the constituents that are
31:59 - 32:03

actually voting for that candidate.
However, if we look at that bottom line,
32:03 - 32:07

the kind of gold line, those are the
donation ads. And we can see that they
32:07 - 32:12

were fundraising in many more states
outside of their constituency, within
32:12 - 32:17

here. So FiveThirtyEight did an
interesting analysis of one particualar
32:17 - 32:22

candidate, Beto O'Rourke. He was a
candidate for Senate in Texas, Texas is a
32:22 - 32:28

very conservative state in the U.S., and
he did surprisingly well, within here. And
32:28 - 32:32

he kind of embraced online advertising and
online donations seeking, were kind of
32:32 - 32:38

cornerstones of his election, within here.
And so FiveThirtyEight did an analysis of
32:38 - 32:42

his donation records in the U.S., at the
federal level. All donations to candidates
32:42 - 32:46

have to be reported to the Federal
Election Committee. So this is all in a
32:46 - 32:49

database for the Federal Election
Committee the FiveThirtyEight people do
32:49 - 32:54

analysis And they kind of confirmed what
we saw on the donation ads, that he was
32:54 - 33:01

getting about 52% of his donations from
Texas and 48% from other states, primarily
33:01 - 33:08

kind of from coastal states that tended to
lean more liberal, like New York,
33:08 - 33:13

California, Washington and places like
that, was where he was donations seeking.
33:13 - 33:19

So this appears to be a very effective way
of getting small dollar donations kind of
33:19 - 33:25

throughout the U.S. within here, through
this online advertising. The last thing that
33:25 - 33:30

I'm going to talk about is the ad
targeting. Facebook didn't directly
33:30 - 33:36

release the ad targeting. However, we were
lucky enough and Pro Publica made a
33:36 - 33:41

browser plugin, that people can install in
their browser, and that's browser plugin
33:41 - 33:47

would identify what it thought was
political ads, based on a machine learning
33:47 - 33:54

algorithm. And for the political ads it
would upload these to their server along
33:54 - 33:59

with the targeting information. So, for
those of you with a facebook account, if
33:59 - 34:02

you're seeing ads you can actually click
on that ad kind of in the upper corner of
34:02 - 34:09

the ad and you can see why is this ad
targeting me, within here. And Facebook
34:09 - 34:15

will tell you a little bit, not all of why
you were targeted for this particular ad.
34:15 - 34:18

They will essentially show you the two
broadest categories of why you were
34:18 - 34:24

targeted for this particular ad, through
this feature they've added to their
34:24 - 34:27

platform. And this is this is actually
kind of interesting, this is something
34:27 - 34:33

that if you're a user of Facebook, I
highly recommend that you do. Because I
34:33 - 34:39

started doing it, and it was kind of eye
opening, as to the level of targeting that
34:39 - 34:43

was being done in terms of advertising.
That's kind of one thing, that we've
34:43 - 34:48

definitely learned from this is that when
you're seeing an ad, oftentimes there's a
34:48 - 34:53

very specific reason as to why you're
seeing that particular ad, within here.
34:53 - 34:58

And so we felt that it was very important
to, as much as we could, understand this
34:58 - 35:03

targeting that was going on within
Facebook's platform. So Pro Publica had
35:03 - 35:10

this browser plug-in and they had this
data set that anyone can analyze, with
35:10 - 35:15

here. So if you do have Facebook and
you're located within the US I would
35:15 - 35:19

highly recommend that you install this
plug-in, because it helps us to kind of
35:19 - 35:25

understand the political advertising in
terms of the targeting, within here. So we
35:25 - 35:30

took ProPublica's data set and we
effectively joined it with Facebook's add
35:30 - 35:35

transparency archive, within here. This
required us to scrape Facebook's ad
35:35 - 35:39

archive, because we needed the ad ID and
this is something that they don't expose
35:39 - 35:44

to their API, currently, within here.
However, they do expose it through their
35:44 - 35:49

user portal, within here. So we scraped
their user portal to join the specific ads
35:49 - 35:55

that were in the ProPublica data set to the
archive dataset, within here. And we were able
35:55 - 36:00

to join about 75% of the ads from here.
There were a lot of ads that were
36:00 - 36:06

collected by the ProPublica data set, that
just simply weren't archived by Facebook's
36:06 - 36:10

transparency archive. It misses things,
within here. It's imperfect as to how it
36:10 - 36:14

does things. And this would be another
interesting analysis to do, to understand
36:14 - 36:18

what is Facebook missing in their ad
transparency archive and this ProPublica
36:18 - 36:23

data set can allow you to somewhat do
this, although through bias of who
36:23 - 36:29

installs the Pro Publica plug-in in the
first place. So we we join these few data
36:29 - 36:34

sets, again with the caveat that the
ProPublica data set is, right, it's
36:34 - 36:38

obviously biased by the set of people that
installed it, which are probably not going
36:38 - 36:43

to be a normal representative set of
Facebook users, within here. But
36:43 - 36:46

unfortunately, it's the best thing that we
have in terms of a data set that releases
36:46 - 36:52

the targeting information, within here.
And so we collapse into three different
36:52 - 36:58

categorizations of targeting, within here.
I'll just quickly explain Facebook's ad
36:58 - 37:02

targeting platform for people that don't
know about it. So one way to target ads
37:02 - 37:08

is, right, through interest or segments,
right, age segments, gender segments or
37:08 - 37:13

interests like I showed you before, within
here. So this is one way to target ads
37:13 - 37:19

within Facebook's platform. Another way to
target ads is through uploading lists of
37:19 - 37:23

information. So you can upload lists of
people's phone numbers, people's email
37:23 - 37:28

addresses or their names. And then when
you upload this list Facebook will find
37:28 - 37:32

those profiles within their database, so
they'll basically join those emails with
37:32 - 37:37

the emails that were entered by the users
accounts, and then they'll target these
37:37 - 37:41

people. So they'll create what they call
an audience of these people through this
37:41 - 37:45

personally identifiable information and
then they'll target them, through this
37:45 - 37:50

method. The final kind of major form of
targeting that Facebook offers is through
37:50 - 37:54

what they call these lookalike audiences.
So this is where you can upload PII
37:54 - 37:59

information, like email addresses, phone
numbers, names. Facebook will link them to
37:59 - 38:04

their accounts and then they'll look at
kind of the interests and things that
38:04 - 38:07

these users and then they'll find you
other users, not these users, but other
38:07 - 38:12

users, that have a similar kind of profile
to these users within here. So these are
38:12 - 38:18

the lookalike audiences that Facebook
offers within their platform. And so we
38:18 - 38:23

categorized it by this and again by
advertiser type, within here. So the thing
38:23 - 38:30

that stands out is, right, is that the for
profit companies are doing a lot of
38:30 - 38:34

targeting based on interests and segments.
So they probably don't know who their
38:34 - 38:39

people that they want a message to are and
they're doing it mostly by interests and
38:39 - 38:45

segment. Whereas when you look at the PACs
and the political candidates they have
38:45 - 38:49

lists. So they have a lot of lists of
people's you know email addresses, phone
38:49 - 38:54

numbers, names, of things like this. And
they're plugging these into Facebook's
38:54 - 38:59

system. And this is how they're targeting
a lot of people, within here, is through
38:59 - 39:04

these lists. And this was expected, but
it's interesting to kind of quantify how
39:04 - 39:10

much of this is happening. And then the
lookalike audiences are also being used, a
39:10 - 39:14

good deal by everyone within here. And
this kind of makes sense, right? Because
39:14 - 39:17

if you have a list of people then you
advertise to them but then right you have
39:17 - 39:23

this lookalike audience of people that are
similar to them that are also perhaps good
39:23 - 39:33

people to advertise to, as well, within
here. The other thing we can do is break
39:33 - 39:38

this down by the intent of the ad here,
and this shows the difference even more
39:38 - 39:44

starkly, of the difference in behavior
between the commercial people and the
39:44 - 39:47

noncommercial people. The commercial
people are targeting mostly based on
39:47 - 39:53

interest, whereas the other people that
are, say, looking to connect with people,
39:53 - 39:56

they're the ones that are using the most
lookalike audiences. And this makes
39:56 - 39:59

perfect sense because right the connection
ads are there to get people's e-mails,
39:59 - 40:03

addresses, phone numbers, names and things
like that. So when you use the look like
40:03 - 40:08

audiences then you can, right, generate
more lists of people they'll convert for
40:08 - 40:13

whatever you want and then you can
retarget them with the direct lists
40:13 - 40:19

targeted ads, later on. So this all makes
pretty good sense when you look at how
40:19 - 40:23

this is behaving, from here. But again
it's interesting, right, kind of make this
40:23 - 40:29

transparent for people to understand how
targeting is happening within the U.S.
40:29 - 40:34

political advertising sphere, within here.
So these were pretty much the two major
40:34 - 40:39

analyses that we did in terms of
targeting, within here. The final part and
40:39 - 40:44

the part that kind of makes the juiciest
of stories is kind of the more dubious
40:44 - 40:48

advertisers that are advertising within
these platforms in terms of political
40:48 - 40:53

advertising. So we kind of call these more
politely kind of "new types of
40:53 - 40:58

advertising", within here. The first type
is one that you would you would pretty
40:58 - 41:03

much expect, so this is this corporate
astroturfing kind of stuff, that's going
41:03 - 41:09

on, within here. We see these ads for
assistance for tobacco rights. And I
41:09 - 41:13

pretty much expected that you look up this
group and it's probably going to be some
41:13 - 41:18

you know quasi nonprofit that's supported
by some industry money from the tobacco
41:18 - 41:22

lobbyists, or something like that. That's
pretty much what I expected to see when I
41:22 - 41:29

saw these ads. You go to this website and
it's actually pretty honest as to what it
41:29 - 41:33

does. This is probably because right of
all the lawsuits and regulations around
41:33 - 41:37

tobacco in the U.S. in advertising. But
the website clearly states, right, that
41:37 - 41:43

it's operated by Philip Morris, the
tobacco company, within here. And this
41:43 - 41:47

actually isn't a legal entity, this
citizens for tobacco rights. Is just
41:47 - 41:51

simply a website that's been stood up,
that's owned and operated by Philip
41:51 - 41:55

Morris, as far as we can tell, within
here. And this gets to a big problem with
41:55 - 42:01

Facebook's transparency archive, which is
that they don't actually vet that
42:01 - 42:05

disclaimer string of the sponsor, within
here. So pretty much anyone can type
42:05 - 42:12

anything that they want within that
disclaimer string and Facebook will allow
42:12 - 42:16

you to run it. We've tested it and as far
as we can tell, you can't say that you're
42:16 - 42:20

from Facebook, Instagram or that you're
Mark Zuckerberg, they'll block that. But
42:20 - 42:24

pretty much anything else that you type in
there they'll allow that ad to run, within
42:24 - 42:30

here, with no vetting. So we discovered
this, we politely, privately mentioned it
42:30 - 42:36

to Facebook. Some reporters kind of
trolled Facebook within here and so there
42:36 - 42:41

was a reporter that trolled Facebook and
opened up ads for all the senators, within
42:41 - 42:45

here, on Facebook. And of course Facebook
approved them all, from within here, and
42:45 - 42:50

they they did some other things to troll
Facebook where they insert some other
42:50 - 42:55

advertisements, within here. But the point
is, that that disclaimer string is not
42:55 - 43:00

vetted within here. Google actually does
that disclaimer string within there, so
43:00 - 43:05

they require either a tax ID number or a
federal election committee I.D. number and
43:05 - 43:12

they actually do vet it and they publish
that tax I.D. number or federal election
43:12 - 43:15

I.D. number along with the disclaimer
string, within here. Which makes it really
43:15 - 43:19

easy to track down advertising on Google.
On Facebook, because right they can
43:19 - 43:22

basically type in whatever they want in
the disclaimer string, it makes it much
43:22 - 43:26

more difficult to actually link these
advertisers. And sometimes just outright
43:26 - 43:34

impossible, if the disclaimer string is
made up or just too mutilated in some way
43:34 - 43:40

or form, within here. So this is
definitely a problem, where we have these
43:40 - 43:44

lobbyist organizations, or in this case
not even lobbyist organizations, just
43:44 - 43:50

industry, that can effectively lie about
who's paying for this ad in Facebook's
43:50 - 43:56

platform. The other thing we found were
what is now kind of being called these
43:56 - 44:03

junk media outlets. So this is for profit
outlets that are claiming that they're
44:03 - 44:08

doing kind of news operations. But right.
It's not really traditional kind of
44:08 - 44:13

reporting journalistic things. It's more
just kind of propaganda messaging, within
44:13 - 44:19

here. So there is this group called New
American Media Group LLC. They also ran
44:19 - 44:25

the name of New Democracy, or sorry
Democracy Now was their other name, within
44:25 - 44:31

here. And so they ran this, within here.
We tracked down these LLCs and they were
44:31 - 44:36

just simply shell companies and that kind
of led to nowhere, within here. We worked
44:36 - 44:42

with a journalist from The Atlantic that
actually did a lot of digging into the
44:42 - 44:49

shell companies. And he was able to,
through his basically investigation, link
44:49 - 44:54

these companies to the actual entity that
created these shell companies and was
44:54 - 45:01

running these ads, within here. And so
when we did our analysis of this, this
45:01 - 45:07

company basically this third party
advertising company was creating these.
45:07 - 45:13

They're meant to look like kind of
grassroots kind of organizations. There
45:13 - 45:17

were, a lot of them were kind of targeted
at more conservatively leaning groups, but
45:17 - 45:22

then they would bombard them with liberal
messaging, within here. So they would
45:22 - 45:26

create these fake communities that looked
more conservative. And then once they
45:26 - 45:30

attract an audience they would bombard
them with these liberal kinds of
45:30 - 45:36

messaging, within here. And so this
particular company is based in Colorado.
45:36 - 45:41

It's called MOTIVE AI. Apparently, it's
hoping to become the Cambridge Analytica
45:41 - 45:49

of the liberal side. I don't know if
that's something to aspire to or not. Some
45:49 - 45:52

other journalists also did some digging,
within here. There was some journalists
45:52 - 45:57

from ProPublica that did some digging,
within here. They found more of this
45:57 - 46:03

astroturfing by political lobbyist groups
and things like that. Big oil insurance
46:03 - 46:09

companies, again when they advertised on
say Google's platform they would be honest
46:09 - 46:12

about their disclaimer string, and then
when they advertised on Facebook's
46:12 - 46:16

platform they would often kind of
obfuscate their disclaimers string, to
46:16 - 46:23

make it more difficult to link them
together. And so they unmasked a whole
46:23 - 46:28

bunch of these other kinds of junk media
operations, as well, that were kind of
46:28 - 46:33

spreading propaganda, within here. I'm
picking on Facebook a lot. Again Google
46:33 - 46:38

does vet the tax I.D. number of these
people, but you see something like, right,
46:38 - 46:45

this DIGICO LLC that paid for some ads. So
you track this down, and this is again one
46:45 - 46:48

of these third party advertising agency.
It's easy to track down because of the tax
46:48 - 46:52

I.D. number. But it still doesn't actually
tell you who paid for the ad. It just
46:52 - 46:56

tells you the third party that, right, it
presumably was paid on behalf of someone
46:56 - 47:00

else to run these ads, from here. So this
is a big problem with these disclaimers
47:00 - 47:04

strings, that oftentimes they don't
actually identify the person that's paying
47:04 - 47:11

for the ad. So to kind of wrap this up,
within here, after our kind of experiences
47:11 - 47:15

looking at these transparency archives I
would say they're fairly adequate to
47:15 - 47:20

understand good actors. So we could fairly
well understand how good political
47:20 - 47:24

advertisers were behaving in Facebook's
platform. However, right, for the bad
47:24 - 47:29

advertisers, we probably missed a lot of
them because they could just simply type
47:29 - 47:33

in lots of different disclaimers strings
and easily avoid our analysis, at this
47:33 - 47:39

point. None of these current archives have
it just right yet. All of them have
47:39 - 47:44

issues, right. Facebook isn't providing
good access to their data. They're not
47:44 - 47:49

releasing targeting information. Google is
missing 30% the content because of third
47:49 - 47:56

parties using their advertising system.
They're not releasing spend and impression
47:56 - 48:02

information based on demographics, within
there. Twitter just simply hasn't hired
48:02 - 48:08

someone to enforce the policy of
transparency, well, within here. And
48:08 - 48:13

unfortunately our experience throughout
this process has been that these companies
48:13 - 48:18

are oftentimes reactive, instead of
proactive, within here. Which means that,
48:18 - 48:21

right, we have to continuously put
pressure on them, in order for them to
48:21 - 48:26

kind of improve these archives, within
here. So this is unfortunately kind of the
48:26 - 48:30

state that we're in, within here. And I'm
sure, one thing that I really want to give
48:30 - 48:33

a shoutout, is right there's people at
these companies that are actually trying
48:33 - 48:39

to build these transparency archives. And
I want to give them a lot of credit for
48:39 - 48:44

taking on this task, that's probably not
well rewarded within their companies, of
48:44 - 48:47

building these transparency archives,
within here. And so my hope is that by
48:47 - 48:53

applying pressure we can get them more
support to kind of get more resources and
48:53 - 48:59

be able to make more transparent, within
their companies, as well. Because I hope
48:59 - 49:05

that, right, this puts us in better shape
to understand the 2018 elections, but 2020
49:05 - 49:10

is another presidential election and my
hope is that we'll continue the improved
49:10 - 49:14

these archives, so that we'd be in a much
better position to understand both the
49:14 - 49:20

good and the bad advertisers by 2020, with
here. However this is going to take
49:20 - 49:25

probably regulatory pressure, legal
pressure, pressure by technologists and
49:25 - 49:32

things like this to improve these
archives, at this point. So with that,
49:32 - 49:36

again, I have my collaborators, that
aren't here on the stage, but they
49:36 - 49:41

definitely did a lot of the heavy lifting
to make this happen, within here. And
49:41 - 49:45

again all of our tools and most of our
data except for the Facebook data, that's
49:45 - 49:51

under NDA, is available through our
GitHub, there. And so with that I will
49:51 - 49:53

open it up to questions.
49:53 - 50:04

applause
50:04 - 50:05

Herald: Thank you so much Damon. I know
50:05 - 50:12

that there are a few questions among the
audience. So, microphone 6 please.
50:12 - 50:16

Question: So [Name] on the IRC is asking
"Have you looked at links between the
50:16 - 50:22

advertisers and do they use the same
images or text for instance?".
50:22 - 50:24

Answer: This is a really good question.
This is actually one of the analysis that
50:24 - 50:28

we're currently doing. So we're starting
with the text, because that's obviously
50:28 - 50:32

the easiest. But we're also exploring some
image clustering algorithms, as well. To
50:32 - 50:37

cluster the advertisers across platforms
and also within platform because we're
50:37 - 50:40

finding a lot where, you know, they create
multiple shell companies, where they just
50:40 - 50:44

lie about their disclaimers and so this is
definitely something that we're focusing
50:44 - 50:49

on, is better clustering of the
advertisers. Because like that group
50:49 - 50:53

MOTIVE AI, even though they created the
different LLCs they were running the same
50:53 - 50:59

images and videos across their different
LLC shell companies.
50:59 - 51:02

Herald: Great thank you. Please if you
have any questions, queue up by the
51:02 - 51:08

microphones. Microphone number 1 please.
Question: Hi, Oliver Moldenhauer? Thanks a
51:08 - 51:13

lot for the talk. Definitely one of the
best I've seen here so far. Two questions.
51:13 - 51:19

A: Why do those transparency archives
exist? Was there some law or political
51:19 - 51:25

process around that? And B: As we are
nearing the European election next year,
51:25 - 51:31

what kind of data is available for Europe?
Answer: That are both good questions.
51:31 - 51:34

Again I'm not intern in one of these
companies, so I can just speculate as to
51:34 - 51:38

why these transparency archive exists. But
my my guess is, right, that this was
51:38 - 51:44

reactionary. So Mark Zuckerberg and high
ranking officials from Twitter and Google
51:44 - 51:50

were hauled in to testify in the House and
Senate, and this is them trying to self
51:50 - 51:54

regulate instead of having regulation
imposed on them by people. So that again,
51:54 - 51:58

this goes to the pressure part is that
there was regulatory pressure put on them,
51:58 - 52:03

the threat of regulatory pressure and so
that's what made them do these
52:03 - 52:09

transparency archives. In terms of what's
available in Europe. I guess as long as
52:09 - 52:17

the UK is still in the EU, kind of
teetering Facebook has started to make ads
52:17 - 52:23

transparent in the UK. They also make them
transparent in Brazil and they're going to
52:23 - 52:26

make them transparent in India. And I
think they have plans to make them
52:26 - 52:31

transparent in other places, in the EU as
well. However, they haven't done that.
52:31 - 52:36

However, again this goes back to the
pressure part. So there's no API for the
52:36 - 52:40

other countries, there's only an API for
the US and that might be because we put
52:40 - 52:46

pressure on them by scraping them and
publicly releasing their data. And, right,
52:46 - 52:49

there's no transparency reports for other
countries, as well, there's only
52:49 - 52:52

transparency report for the US. And again
that might have been because we applied
52:52 - 52:56

pressure and we were publishing numbers.
Some of the numbers in terms of spend were
52:56 - 53:00

very low, because, right, they were just
giving us ranges. So we might have been
53:00 - 53:04

making them look bad, when we took the
bottom range their spend and they might
53:04 - 53:08

have wanted to correct that with their own
transparency archive, as well. So again, a
53:08 - 53:12

lot of this unfortunately requires
pressure to get them to improve their
53:12 - 53:16

transparency efforts.
Herald: Great thank you. Microphone number
53:16 - 53:21

two please.
Question: So you mentioned you mentioned
53:21 - 53:26

FiveThirtyEight and their work on the
donations. Do you think it makes sense
53:26 - 53:33

to combine the data you gathered with what
they have to look at election outcomes,
53:33 - 53:38

like, election results and turnout and
stuff like that?
53:38 - 53:43

Answer: Yes. Actually this is the number
one project on our road map, right now.
53:43 - 53:50

Is, actually Google has processed the FEC
information and they've made this
53:50 - 53:56

information available via their big query
database. So we've downloaded this, we've
53:56 - 54:02

manually linked the Facebook advertisers
and the Google advertiser to the FEC data
54:02 - 54:07

and now we're doing the regression models,
specifically focused on the donation ads
54:07 - 54:11

first. Because those are what are reported
to the FEC, at this point. So we are
54:11 - 54:15

essentially trying to understand how
effective these donation ads are at
54:15 - 54:21

actually driving donations, within here.
Herald: Thank you. Microphone number 4
54:21 - 54:25

please.
Question: Hi. First of all thank you Mr.
54:25 - 54:30

McCoy and your team for this very
interesting research. I was wondering,
54:30 - 54:35

whether you know if there are any follow
up research conducted by political
54:35 - 54:42

scientists, sociologists etc. analyzing
the political repercussions of these ad
54:42 - 54:47

campaigns.
Answer: Yes, so we're aware of a few
54:47 - 54:50

efforts. I don't want to out the teams
that are doing them, in case they don't
54:50 - 54:56

want to be outed. There's there's nothing
that's been published, publicly I believe
54:56 - 55:00

on this. But we're definitely trying to.
That's one of the main goals of kind of
55:00 - 55:04

our overarching online political
advertising transparency thing, is to try
55:04 - 55:09

and get as much data as we can in the
hands of less technical people in an easy
55:09 - 55:16

way for them to analyze. And so this is
basically the primary goal of our project,
55:16 - 55:20

in here. So we've been working as hard as
we can to get political science to stay up
55:20 - 55:25

to speed on the data. And this is why it's
really unfortunate that Facebook has its
55:25 - 55:30

NDA in place for their particular data,
because this makes it very difficult for
55:30 - 55:35

us to share and collaborate in that
particular data. Which puts pressure on us
55:35 - 55:40

unfortunately as being the only ones that
can do some of this analysis right now. So
55:40 - 55:44

this is why I would I would love to apply
enough pressure to Facebook, to get better
55:44 - 55:49

access to their particular data.
Herald: Yes. And the question from the
55:49 - 55:54

Internet please.
Signal Angel: So Nomad is asking "Why are
55:54 - 55:59

those advertisements considered political or
election interference in the USA. Can't you just
55:59 - 56:04

see, that someone paid money to display
that content and conclude its purpose is
56:04 - 56:12

to promote an agenda or manipulate them?".
Answer: This is a good question. Right, a
56:12 - 56:15

lot of this goes to the tactics that
they're using here. So again they're
56:15 - 56:19

creating these communities, that they're
making look like their grass roots
56:19 - 56:24

communities and then they're kind of
sucking people in with these ads, that up
56:24 - 56:29

until recently had no disclaimers string
on them. So you had no idea who paid for
56:29 - 56:34

them. So they appear to be paid for by
kind of these grassroots organizations. So
56:34 - 56:39

you felt like you were, kind of, part of a
grassroots movement, enjoining these kinds
56:39 - 56:43

of communities. I think this is the really
scary, kind of subtle things. And you
56:43 - 56:47

might not realize why you're being
targeted for these particular ads or who
56:47 - 56:50

was behind these particular ads. So, I
think it was really easy for people to
56:50 - 56:55

kind of get unwittingly, kind of, duped
into joining what looked like these
56:55 - 57:00

grassroots campaigns. So that's why I
think improving these disclaimers strings
57:00 - 57:03

and showing who is really behind these
communities and these advertisements is
57:03 - 57:09

really important, to dispel this notion of
these fake grassroots communities, that
57:09 - 57:13

are luring people in within here. So I
think that's one of the big things that
57:13 - 57:18

can be gained by these transparency
archives. But it requires improvement of
57:18 - 57:23

the transparency archives, to do that.
Herald: Microphone number 3 please.
57:23 - 57:28

Question: Yes. So I'm curious about the
efficacy of some of the advertisements
57:28 - 57:37

that are on Facebook and Twitter. And I'm
wondering is any group like the ProPublica
57:37 - 57:45

web extension checking the engagement
rate? Like the number of comments, the
57:45 - 57:53

number of views and the number of shares,
to like kind of get an estimate of, OK
57:53 - 57:58

this big grassroots community is building
up a number of followers and these
57:58 - 58:06

followers population sizes and whatnot.
Answer: Yeah, this is again a really good
58:06 - 58:11

question. This is something that we are, I
would certainly encourage other people to
58:11 - 58:14

potentially do as well. So the problem is
that a lot of that information isn't
58:14 - 58:18

exposed by the transparency archives. This
is more of what they call kind of the
58:18 - 58:23

organic information, the non paid for
information, within here. And so this is
58:23 - 58:28

stuff that none of the platforms are
releasing. And so it requires kind of a
58:28 - 58:33

scraping operation, essentially, to gather
this information and collect it. And it's
58:33 - 58:36

something that we're definitely thinking
about how to efficiently do, is how to
58:36 - 58:41

efficiently scrape and collect this
information. Because this is very hard
58:41 - 58:43

because, right, you go against the anti
scraping teams of these companies, that
58:43 - 58:47

are well resourced. And this requires
accounts, and these accounts are going to
58:47 - 58:51

be shut down and detected. So this is
something that we're trying to pilot to
58:51 - 58:56

understand. Our other idea of how to do
this potentially is try and crowdsource
58:56 - 59:00

this information. This is similar to how
ProPublica crowdsourced it for the browser
59:00 - 59:04

extension information. We could
potentially crowdsource it, where you
59:04 - 59:08

know, when people interact with these
communities or these ads the plug-in could
59:08 - 59:12

potentially crowdsource that information
back to us. And then we would have to
59:12 - 59:18

figure out some strategy to sanitize that
information in some way. Because at that
59:18 - 59:21

point you might have some sensitive
information they are collecting. This is
59:21 - 59:26

something that we're thinking about. We're
cautious, I think, rightly so because this
59:26 - 59:31

can start stepping on, again, more
sensitive information that's available
59:31 - 59:34

from within here. But I think it's
definitely key to understanding the
59:34 - 59:37

effectiveness of these ads. Something that
we're going to have to do or we're going
59:37 - 59:42

to have to convince Facebook somehow to do
on our behalf in order to really
59:42 - 59:46

understand the effectiveness of these ads.
Herald: Thank you. Last question for
59:46 - 59:50

microphone number 1.
Question: All right. At the beginning of
59:50 - 59:54

your talk you explained how Russia
influenced the elections. I'm curious
59:54 - 60:00

about the attribution. Is there possibly
any doubts at any instance that you
60:00 - 60:06

presented that it was not Russia or maybe
some other country, China or Iran? How do
60:06 - 60:10

you know, and did you check the facts?
Answer: I mean, that's a good question.
60:10 - 60:14

Unfortunately, right, the national
security agencies don't release the
60:14 - 60:21

sources of their information. There's
another investigation done by the
60:21 - 60:27

Department of Justice by Robert Mueller,
that did release some more information
60:27 - 60:32

about this, within here. I've looked at
that information and it looks, you know,
60:32 - 60:36

right, you can never a 100%, unequivocally
state that it was Russia. It could have
60:36 - 60:41

been a false flag operation. But I think
that pretty much the overwhelming
60:41 - 60:45

information that everyone has found when
they've investigated this has pointed at
60:45 - 60:53

Russia and the organizations that were
prosecuted by Mueller.
60:53 - 60:57

Herald: Damon McCoy, thank you very much.
Please give them a great round of applause.
60:57 - 61:00

Applause
61:00 - 61:05

35c3 postroll music
61:05 - 61:22

subtitles created by c3subtitles.de
in the year 2019. Join, and help us!

Title:: 35C3 - Explaining Online US Political Advertising
Description:: more » « less
Video Language:: English
Duration:: 01:01:22

	C3Subtitles edited English subtitles for 35C3 - Explaining Online US Political Advertising
	Bar Sch edited English subtitles for 35C3 - Explaining Online US Political Advertising
	C3Subtitles edited English subtitles for 35C3 - Explaining Online US Political Advertising
	C3Subtitles edited English subtitles for 35C3 - Explaining Online US Political Advertising

English subtitles

Revisions

Revision 4 Edited

C3Subtitles

35C3 - Explaining Online US Political Advertising

Revisions

Our website uses cookies

Operating cookies (Required)