-
35c3 preroll music
-
Herald: This talk will be held by Damon
McCoy. He will be explaining online U.S.
-
political advertising and he has been
working with researching like how
-
different online communities basically
behave around many different topics. But
-
this is what he's going to talk about
today so please give him a great round of
-
applause.
Applause
-
Damon McCoy: Thank you everyione for coming. I'm
up here speaking and I'm the only one that
-
wanted to fly to Germany over Christmas
and New Year's. However there were three
-
real people that were really key in
helping out with this research and before
-
we get started, I just want to credit
them. One is my grad student Laura
-
Adelson. She did a lot of the analysis
that you're going to be seeing generated
-
all the graphs. One of the undergraduate
students that's from our NYU Shanghai
-
campus secure did a lot of the work to
collect all the data that you're going to
-
see here. And then Raytown who is a
professor at NYU and the Shanghai campus
-
also helped out with kind of our initial
efforts of collecting some of this data.
-
And so before we get started I guess I'll
give a little bit of an introduction about
-
myself. I'm a professor at NYU tenant
school of engineering. As was mentioned
-
before I do a lot of stuff kind of looking
at how technology kind of impacts the
-
security and privacy of you know society
groups of people and things like them. So
-
this was really kind of an opportunistic
project that kind of captured the impact
-
of online advertising in the political
sphere of U.S. campaigns. And also quick
-
plug. So everything that I'm going to be
showing you most of the data scripts and
-
things like that we've put in a github
that's accessible by anyone that wants to
-
analyze the data or look at our scripts
and improve them or things like that.
-
Applause Thank you. This is the first
time that I've given this talk outside of
-
the U.S. so let me just start with some
quick explanation as to how U.S. elections
-
work for those of you that might not know
about this. So every two years in the U.S.
-
we hold federal elections. These are
elections - right - that impact all of the
-
states within the U.S. And so every four
years we have an election for president,
-
2018 were our last elections. This was not
a presidential election year so the
-
elections were for the Senate and the
House seats at the federal level. And then
-
we also had elections for state and local
positions as well with in here. And some
-
of them that are captured in our data
especially our Facebook data not so much
-
in our Twitter and Google add transparency
data that we have here. So this will be
-
focused this talk on the 2008 elections
that happened on November 7th. Election
-
day is always the first Tuesday in
November in the U.S. every two years. So
-
to begin with in the background right
probably some of you know about this, some
-
of you might not know about this, but in
the 2016 elections which were a
-
presidential election year there was right
this election interference that happened
-
here. And so Facebook has released these
ads. These ads were paid for by a Russian
-
company the Internet Research Agency that
ran these ads. And Facebook released these
-
to - right - the Senate and then the
Senate re-released these publicly to
-
people. And so this is an ad basically
trying to disenfranchise people from
-
voting in the elections. And you can see
right it's targeted at people in the U.S.
-
of a certain age range and interests like
Martin Luther King, African-American
-
culture, African-American civil rights.
Facebook doesn't actually allow you to
-
directly target to people based on like
their ethnicity. So this is a pretty good
-
proxy though. If you target these kinds of
interests for this, so this would probably
-
be fairly effective at targeting African-
American people within the U.S. for this
-
ad to try and disenfranchise them from
voting in the elections. There were other
-
ads , right, that tried to do
misinformation distance information kinds
-
of campaigns. So this was an ad that was
again paid for by the Russian agency that
-
was trying to perpetuate this rumor
basically unsubstantiated rumor that Bill
-
Clinton has this ill legitimate child
within here. And again, right, the
-
targeting information is targeted at
African-Americans within the U.S. and
-
African-Americans are kind of a key voting
block for a kind of more liberal,
-
democratic people within the U.S.
oftentimes. That's probably the other
-
thing that should have explained about the
U.S. election system especially at the
-
federal level is that we effectively have
two parties that are you know when any
-
meaningful amount of elections within the
U.S. and one of them is the Democratic
-
party that tends to skew more liberal. So
they're more kind of right for bigger
-
government, more social services and then
we have the Republicans which skew kind of
-
more conservative wanting kind of a
smaller government providing less services
-
and kind of less regulation around things
as well. And so right these are two
-
examples but there were a whole bunch of
these ads that were shown on Facebook
-
within a year. So right. Pretty much all
of these that they're tried to
-
disenfranchise people or tried to kind of
create chaos, kind of polarize people
-
around the election oftentimes with kind
of disinformation sorts of that things.
-
And so in 2017 our Office of the Director
of National Intelligence put out a report
-
- sorry for the big block of text, this
will be the only big box of text in here.
-
But I thought it was kind of important to
show this because they they pretty much
-
unequivocally state that Russia tried to
interfere in the US elections and that
-
Vladimir Putin was somehow involved within
this interference. And so this is - this
-
is pretty much as far as the National
Security Agency, the CIA, NSA pretty much
-
solid evidence that they have - that this
occurred within here. And so the other
-
thing that broke was right the Cambridge
Analytica scandal as well broke within
-
Facebook where there was this straight
third party advertising agency that
-
collected you know a whole bunch of data
on 80 million profiles within Facebook and
-
then tried to create psychological
profiles for targeting and messaging and
-
things like that around here. And so these
two particular scandals broke within here
-
and the first result of this is we have
Mark Zuckerberg in a suit a real suit not
-
a hoodie suit. Testifying right in front
of our Senate within a year. And so he,
-
right, he testified before House and
Senate committees about the abuses
-
occurring within Facebook. And he did this
on April 10th and 11th 2018 within here,
-
and right. So this is right. In here, he
did admit that Facebook had made mistakes
-
and that they they need to improve things
moving forward within their platform. The
-
most kind of tangible outcome from these
testimonies were these transparency
-
archives that began to appear. And here is
a view of what Facebook's add transparency
-
archive looks at. When it originally
deployed you needed a Facebook account to
-
interact with it. Now Facebook has dropped
the requirement so anyone with Internet
-
access, unless you're censored somehow,
can go to this archive and access these
-
ads. So the user facing portal for these
add transparency archives you type in
-
keywords and then it basically does is
write a pattern matching on the ad text
-
and other parts of the ad and then returns
the ads that matches that within here. And
-
so then you can see all the political ads
that match that particular term within
-
here. Facebook began archiving these ads
kind of at a large scale starting on May
-
7th 2018 by Election Day November 7th
2018. There were 1.6 million ads paid for
-
by over 85 thousand advertisers within
Facebook's platform. Facebook is actually
-
fairly broad as to what they included
within their political archive. They
-
included any ads related to US elections,
either federal state or local elections.
-
They also included these very important
kind of issue ads as we saw when we looked
-
at the Russian interference a lot of times
the ads didn't mention actual political
-
candidates, they mentioned kind of
polarizing issues within the U.S. So
-
Facebook also included these ads of
political national importance. They had a
-
list of I think about 13 different
criteria is the last of them being values.
-
And so it was a fairly encompassing set of
ads they tried to include within their
-
archive. Along with the text and images
or videos of the ad, they also included
-
basically ranges of geographic impressions
and demographic impressions. So right.
-
State level impression information in some
kind of ranges and demographic by gender
-
and by age kind of bucketed within here.
And they did this again for impressions
-
and then for they also included some spend
information again kind of in ranges so
-
they gave ranges of 0 to 99 dollars, a
hundred dollars to say like 500 dollars,
-
501 dollars, 2000 dollars and so on. And
so forth within these buckets one of the
-
key pieces of information that they did
not release was the targeting information.
-
So like I showed you before of those ads
they - right - they have that targeting
-
information, Facebook does not release
that within their transparency archive.
-
They have this right. They have right-
thay had that user portal where you could
-
do the keyword search from within there.
However right. I'm I like to do large
-
scale data analysis and so I wanted to
basically try and collect all of the ads
-
within this web portal. And so initially
all they had was this keyword search
-
portal within here. And so what we did is
we compiled kind of a large list of what
-
we thought were reasonables of keywords,
names of prominent politicians, names of
-
states issues within here. And so we tried
to compile this long list of keyword
-
searches and we began scraping the
reporter within here and I'll tell you the
-
story of how our scraping efforts went.
Now currently they are so off for a API
-
it's still keyword based their API and
it's restricted by an NDA so I'll kind of
-
flesh out the story of how this goes. So
at the beginning they, they released this
-
kind of towards the end of May the user
archive and I played with it and I
-
realized that this didn't lend itself well
to kind of large scale analysis of these
-
ads and so on. I went to my students
Secur and Laura and Secur worked kind
-
of furiously night and day and within
three days he had a workable scraper that
-
was able to put in our keywords and then
we were able to scrape all the results
-
from our keyword within here. And so we
ran the scraper for about 2 months and
-
then we released a report. Just kind of a
very general statistical report and we
-
released the data in our github archive at
that. After that about 2 weeks later
-
Facebook began anti-scraping measures
within here. And so, right, this kind of
-
hampered our efforts to scrape Facebook's
archive. At this point. I'm - I don't want
-
to attribute any malice. I don't believe
that Facebook was targeting just our
-
scraping efforts they were targeting
everyones scraping efforts. The
-
transparency whether it's wrong or right
to block people from collecting data on a
-
transparency archive I might kind of
quibble with them on that and say they
-
might once provide better access to the
data within their transparency archive.
-
But this was the choice that Facebook made
to kind of clamp down on the scraping
-
within here. So we tried to fight with
them a little bit to - right kind of a can
-
mouse game. You know we make some changes
to our scraper to avoid their anti
-
scraping. They do some things on their end
to block our scraper and probably other
-
people scrapers that are doing similar
things to us as well within here. And so
-
this persisted for probably about 2 weeks,
and then Facebook basically deployed their
-
API within here. However they said right,
their API is very limited and still in
-
beta at this point. So these were part of
the terms and conditions from here. One of
-
the ones that I found kind of the most
unease (?) is that it limited it only to
-
U.S. people so we could essentially only
very closely work with U.S. people within
-
here and at least it did kind of - it
limited the types of people that we could
-
work with in here. And so right
unfortunately this kind of ruled us out
-
from working closely with journalists from
you know really good news organizations
-
like the Guardian and so like that just
happened to have the misfortune of being
-
located somewhere outside of the U.S.
within here. Maybe the good fortune, yes.
-
And then the list of restrictions
continue. They also placed the data
-
retention on it so we could only retain
the data for one year. Again placing data
-
retention. So Facebook's data retention on
their archive is 7 years within here but
-
they're placing a 1 year data retention on
the data that we collect from their NDA.
-
I'd like to say that - right - we - right
- I got this NDA and I lit it on fire, I
-
tore it up and we continued to scrape the
archive. Within a year. No unfortunately
-
it was a hard call to make but right you
know there's basically two students and we
-
basically had to make a call whether we
wanted the data to analyze or whether we
-
wanted to spend all of our time kind of
faded - fighting with Facebook's anti-
-
scraping efforts. And so in the end we did
- I did in fact agree to their NDA within
-
a year. So the initial data we scraped, we
release were still scraping a small amount
-
of data that we do release as well from
here. But unfortunately at this point any
-
of the data that we collected from the NDA
we cannot release within here. If anyone
-
doesn't want to fight with Facebook and
resurrect the crawler within here I would
-
be more than happy for that to happen
within a year. Unfortunately given our
-
engineering constraints it just simply
wasn't feasible for us to do that within a
-
year. And so the story is a little bit
different with Google. So Google's archive
-
they began archiving ads on May 31, 2018.
By election day they had 45.000 and from
-
600 advertisers. Their criteria for
introducing advertising was much more
-
narrow than Facebook's, so they only
released ads related to U.S. federal
-
candidates and federal office holders
within here. So it is a much more limited
-
set of data that Google released within a
year. None of the issue ads that Facebook
-
released. They didn't release any of the
geographic or demographic data by
-
impression, they did release ranges of
impressions and ranges of spend data, and
-
they did release some limited targeting
data from here so they released geographic
-
and demographic targeting information
which Facebook hadn't released in their
-
ads. And their data is available through a
similar keyword based portal. But they
-
also make it available through just a
database, if you want to within here. So
-
this is what their portal looks like
within here. And this is - right - their
-
big table, sorry, their big query database
that they released from here. And so they
-
updated every week within here and you can
download it and analyze the data
-
relatively easily within here. So the last
one to kind of implement their archive was
-
Twitter. Twitter began archiving ads on
June 27, 2018. The scale of ads and
-
Twitter is very small compared to the rest
of them. The scale of their ad network in
-
general is much smaller than Google and
Facebook's. And what they included, it was
-
similar to what Google included in terms
of - right - only federal candidates
-
within here. Kind of closer to the
election, they also said that they were
-
going to release political issue ads.
However, the mechanism of enforcement
-
doesn't appear to exist within Twitter's
system. There doesn't appear to be anyones
-
job it is to actually enforce transparency
of ads from here. So we've been kind of
-
manually finding accounts and reporting
them to Twitter within here and then when
-
we manually report them to Twitter,
Twitter then includes them and future
-
transparency kinds of efforts within here.
But it appears like we're basically the
-
ones [Damon McCoy] short laughter it's
become our job to monitor the Twitter
-
accounts and then notify Twitter and then
they'll manually kind of deal with it.
-
Unfortunately, they still don't appear to
have a person that actually manages this
-
process internal to Twitter at this point.
Twitter does however release the most
-
information. So they release exact data
not the range data on impressions and
-
spend information, also by geographic and
demographics and they also include all of
-
the targeted information as far as we can
tell and their data is available through
-
without an account. Basically through
their portal and we've been scraping them
-
and there's been no problems they haven't
blocked us at this point. So we just
-
simply scraped their data and then we
republish it to github at that point. And
-
we've had no problems with Twitter in this
way in the scale, their data is so small
-
that it's been relatively easy to keep
pace with it at this point. And here's
-
just a picture of the Twitter transparency
archive and again this have a list of all
-
the Twitter accounts that they include in
their transparency archive. So we can
-
monitor this and then we can monitor other
people that we know that are politically
-
active when we see them doing paid
advertising then we can notify Twitter and
-
then Twitter will include them in their
transparency archive normally within like
-
a week, or so of here. And so this is,
this is kind of the background that you
-
need to understand the transparency
archives. So now we have a data set that
-
we can begin to analyze within here. For
Facebook since it was the keyword driven
-
thing at the beginning and it still is, we
were able to collect about 80% of the ads
-
in Twitter's database from there. The
other problem with the API is that it is
-
severely rate-limited at this point. I'm
talking about 3 to 4 queries per minute
-
that we can get through Facebook's API at
this point. And so we kind of did our best
-
effort to collect as much data as we could
from Facebook. About two weeks before the
-
election, Facebook began releasing a
transparency archive that included
-
basically an aggregated list of all the
advertisers and how many ads they have and
-
how much spent and this is how we can tell
that we got about 80% of the ads from
-
Facebook's archive based on this within
here. And the nice thing about the
-
transparency report is that we could go
back and now that we know we're missing we
-
could readjust our usage of the API and so
now we have virtually 100% coverage of
-
Facebook going forward within here.
Twitter - right - we could collect 100% of
-
their data. And again we've republished
the SOL (?) in an easier to process kind
-
of form. Google again - right - we have a
100% of their ads because they're all in
-
the big query database. However when we
started analyzing the data we noticed that
-
for a lot of the ads we're missing the
actual content, the images and text of the
-
ad. It turns out that for Google's ad
network if the ad was originally purchased
-
through a third party advertiser and then
run on one of Google's properties the
-
content of the ad won't be archived within
your system. This is unfortunately a big
-
loophole. So - right - if you're if you're
running a kind of malicious misinformation
-
thing, you can easily unfortunately
circumvent Google's archive at least from
-
archiving your content by simply just
paying for it by a third party within
-
here. It's unclear whether this is a
policy limitation or whether this is the
-
technical limitation on Google's part, but
the outcome is that we only have the
-
content for about 70% of Google's ads that
were paid for directly on Google's
-
platform and within here. So one of the
first things that we want to do is kind of
-
add some semantic meaning to these ads a
kind of large scale. And so we played
-
around with a few techniques, some fancy
kinds of natural language processing and
-
things like that. But we found that
there's actually a really fairly simple
-
and effective way of categorizing kind of
the intent of the ad, and that's that most
-
of these ads have a URL of some kind and
a lot easier or else just point back to
-
like third party services like if you're
holding some kind of event you're going to
-
coordinate it with like Everbright or
something that if you're seeking
-
donations, if you're a Democrat you're
going to use this third party Paron
-
processor they're called Act Blue, if
you're Republican there's like two or
-
three payment processors that you're going
to use for this. So we could simply just
-
look at these really prominent URLs that
occur a lot of times and just kind of
-
manually tag what is the purpose for this.
And by doing this we can tag ads as either
-
just purely informational that they wanted
just kind of get some kind of message
-
about the candidate either positive or
negative out their connection ads that are
-
seeking contact information like people's
e-mail addresses, phone numbers, names and
-
things like that. Presumably so they can -
you know - either get them to volunteer or
-
donate money in the future for the
campaign. There's move ads that are either
-
they're trying to get people to vote or to
attend some kind of rally or to volunteer
-
or something like that and then right
there's donation ads. And then finally
-
there's kind of commercial ads. These are
things either they are selling products
-
that are kind of directly critical nature
like a bobble head of some candidate or
-
they might be like solar panels which have
tax credits in the U.S. and things like
-
that. So there's some kind of commercial
good that's linked somehow to some
-
political messaging within here. So we use
this method and we were able to categorize
-
about 70% of the ads, we took a random
sample of them, we manually checked what
-
we were doing and we found it was pretty
accurate. About 96% accuracy we got using
-
this methods. The other thing that we did
is for the top advertisers, so for
-
Facebook the top 75% of the advertisers,
for Google the top 80% of the advertisers,
-
in terms of the money spent by the
advertiser. We went in and we manually
-
categorized what was this type of
organization. Was it a political
-
candidate, was it what's called a
political action committee, so these are
-
the PACs within the U.S., was it a
union, was that a for profit operation,
-
was it a non-profit operation. So on and
so forth and so we wrote like some regular
-
expressions that got us most of the way
there. Most of them have fairly uniform
-
naming conventions and for the ones that
we couldn't kind of automatically classify
-
we just did it manually, within a year.
And then since Twitter had so few
-
advertisers, we just did these all,
manually, within here. Now, right, we can
-
start to do some analysis. So the first
analysis that we did, the easiest
-
analysis, was we looked at the size of the
ads. And the thing that pops out is that
-
the majority of ads on all the platforms
are between $0 and $100 dollars. So these
-
are what are normally called the micro
targeted ads, that are typically seen by
-
less than a 1000 people within a year. So
these are very short lived, narrowly
-
targeted ads that are kind of honing in on
a specific demographic within here. So
-
these are these micro targeted ads within
here. And it appears, right, that the
-
majority of ads, especially on Facebook's
platform 82% of them, are of this micro
-
targeted kind of ilk within here. So it's
kind of confirms the reporting that people
-
had of this kind of trend of
microtargeting within political
-
advertising. The other thing, based on our
categorization we can look at how the
-
different platforms were used from within
here. The problem with these numbers is
-
that there was different inclusion
criteria within each of these databases.
-
And then right. Finally, we can kind of
look at the different types of advertisers
-
on these kind of platforms. And again it's
hard to read too much into these numbers
-
because again, right, Facebook included
much more of the commercial stuff. So
-
we're going to see a lot more of the
commercial stuff within here. And the the
-
final analysis of the entire data set that
we did was looking at right kind of
-
basically the ramp up to the election. We
cut this off in late October. This
-
analysis was done for a paper. So the due
date of the paper was ironically November
-
6 within here. So we cut it off a few
weeks later and we haven't regenerated the
-
contents since then. The one thing that
you can see is at the top there is that
-
green Spike. That's kind of the move ads.
So right, closer into the election the
-
campaigns were kind of doing sophisticated
get out the vote kinds of ads, within
-
here. So there were really sophisticated
kind of microtargeted ads that get out the
-
vote. Where like, it was almost kind of
spooky where like they knew where the
-
person lived that they were targeting and
so they gave them like directions on how
-
to get from where they live to their
nearest polling place within here. So
-
there are these really sophisticated kind
of get out the vote efforts that were
-
being run online, within here, towards the
end of the campaign. To kind of give you
-
more of a kind of apples to apples
comparison of these different ad
-
platforms, we also did some analysis kind
of narrowing each of the different
-
advertiser types to the ones that were
made transparent by all three platforms,
-
which were the federal candidates only.
And so this can give you some idea of kind
-
of a scale of these things. And we can see
that when we narrow it here we can still
-
see that Facebook has a lot more
advertisers and a lot more ads compared to
-
Google. However the spending numbers are
kind of comparable here. For Facebook
-
impressions and spends are ranges, that's
all that Facebook releases. For Google the
-
impression data is ranges, however we can
get exact spend data, because Google
-
basically released a weekly report of
exact spend numbers, aggregated by the
-
different advertisers, with here. So we
can use that, to get an exact number of
-
the spend. And again, right, Twitter's
numbers are much smaller in terms of
-
everything, within here. And we redid some
of our analysis to just see whether our
-
effects were simply a distortion based on
what was included in the archives. So
-
right we redid our ad size analysis and
even when we limit it to federal
-
candidates we can see this still holds,
that a lot of the ads on Facebook are so
-
these micro targeted ads. And they are
still micro targeted ads on the other
-
platforms, as well, within here. And right
this microtargeting of course varies
-
depending on the advertiser. So you take
someone like President Trump and he does a
-
lot of microtargeting. So almost all of
his ads probably about 90%, 95% of his ads
-
are micro targeted, within here. You look
at other candidates and they do much less
-
microtargeting, within here. So this is
definitely different strategies are used
-
by different advertisers, within here. But
when we look at it in aggregate, it still
-
appears that microtargeting is a very
popular strategy across advertisers. We
-
can also, right, look at some of the spend
type by ad type and this kind of shows you
-
a little bit how the different platforms
are used, within here. So Facebook's
-
platform looks like it's a little bit more
kind of informationally, it's still used a
-
lot for donations, whereas Google's
platform is used a lot more for donations
-
and a lot less for a kind of informational
ads and to connect within here. It's
-
really kind of hard to read anything into
Twitter's data because it's such a small
-
set of data. But from the data that we do
have it looks like there's a lot more kind
-
of collection of e-mails and things like
that, within here. The other analysis that
-
we did on the federal candidate ads was to
look at, that for Facebook in particular
-
right, we have the geographic impression
data from here. So we can effectively look
-
at how many states were targeted by each
ad with a Facebook advertiser. And the
-
interesting thing here is that right.
There was no presidential election. So
-
basically all these campaigns were
operating in one state. So their
-
constituents for all these elections were
essentially in one state, within here. And
-
so if you look at the inform ads, right,
most of those shown a very small number of
-
states. So the inform ads are mostly being
shown to the constituents that are
-
actually voting for that candidate.
However, if we look at that bottom line,
-
the kind of gold line, those are the
donation ads. And we can see that they
-
were fundraising in many more states
outside of their constituency, within
-
here. So FiveThirtyEight did an
interesting analysis of one particualar
-
candidate, Beto O'Rourke. He was a
candidate for Senate in Texas, Texas is a
-
very conservative state in the U.S., and
he did surprisingly well, within here. And
-
he kind of embraced online advertising and
online donations seeking, were kind of
-
cornerstones of his election, within here.
And so FiveThirtyEight did an analysis of
-
his donation records in the U.S., at the
federal level. All donations to candidates
-
have to be reported to the Federal
Election Committee. So this is all in a
-
database for the Federal Election
Committee the FiveThirtyEight people do
-
analysis And they kind of confirmed what
we saw on the donation ads, that he was
-
getting about 52% of his donations from
Texas and 48% from other states, primarily
-
kind of from coastal states that tended to
lean more liberal, like New York,
-
California, Washington and places like
that, was where he was donations seeking.
-
So this appears to be a very effective way
of getting small dollar donations kind of
-
throughout the U.S. within here, through
this online advertising. The last thing that
-
I'm going to talk about is the ad
targeting. Facebook didn't directly
-
release the ad targeting. However, we were
lucky enough and Pro Publica made a
-
browser plugin, that people can install in
their browser, and that's browser plugin
-
would identify what it thought was
political ads, based on a machine learning
-
algorithm. And for the political ads it
would upload these to their server along
-
with the targeting information. So, for
those of you with a facebook account, if
-
you're seeing ads you can actually click
on that ad kind of in the upper corner of
-
the ad and you can see why is this ad
targeting me, within here. And Facebook
-
will tell you a little bit, not all of why
you were targeted for this particular ad.
-
They will essentially show you the two
broadest categories of why you were
-
targeted for this particular ad, through
this feature they've added to their
-
platform. And this is this is actually
kind of interesting, this is something
-
that if you're a user of Facebook, I
highly recommend that you do. Because I
-
started doing it, and it was kind of eye
opening, as to the level of targeting that
-
was being done in terms of advertising.
That's kind of one thing, that we've
-
definitely learned from this is that when
you're seeing an ad, oftentimes there's a
-
very specific reason as to why you're
seeing that particular ad, within here.
-
And so we felt that it was very important
to, as much as we could, understand this
-
targeting that was going on within
Facebook's platform. So Pro Publica had
-
this browser plug-in and they had this
data set that anyone can analyze, with
-
here. So if you do have Facebook and
you're located within the US I would
-
highly recommend that you install this
plug-in, because it helps us to kind of
-
understand the political advertising in
terms of the targeting, within here. So we
-
took ProPublica's data set and we
effectively joined it with Facebook's add
-
transparency archive, within here. This
required us to scrape Facebook's ad
-
archive, because we needed the ad ID and
this is something that they don't expose
-
to their API, currently, within here.
However, they do expose it through their
-
user portal, within here. So we scraped
their user portal to join the specific ads
-
that were in the ProPublica data set to the
archive dataset, within here. And we were able
-
to join about 75% of the ads from here.
There were a lot of ads that were
-
collected by the ProPublica data set, that
just simply weren't archived by Facebook's
-
transparency archive. It misses things,
within here. It's imperfect as to how it
-
does things. And this would be another
interesting analysis to do, to understand
-
what is Facebook missing in their ad
transparency archive and this ProPublica
-
data set can allow you to somewhat do
this, although through bias of who
-
installs the Pro Publica plug-in in the
first place. So we we join these few data
-
sets, again with the caveat that the
ProPublica data set is, right, it's
-
obviously biased by the set of people that
installed it, which are probably not going
-
to be a normal representative set of
Facebook users, within here. But
-
unfortunately, it's the best thing that we
have in terms of a data set that releases
-
the targeting information, within here.
And so we collapse into three different
-
categorizations of targeting, within here.
I'll just quickly explain Facebook's ad
-
targeting platform for people that don't
know about it. So one way to target ads
-
is, right, through interest or segments,
right, age segments, gender segments or
-
interests like I showed you before, within
here. So this is one way to target ads
-
within Facebook's platform. Another way to
target ads is through uploading lists of
-
information. So you can upload lists of
people's phone numbers, people's email
-
addresses or their names. And then when
you upload this list Facebook will find
-
those profiles within their database, so
they'll basically join those emails with
-
the emails that were entered by the users
accounts, and then they'll target these
-
people. So they'll create what they call
an audience of these people through this
-
personally identifiable information and
then they'll target them, through this
-
method. The final kind of major form of
targeting that Facebook offers is through
-
what they call these lookalike audiences.
So this is where you can upload PII
-
information, like email addresses, phone
numbers, names. Facebook will link them to
-
their accounts and then they'll look at
kind of the interests and things that
-
these users and then they'll find you
other users, not these users, but other
-
users, that have a similar kind of profile
to these users within here. So these are
-
the lookalike audiences that Facebook
offers within their platform. And so we
-
categorized it by this and again by
advertiser type, within here. So the thing
-
that stands out is, right, is that the for
profit companies are doing a lot of
-
targeting based on interests and segments.
So they probably don't know who their
-
people that they want a message to are and
they're doing it mostly by interests and
-
segment. Whereas when you look at the PACs
and the political candidates they have
-
lists. So they have a lot of lists of
people's you know email addresses, phone
-
numbers, names, of things like this. And
they're plugging these into Facebook's
-
system. And this is how they're targeting
a lot of people, within here, is through
-
these lists. And this was expected, but
it's interesting to kind of quantify how
-
much of this is happening. And then the
lookalike audiences are also being used, a
-
good deal by everyone within here. And
this kind of makes sense, right? Because
-
if you have a list of people then you
advertise to them but then right you have
-
this lookalike audience of people that are
similar to them that are also perhaps good
-
people to advertise to, as well, within
here. The other thing we can do is break
-
this down by the intent of the ad here,
and this shows the difference even more
-
starkly, of the difference in behavior
between the commercial people and the
-
noncommercial people. The commercial
people are targeting mostly based on
-
interest, whereas the other people that
are, say, looking to connect with people,
-
they're the ones that are using the most
lookalike audiences. And this makes
-
perfect sense because right the connection
ads are there to get people's e-mails,
-
addresses, phone numbers, names and things
like that. So when you use the look like
-
audiences then you can, right, generate
more lists of people they'll convert for
-
whatever you want and then you can
retarget them with the direct lists
-
targeted ads, later on. So this all makes
pretty good sense when you look at how
-
this is behaving, from here. But again
it's interesting, right, kind of make this
-
transparent for people to understand how
targeting is happening within the U.S.
-
political advertising sphere, within here.
So these were pretty much the two major
-
analyses that we did in terms of
targeting, within here. The final part and
-
the part that kind of makes the juiciest
of stories is kind of the more dubious
-
advertisers that are advertising within
these platforms in terms of political
-
advertising. So we kind of call these more
politely kind of "new types of
-
advertising", within here. The first type
is one that you would you would pretty
-
much expect, so this is this corporate
astroturfing kind of stuff, that's going
-
on, within here. We see these ads for
assistance for tobacco rights. And I
-
pretty much expected that you look up this
group and it's probably going to be some
-
you know quasi nonprofit that's supported
by some industry money from the tobacco
-
lobbyists, or something like that. That's
pretty much what I expected to see when I
-
saw these ads. You go to this website and
it's actually pretty honest as to what it
-
does. This is probably because right of
all the lawsuits and regulations around
-
tobacco in the U.S. in advertising. But
the website clearly states, right, that
-
it's operated by Philip Morris, the
tobacco company, within here. And this
-
actually isn't a legal entity, this
citizens for tobacco rights. Is just
-
simply a website that's been stood up,
that's owned and operated by Philip
-
Morris, as far as we can tell, within
here. And this gets to a big problem with
-
Facebook's transparency archive, which is
that they don't actually vet that
-
disclaimer string of the sponsor, within
here. So pretty much anyone can type
-
anything that they want within that
disclaimer string and Facebook will allow
-
you to run it. We've tested it and as far
as we can tell, you can't say that you're
-
from Facebook, Instagram or that you're
Mark Zuckerberg, they'll block that. But
-
pretty much anything else that you type in
there they'll allow that ad to run, within
-
here, with no vetting. So we discovered
this, we politely, privately mentioned it
-
to Facebook. Some reporters kind of
trolled Facebook within here and so there
-
was a reporter that trolled Facebook and
opened up ads for all the senators, within
-
here, on Facebook. And of course Facebook
approved them all, from within here, and
-
they they did some other things to troll
Facebook where they insert some other
-
advertisements, within here. But the point
is, that that disclaimer string is not
-
vetted within here. Google actually does
that disclaimer string within there, so
-
they require either a tax ID number or a
federal election committee I.D. number and
-
they actually do vet it and they publish
that tax I.D. number or federal election
-
I.D. number along with the disclaimer
string, within here. Which makes it really
-
easy to track down advertising on Google.
On Facebook, because right they can
-
basically type in whatever they want in
the disclaimer string, it makes it much
-
more difficult to actually link these
advertisers. And sometimes just outright
-
impossible, if the disclaimer string is
made up or just too mutilated in some way
-
or form, within here. So this is
definitely a problem, where we have these
-
lobbyist organizations, or in this case
not even lobbyist organizations, just
-
industry, that can effectively lie about
who's paying for this ad in Facebook's
-
platform. The other thing we found were
what is now kind of being called these
-
junk media outlets. So this is for profit
outlets that are claiming that they're
-
doing kind of news operations. But right.
It's not really traditional kind of
-
reporting journalistic things. It's more
just kind of propaganda messaging, within
-
here. So there is this group called New
American Media Group LLC. They also ran
-
the name of New Democracy, or sorry
Democracy Now was their other name, within
-
here. And so they ran this, within here.
We tracked down these LLCs and they were
-
just simply shell companies and that kind
of led to nowhere, within here. We worked
-
with a journalist from The Atlantic that
actually did a lot of digging into the
-
shell companies. And he was able to,
through his basically investigation, link
-
these companies to the actual entity that
created these shell companies and was
-
running these ads, within here. And so
when we did our analysis of this, this
-
company basically this third party
advertising company was creating these.
-
They're meant to look like kind of
grassroots kind of organizations. There
-
were, a lot of them were kind of targeted
at more conservatively leaning groups, but
-
then they would bombard them with liberal
messaging, within here. So they would
-
create these fake communities that looked
more conservative. And then once they
-
attract an audience they would bombard
them with these liberal kinds of
-
messaging, within here. And so this
particular company is based in Colorado.
-
It's called MOTIVE AI. Apparently, it's
hoping to become the Cambridge Analytica
-
of the liberal side. I don't know if
that's something to aspire to or not. Some
-
other journalists also did some digging,
within here. There was some journalists
-
from ProPublica that did some digging,
within here. They found more of this
-
astroturfing by political lobbyist groups
and things like that. Big oil insurance
-
companies, again when they advertised on
say Google's platform they would be honest
-
about their disclaimer string, and then
when they advertised on Facebook's
-
platform they would often kind of
obfuscate their disclaimers string, to
-
make it more difficult to link them
together. And so they unmasked a whole
-
bunch of these other kinds of junk media
operations, as well, that were kind of
-
spreading propaganda, within here. I'm
picking on Facebook a lot. Again Google
-
does vet the tax I.D. number of these
people, but you see something like, right,
-
this DIGICO LLC that paid for some ads. So
you track this down, and this is again one
-
of these third party advertising agency.
It's easy to track down because of the tax
-
I.D. number. But it still doesn't actually
tell you who paid for the ad. It just
-
tells you the third party that, right, it
presumably was paid on behalf of someone
-
else to run these ads, from here. So this
is a big problem with these disclaimers
-
strings, that oftentimes they don't
actually identify the person that's paying
-
for the ad. So to kind of wrap this up,
within here, after our kind of experiences
-
looking at these transparency archives I
would say they're fairly adequate to
-
understand good actors. So we could fairly
well understand how good political
-
advertisers were behaving in Facebook's
platform. However, right, for the bad
-
advertisers, we probably missed a lot of
them because they could just simply type
-
in lots of different disclaimers strings
and easily avoid our analysis, at this
-
point. None of these current archives have
it just right yet. All of them have
-
issues, right. Facebook isn't providing
good access to their data. They're not
-
releasing targeting information. Google is
missing 30% the content because of third
-
parties using their advertising system.
They're not releasing spend and impression
-
information based on demographics, within
there. Twitter just simply hasn't hired
-
someone to enforce the policy of
transparency, well, within here. And
-
unfortunately our experience throughout
this process has been that these companies
-
are oftentimes reactive, instead of
proactive, within here. Which means that,
-
right, we have to continuously put
pressure on them, in order for them to
-
kind of improve these archives, within
here. So this is unfortunately kind of the
-
state that we're in, within here. And I'm
sure, one thing that I really want to give
-
a shoutout, is right there's people at
these companies that are actually trying
-
to build these transparency archives. And
I want to give them a lot of credit for
-
taking on this task, that's probably not
well rewarded within their companies, of
-
building these transparency archives,
within here. And so my hope is that by
-
applying pressure we can get them more
support to kind of get more resources and
-
be able to make more transparent, within
their companies, as well. Because I hope
-
that, right, this puts us in better shape
to understand the 2018 elections, but 2020
-
is another presidential election and my
hope is that we'll continue the improved
-
these archives, so that we'd be in a much
better position to understand both the
-
good and the bad advertisers by 2020, with
here. However this is going to take
-
probably regulatory pressure, legal
pressure, pressure by technologists and
-
things like this to improve these
archives, at this point. So with that,
-
again, I have my collaborators, that
aren't here on the stage, but they
-
definitely did a lot of the heavy lifting
to make this happen, within here. And
-
again all of our tools and most of our
data except for the Facebook data, that's
-
under NDA, is available through our
GitHub, there. And so with that I will
-
open it up to questions.
-
applause
-
Herald: Thank you so much Damon. I know
-
that there are a few questions among the
audience. So, microphone 6 please.
-
Question: So [Name] on the IRC is asking
"Have you looked at links between the
-
advertisers and do they use the same
images or text for instance?".
-
Answer: This is a really good question.
This is actually one of the analysis that
-
we're currently doing. So we're starting
with the text, because that's obviously
-
the easiest. But we're also exploring some
image clustering algorithms, as well. To
-
cluster the advertisers across platforms
and also within platform because we're
-
finding a lot where, you know, they create
multiple shell companies, where they just
-
lie about their disclaimers and so this is
definitely something that we're focusing
-
on, is better clustering of the
advertisers. Because like that group
-
MOTIVE AI, even though they created the
different LLCs they were running the same
-
images and videos across their different
LLC shell companies.
-
Herald: Great thank you. Please if you
have any questions, queue up by the
-
microphones. Microphone number 1 please.
Question: Hi, Oliver Moldenhauer? Thanks a
-
lot for the talk. Definitely one of the
best I've seen here so far. Two questions.
-
A: Why do those transparency archives
exist? Was there some law or political
-
process around that? And B: As we are
nearing the European election next year,
-
what kind of data is available for Europe?
Answer: That are both good questions.
-
Again I'm not intern in one of these
companies, so I can just speculate as to
-
why these transparency archive exists. But
my my guess is, right, that this was
-
reactionary. So Mark Zuckerberg and high
ranking officials from Twitter and Google
-
were hauled in to testify in the House and
Senate, and this is them trying to self
-
regulate instead of having regulation
imposed on them by people. So that again,
-
this goes to the pressure part is that
there was regulatory pressure put on them,
-
the threat of regulatory pressure and so
that's what made them do these
-
transparency archives. In terms of what's
available in Europe. I guess as long as
-
the UK is still in the EU, kind of
teetering Facebook has started to make ads
-
transparent in the UK. They also make them
transparent in Brazil and they're going to
-
make them transparent in India. And I
think they have plans to make them
-
transparent in other places, in the EU as
well. However, they haven't done that.
-
However, again this goes back to the
pressure part. So there's no API for the
-
other countries, there's only an API for
the US and that might be because we put
-
pressure on them by scraping them and
publicly releasing their data. And, right,
-
there's no transparency reports for other
countries, as well, there's only
-
transparency report for the US. And again
that might have been because we applied
-
pressure and we were publishing numbers.
Some of the numbers in terms of spend were
-
very low, because, right, they were just
giving us ranges. So we might have been
-
making them look bad, when we took the
bottom range their spend and they might
-
have wanted to correct that with their own
transparency archive, as well. So again, a
-
lot of this unfortunately requires
pressure to get them to improve their
-
transparency efforts.
Herald: Great thank you. Microphone number
-
two please.
Question: So you mentioned you mentioned
-
FiveThirtyEight and their work on the
donations. Do you think it makes sense
-
to combine the data you gathered with what
they have to look at election outcomes,
-
like, election results and turnout and
stuff like that?
-
Answer: Yes. Actually this is the number
one project on our road map, right now.
-
Is, actually Google has processed the FEC
information and they've made this
-
information available via their big query
database. So we've downloaded this, we've
-
manually linked the Facebook advertisers
and the Google advertiser to the FEC data
-
and now we're doing the regression models,
specifically focused on the donation ads
-
first. Because those are what are reported
to the FEC, at this point. So we are
-
essentially trying to understand how
effective these donation ads are at
-
actually driving donations, within here.
Herald: Thank you. Microphone number 4
-
please.
Question: Hi. First of all thank you Mr.
-
McCoy and your team for this very
interesting research. I was wondering,
-
whether you know if there are any follow
up research conducted by political
-
scientists, sociologists etc. analyzing
the political repercussions of these ad
-
campaigns.
Answer: Yes, so we're aware of a few
-
efforts. I don't want to out the teams
that are doing them, in case they don't
-
want to be outed. There's there's nothing
that's been published, publicly I believe
-
on this. But we're definitely trying to.
That's one of the main goals of kind of
-
our overarching online political
advertising transparency thing, is to try
-
and get as much data as we can in the
hands of less technical people in an easy
-
way for them to analyze. And so this is
basically the primary goal of our project,
-
in here. So we've been working as hard as
we can to get political science to stay up
-
to speed on the data. And this is why it's
really unfortunate that Facebook has its
-
NDA in place for their particular data,
because this makes it very difficult for
-
us to share and collaborate in that
particular data. Which puts pressure on us
-
unfortunately as being the only ones that
can do some of this analysis right now. So
-
this is why I would I would love to apply
enough pressure to Facebook, to get better
-
access to their particular data.
Herald: Yes. And the question from the
-
Internet please.
Signal Angel: So Nomad is asking "Why are
-
those advertisements considered political or
election interference in the USA. Can't you just
-
see, that someone paid money to display
that content and conclude its purpose is
-
to promote an agenda or manipulate them?".
Answer: This is a good question. Right, a
-
lot of this goes to the tactics that
they're using here. So again they're
-
creating these communities, that they're
making look like their grass roots
-
communities and then they're kind of
sucking people in with these ads, that up
-
until recently had no disclaimers string
on them. So you had no idea who paid for
-
them. So they appear to be paid for by
kind of these grassroots organizations. So
-
you felt like you were, kind of, part of a
grassroots movement, enjoining these kinds
-
of communities. I think this is the really
scary, kind of subtle things. And you
-
might not realize why you're being
targeted for these particular ads or who
-
was behind these particular ads. So, I
think it was really easy for people to
-
kind of get unwittingly, kind of, duped
into joining what looked like these
-
grassroots campaigns. So that's why I
think improving these disclaimers strings
-
and showing who is really behind these
communities and these advertisements is
-
really important, to dispel this notion of
these fake grassroots communities, that
-
are luring people in within here. So I
think that's one of the big things that
-
can be gained by these transparency
archives. But it requires improvement of
-
the transparency archives, to do that.
Herald: Microphone number 3 please.
-
Question: Yes. So I'm curious about the
efficacy of some of the advertisements
-
that are on Facebook and Twitter. And I'm
wondering is any group like the ProPublica
-
web extension checking the engagement
rate? Like the number of comments, the
-
number of views and the number of shares,
to like kind of get an estimate of, OK
-
this big grassroots community is building
up a number of followers and these
-
followers population sizes and whatnot.
Answer: Yeah, this is again a really good
-
question. This is something that we are, I
would certainly encourage other people to
-
potentially do as well. So the problem is
that a lot of that information isn't
-
exposed by the transparency archives. This
is more of what they call kind of the
-
organic information, the non paid for
information, within here. And so this is
-
stuff that none of the platforms are
releasing. And so it requires kind of a
-
scraping operation, essentially, to gather
this information and collect it. And it's
-
something that we're definitely thinking
about how to efficiently do, is how to
-
efficiently scrape and collect this
information. Because this is very hard
-
because, right, you go against the anti
scraping teams of these companies, that
-
are well resourced. And this requires
accounts, and these accounts are going to
-
be shut down and detected. So this is
something that we're trying to pilot to
-
understand. Our other idea of how to do
this potentially is try and crowdsource
-
this information. This is similar to how
ProPublica crowdsourced it for the browser
-
extension information. We could
potentially crowdsource it, where you
-
know, when people interact with these
communities or these ads the plug-in could
-
potentially crowdsource that information
back to us. And then we would have to
-
figure out some strategy to sanitize that
information in some way. Because at that
-
point you might have some sensitive
information they are collecting. This is
-
something that we're thinking about. We're
cautious, I think, rightly so because this
-
can start stepping on, again, more
sensitive information that's available
-
from within here. But I think it's
definitely key to understanding the
-
effectiveness of these ads. Something that
we're going to have to do or we're going
-
to have to convince Facebook somehow to do
on our behalf in order to really
-
understand the effectiveness of these ads.
Herald: Thank you. Last question for
-
microphone number 1.
Question: All right. At the beginning of
-
your talk you explained how Russia
influenced the elections. I'm curious
-
about the attribution. Is there possibly
any doubts at any instance that you
-
presented that it was not Russia or maybe
some other country, China or Iran? How do
-
you know, and did you check the facts?
Answer: I mean, that's a good question.
-
Unfortunately, right, the national
security agencies don't release the
-
sources of their information. There's
another investigation done by the
-
Department of Justice by Robert Mueller,
that did release some more information
-
about this, within here. I've looked at
that information and it looks, you know,
-
right, you can never a 100%, unequivocally
state that it was Russia. It could have
-
been a false flag operation. But I think
that pretty much the overwhelming
-
information that everyone has found when
they've investigated this has pointed at
-
Russia and the organizations that were
prosecuted by Mueller.
-
Herald: Damon McCoy, thank you very much.
Please give them a great round of applause.
-
Applause
-
35c3 postroll music
-
subtitles created by c3subtitles.de
in the year 2019. Join, and help us!