35c3 preroll music
Herald: This talk will be held by Damon
McCoy. He will be explaining online U.S.
political advertising and he has been
working with researching like how
different online communities basically
behave around many different topics. But
this is what he's going to talk about
today so please give him a great round of
applause.
Applause
Damon McCoy: Thank you everyione for coming. I'm
up here speaking and I'm the only one that
wanted to fly to Germany over Christmas
and New Year's. However there were three
real people that were really key in
helping out with this research and before
we get started, I just want to credit
them. One is my grad student Laura
Adelson. She did a lot of the analysis
that you're going to be seeing generated
all the graphs. One of the undergraduate
students that's from our NYU Shanghai
campus secure did a lot of the work to
collect all the data that you're going to
see here. And then Raytown who is a
professor at NYU and the Shanghai campus
also helped out with kind of our initial
efforts of collecting some of this data.
And so before we get started I guess I'll
give a little bit of an introduction about
myself. I'm a professor at NYU tenant
school of engineering. As was mentioned
before I do a lot of stuff kind of looking
at how technology kind of impacts the
security and privacy of you know society
groups of people and things like them. So
this was really kind of an opportunistic
project that kind of captured the impact
of online advertising in the political
sphere of U.S. campaigns. And also quick
plug. So everything that I'm going to be
showing you most of the data scripts and
things like that we've put in a github
that's accessible by anyone that wants to
analyze the data or look at our scripts
and improve them or things like that.
Applause Thank you. This is the first
time that I've given this talk outside of
the U.S. so let me just start with some
quick explanation as to how U.S. elections
work for those of you that might not know
about this. So every two years in the U.S.
we hold federal elections. These are
elections - right - that impact all of the
states within the U.S. And so every four
years we have an election for president,
2018 were our last elections. This was not
a presidential election year so the
elections were for the Senate and the
House seats at the federal level. And then
we also had elections for state and local
positions as well with in here. And some
of them that are captured in our data
especially our Facebook data not so much
in our Twitter and Google add transparency
data that we have here. So this will be
focused this talk on the 2008 elections
that happened on November 7th. Election
day is always the first Tuesday in
November in the U.S. every two years. So
to begin with in the background right
probably some of you know about this, some
of you might not know about this, but in
the 2016 elections which were a
presidential election year there was right
this election interference that happened
here. And so Facebook has released these
ads. These ads were paid for by a Russian
company the Internet Research Agency that
ran these ads. And Facebook released these
to - right - the Senate and then the
Senate re-released these publicly to
people. And so this is an ad basically
trying to disenfranchise people from
voting in the elections. And you can see
right it's targeted at people in the U.S.
of a certain age range and interests like
Martin Luther King, African-American
culture, African-American civil rights.
Facebook doesn't actually allow you to
directly target to people based on like
their ethnicity. So this is a pretty good
proxy though. If you target these kinds of
interests for this, so this would probably
be fairly effective at targeting African-
American people within the U.S. for this
ad to try and disenfranchise them from
voting in the elections. There were other
ads , right, that tried to do
misinformation distance information kinds
of campaigns. So this was an ad that was
again paid for by the Russian agency that
was trying to perpetuate this rumor
basically unsubstantiated rumor that Bill
Clinton has this ill legitimate child
within here. And again, right, the
targeting information is targeted at
African-Americans within the U.S. and
African-Americans are kind of a key voting
block for a kind of more liberal,
democratic people within the U.S.
oftentimes. That's probably the other
thing that should have explained about the
U.S. election system especially at the
federal level is that we effectively have
two parties that are you know when any
meaningful amount of elections within the
U.S. and one of them is the Democratic
party that tends to skew more liberal. So
they're more kind of right for bigger
government, more social services and then
we have the Republicans which skew kind of
more conservative wanting kind of a
smaller government providing less services
and kind of less regulation around things
as well. And so right these are two
examples but there were a whole bunch of
these ads that were shown on Facebook
within a year. So right. Pretty much all
of these that they're tried to
disenfranchise people or tried to kind of
create chaos, kind of polarize people
around the election oftentimes with kind
of disinformation sorts of that things.
And so in 2017 our Office of the Director
of National Intelligence put out a report
- sorry for the big block of text, this
will be the only big box of text in here.
But I thought it was kind of important to
show this because they they pretty much
unequivocally state that Russia tried to
interfere in the US elections and that
Vladimir Putin was somehow involved within
this interference. And so this is - this
is pretty much as far as the National
Security Agency, the CIA, NSA pretty much
solid evidence that they have - that this
occurred within here. And so the other
thing that broke was right the Cambridge
Analytica scandal as well broke within
Facebook where there was this straight
third party advertising agency that
collected you know a whole bunch of data
on 80 million profiles within Facebook and
then tried to create psychological
profiles for targeting and messaging and
things like that around here. And so these
two particular scandals broke within here
and the first result of this is we have
Mark Zuckerberg in a suit a real suit not
a hoodie suit. Testifying right in front
of our Senate within a year. And so he,
right, he testified before House and
Senate committees about the abuses
occurring within Facebook. And he did this
on April 10th and 11th 2018 within here,
and right. So this is right. In here, he
did admit that Facebook had made mistakes
and that they they need to improve things
moving forward within their platform. The
most kind of tangible outcome from these
testimonies were these transparency
archives that began to appear. And here is
a view of what Facebook's add transparency
archive looks at. When it originally
deployed you needed a Facebook account to
interact with it. Now Facebook has dropped
the requirement so anyone with Internet
access, unless you're censored somehow,
can go to this archive and access these
ads. So the user facing portal for these
add transparency archives you type in
keywords and then it basically does is
write a pattern matching on the ad text
and other parts of the ad and then returns
the ads that matches that within here. And
so then you can see all the political ads
that match that particular term within
here. Facebook began archiving these ads
kind of at a large scale starting on May
7th 2018 by Election Day November 7th
2018. There were 1.6 million ads paid for
by over 85 thousand advertisers within
Facebook's platform. Facebook is actually
fairly broad as to what they included
within their political archive. They
included any ads related to US elections,
either federal state or local elections.
They also included these very important
kind of issue ads as we saw when we looked
at the Russian interference a lot of times
the ads didn't mention actual political
candidates, they mentioned kind of
polarizing issues within the U.S. So
Facebook also included these ads of
political national importance. They had a
list of I think about 13 different
criteria is the last of them being values.
And so it was a fairly encompassing set of
ads they tried to include within their
archive. Along with the text and images
or videos of the ad, they also included
basically ranges of geographic impressions
and demographic impressions. So right.
State level impression information in some
kind of ranges and demographic by gender
and by age kind of bucketed within here.
And they did this again for impressions
and then for they also included some spend
information again kind of in ranges so
they gave ranges of 0 to 99 dollars, a
hundred dollars to say like 500 dollars,
501 dollars, 2000 dollars and so on. And
so forth within these buckets one of the
key pieces of information that they did
not release was the targeting information.
So like I showed you before of those ads
they - right - they have that targeting
information, Facebook does not release
that within their transparency archive.
They have this right. They have right-
thay had that user portal where you could
do the keyword search from within there.
However right. I'm I like to do large
scale data analysis and so I wanted to
basically try and collect all of the ads
within this web portal. And so initially
all they had was this keyword search
portal within here. And so what we did is
we compiled kind of a large list of what
we thought were reasonables of keywords,
names of prominent politicians, names of
states issues within here. And so we tried
to compile this long list of keyword
searches and we began scraping the
reporter within here and I'll tell you the
story of how our scraping efforts went.
Now currently they are so off for a API
it's still keyword based their API and
it's restricted by an NDA so I'll kind of
flesh out the story of how this goes. So
at the beginning they, they released this
kind of towards the end of May the user
archive and I played with it and I
realized that this didn't lend itself well
to kind of large scale analysis of these
ads and so on. I went to my students
Secur and Laura and Secur worked kind
of furiously night and day and within
three days he had a workable scraper that
was able to put in our keywords and then
we were able to scrape all the results
from our keyword within here. And so we
ran the scraper for about 2 months and
then we released a report. Just kind of a
very general statistical report and we
released the data in our github archive at
that. After that about 2 weeks later
Facebook began anti-scraping measures
within here. And so, right, this kind of
hampered our efforts to scrape Facebook's
archive. At this point. I'm - I don't want
to attribute any malice. I don't believe
that Facebook was targeting just our
scraping efforts they were targeting
everyones scraping efforts. The
transparency whether it's wrong or right
to block people from collecting data on a
transparency archive I might kind of
quibble with them on that and say they
might once provide better access to the
data within their transparency archive.
But this was the choice that Facebook made
to kind of clamp down on the scraping
within here. So we tried to fight with
them a little bit to - right kind of a can
mouse game. You know we make some changes
to our scraper to avoid their anti
scraping. They do some things on their end
to block our scraper and probably other
people scrapers that are doing similar
things to us as well within here. And so
this persisted for probably about 2 weeks,
and then Facebook basically deployed their
API within here. However they said right,
their API is very limited and still in
beta at this point. So these were part of
the terms and conditions from here. One of
the ones that I found kind of the most
unease (?) is that it limited it only to
U.S. people so we could essentially only
very closely work with U.S. people within
here and at least it did kind of - it
limited the types of people that we could
work with in here. And so right
unfortunately this kind of ruled us out
from working closely with journalists from
you know really good news organizations
like the Guardian and so like that just
happened to have the misfortune of being
located somewhere outside of the U.S.
within here. Maybe the good fortune, yes.
And then the list of restrictions
continue. They also placed the data
retention on it so we could only retain
the data for one year. Again placing data
retention. So Facebook's data retention on
their archive is 7 years within here but
they're placing a 1 year data retention on
the data that we collect from their NDA.
I'd like to say that - right - we - right
- I got this NDA and I lit it on fire, I
tore it up and we continued to scrape the
archive. Within a year. No unfortunately
it was a hard call to make but right you
know there's basically two students and we
basically had to make a call whether we
wanted the data to analyze or whether we
wanted to spend all of our time kind of
faded - fighting with Facebook's anti-
scraping efforts. And so in the end we did
- I did in fact agree to their NDA within
a year. So the initial data we scraped, we
release were still scraping a small amount
of data that we do release as well from
here. But unfortunately at this point any
of the data that we collected from the NDA
we cannot release within here. If anyone
doesn't want to fight with Facebook and
resurrect the crawler within here I would
be more than happy for that to happen
within a year. Unfortunately given our
engineering constraints it just simply
wasn't feasible for us to do that within a
year. And so the story is a little bit
different with Google. So Google's archive
they began archiving ads on May 31, 2018.
By election day they had 45.000 and from
600 advertisers. Their criteria for
introducing advertising was much more
narrow than Facebook's, so they only
released ads related to U.S. federal
candidates and federal office holders
within here. So it is a much more limited
set of data that Google released within a
year. None of the issue ads that Facebook
released. They didn't release any of the
geographic or demographic data by
impression, they did release ranges of
impressions and ranges of spend data, and
they did release some limited targeting
data from here so they released geographic
and demographic targeting information
which Facebook hadn't released in their
ads. And their data is available through a
similar keyword based portal. But they
also make it available through just a
database, if you want to within here. So
this is what their portal looks like
within here. And this is - right - their
big table, sorry, their big query database
that they released from here. And so they
updated every week within here and you can
download it and analyze the data
relatively easily within here. So the last
one to kind of implement their archive was
Twitter. Twitter began archiving ads on
June 27, 2018. The scale of ads and
Twitter is very small compared to the rest
of them. The scale of their ad network in
general is much smaller than Google and
Facebook's. And what they included, it was
similar to what Google included in terms
of - right - only federal candidates
within here. Kind of closer to the
election, they also said that they were
going to release political issue ads.
However, the mechanism of enforcement
doesn't appear to exist within Twitter's
system. There doesn't appear to be anyones
job it is to actually enforce transparency
of ads from here. So we've been kind of
manually finding accounts and reporting
them to Twitter within here and then when
we manually report them to Twitter,
Twitter then includes them and future
transparency kinds of efforts within here.
But it appears like we're basically the
ones [Damon McCoy] short laughter it's
become our job to monitor the Twitter
accounts and then notify Twitter and then
they'll manually kind of deal with it.
Unfortunately, they still don't appear to
have a person that actually manages this
process internal to Twitter at this point.
Twitter does however release the most
information. So they release exact data
not the range data on impressions and
spend information, also by geographic and
demographics and they also include all of
the targeted information as far as we can
tell and their data is available through
without an account. Basically through
their portal and we've been scraping them
and there's been no problems they haven't
blocked us at this point. So we just
simply scraped their data and then we
republish it to github at that point. And
we've had no problems with Twitter in this
way in the scale, their data is so small
that it's been relatively easy to keep
pace with it at this point. And here's
just a picture of the Twitter transparency
archive and again this have a list of all
the Twitter accounts that they include in
their transparency archive. So we can
monitor this and then we can monitor other
people that we know that are politically
active when we see them doing paid
advertising then we can notify Twitter and
then Twitter will include them in their
transparency archive normally within like
a week, or so of here. And so this is,
this is kind of the background that you
need to understand the transparency
archives. So now we have a data set that
we can begin to analyze within here. For
Facebook since it was the keyword driven
thing at the beginning and it still is, we
were able to collect about 80% of the ads
in Twitter's database from there. The
other problem with the API is that it is
severely rate-limited at this point. I'm
talking about 3 to 4 queries per minute
that we can get through Facebook's API at
this point. And so we kind of did our best
effort to collect as much data as we could
from Facebook. About two weeks before the
election, Facebook began releasing a
transparency archive that included
basically an aggregated list of all the
advertisers and how many ads they have and
how much spent and this is how we can tell
that we got about 80% of the ads from
Facebook's archive based on this within
here. And the nice thing about the
transparency report is that we could go
back and now that we know we're missing we
could readjust our usage of the API and so
now we have virtually 100% coverage of
Facebook going forward within here.
Twitter - right - we could collect 100% of
their data. And again we've republished
the SOL (?) in an easier to process kind
of form. Google again - right - we have a
100% of their ads because they're all in
the big query database. However when we
started analyzing the data we noticed that
for a lot of the ads we're missing the
actual content, the images and text of the
ad. It turns out that for Google's ad
network if the ad was originally purchased
through a third party advertiser and then
run on one of Google's properties the
content of the ad won't be archived within
your system. This is unfortunately a big
loophole. So - right - if you're if you're
running a kind of malicious misinformation
thing, you can easily unfortunately
circumvent Google's archive at least from
archiving your content by simply just
paying for it by a third party within
here. It's unclear whether this is a
policy limitation or whether this is the
technical limitation on Google's part, but
the outcome is that we only have the
content for about 70% of Google's ads that
were paid for directly on Google's
platform and within here. So one of the
first things that we want to do is kind of
add some semantic meaning to these ads a
kind of large scale. And so we played
around with a few techniques, some fancy
kinds of natural language processing and
things like that. But we found that
there's actually a really fairly simple
and effective way of categorizing kind of
the intent of the ad, and that's that most
of these ads have a URL of some kind and
a lot easier or else just point back to
like third party services like if you're
holding some kind of event you're going to
coordinate it with like Everbright or
something that if you're seeking
donations, if you're a Democrat you're
going to use this third party Paron
processor they're called Act Blue, if
you're Republican there's like two or
three payment processors that you're going
to use for this. So we could simply just
look at these really prominent URLs that
occur a lot of times and just kind of
manually tag what is the purpose for this.
And by doing this we can tag ads as either
just purely informational that they wanted
just kind of get some kind of message
about the candidate either positive or
negative out their connection ads that are
seeking contact information like people's
e-mail addresses, phone numbers, names and
things like that. Presumably so they can -
you know - either get them to volunteer or
donate money in the future for the
campaign. There's move ads that are either
they're trying to get people to vote or to
attend some kind of rally or to volunteer
or something like that and then right
there's donation ads. And then finally
there's kind of commercial ads. These are
things either they are selling products
that are kind of directly critical nature
like a bobble head of some candidate or
they might be like solar panels which have
tax credits in the U.S. and things like
that. So there's some kind of commercial
good that's linked somehow to some
political messaging within here. So we use
this method and we were able to categorize
about 70% of the ads, we took a random
sample of them, we manually checked what
we were doing and we found it was pretty
accurate. About 96% accuracy we got using
this methods. The other thing that we did
is for the top advertisers, so for
Facebook the top 75% of the advertisers,
for Google the top 80% of the advertisers,
in terms of the money spent by the
advertiser. We went in and we manually
categorized what was this type of
organization. Was it a political
candidate, was it what's called a
political action committee, so these are
the PACs within the U.S., was it a
union, was that a for profit operation,
was it a non-profit operation. So on and
so forth and so we wrote like some regular
expressions that got us most of the way
there. Most of them have fairly uniform
naming conventions and for the ones that
we couldn't kind of automatically classify
we just did it manually, within a year.
And then since Twitter had so few
advertisers, we just did these all,
manually, within here. Now, right, we can
start to do some analysis. So the first
analysis that we did, the easiest
analysis, was we looked at the size of the
ads. And the thing that pops out is that
the majority of ads on all the platforms
are between $0 and $100 dollars. So these
are what are normally called the micro
targeted ads, that are typically seen by
less than a 1000 people within a year. So
these are very short lived, narrowly
targeted ads that are kind of honing in on
a specific demographic within here. So
these are these micro targeted ads within
here. And it appears, right, that the
majority of ads, especially on Facebook's
platform 82% of them, are of this micro
targeted kind of ilk within here. So it's
kind of confirms the reporting that people
had of this kind of trend of
microtargeting within political
advertising. The other thing, based on our
categorization we can look at how the
different platforms were used from within
here. The problem with these numbers is
that there was different inclusion
criteria within each of these databases.
And then right. Finally, we can kind of
look at the different types of advertisers
on these kind of platforms. And again it's
hard to read too much into these numbers
because again, right, Facebook included
much more of the commercial stuff. So
we're going to see a lot more of the
commercial stuff within here. And the the
final analysis of the entire data set that
we did was looking at right kind of
basically the ramp up to the election. We
cut this off in late October. This
analysis was done for a paper. So the due
date of the paper was ironically November
6 within here. So we cut it off a few
weeks later and we haven't regenerated the
contents since then. The one thing that
you can see is at the top there is that
green Spike. That's kind of the move ads.
So right, closer into the election the
campaigns were kind of doing sophisticated
get out the vote kinds of ads, within
here. So there were really sophisticated
kind of microtargeted ads that get out the
vote. Where like, it was almost kind of
spooky where like they knew where the
person lived that they were targeting and
so they gave them like directions on how
to get from where they live to their
nearest polling place within here. So
there are these really sophisticated kind
of get out the vote efforts that were
being run online, within here, towards the
end of the campaign. To kind of give you
more of a kind of apples to apples
comparison of these different ad
platforms, we also did some analysis kind
of narrowing each of the different
advertiser types to the ones that were
made transparent by all three platforms,
which were the federal candidates only.
And so this can give you some idea of kind
of a scale of these things. And we can see
that when we narrow it here we can still
see that Facebook has a lot more
advertisers and a lot more ads compared to
Google. However the spending numbers are
kind of comparable here. For Facebook
impressions and spends are ranges, that's
all that Facebook releases. For Google the
impression data is ranges, however we can
get exact spend data, because Google
basically released a weekly report of
exact spend numbers, aggregated by the
different advertisers, with here. So we
can use that, to get an exact number of
the spend. And again, right, Twitter's
numbers are much smaller in terms of
everything, within here. And we redid some
of our analysis to just see whether our
effects were simply a distortion based on
what was included in the archives. So
right we redid our ad size analysis and
even when we limit it to federal
candidates we can see this still holds,
that a lot of the ads on Facebook are so
these micro targeted ads. And they are
still micro targeted ads on the other
platforms, as well, within here. And right
this microtargeting of course varies
depending on the advertiser. So you take
someone like President Trump and he does a
lot of microtargeting. So almost all of
his ads probably about 90%, 95% of his ads
are micro targeted, within here. You look
at other candidates and they do much less
microtargeting, within here. So this is
definitely different strategies are used
by different advertisers, within here. But
when we look at it in aggregate, it still
appears that microtargeting is a very
popular strategy across advertisers. We
can also, right, look at some of the spend
type by ad type and this kind of shows you
a little bit how the different platforms
are used, within here. So Facebook's
platform looks like it's a little bit more
kind of informationally, it's still used a
lot for donations, whereas Google's
platform is used a lot more for donations
and a lot less for a kind of informational
ads and to connect within here. It's
really kind of hard to read anything into
Twitter's data because it's such a small
set of data. But from the data that we do
have it looks like there's a lot more kind
of collection of e-mails and things like
that, within here. The other analysis that
we did on the federal candidate ads was to
look at, that for Facebook in particular
right, we have the geographic impression
data from here. So we can effectively look
at how many states were targeted by each
ad with a Facebook advertiser. And the
interesting thing here is that right.
There was no presidential election. So
basically all these campaigns were
operating in one state. So their
constituents for all these elections were
essentially in one state, within here. And
so if you look at the inform ads, right,
most of those shown a very small number of
states. So the inform ads are mostly being
shown to the constituents that are
actually voting for that candidate.
However, if we look at that bottom line,
the kind of gold line, those are the
donation ads. And we can see that they
were fundraising in many more states
outside of their constituency, within
here. So FiveThirtyEight did an
interesting analysis of one particualar
candidate, Beto O'Rourke. He was a
candidate for Senate in Texas, Texas is a
very conservative state in the U.S., and
he did surprisingly well, within here. And
he kind of embraced online advertising and
online donations seeking, were kind of
cornerstones of his election, within here.
And so FiveThirtyEight did an analysis of
his donation records in the U.S., at the
federal level. All donations to candidates
have to be reported to the Federal
Election Committee. So this is all in a
database for the Federal Election
Committee the FiveThirtyEight people do
analysis And they kind of confirmed what
we saw on the donation ads, that he was
getting about 52% of his donations from
Texas and 48% from other states, primarily
kind of from coastal states that tended to
lean more liberal, like New York,
California, Washington and places like
that, was where he was donations seeking.
So this appears to be a very effective way
of getting small dollar donations kind of
throughout the U.S. within here, through
this online advertising. The last thing that
I'm going to talk about is the ad
targeting. Facebook didn't directly
release the ad targeting. However, we were
lucky enough and Pro Publica made a
browser plugin, that people can install in
their browser, and that's browser plugin
would identify what it thought was
political ads, based on a machine learning
algorithm. And for the political ads it
would upload these to their server along
with the targeting information. So, for
those of you with a facebook account, if
you're seeing ads you can actually click
on that ad kind of in the upper corner of
the ad and you can see why is this ad
targeting me, within here. And Facebook
will tell you a little bit, not all of why
you were targeted for this particular ad.
They will essentially show you the two
broadest categories of why you were
targeted for this particular ad, through
this feature they've added to their
platform. And this is this is actually
kind of interesting, this is something
that if you're a user of Facebook, I
highly recommend that you do. Because I
started doing it, and it was kind of eye
opening, as to the level of targeting that
was being done in terms of advertising.
That's kind of one thing, that we've
definitely learned from this is that when
you're seeing an ad, oftentimes there's a
very specific reason as to why you're
seeing that particular ad, within here.
And so we felt that it was very important
to, as much as we could, understand this
targeting that was going on within
Facebook's platform. So Pro Publica had
this browser plug-in and they had this
data set that anyone can analyze, with
here. So if you do have Facebook and
you're located within the US I would
highly recommend that you install this
plug-in, because it helps us to kind of
understand the political advertising in
terms of the targeting, within here. So we
took ProPublica's data set and we
effectively joined it with Facebook's add
transparency archive, within here. This
required us to scrape Facebook's ad
archive, because we needed the ad ID and
this is something that they don't expose
to their API, currently, within here.
However, they do expose it through their
user portal, within here. So we scraped
their user portal to join the specific ads
that were in the ProPublica data set to the
archive dataset, within here. And we were able
to join about 75% of the ads from here.
There were a lot of ads that were
collected by the ProPublica data set, that
just simply weren't archived by Facebook's
transparency archive. It misses things,
within here. It's imperfect as to how it
does things. And this would be another
interesting analysis to do, to understand
what is Facebook missing in their ad
transparency archive and this ProPublica
data set can allow you to somewhat do
this, although through bias of who
installs the Pro Publica plug-in in the
first place. So we we join these few data
sets, again with the caveat that the
ProPublica data set is, right, it's
obviously biased by the set of people that
installed it, which are probably not going
to be a normal representative set of
Facebook users, within here. But
unfortunately, it's the best thing that we
have in terms of a data set that releases
the targeting information, within here.
And so we collapse into three different
categorizations of targeting, within here.
I'll just quickly explain Facebook's ad
targeting platform for people that don't
know about it. So one way to target ads
is, right, through interest or segments,
right, age segments, gender segments or
interests like I showed you before, within
here. So this is one way to target ads
within Facebook's platform. Another way to
target ads is through uploading lists of
information. So you can upload lists of
people's phone numbers, people's email
addresses or their names. And then when
you upload this list Facebook will find
those profiles within their database, so
they'll basically join those emails with
the emails that were entered by the users
accounts, and then they'll target these
people. So they'll create what they call
an audience of these people through this
personally identifiable information and
then they'll target them, through this
method. The final kind of major form of
targeting that Facebook offers is through
what they call these lookalike audiences.
So this is where you can upload PII
information, like email addresses, phone
numbers, names. Facebook will link them to
their accounts and then they'll look at
kind of the interests and things that
these users and then they'll find you
other users, not these users, but other
users, that have a similar kind of profile
to these users within here. So these are
the lookalike audiences that Facebook
offers within their platform. And so we
categorized it by this and again by
advertiser type, within here. So the thing
that stands out is, right, is that the for
profit companies are doing a lot of
targeting based on interests and segments.
So they probably don't know who their
people that they want a message to are and
they're doing it mostly by interests and
segment. Whereas when you look at the PACs
and the political candidates they have
lists. So they have a lot of lists of
people's you know email addresses, phone
numbers, names, of things like this. And
they're plugging these into Facebook's
system. And this is how they're targeting
a lot of people, within here, is through
these lists. And this was expected, but
it's interesting to kind of quantify how
much of this is happening. And then the
lookalike audiences are also being used, a
good deal by everyone within here. And
this kind of makes sense, right? Because
if you have a list of people then you
advertise to them but then right you have
this lookalike audience of people that are
similar to them that are also perhaps good
people to advertise to, as well, within
here. The other thing we can do is break
this down by the intent of the ad here,
and this shows the difference even more
starkly, of the difference in behavior
between the commercial people and the
noncommercial people. The commercial
people are targeting mostly based on
interest, whereas the other people that
are, say, looking to connect with people,
they're the ones that are using the most
lookalike audiences. And this makes
perfect sense because right the connection
ads are there to get people's e-mails,
addresses, phone numbers, names and things
like that. So when you use the look like
audiences then you can, right, generate
more lists of people they'll convert for
whatever you want and then you can
retarget them with the direct lists
targeted ads, later on. So this all makes
pretty good sense when you look at how
this is behaving, from here. But again
it's interesting, right, kind of make this
transparent for people to understand how
targeting is happening within the U.S.
political advertising sphere, within here.
So these were pretty much the two major
analyses that we did in terms of
targeting, within here. The final part and
the part that kind of makes the juiciest
of stories is kind of the more dubious
advertisers that are advertising within
these platforms in terms of political
advertising. So we kind of call these more
politely kind of "new types of
advertising", within here. The first type
is one that you would you would pretty
much expect, so this is this corporate
astroturfing kind of stuff, that's going
on, within here. We see these ads for
assistance for tobacco rights. And I
pretty much expected that you look up this
group and it's probably going to be some
you know quasi nonprofit that's supported
by some industry money from the tobacco
lobbyists, or something like that. That's
pretty much what I expected to see when I
saw these ads. You go to this website and
it's actually pretty honest as to what it
does. This is probably because right of
all the lawsuits and regulations around
tobacco in the U.S. in advertising. But
the website clearly states, right, that
it's operated by Philip Morris, the
tobacco company, within here. And this
actually isn't a legal entity, this
citizens for tobacco rights. Is just
simply a website that's been stood up,
that's owned and operated by Philip
Morris, as far as we can tell, within
here. And this gets to a big problem with
Facebook's transparency archive, which is
that they don't actually vet that
disclaimer string of the sponsor, within
here. So pretty much anyone can type
anything that they want within that
disclaimer string and Facebook will allow
you to run it. We've tested it and as far
as we can tell, you can't say that you're
from Facebook, Instagram or that you're
Mark Zuckerberg, they'll block that. But
pretty much anything else that you type in
there they'll allow that ad to run, within
here, with no vetting. So we discovered
this, we politely, privately mentioned it
to Facebook. Some reporters kind of
trolled Facebook within here and so there
was a reporter that trolled Facebook and
opened up ads for all the senators, within
here, on Facebook. And of course Facebook
approved them all, from within here, and
they they did some other things to troll
Facebook where they insert some other
advertisements, within here. But the point
is, that that disclaimer string is not
vetted within here. Google actually does
that disclaimer string within there, so
they require either a tax ID number or a
federal election committee I.D. number and
they actually do vet it and they publish
that tax I.D. number or federal election
I.D. number along with the disclaimer
string, within here. Which makes it really
easy to track down advertising on Google.
On Facebook, because right they can
basically type in whatever they want in
the disclaimer string, it makes it much
more difficult to actually link these
advertisers. And sometimes just outright
impossible, if the disclaimer string is
made up or just too mutilated in some way
or form, within here. So this is
definitely a problem, where we have these
lobbyist organizations, or in this case
not even lobbyist organizations, just
industry, that can effectively lie about
who's paying for this ad in Facebook's
platform. The other thing we found were
what is now kind of being called these
junk media outlets. So this is for profit
outlets that are claiming that they're
doing kind of news operations. But right.
It's not really traditional kind of
reporting journalistic things. It's more
just kind of propaganda messaging, within
here. So there is this group called New
American Media Group LLC. They also ran
the name of New Democracy, or sorry
Democracy Now was their other name, within
here. And so they ran this, within here.
We tracked down these LLCs and they were
just simply shell companies and that kind
of led to nowhere, within here. We worked
with a journalist from The Atlantic that
actually did a lot of digging into the
shell companies. And he was able to,
through his basically investigation, link
these companies to the actual entity that
created these shell companies and was
running these ads, within here. And so
when we did our analysis of this, this
company basically this third party
advertising company was creating these.
They're meant to look like kind of
grassroots kind of organizations. There
were, a lot of them were kind of targeted
at more conservatively leaning groups, but
then they would bombard them with liberal
messaging, within here. So they would
create these fake communities that looked
more conservative. And then once they
attract an audience they would bombard
them with these liberal kinds of
messaging, within here. And so this
particular company is based in Colorado.
It's called MOTIVE AI. Apparently, it's
hoping to become the Cambridge Analytica
of the liberal side. I don't know if
that's something to aspire to or not. Some
other journalists also did some digging,
within here. There was some journalists
from ProPublica that did some digging,
within here. They found more of this
astroturfing by political lobbyist groups
and things like that. Big oil insurance
companies, again when they advertised on
say Google's platform they would be honest
about their disclaimer string, and then
when they advertised on Facebook's
platform they would often kind of
obfuscate their disclaimers string, to
make it more difficult to link them
together. And so they unmasked a whole
bunch of these other kinds of junk media
operations, as well, that were kind of
spreading propaganda, within here. I'm
picking on Facebook a lot. Again Google
does vet the tax I.D. number of these
people, but you see something like, right,
this DIGICO LLC that paid for some ads. So
you track this down, and this is again one
of these third party advertising agency.
It's easy to track down because of the tax
I.D. number. But it still doesn't actually
tell you who paid for the ad. It just
tells you the third party that, right, it
presumably was paid on behalf of someone
else to run these ads, from here. So this
is a big problem with these disclaimers
strings, that oftentimes they don't
actually identify the person that's paying
for the ad. So to kind of wrap this up,
within here, after our kind of experiences
looking at these transparency archives I
would say they're fairly adequate to
understand good actors. So we could fairly
well understand how good political
advertisers were behaving in Facebook's
platform. However, right, for the bad
advertisers, we probably missed a lot of
them because they could just simply type
in lots of different disclaimers strings
and easily avoid our analysis, at this
point. None of these current archives have
it just right yet. All of them have
issues, right. Facebook isn't providing
good access to their data. They're not
releasing targeting information. Google is
missing 30% the content because of third
parties using their advertising system.
They're not releasing spend and impression
information based on demographics, within
there. Twitter just simply hasn't hired
someone to enforce the policy of
transparency, well, within here. And
unfortunately our experience throughout
this process has been that these companies
are oftentimes reactive, instead of
proactive, within here. Which means that,
right, we have to continuously put
pressure on them, in order for them to
kind of improve these archives, within
here. So this is unfortunately kind of the
state that we're in, within here. And I'm
sure, one thing that I really want to give
a shoutout, is right there's people at
these companies that are actually trying
to build these transparency archives. And
I want to give them a lot of credit for
taking on this task, that's probably not
well rewarded within their companies, of
building these transparency archives,
within here. And so my hope is that by
applying pressure we can get them more
support to kind of get more resources and
be able to make more transparent, within
their companies, as well. Because I hope
that, right, this puts us in better shape
to understand the 2018 elections, but 2020
is another presidential election and my
hope is that we'll continue the improved
these archives, so that we'd be in a much
better position to understand both the
good and the bad advertisers by 2020, with
here. However this is going to take
probably regulatory pressure, legal
pressure, pressure by technologists and
things like this to improve these
archives, at this point. So with that,
again, I have my collaborators, that
aren't here on the stage, but they
definitely did a lot of the heavy lifting
to make this happen, within here. And
again all of our tools and most of our
data except for the Facebook data, that's
under NDA, is available through our
GitHub, there. And so with that I will
open it up to questions.
applause
Herald: Thank you so much Damon. I know
that there are a few questions among the
audience. So, microphone 6 please.
Question: So [Name] on the IRC is asking
"Have you looked at links between the
advertisers and do they use the same
images or text for instance?".
Answer: This is a really good question.
This is actually one of the analysis that
we're currently doing. So we're starting
with the text, because that's obviously
the easiest. But we're also exploring some
image clustering algorithms, as well. To
cluster the advertisers across platforms
and also within platform because we're
finding a lot where, you know, they create
multiple shell companies, where they just
lie about their disclaimers and so this is
definitely something that we're focusing
on, is better clustering of the
advertisers. Because like that group
MOTIVE AI, even though they created the
different LLCs they were running the same
images and videos across their different
LLC shell companies.
Herald: Great thank you. Please if you
have any questions, queue up by the
microphones. Microphone number 1 please.
Question: Hi, Oliver Moldenhauer? Thanks a
lot for the talk. Definitely one of the
best I've seen here so far. Two questions.
A: Why do those transparency archives
exist? Was there some law or political
process around that? And B: As we are
nearing the European election next year,
what kind of data is available for Europe?
Answer: That are both good questions.
Again I'm not intern in one of these
companies, so I can just speculate as to
why these transparency archive exists. But
my my guess is, right, that this was
reactionary. So Mark Zuckerberg and high
ranking officials from Twitter and Google
were hauled in to testify in the House and
Senate, and this is them trying to self
regulate instead of having regulation
imposed on them by people. So that again,
this goes to the pressure part is that
there was regulatory pressure put on them,
the threat of regulatory pressure and so
that's what made them do these
transparency archives. In terms of what's
available in Europe. I guess as long as
the UK is still in the EU, kind of
teetering Facebook has started to make ads
transparent in the UK. They also make them
transparent in Brazil and they're going to
make them transparent in India. And I
think they have plans to make them
transparent in other places, in the EU as
well. However, they haven't done that.
However, again this goes back to the
pressure part. So there's no API for the
other countries, there's only an API for
the US and that might be because we put
pressure on them by scraping them and
publicly releasing their data. And, right,
there's no transparency reports for other
countries, as well, there's only
transparency report for the US. And again
that might have been because we applied
pressure and we were publishing numbers.
Some of the numbers in terms of spend were
very low, because, right, they were just
giving us ranges. So we might have been
making them look bad, when we took the
bottom range their spend and they might
have wanted to correct that with their own
transparency archive, as well. So again, a
lot of this unfortunately requires
pressure to get them to improve their
transparency efforts.
Herald: Great thank you. Microphone number
two please.
Question: So you mentioned you mentioned
FiveThirtyEight and their work on the
donations. Do you think it makes sense
to combine the data you gathered with what
they have to look at election outcomes,
like, election results and turnout and
stuff like that?
Answer: Yes. Actually this is the number
one project on our road map, right now.
Is, actually Google has processed the FEC
information and they've made this
information available via their big query
database. So we've downloaded this, we've
manually linked the Facebook advertisers
and the Google advertiser to the FEC data
and now we're doing the regression models,
specifically focused on the donation ads
first. Because those are what are reported
to the FEC, at this point. So we are
essentially trying to understand how
effective these donation ads are at
actually driving donations, within here.
Herald: Thank you. Microphone number 4
please.
Question: Hi. First of all thank you Mr.
McCoy and your team for this very
interesting research. I was wondering,
whether you know if there are any follow
up research conducted by political
scientists, sociologists etc. analyzing
the political repercussions of these ad
campaigns.
Answer: Yes, so we're aware of a few
efforts. I don't want to out the teams
that are doing them, in case they don't
want to be outed. There's there's nothing
that's been published, publicly I believe
on this. But we're definitely trying to.
That's one of the main goals of kind of
our overarching online political
advertising transparency thing, is to try
and get as much data as we can in the
hands of less technical people in an easy
way for them to analyze. And so this is
basically the primary goal of our project,
in here. So we've been working as hard as
we can to get political science to stay up
to speed on the data. And this is why it's
really unfortunate that Facebook has its
NDA in place for their particular data,
because this makes it very difficult for
us to share and collaborate in that
particular data. Which puts pressure on us
unfortunately as being the only ones that
can do some of this analysis right now. So
this is why I would I would love to apply
enough pressure to Facebook, to get better
access to their particular data.
Herald: Yes. And the question from the
Internet please.
Signal Angel: So Nomad is asking "Why are
those advertisements considered political or
election interference in the USA. Can't you just
see, that someone paid money to display
that content and conclude its purpose is
to promote an agenda or manipulate them?".
Answer: This is a good question. Right, a
lot of this goes to the tactics that
they're using here. So again they're
creating these communities, that they're
making look like their grass roots
communities and then they're kind of
sucking people in with these ads, that up
until recently had no disclaimers string
on them. So you had no idea who paid for
them. So they appear to be paid for by
kind of these grassroots organizations. So
you felt like you were, kind of, part of a
grassroots movement, enjoining these kinds
of communities. I think this is the really
scary, kind of subtle things. And you
might not realize why you're being
targeted for these particular ads or who
was behind these particular ads. So, I
think it was really easy for people to
kind of get unwittingly, kind of, duped
into joining what looked like these
grassroots campaigns. So that's why I
think improving these disclaimers strings
and showing who is really behind these
communities and these advertisements is
really important, to dispel this notion of
these fake grassroots communities, that
are luring people in within here. So I
think that's one of the big things that
can be gained by these transparency
archives. But it requires improvement of
the transparency archives, to do that.
Herald: Microphone number 3 please.
Question: Yes. So I'm curious about the
efficacy of some of the advertisements
that are on Facebook and Twitter. And I'm
wondering is any group like the ProPublica
web extension checking the engagement
rate? Like the number of comments, the
number of views and the number of shares,
to like kind of get an estimate of, OK
this big grassroots community is building
up a number of followers and these
followers population sizes and whatnot.
Answer: Yeah, this is again a really good
question. This is something that we are, I
would certainly encourage other people to
potentially do as well. So the problem is
that a lot of that information isn't
exposed by the transparency archives. This
is more of what they call kind of the
organic information, the non paid for
information, within here. And so this is
stuff that none of the platforms are
releasing. And so it requires kind of a
scraping operation, essentially, to gather
this information and collect it. And it's
something that we're definitely thinking
about how to efficiently do, is how to
efficiently scrape and collect this
information. Because this is very hard
because, right, you go against the anti
scraping teams of these companies, that
are well resourced. And this requires
accounts, and these accounts are going to
be shut down and detected. So this is
something that we're trying to pilot to
understand. Our other idea of how to do
this potentially is try and crowdsource
this information. This is similar to how
ProPublica crowdsourced it for the browser
extension information. We could
potentially crowdsource it, where you
know, when people interact with these
communities or these ads the plug-in could
potentially crowdsource that information
back to us. And then we would have to
figure out some strategy to sanitize that
information in some way. Because at that
point you might have some sensitive
information they are collecting. This is
something that we're thinking about. We're
cautious, I think, rightly so because this
can start stepping on, again, more
sensitive information that's available
from within here. But I think it's
definitely key to understanding the
effectiveness of these ads. Something that
we're going to have to do or we're going
to have to convince Facebook somehow to do
on our behalf in order to really
understand the effectiveness of these ads.
Herald: Thank you. Last question for
microphone number 1.
Question: All right. At the beginning of
your talk you explained how Russia
influenced the elections. I'm curious
about the attribution. Is there possibly
any doubts at any instance that you
presented that it was not Russia or maybe
some other country, China or Iran? How do
you know, and did you check the facts?
Answer: I mean, that's a good question.
Unfortunately, right, the national
security agencies don't release the
sources of their information. There's
another investigation done by the
Department of Justice by Robert Mueller,
that did release some more information
about this, within here. I've looked at
that information and it looks, you know,
right, you can never a 100%, unequivocally
state that it was Russia. It could have
been a false flag operation. But I think
that pretty much the overwhelming
information that everyone has found when
they've investigated this has pointed at
Russia and the organizations that were
prosecuted by Mueller.
Herald: Damon McCoy, thank you very much.
Please give them a great round of applause.
Applause
35c3 postroll music
subtitles created by c3subtitles.de
in the year 2019. Join, and help us!