0:00:00.000,0:00:19.450
35c3 preroll music
0:00:19.450,0:00:23.720
Herald: This talk will be held by Damon[br]McCoy. He will be explaining online U.S.
0:00:23.720,0:00:28.390
political advertising and he has been[br]working with researching like how
0:00:28.390,0:00:34.280
different online communities basically[br]behave around many different topics. But
0:00:34.280,0:00:37.480
this is what he's going to talk about[br]today so please give him a great round of
0:00:37.480,0:00:46.729
applause.[br]Applause
0:00:46.729,0:00:50.910
Damon McCoy: Thank you everyione for coming. I'm[br]up here speaking and I'm the only one that
0:00:50.910,0:00:55.399
wanted to fly to Germany over Christmas[br]and New Year's. However there were three
0:00:55.399,0:01:00.369
real people that were really key in[br]helping out with this research and before
0:01:00.369,0:01:04.559
we get started, I just want to credit[br]them. One is my grad student Laura
0:01:04.559,0:01:08.410
Adelson. She did a lot of the analysis[br]that you're going to be seeing generated
0:01:08.410,0:01:14.509
all the graphs. One of the undergraduate[br]students that's from our NYU Shanghai
0:01:14.509,0:01:19.740
campus secure did a lot of the work to[br]collect all the data that you're going to
0:01:19.740,0:01:25.119
see here. And then Raytown who is a[br]professor at NYU and the Shanghai campus
0:01:25.119,0:01:30.960
also helped out with kind of our initial[br]efforts of collecting some of this data.
0:01:30.960,0:01:34.930
And so before we get started I guess I'll[br]give a little bit of an introduction about
0:01:34.930,0:01:39.930
myself. I'm a professor at NYU tenant[br]school of engineering. As was mentioned
0:01:39.930,0:01:45.460
before I do a lot of stuff kind of looking[br]at how technology kind of impacts the
0:01:45.460,0:01:51.280
security and privacy of you know society[br]groups of people and things like them. So
0:01:51.280,0:01:56.530
this was really kind of an opportunistic[br]project that kind of captured the impact
0:01:56.530,0:02:06.799
of online advertising in the political[br]sphere of U.S. campaigns. And also quick
0:02:06.799,0:02:12.620
plug. So everything that I'm going to be[br]showing you most of the data scripts and
0:02:12.620,0:02:17.110
things like that we've put in a github[br]that's accessible by anyone that wants to
0:02:17.110,0:02:23.230
analyze the data or look at our scripts[br]and improve them or things like that.
0:02:23.230,0:02:30.920
Applause Thank you. This is the first[br]time that I've given this talk outside of
0:02:30.920,0:02:35.890
the U.S. so let me just start with some[br]quick explanation as to how U.S. elections
0:02:35.890,0:02:41.200
work for those of you that might not know[br]about this. So every two years in the U.S.
0:02:41.200,0:02:46.020
we hold federal elections. These are[br]elections - right - that impact all of the
0:02:46.020,0:02:53.430
states within the U.S. And so every four[br]years we have an election for president,
0:02:53.430,0:02:58.839
2018 were our last elections. This was not[br]a presidential election year so the
0:02:58.839,0:03:04.759
elections were for the Senate and the[br]House seats at the federal level. And then
0:03:04.759,0:03:10.099
we also had elections for state and local[br]positions as well with in here. And some
0:03:10.099,0:03:14.900
of them that are captured in our data[br]especially our Facebook data not so much
0:03:14.900,0:03:21.220
in our Twitter and Google add transparency[br]data that we have here. So this will be
0:03:21.220,0:03:26.740
focused this talk on the 2008 elections[br]that happened on November 7th. Election
0:03:26.740,0:03:34.650
day is always the first Tuesday in[br]November in the U.S. every two years. So
0:03:34.650,0:03:37.621
to begin with in the background right[br]probably some of you know about this, some
0:03:37.621,0:03:41.780
of you might not know about this, but in[br]the 2016 elections which were a
0:03:41.780,0:03:46.969
presidential election year there was right[br]this election interference that happened
0:03:46.969,0:03:53.060
here. And so Facebook has released these[br]ads. These ads were paid for by a Russian
0:03:53.060,0:04:00.819
company the Internet Research Agency that[br]ran these ads. And Facebook released these
0:04:00.819,0:04:05.430
to - right - the Senate and then the[br]Senate re-released these publicly to
0:04:05.430,0:04:10.260
people. And so this is an ad basically[br]trying to disenfranchise people from
0:04:10.260,0:04:15.069
voting in the elections. And you can see[br]right it's targeted at people in the U.S.
0:04:15.069,0:04:19.880
of a certain age range and interests like[br]Martin Luther King, African-American
0:04:19.880,0:04:24.260
culture, African-American civil rights.[br]Facebook doesn't actually allow you to
0:04:24.260,0:04:30.170
directly target to people based on like[br]their ethnicity. So this is a pretty good
0:04:30.170,0:04:33.880
proxy though. If you target these kinds of[br]interests for this, so this would probably
0:04:33.880,0:04:38.500
be fairly effective at targeting African-[br]American people within the U.S. for this
0:04:38.500,0:04:44.310
ad to try and disenfranchise them from[br]voting in the elections. There were other
0:04:44.310,0:04:49.680
ads , right, that tried to do[br]misinformation distance information kinds
0:04:49.680,0:04:55.230
of campaigns. So this was an ad that was[br]again paid for by the Russian agency that
0:04:55.230,0:05:00.670
was trying to perpetuate this rumor[br]basically unsubstantiated rumor that Bill
0:05:00.670,0:05:04.960
Clinton has this ill legitimate child[br]within here. And again, right, the
0:05:04.960,0:05:08.560
targeting information is targeted at[br]African-Americans within the U.S. and
0:05:08.560,0:05:14.770
African-Americans are kind of a key voting[br]block for a kind of more liberal,
0:05:14.770,0:05:19.790
democratic people within the U.S.[br]oftentimes. That's probably the other
0:05:19.790,0:05:23.860
thing that should have explained about the[br]U.S. election system especially at the
0:05:23.860,0:05:29.040
federal level is that we effectively have[br]two parties that are you know when any
0:05:29.040,0:05:33.810
meaningful amount of elections within the[br]U.S. and one of them is the Democratic
0:05:33.810,0:05:38.740
party that tends to skew more liberal. So[br]they're more kind of right for bigger
0:05:38.740,0:05:42.930
government, more social services and then[br]we have the Republicans which skew kind of
0:05:42.930,0:05:48.290
more conservative wanting kind of a[br]smaller government providing less services
0:05:48.290,0:05:54.810
and kind of less regulation around things[br]as well. And so right these are two
0:05:54.810,0:06:01.620
examples but there were a whole bunch of[br]these ads that were shown on Facebook
0:06:01.620,0:06:04.620
within a year. So right. Pretty much all[br]of these that they're tried to
0:06:04.620,0:06:11.160
disenfranchise people or tried to kind of[br]create chaos, kind of polarize people
0:06:11.160,0:06:16.990
around the election oftentimes with kind[br]of disinformation sorts of that things.
0:06:16.990,0:06:25.460
And so in 2017 our Office of the Director[br]of National Intelligence put out a report
0:06:25.460,0:06:29.290
- sorry for the big block of text, this[br]will be the only big box of text in here.
0:06:29.290,0:06:33.990
But I thought it was kind of important to[br]show this because they they pretty much
0:06:33.990,0:06:41.320
unequivocally state that Russia tried to[br]interfere in the US elections and that
0:06:41.320,0:06:46.400
Vladimir Putin was somehow involved within[br]this interference. And so this is - this
0:06:46.400,0:06:53.210
is pretty much as far as the National[br]Security Agency, the CIA, NSA pretty much
0:06:53.210,0:06:58.430
solid evidence that they have - that this[br]occurred within here. And so the other
0:06:58.430,0:07:02.910
thing that broke was right the Cambridge[br]Analytica scandal as well broke within
0:07:02.910,0:07:07.590
Facebook where there was this straight[br]third party advertising agency that
0:07:07.590,0:07:12.930
collected you know a whole bunch of data[br]on 80 million profiles within Facebook and
0:07:12.930,0:07:17.030
then tried to create psychological[br]profiles for targeting and messaging and
0:07:17.030,0:07:22.270
things like that around here. And so these[br]two particular scandals broke within here
0:07:22.270,0:07:28.760
and the first result of this is we have[br]Mark Zuckerberg in a suit a real suit not
0:07:28.760,0:07:37.470
a hoodie suit. Testifying right in front[br]of our Senate within a year. And so he,
0:07:37.470,0:07:42.490
right, he testified before House and[br]Senate committees about the abuses
0:07:42.490,0:07:50.340
occurring within Facebook. And he did this[br]on April 10th and 11th 2018 within here,
0:07:50.340,0:07:55.550
and right. So this is right. In here, he[br]did admit that Facebook had made mistakes
0:07:55.550,0:08:04.410
and that they they need to improve things[br]moving forward within their platform. The
0:08:04.410,0:08:11.340
most kind of tangible outcome from these[br]testimonies were these transparency
0:08:11.340,0:08:20.300
archives that began to appear. And here is[br]a view of what Facebook's add transparency
0:08:20.300,0:08:25.300
archive looks at. When it originally[br]deployed you needed a Facebook account to
0:08:25.300,0:08:29.900
interact with it. Now Facebook has dropped[br]the requirement so anyone with Internet
0:08:29.900,0:08:36.799
access, unless you're censored somehow,[br]can go to this archive and access these
0:08:36.799,0:08:42.440
ads. So the user facing portal for these[br]add transparency archives you type in
0:08:42.440,0:08:48.230
keywords and then it basically does is[br]write a pattern matching on the ad text
0:08:48.230,0:08:53.869
and other parts of the ad and then returns[br]the ads that matches that within here. And
0:08:53.869,0:08:59.230
so then you can see all the political ads[br]that match that particular term within
0:08:59.230,0:09:07.459
here. Facebook began archiving these ads[br]kind of at a large scale starting on May
0:09:07.459,0:09:16.149
7th 2018 by Election Day November 7th[br]2018. There were 1.6 million ads paid for
0:09:16.149,0:09:22.280
by over 85 thousand advertisers within[br]Facebook's platform. Facebook is actually
0:09:22.280,0:09:27.420
fairly broad as to what they included[br]within their political archive. They
0:09:27.420,0:09:32.839
included any ads related to US elections,[br]either federal state or local elections.
0:09:32.839,0:09:36.899
They also included these very important[br]kind of issue ads as we saw when we looked
0:09:36.899,0:09:41.129
at the Russian interference a lot of times[br]the ads didn't mention actual political
0:09:41.129,0:09:46.049
candidates, they mentioned kind of[br]polarizing issues within the U.S. So
0:09:46.049,0:09:50.620
Facebook also included these ads of[br]political national importance. They had a
0:09:50.620,0:09:56.050
list of I think about 13 different[br]criteria is the last of them being values.
0:09:56.050,0:10:00.500
And so it was a fairly encompassing set of[br]ads they tried to include within their
0:10:00.500,0:10:08.019
archive. Along with the text and images[br]or videos of the ad, they also included
0:10:08.019,0:10:14.780
basically ranges of geographic impressions[br]and demographic impressions. So right.
0:10:14.780,0:10:22.300
State level impression information in some[br]kind of ranges and demographic by gender
0:10:22.300,0:10:29.350
and by age kind of bucketed within here.[br]And they did this again for impressions
0:10:29.350,0:10:33.800
and then for they also included some spend[br]information again kind of in ranges so
0:10:33.800,0:10:40.950
they gave ranges of 0 to 99 dollars, a[br]hundred dollars to say like 500 dollars,
0:10:40.950,0:10:48.009
501 dollars, 2000 dollars and so on. And[br]so forth within these buckets one of the
0:10:48.009,0:10:51.960
key pieces of information that they did[br]not release was the targeting information.
0:10:51.960,0:10:55.230
So like I showed you before of those ads[br]they - right - they have that targeting
0:10:55.230,0:11:01.660
information, Facebook does not release[br]that within their transparency archive.
0:11:01.660,0:11:04.210
They have this right. They have right-[br]thay had that user portal where you could
0:11:04.210,0:11:09.660
do the keyword search from within there.[br]However right. I'm I like to do large
0:11:09.660,0:11:16.199
scale data analysis and so I wanted to[br]basically try and collect all of the ads
0:11:16.199,0:11:20.870
within this web portal. And so initially[br]all they had was this keyword search
0:11:20.870,0:11:25.320
portal within here. And so what we did is[br]we compiled kind of a large list of what
0:11:25.320,0:11:31.279
we thought were reasonables of keywords,[br]names of prominent politicians, names of
0:11:31.279,0:11:36.370
states issues within here. And so we tried[br]to compile this long list of keyword
0:11:36.370,0:11:41.379
searches and we began scraping the[br]reporter within here and I'll tell you the
0:11:41.379,0:11:47.840
story of how our scraping efforts went.[br]Now currently they are so off for a API
0:11:47.840,0:11:52.150
it's still keyword based their API and[br]it's restricted by an NDA so I'll kind of
0:11:52.150,0:11:58.070
flesh out the story of how this goes. So[br]at the beginning they, they released this
0:11:58.070,0:12:03.020
kind of towards the end of May the user[br]archive and I played with it and I
0:12:03.020,0:12:08.930
realized that this didn't lend itself well[br]to kind of large scale analysis of these
0:12:08.930,0:12:16.330
ads and so on. I went to my students[br]Secur and Laura and Secur worked kind
0:12:16.330,0:12:21.749
of furiously night and day and within[br]three days he had a workable scraper that
0:12:21.749,0:12:25.889
was able to put in our keywords and then[br]we were able to scrape all the results
0:12:25.889,0:12:32.889
from our keyword within here. And so we[br]ran the scraper for about 2 months and
0:12:32.889,0:12:37.970
then we released a report. Just kind of a[br]very general statistical report and we
0:12:37.970,0:12:45.009
released the data in our github archive at[br]that. After that about 2 weeks later
0:12:45.009,0:12:53.589
Facebook began anti-scraping measures[br]within here. And so, right, this kind of
0:12:53.589,0:12:59.750
hampered our efforts to scrape Facebook's[br]archive. At this point. I'm - I don't want
0:12:59.750,0:13:03.760
to attribute any malice. I don't believe[br]that Facebook was targeting just our
0:13:03.760,0:13:08.419
scraping efforts they were targeting[br]everyones scraping efforts. The
0:13:08.419,0:13:12.910
transparency whether it's wrong or right[br]to block people from collecting data on a
0:13:12.910,0:13:17.139
transparency archive I might kind of[br]quibble with them on that and say they
0:13:17.139,0:13:21.959
might once provide better access to the[br]data within their transparency archive.
0:13:21.959,0:13:26.629
But this was the choice that Facebook made[br]to kind of clamp down on the scraping
0:13:26.629,0:13:32.040
within here. So we tried to fight with[br]them a little bit to - right kind of a can
0:13:32.040,0:13:36.430
mouse game. You know we make some changes[br]to our scraper to avoid their anti
0:13:36.430,0:13:41.939
scraping. They do some things on their end[br]to block our scraper and probably other
0:13:41.939,0:13:47.969
people scrapers that are doing similar[br]things to us as well within here. And so
0:13:47.969,0:13:56.639
this persisted for probably about 2 weeks,[br]and then Facebook basically deployed their
0:13:56.639,0:14:01.779
API within here. However they said right,[br]their API is very limited and still in
0:14:01.779,0:14:07.000
beta at this point. So these were part of[br]the terms and conditions from here. One of
0:14:07.000,0:14:12.769
the ones that I found kind of the most[br]unease (?) is that it limited it only to
0:14:12.769,0:14:18.399
U.S. people so we could essentially only[br]very closely work with U.S. people within
0:14:18.399,0:14:23.371
here and at least it did kind of - it[br]limited the types of people that we could
0:14:23.371,0:14:28.100
work with in here. And so right[br]unfortunately this kind of ruled us out
0:14:28.100,0:14:31.930
from working closely with journalists from[br]you know really good news organizations
0:14:31.930,0:14:36.600
like the Guardian and so like that just[br]happened to have the misfortune of being
0:14:36.600,0:14:46.199
located somewhere outside of the U.S.[br]within here. Maybe the good fortune, yes.
0:14:46.199,0:14:51.110
And then the list of restrictions[br]continue. They also placed the data
0:14:51.110,0:14:57.240
retention on it so we could only retain[br]the data for one year. Again placing data
0:14:57.240,0:15:02.450
retention. So Facebook's data retention on[br]their archive is 7 years within here but
0:15:02.450,0:15:08.430
they're placing a 1 year data retention on[br]the data that we collect from their NDA.
0:15:08.430,0:15:14.430
I'd like to say that - right - we - right[br]- I got this NDA and I lit it on fire, I
0:15:14.430,0:15:23.140
tore it up and we continued to scrape the[br]archive. Within a year. No unfortunately
0:15:23.140,0:15:29.959
it was a hard call to make but right you[br]know there's basically two students and we
0:15:29.959,0:15:33.800
basically had to make a call whether we[br]wanted the data to analyze or whether we
0:15:33.800,0:15:38.040
wanted to spend all of our time kind of[br]faded - fighting with Facebook's anti-
0:15:38.040,0:15:45.581
scraping efforts. And so in the end we did[br]- I did in fact agree to their NDA within
0:15:45.581,0:15:51.319
a year. So the initial data we scraped, we[br]release were still scraping a small amount
0:15:51.319,0:15:56.509
of data that we do release as well from[br]here. But unfortunately at this point any
0:15:56.509,0:16:01.329
of the data that we collected from the NDA[br]we cannot release within here. If anyone
0:16:01.329,0:16:06.999
doesn't want to fight with Facebook and[br]resurrect the crawler within here I would
0:16:06.999,0:16:12.160
be more than happy for that to happen[br]within a year. Unfortunately given our
0:16:12.160,0:16:16.539
engineering constraints it just simply[br]wasn't feasible for us to do that within a
0:16:16.539,0:16:23.879
year. And so the story is a little bit[br]different with Google. So Google's archive
0:16:23.879,0:16:34.300
they began archiving ads on May 31, 2018.[br]By election day they had 45.000 and from
0:16:34.300,0:16:39.170
600 advertisers. Their criteria for[br]introducing advertising was much more
0:16:39.170,0:16:44.379
narrow than Facebook's, so they only[br]released ads related to U.S. federal
0:16:44.379,0:16:49.850
candidates and federal office holders[br]within here. So it is a much more limited
0:16:49.850,0:16:55.440
set of data that Google released within a[br]year. None of the issue ads that Facebook
0:16:55.440,0:17:03.360
released. They didn't release any of the[br]geographic or demographic data by
0:17:03.360,0:17:09.270
impression, they did release ranges of[br]impressions and ranges of spend data, and
0:17:09.270,0:17:14.800
they did release some limited targeting[br]data from here so they released geographic
0:17:14.800,0:17:20.069
and demographic targeting information[br]which Facebook hadn't released in their
0:17:20.069,0:17:25.230
ads. And their data is available through a[br]similar keyword based portal. But they
0:17:25.230,0:17:30.610
also make it available through just a[br]database, if you want to within here. So
0:17:30.610,0:17:34.971
this is what their portal looks like[br]within here. And this is - right - their
0:17:34.971,0:17:40.700
big table, sorry, their big query database[br]that they released from here. And so they
0:17:40.700,0:17:46.020
updated every week within here and you can[br]download it and analyze the data
0:17:46.020,0:17:52.530
relatively easily within here. So the last[br]one to kind of implement their archive was
0:17:52.530,0:18:01.370
Twitter. Twitter began archiving ads on[br]June 27, 2018. The scale of ads and
0:18:01.370,0:18:06.590
Twitter is very small compared to the rest[br]of them. The scale of their ad network in
0:18:06.590,0:18:13.380
general is much smaller than Google and[br]Facebook's. And what they included, it was
0:18:13.380,0:18:18.440
similar to what Google included in terms[br]of - right - only federal candidates
0:18:18.440,0:18:21.770
within here. Kind of closer to the[br]election, they also said that they were
0:18:21.770,0:18:28.340
going to release political issue ads.[br]However, the mechanism of enforcement
0:18:28.340,0:18:32.600
doesn't appear to exist within Twitter's[br]system. There doesn't appear to be anyones
0:18:32.600,0:18:38.980
job it is to actually enforce transparency[br]of ads from here. So we've been kind of
0:18:38.980,0:18:42.910
manually finding accounts and reporting[br]them to Twitter within here and then when
0:18:42.910,0:18:46.940
we manually report them to Twitter,[br]Twitter then includes them and future
0:18:46.940,0:18:51.890
transparency kinds of efforts within here.[br]But it appears like we're basically the
0:18:51.890,0:18:56.880
ones [Damon McCoy] short laughter it's[br]become our job to monitor the Twitter
0:18:56.880,0:19:00.910
accounts and then notify Twitter and then[br]they'll manually kind of deal with it.
0:19:00.910,0:19:04.420
Unfortunately, they still don't appear to[br]have a person that actually manages this
0:19:04.420,0:19:10.210
process internal to Twitter at this point.[br]Twitter does however release the most
0:19:10.210,0:19:15.760
information. So they release exact data[br]not the range data on impressions and
0:19:15.760,0:19:21.411
spend information, also by geographic and[br]demographics and they also include all of
0:19:21.411,0:19:27.730
the targeted information as far as we can[br]tell and their data is available through
0:19:27.730,0:19:31.770
without an account. Basically through[br]their portal and we've been scraping them
0:19:31.770,0:19:35.370
and there's been no problems they haven't[br]blocked us at this point. So we just
0:19:35.370,0:19:40.240
simply scraped their data and then we[br]republish it to github at that point. And
0:19:40.240,0:19:45.510
we've had no problems with Twitter in this[br]way in the scale, their data is so small
0:19:45.510,0:19:51.000
that it's been relatively easy to keep[br]pace with it at this point. And here's
0:19:51.000,0:19:56.520
just a picture of the Twitter transparency[br]archive and again this have a list of all
0:19:56.520,0:20:00.180
the Twitter accounts that they include in[br]their transparency archive. So we can
0:20:00.180,0:20:03.670
monitor this and then we can monitor other[br]people that we know that are politically
0:20:03.670,0:20:08.280
active when we see them doing paid[br]advertising then we can notify Twitter and
0:20:08.280,0:20:12.640
then Twitter will include them in their[br]transparency archive normally within like
0:20:12.640,0:20:18.800
a week, or so of here. And so this is,[br]this is kind of the background that you
0:20:18.800,0:20:24.730
need to understand the transparency[br]archives. So now we have a data set that
0:20:24.730,0:20:31.940
we can begin to analyze within here. For[br]Facebook since it was the keyword driven
0:20:31.940,0:20:38.620
thing at the beginning and it still is, we[br]were able to collect about 80% of the ads
0:20:38.620,0:20:43.400
in Twitter's database from there. The[br]other problem with the API is that it is
0:20:43.400,0:20:51.060
severely rate-limited at this point. I'm[br]talking about 3 to 4 queries per minute
0:20:51.060,0:20:57.930
that we can get through Facebook's API at[br]this point. And so we kind of did our best
0:20:57.930,0:21:02.640
effort to collect as much data as we could[br]from Facebook. About two weeks before the
0:21:02.640,0:21:07.910
election, Facebook began releasing a[br]transparency archive that included
0:21:07.910,0:21:12.430
basically an aggregated list of all the[br]advertisers and how many ads they have and
0:21:12.430,0:21:17.190
how much spent and this is how we can tell[br]that we got about 80% of the ads from
0:21:17.190,0:21:22.430
Facebook's archive based on this within[br]here. And the nice thing about the
0:21:22.430,0:21:26.000
transparency report is that we could go[br]back and now that we know we're missing we
0:21:26.000,0:21:33.030
could readjust our usage of the API and so[br]now we have virtually 100% coverage of
0:21:33.030,0:21:39.680
Facebook going forward within here.[br]Twitter - right - we could collect 100% of
0:21:39.680,0:21:45.600
their data. And again we've republished[br]the SOL (?) in an easier to process kind
0:21:45.600,0:21:50.410
of form. Google again - right - we have a[br]100% of their ads because they're all in
0:21:50.410,0:21:55.370
the big query database. However when we[br]started analyzing the data we noticed that
0:21:55.370,0:22:00.100
for a lot of the ads we're missing the[br]actual content, the images and text of the
0:22:00.100,0:22:07.050
ad. It turns out that for Google's ad[br]network if the ad was originally purchased
0:22:07.050,0:22:10.910
through a third party advertiser and then[br]run on one of Google's properties the
0:22:10.910,0:22:15.800
content of the ad won't be archived within[br]your system. This is unfortunately a big
0:22:15.800,0:22:21.290
loophole. So - right - if you're if you're[br]running a kind of malicious misinformation
0:22:21.290,0:22:26.470
thing, you can easily unfortunately[br]circumvent Google's archive at least from
0:22:26.470,0:22:31.320
archiving your content by simply just[br]paying for it by a third party within
0:22:31.320,0:22:35.290
here. It's unclear whether this is a[br]policy limitation or whether this is the
0:22:35.290,0:22:40.270
technical limitation on Google's part, but[br]the outcome is that we only have the
0:22:40.270,0:22:45.740
content for about 70% of Google's ads that[br]were paid for directly on Google's
0:22:45.740,0:22:52.410
platform and within here. So one of the[br]first things that we want to do is kind of
0:22:52.410,0:22:57.380
add some semantic meaning to these ads a[br]kind of large scale. And so we played
0:22:57.380,0:23:02.490
around with a few techniques, some fancy[br]kinds of natural language processing and
0:23:02.490,0:23:05.820
things like that. But we found that[br]there's actually a really fairly simple
0:23:05.820,0:23:11.570
and effective way of categorizing kind of[br]the intent of the ad, and that's that most
0:23:11.570,0:23:16.780
of these ads have a URL of some kind and[br]a lot easier or else just point back to
0:23:16.780,0:23:20.160
like third party services like if you're[br]holding some kind of event you're going to
0:23:20.160,0:23:23.590
coordinate it with like Everbright or[br]something that if you're seeking
0:23:23.590,0:23:27.640
donations, if you're a Democrat you're[br]going to use this third party Paron
0:23:27.640,0:23:31.320
processor they're called Act Blue, if[br]you're Republican there's like two or
0:23:31.320,0:23:35.620
three payment processors that you're going[br]to use for this. So we could simply just
0:23:35.620,0:23:39.610
look at these really prominent URLs that[br]occur a lot of times and just kind of
0:23:39.610,0:23:45.720
manually tag what is the purpose for this.[br]And by doing this we can tag ads as either
0:23:45.720,0:23:50.550
just purely informational that they wanted[br]just kind of get some kind of message
0:23:50.550,0:23:55.790
about the candidate either positive or[br]negative out their connection ads that are
0:23:55.790,0:24:00.620
seeking contact information like people's[br]e-mail addresses, phone numbers, names and
0:24:00.620,0:24:04.650
things like that. Presumably so they can -[br]you know - either get them to volunteer or
0:24:04.650,0:24:10.180
donate money in the future for the[br]campaign. There's move ads that are either
0:24:10.180,0:24:15.060
they're trying to get people to vote or to[br]attend some kind of rally or to volunteer
0:24:15.060,0:24:19.280
or something like that and then right[br]there's donation ads. And then finally
0:24:19.280,0:24:24.360
there's kind of commercial ads. These are[br]things either they are selling products
0:24:24.360,0:24:33.350
that are kind of directly critical nature[br]like a bobble head of some candidate or
0:24:33.350,0:24:38.530
they might be like solar panels which have[br]tax credits in the U.S. and things like
0:24:38.530,0:24:42.750
that. So there's some kind of commercial[br]good that's linked somehow to some
0:24:42.750,0:24:50.070
political messaging within here. So we use[br]this method and we were able to categorize
0:24:50.070,0:24:55.770
about 70% of the ads, we took a random[br]sample of them, we manually checked what
0:24:55.770,0:25:01.230
we were doing and we found it was pretty[br]accurate. About 96% accuracy we got using
0:25:01.230,0:25:07.790
this methods. The other thing that we did[br]is for the top advertisers, so for
0:25:07.790,0:25:13.760
Facebook the top 75% of the advertisers,[br]for Google the top 80% of the advertisers,
0:25:13.760,0:25:17.960
in terms of the money spent by the[br]advertiser. We went in and we manually
0:25:17.960,0:25:21.610
categorized what was this type of[br]organization. Was it a political
0:25:21.610,0:25:27.140
candidate, was it what's called a[br]political action committee, so these are
0:25:27.140,0:25:32.520
the PACs within the U.S., was it a[br]union, was that a for profit operation,
0:25:32.520,0:25:37.160
was it a non-profit operation. So on and[br]so forth and so we wrote like some regular
0:25:37.160,0:25:42.370
expressions that got us most of the way[br]there. Most of them have fairly uniform
0:25:42.370,0:25:46.790
naming conventions and for the ones that[br]we couldn't kind of automatically classify
0:25:46.790,0:25:51.490
we just did it manually, within a year.[br]And then since Twitter had so few
0:25:51.490,0:25:58.060
advertisers, we just did these all,[br]manually, within here. Now, right, we can
0:25:58.060,0:26:01.210
start to do some analysis. So the first[br]analysis that we did, the easiest
0:26:01.210,0:26:08.160
analysis, was we looked at the size of the[br]ads. And the thing that pops out is that
0:26:08.160,0:26:13.250
the majority of ads on all the platforms[br]are between $0 and $100 dollars. So these
0:26:13.250,0:26:18.130
are what are normally called the micro[br]targeted ads, that are typically seen by
0:26:18.130,0:26:22.770
less than a 1000 people within a year. So[br]these are very short lived, narrowly
0:26:22.770,0:26:28.270
targeted ads that are kind of honing in on[br]a specific demographic within here. So
0:26:28.270,0:26:32.050
these are these micro targeted ads within[br]here. And it appears, right, that the
0:26:32.050,0:26:37.280
majority of ads, especially on Facebook's[br]platform 82% of them, are of this micro
0:26:37.280,0:26:41.970
targeted kind of ilk within here. So it's[br]kind of confirms the reporting that people
0:26:41.970,0:26:46.820
had of this kind of trend of[br]microtargeting within political
0:26:46.820,0:26:52.480
advertising. The other thing, based on our[br]categorization we can look at how the
0:26:52.480,0:26:58.070
different platforms were used from within[br]here. The problem with these numbers is
0:26:58.070,0:27:04.220
that there was different inclusion[br]criteria within each of these databases.
0:27:04.220,0:27:11.600
And then right. Finally, we can kind of[br]look at the different types of advertisers
0:27:11.600,0:27:15.060
on these kind of platforms. And again it's[br]hard to read too much into these numbers
0:27:15.060,0:27:19.260
because again, right, Facebook included[br]much more of the commercial stuff. So
0:27:19.260,0:27:24.430
we're going to see a lot more of the[br]commercial stuff within here. And the the
0:27:24.430,0:27:30.750
final analysis of the entire data set that[br]we did was looking at right kind of
0:27:30.750,0:27:35.490
basically the ramp up to the election. We[br]cut this off in late October. This
0:27:35.490,0:27:40.680
analysis was done for a paper. So the due[br]date of the paper was ironically November
0:27:40.680,0:27:46.570
6 within here. So we cut it off a few[br]weeks later and we haven't regenerated the
0:27:46.570,0:27:49.730
contents since then. The one thing that[br]you can see is at the top there is that
0:27:49.730,0:27:57.570
green Spike. That's kind of the move ads.[br]So right, closer into the election the
0:27:57.570,0:28:03.760
campaigns were kind of doing sophisticated[br]get out the vote kinds of ads, within
0:28:03.760,0:28:07.470
here. So there were really sophisticated[br]kind of microtargeted ads that get out the
0:28:07.470,0:28:12.050
vote. Where like, it was almost kind of[br]spooky where like they knew where the
0:28:12.050,0:28:16.250
person lived that they were targeting and[br]so they gave them like directions on how
0:28:16.250,0:28:20.400
to get from where they live to their[br]nearest polling place within here. So
0:28:20.400,0:28:23.630
there are these really sophisticated kind[br]of get out the vote efforts that were
0:28:23.630,0:28:29.010
being run online, within here, towards the[br]end of the campaign. To kind of give you
0:28:29.010,0:28:33.780
more of a kind of apples to apples[br]comparison of these different ad
0:28:33.780,0:28:39.950
platforms, we also did some analysis kind[br]of narrowing each of the different
0:28:39.950,0:28:45.200
advertiser types to the ones that were[br]made transparent by all three platforms,
0:28:45.200,0:28:50.130
which were the federal candidates only.[br]And so this can give you some idea of kind
0:28:50.130,0:28:56.420
of a scale of these things. And we can see[br]that when we narrow it here we can still
0:28:56.420,0:29:00.670
see that Facebook has a lot more[br]advertisers and a lot more ads compared to
0:29:00.670,0:29:06.060
Google. However the spending numbers are[br]kind of comparable here. For Facebook
0:29:06.060,0:29:11.430
impressions and spends are ranges, that's[br]all that Facebook releases. For Google the
0:29:11.430,0:29:16.340
impression data is ranges, however we can[br]get exact spend data, because Google
0:29:16.340,0:29:21.940
basically released a weekly report of[br]exact spend numbers, aggregated by the
0:29:21.940,0:29:26.770
different advertisers, with here. So we[br]can use that, to get an exact number of
0:29:26.770,0:29:30.980
the spend. And again, right, Twitter's[br]numbers are much smaller in terms of
0:29:30.980,0:29:36.590
everything, within here. And we redid some[br]of our analysis to just see whether our
0:29:36.590,0:29:42.480
effects were simply a distortion based on[br]what was included in the archives. So
0:29:42.480,0:29:47.730
right we redid our ad size analysis and[br]even when we limit it to federal
0:29:47.730,0:29:53.370
candidates we can see this still holds,[br]that a lot of the ads on Facebook are so
0:29:53.370,0:29:57.060
these micro targeted ads. And they are[br]still micro targeted ads on the other
0:29:57.060,0:30:02.170
platforms, as well, within here. And right[br]this microtargeting of course varies
0:30:02.170,0:30:09.520
depending on the advertiser. So you take[br]someone like President Trump and he does a
0:30:09.520,0:30:16.020
lot of microtargeting. So almost all of[br]his ads probably about 90%, 95% of his ads
0:30:16.020,0:30:22.020
are micro targeted, within here. You look[br]at other candidates and they do much less
0:30:22.020,0:30:26.380
microtargeting, within here. So this is[br]definitely different strategies are used
0:30:26.380,0:30:30.330
by different advertisers, within here. But[br]when we look at it in aggregate, it still
0:30:30.330,0:30:37.650
appears that microtargeting is a very[br]popular strategy across advertisers. We
0:30:37.650,0:30:43.910
can also, right, look at some of the spend[br]type by ad type and this kind of shows you
0:30:43.910,0:30:49.870
a little bit how the different platforms[br]are used, within here. So Facebook's
0:30:49.870,0:30:54.420
platform looks like it's a little bit more[br]kind of informationally, it's still used a
0:30:54.420,0:30:59.120
lot for donations, whereas Google's[br]platform is used a lot more for donations
0:30:59.120,0:31:04.470
and a lot less for a kind of informational[br]ads and to connect within here. It's
0:31:04.470,0:31:07.770
really kind of hard to read anything into[br]Twitter's data because it's such a small
0:31:07.770,0:31:11.990
set of data. But from the data that we do[br]have it looks like there's a lot more kind
0:31:11.990,0:31:18.850
of collection of e-mails and things like[br]that, within here. The other analysis that
0:31:18.850,0:31:24.610
we did on the federal candidate ads was to[br]look at, that for Facebook in particular
0:31:24.610,0:31:30.270
right, we have the geographic impression[br]data from here. So we can effectively look
0:31:30.270,0:31:36.940
at how many states were targeted by each[br]ad with a Facebook advertiser. And the
0:31:36.940,0:31:40.160
interesting thing here is that right.[br]There was no presidential election. So
0:31:40.160,0:31:43.980
basically all these campaigns were[br]operating in one state. So their
0:31:43.980,0:31:50.190
constituents for all these elections were[br]essentially in one state, within here. And
0:31:50.190,0:31:54.430
so if you look at the inform ads, right,[br]most of those shown a very small number of
0:31:54.430,0:31:58.910
states. So the inform ads are mostly being[br]shown to the constituents that are
0:31:58.910,0:32:02.970
actually voting for that candidate.[br]However, if we look at that bottom line,
0:32:02.970,0:32:06.930
the kind of gold line, those are the[br]donation ads. And we can see that they
0:32:06.930,0:32:11.621
were fundraising in many more states[br]outside of their constituency, within
0:32:11.621,0:32:16.990
here. So FiveThirtyEight did an[br]interesting analysis of one particualar
0:32:16.990,0:32:22.410
candidate, Beto O'Rourke. He was a[br]candidate for Senate in Texas, Texas is a
0:32:22.410,0:32:27.740
very conservative state in the U.S., and[br]he did surprisingly well, within here. And
0:32:27.740,0:32:31.960
he kind of embraced online advertising and[br]online donations seeking, were kind of
0:32:31.960,0:32:37.530
cornerstones of his election, within here.[br]And so FiveThirtyEight did an analysis of
0:32:37.530,0:32:42.270
his donation records in the U.S., at the[br]federal level. All donations to candidates
0:32:42.270,0:32:46.480
have to be reported to the Federal[br]Election Committee. So this is all in a
0:32:46.480,0:32:49.410
database for the Federal Election[br]Committee the FiveThirtyEight people do
0:32:49.410,0:32:53.880
analysis And they kind of confirmed what[br]we saw on the donation ads, that he was
0:32:53.880,0:33:01.140
getting about 52% of his donations from[br]Texas and 48% from other states, primarily
0:33:01.140,0:33:08.080
kind of from coastal states that tended to[br]lean more liberal, like New York,
0:33:08.080,0:33:12.900
California, Washington and places like[br]that, was where he was donations seeking.
0:33:12.900,0:33:19.020
So this appears to be a very effective way[br]of getting small dollar donations kind of
0:33:19.020,0:33:24.760
throughout the U.S. within here, through[br]this online advertising. The last thing that
0:33:24.760,0:33:30.000
I'm going to talk about is the ad[br]targeting. Facebook didn't directly
0:33:30.000,0:33:35.510
release the ad targeting. However, we were[br]lucky enough and Pro Publica made a
0:33:35.510,0:33:41.090
browser plugin, that people can install in[br]their browser, and that's browser plugin
0:33:41.090,0:33:46.800
would identify what it thought was[br]political ads, based on a machine learning
0:33:46.800,0:33:54.460
algorithm. And for the political ads it[br]would upload these to their server along
0:33:54.460,0:33:58.840
with the targeting information. So, for[br]those of you with a facebook account, if
0:33:58.840,0:34:02.350
you're seeing ads you can actually click[br]on that ad kind of in the upper corner of
0:34:02.350,0:34:09.449
the ad and you can see why is this ad[br]targeting me, within here. And Facebook
0:34:09.449,0:34:14.919
will tell you a little bit, not all of why[br]you were targeted for this particular ad.
0:34:14.919,0:34:18.300
They will essentially show you the two[br]broadest categories of why you were
0:34:18.300,0:34:24.349
targeted for this particular ad, through[br]this feature they've added to their
0:34:24.349,0:34:27.440
platform. And this is this is actually[br]kind of interesting, this is something
0:34:27.440,0:34:32.559
that if you're a user of Facebook, I[br]highly recommend that you do. Because I
0:34:32.559,0:34:38.710
started doing it, and it was kind of eye[br]opening, as to the level of targeting that
0:34:38.710,0:34:42.809
was being done in terms of advertising.[br]That's kind of one thing, that we've
0:34:42.809,0:34:48.280
definitely learned from this is that when[br]you're seeing an ad, oftentimes there's a
0:34:48.280,0:34:53.409
very specific reason as to why you're[br]seeing that particular ad, within here.
0:34:53.409,0:34:57.730
And so we felt that it was very important[br]to, as much as we could, understand this
0:34:57.730,0:35:03.009
targeting that was going on within[br]Facebook's platform. So Pro Publica had
0:35:03.009,0:35:09.539
this browser plug-in and they had this[br]data set that anyone can analyze, with
0:35:09.539,0:35:15.460
here. So if you do have Facebook and[br]you're located within the US I would
0:35:15.460,0:35:19.150
highly recommend that you install this[br]plug-in, because it helps us to kind of
0:35:19.150,0:35:25.070
understand the political advertising in[br]terms of the targeting, within here. So we
0:35:25.070,0:35:29.690
took ProPublica's data set and we[br]effectively joined it with Facebook's add
0:35:29.690,0:35:34.640
transparency archive, within here. This[br]required us to scrape Facebook's ad
0:35:34.640,0:35:38.890
archive, because we needed the ad ID and[br]this is something that they don't expose
0:35:38.890,0:35:44.069
to their API, currently, within here.[br]However, they do expose it through their
0:35:44.069,0:35:49.339
user portal, within here. So we scraped[br]their user portal to join the specific ads
0:35:49.339,0:35:55.109
that were in the ProPublica data set to the[br]archive dataset, within here. And we were able
0:35:55.109,0:35:59.829
to join about 75% of the ads from here.[br]There were a lot of ads that were
0:35:59.829,0:36:05.859
collected by the ProPublica data set, that[br]just simply weren't archived by Facebook's
0:36:05.859,0:36:10.210
transparency archive. It misses things,[br]within here. It's imperfect as to how it
0:36:10.210,0:36:13.910
does things. And this would be another[br]interesting analysis to do, to understand
0:36:13.910,0:36:18.210
what is Facebook missing in their ad[br]transparency archive and this ProPublica
0:36:18.210,0:36:23.200
data set can allow you to somewhat do[br]this, although through bias of who
0:36:23.200,0:36:29.400
installs the Pro Publica plug-in in the[br]first place. So we we join these few data
0:36:29.400,0:36:33.609
sets, again with the caveat that the[br]ProPublica data set is, right, it's
0:36:33.609,0:36:38.279
obviously biased by the set of people that[br]installed it, which are probably not going
0:36:38.279,0:36:42.770
to be a normal representative set of[br]Facebook users, within here. But
0:36:42.770,0:36:46.230
unfortunately, it's the best thing that we[br]have in terms of a data set that releases
0:36:46.230,0:36:51.769
the targeting information, within here.[br]And so we collapse into three different
0:36:51.769,0:36:57.579
categorizations of targeting, within here.[br]I'll just quickly explain Facebook's ad
0:36:57.579,0:37:02.089
targeting platform for people that don't[br]know about it. So one way to target ads
0:37:02.089,0:37:07.990
is, right, through interest or segments,[br]right, age segments, gender segments or
0:37:07.990,0:37:13.329
interests like I showed you before, within[br]here. So this is one way to target ads
0:37:13.329,0:37:19.319
within Facebook's platform. Another way to[br]target ads is through uploading lists of
0:37:19.319,0:37:23.069
information. So you can upload lists of[br]people's phone numbers, people's email
0:37:23.069,0:37:27.910
addresses or their names. And then when[br]you upload this list Facebook will find
0:37:27.910,0:37:32.440
those profiles within their database, so[br]they'll basically join those emails with
0:37:32.440,0:37:36.979
the emails that were entered by the users[br]accounts, and then they'll target these
0:37:36.979,0:37:41.210
people. So they'll create what they call[br]an audience of these people through this
0:37:41.210,0:37:45.020
personally identifiable information and[br]then they'll target them, through this
0:37:45.020,0:37:50.309
method. The final kind of major form of[br]targeting that Facebook offers is through
0:37:50.309,0:37:53.969
what they call these lookalike audiences.[br]So this is where you can upload PII
0:37:53.969,0:37:59.349
information, like email addresses, phone[br]numbers, names. Facebook will link them to
0:37:59.349,0:38:03.650
their accounts and then they'll look at[br]kind of the interests and things that
0:38:03.650,0:38:06.600
these users and then they'll find you[br]other users, not these users, but other
0:38:06.600,0:38:11.950
users, that have a similar kind of profile[br]to these users within here. So these are
0:38:11.950,0:38:17.900
the lookalike audiences that Facebook[br]offers within their platform. And so we
0:38:17.900,0:38:23.349
categorized it by this and again by[br]advertiser type, within here. So the thing
0:38:23.349,0:38:29.500
that stands out is, right, is that the for[br]profit companies are doing a lot of
0:38:29.500,0:38:33.559
targeting based on interests and segments.[br]So they probably don't know who their
0:38:33.559,0:38:38.819
people that they want a message to are and[br]they're doing it mostly by interests and
0:38:38.819,0:38:44.549
segment. Whereas when you look at the PACs[br]and the political candidates they have
0:38:44.549,0:38:49.270
lists. So they have a lot of lists of[br]people's you know email addresses, phone
0:38:49.270,0:38:53.579
numbers, names, of things like this. And[br]they're plugging these into Facebook's
0:38:53.579,0:38:59.159
system. And this is how they're targeting[br]a lot of people, within here, is through
0:38:59.159,0:39:04.440
these lists. And this was expected, but[br]it's interesting to kind of quantify how
0:39:04.440,0:39:10.069
much of this is happening. And then the[br]lookalike audiences are also being used, a
0:39:10.069,0:39:14.000
good deal by everyone within here. And[br]this kind of makes sense, right? Because
0:39:14.000,0:39:17.290
if you have a list of people then you[br]advertise to them but then right you have
0:39:17.290,0:39:23.210
this lookalike audience of people that are[br]similar to them that are also perhaps good
0:39:23.210,0:39:32.640
people to advertise to, as well, within[br]here. The other thing we can do is break
0:39:32.640,0:39:37.670
this down by the intent of the ad here,[br]and this shows the difference even more
0:39:37.670,0:39:43.520
starkly, of the difference in behavior[br]between the commercial people and the
0:39:43.520,0:39:47.250
noncommercial people. The commercial[br]people are targeting mostly based on
0:39:47.250,0:39:52.640
interest, whereas the other people that[br]are, say, looking to connect with people,
0:39:52.640,0:39:55.789
they're the ones that are using the most[br]lookalike audiences. And this makes
0:39:55.789,0:39:59.359
perfect sense because right the connection[br]ads are there to get people's e-mails,
0:39:59.359,0:40:02.930
addresses, phone numbers, names and things[br]like that. So when you use the look like
0:40:02.930,0:40:08.220
audiences then you can, right, generate[br]more lists of people they'll convert for
0:40:08.220,0:40:13.249
whatever you want and then you can[br]retarget them with the direct lists
0:40:13.249,0:40:18.809
targeted ads, later on. So this all makes[br]pretty good sense when you look at how
0:40:18.809,0:40:23.310
this is behaving, from here. But again[br]it's interesting, right, kind of make this
0:40:23.310,0:40:28.900
transparent for people to understand how[br]targeting is happening within the U.S.
0:40:28.900,0:40:33.799
political advertising sphere, within here.[br]So these were pretty much the two major
0:40:33.799,0:40:38.789
analyses that we did in terms of[br]targeting, within here. The final part and
0:40:38.789,0:40:43.989
the part that kind of makes the juiciest[br]of stories is kind of the more dubious
0:40:43.989,0:40:48.259
advertisers that are advertising within[br]these platforms in terms of political
0:40:48.259,0:40:53.340
advertising. So we kind of call these more[br]politely kind of "new types of
0:40:53.340,0:40:57.951
advertising", within here. The first type[br]is one that you would you would pretty
0:40:57.951,0:41:02.910
much expect, so this is this corporate[br]astroturfing kind of stuff, that's going
0:41:02.910,0:41:09.290
on, within here. We see these ads for[br]assistance for tobacco rights. And I
0:41:09.290,0:41:13.089
pretty much expected that you look up this[br]group and it's probably going to be some
0:41:13.089,0:41:17.910
you know quasi nonprofit that's supported[br]by some industry money from the tobacco
0:41:17.910,0:41:21.510
lobbyists, or something like that. That's[br]pretty much what I expected to see when I
0:41:21.510,0:41:28.770
saw these ads. You go to this website and[br]it's actually pretty honest as to what it
0:41:28.770,0:41:32.809
does. This is probably because right of[br]all the lawsuits and regulations around
0:41:32.809,0:41:36.759
tobacco in the U.S. in advertising. But[br]the website clearly states, right, that
0:41:36.759,0:41:43.130
it's operated by Philip Morris, the[br]tobacco company, within here. And this
0:41:43.130,0:41:46.530
actually isn't a legal entity, this[br]citizens for tobacco rights. Is just
0:41:46.530,0:41:50.970
simply a website that's been stood up,[br]that's owned and operated by Philip
0:41:50.970,0:41:55.390
Morris, as far as we can tell, within[br]here. And this gets to a big problem with
0:41:55.390,0:42:00.819
Facebook's transparency archive, which is[br]that they don't actually vet that
0:42:00.819,0:42:05.140
disclaimer string of the sponsor, within[br]here. So pretty much anyone can type
0:42:05.140,0:42:11.880
anything that they want within that[br]disclaimer string and Facebook will allow
0:42:11.880,0:42:15.730
you to run it. We've tested it and as far[br]as we can tell, you can't say that you're
0:42:15.730,0:42:19.620
from Facebook, Instagram or that you're[br]Mark Zuckerberg, they'll block that. But
0:42:19.620,0:42:24.011
pretty much anything else that you type in[br]there they'll allow that ad to run, within
0:42:24.011,0:42:30.440
here, with no vetting. So we discovered[br]this, we politely, privately mentioned it
0:42:30.440,0:42:36.119
to Facebook. Some reporters kind of[br]trolled Facebook within here and so there
0:42:36.119,0:42:41.421
was a reporter that trolled Facebook and[br]opened up ads for all the senators, within
0:42:41.421,0:42:45.440
here, on Facebook. And of course Facebook[br]approved them all, from within here, and
0:42:45.440,0:42:49.849
they they did some other things to troll[br]Facebook where they insert some other
0:42:49.849,0:42:54.799
advertisements, within here. But the point[br]is, that that disclaimer string is not
0:42:54.799,0:43:00.500
vetted within here. Google actually does[br]that disclaimer string within there, so
0:43:00.500,0:43:05.079
they require either a tax ID number or a[br]federal election committee I.D. number and
0:43:05.079,0:43:11.539
they actually do vet it and they publish[br]that tax I.D. number or federal election
0:43:11.539,0:43:14.590
I.D. number along with the disclaimer[br]string, within here. Which makes it really
0:43:14.590,0:43:18.759
easy to track down advertising on Google.[br]On Facebook, because right they can
0:43:18.759,0:43:22.339
basically type in whatever they want in[br]the disclaimer string, it makes it much
0:43:22.339,0:43:25.930
more difficult to actually link these[br]advertisers. And sometimes just outright
0:43:25.930,0:43:33.910
impossible, if the disclaimer string is[br]made up or just too mutilated in some way
0:43:33.910,0:43:39.989
or form, within here. So this is[br]definitely a problem, where we have these
0:43:39.989,0:43:43.530
lobbyist organizations, or in this case[br]not even lobbyist organizations, just
0:43:43.530,0:43:49.519
industry, that can effectively lie about[br]who's paying for this ad in Facebook's
0:43:49.519,0:43:55.849
platform. The other thing we found were[br]what is now kind of being called these
0:43:55.849,0:44:03.029
junk media outlets. So this is for profit[br]outlets that are claiming that they're
0:44:03.029,0:44:08.410
doing kind of news operations. But right.[br]It's not really traditional kind of
0:44:08.410,0:44:12.780
reporting journalistic things. It's more[br]just kind of propaganda messaging, within
0:44:12.780,0:44:18.769
here. So there is this group called New[br]American Media Group LLC. They also ran
0:44:18.769,0:44:24.641
the name of New Democracy, or sorry[br]Democracy Now was their other name, within
0:44:24.641,0:44:30.529
here. And so they ran this, within here.[br]We tracked down these LLCs and they were
0:44:30.529,0:44:35.750
just simply shell companies and that kind[br]of led to nowhere, within here. We worked
0:44:35.750,0:44:41.500
with a journalist from The Atlantic that[br]actually did a lot of digging into the
0:44:41.500,0:44:48.650
shell companies. And he was able to,[br]through his basically investigation, link
0:44:48.650,0:44:53.640
these companies to the actual entity that[br]created these shell companies and was
0:44:53.640,0:45:00.579
running these ads, within here. And so[br]when we did our analysis of this, this
0:45:00.579,0:45:07.220
company basically this third party[br]advertising company was creating these.
0:45:07.220,0:45:12.710
They're meant to look like kind of[br]grassroots kind of organizations. There
0:45:12.710,0:45:17.170
were, a lot of them were kind of targeted[br]at more conservatively leaning groups, but
0:45:17.170,0:45:22.480
then they would bombard them with liberal[br]messaging, within here. So they would
0:45:22.480,0:45:26.169
create these fake communities that looked[br]more conservative. And then once they
0:45:26.169,0:45:30.390
attract an audience they would bombard[br]them with these liberal kinds of
0:45:30.390,0:45:35.589
messaging, within here. And so this[br]particular company is based in Colorado.
0:45:35.589,0:45:40.849
It's called MOTIVE AI. Apparently, it's[br]hoping to become the Cambridge Analytica
0:45:40.849,0:45:48.510
of the liberal side. I don't know if[br]that's something to aspire to or not. Some
0:45:48.510,0:45:52.450
other journalists also did some digging,[br]within here. There was some journalists
0:45:52.450,0:45:57.269
from ProPublica that did some digging,[br]within here. They found more of this
0:45:57.269,0:46:03.020
astroturfing by political lobbyist groups[br]and things like that. Big oil insurance
0:46:03.020,0:46:08.549
companies, again when they advertised on[br]say Google's platform they would be honest
0:46:08.549,0:46:11.809
about their disclaimer string, and then[br]when they advertised on Facebook's
0:46:11.809,0:46:16.319
platform they would often kind of[br]obfuscate their disclaimers string, to
0:46:16.319,0:46:23.430
make it more difficult to link them[br]together. And so they unmasked a whole
0:46:23.430,0:46:27.869
bunch of these other kinds of junk media[br]operations, as well, that were kind of
0:46:27.869,0:46:33.499
spreading propaganda, within here. I'm[br]picking on Facebook a lot. Again Google
0:46:33.499,0:46:38.400
does vet the tax I.D. number of these[br]people, but you see something like, right,
0:46:38.400,0:46:44.560
this DIGICO LLC that paid for some ads. So[br]you track this down, and this is again one
0:46:44.560,0:46:48.051
of these third party advertising agency.[br]It's easy to track down because of the tax
0:46:48.051,0:46:52.030
I.D. number. But it still doesn't actually[br]tell you who paid for the ad. It just
0:46:52.030,0:46:56.000
tells you the third party that, right, it[br]presumably was paid on behalf of someone
0:46:56.000,0:47:00.210
else to run these ads, from here. So this[br]is a big problem with these disclaimers
0:47:00.210,0:47:04.170
strings, that oftentimes they don't[br]actually identify the person that's paying
0:47:04.170,0:47:10.849
for the ad. So to kind of wrap this up,[br]within here, after our kind of experiences
0:47:10.849,0:47:15.150
looking at these transparency archives I[br]would say they're fairly adequate to
0:47:15.150,0:47:20.339
understand good actors. So we could fairly[br]well understand how good political
0:47:20.339,0:47:24.089
advertisers were behaving in Facebook's[br]platform. However, right, for the bad
0:47:24.089,0:47:28.829
advertisers, we probably missed a lot of[br]them because they could just simply type
0:47:28.829,0:47:33.119
in lots of different disclaimers strings[br]and easily avoid our analysis, at this
0:47:33.119,0:47:39.490
point. None of these current archives have[br]it just right yet. All of them have
0:47:39.490,0:47:44.190
issues, right. Facebook isn't providing[br]good access to their data. They're not
0:47:44.190,0:47:49.069
releasing targeting information. Google is[br]missing 30% the content because of third
0:47:49.069,0:47:56.499
parties using their advertising system.[br]They're not releasing spend and impression
0:47:56.499,0:48:01.950
information based on demographics, within[br]there. Twitter just simply hasn't hired
0:48:01.950,0:48:08.060
someone to enforce the policy of[br]transparency, well, within here. And
0:48:08.060,0:48:12.880
unfortunately our experience throughout[br]this process has been that these companies
0:48:12.880,0:48:17.569
are oftentimes reactive, instead of[br]proactive, within here. Which means that,
0:48:17.569,0:48:20.660
right, we have to continuously put[br]pressure on them, in order for them to
0:48:20.660,0:48:26.000
kind of improve these archives, within[br]here. So this is unfortunately kind of the
0:48:26.000,0:48:30.390
state that we're in, within here. And I'm[br]sure, one thing that I really want to give
0:48:30.390,0:48:33.430
a shoutout, is right there's people at[br]these companies that are actually trying
0:48:33.430,0:48:38.739
to build these transparency archives. And[br]I want to give them a lot of credit for
0:48:38.739,0:48:43.589
taking on this task, that's probably not[br]well rewarded within their companies, of
0:48:43.589,0:48:47.420
building these transparency archives,[br]within here. And so my hope is that by
0:48:47.420,0:48:52.690
applying pressure we can get them more[br]support to kind of get more resources and
0:48:52.690,0:48:58.891
be able to make more transparent, within[br]their companies, as well. Because I hope
0:48:58.891,0:49:04.960
that, right, this puts us in better shape[br]to understand the 2018 elections, but 2020
0:49:04.960,0:49:09.589
is another presidential election and my[br]hope is that we'll continue the improved
0:49:09.589,0:49:13.920
these archives, so that we'd be in a much[br]better position to understand both the
0:49:13.920,0:49:19.839
good and the bad advertisers by 2020, with[br]here. However this is going to take
0:49:19.839,0:49:25.299
probably regulatory pressure, legal[br]pressure, pressure by technologists and
0:49:25.299,0:49:31.670
things like this to improve these[br]archives, at this point. So with that,
0:49:31.670,0:49:35.521
again, I have my collaborators, that[br]aren't here on the stage, but they
0:49:35.521,0:49:40.809
definitely did a lot of the heavy lifting[br]to make this happen, within here. And
0:49:40.809,0:49:45.340
again all of our tools and most of our[br]data except for the Facebook data, that's
0:49:45.340,0:49:51.262
under NDA, is available through our[br]GitHub, there. And so with that I will
0:49:51.262,0:49:52.576
open it up to questions.[br]
0:49:52.576,0:50:03.836
applause
0:50:03.836,0:50:04.970
Herald: Thank you so much Damon. I know
0:50:04.970,0:50:11.530
that there are a few questions among the[br]audience. So, microphone 6 please.
0:50:11.530,0:50:16.480
Question: So [Name] on the IRC is asking[br]"Have you looked at links between the
0:50:16.480,0:50:21.620
advertisers and do they use the same[br]images or text for instance?".
0:50:21.620,0:50:24.200
Answer: This is a really good question.[br]This is actually one of the analysis that
0:50:24.200,0:50:27.539
we're currently doing. So we're starting[br]with the text, because that's obviously
0:50:27.539,0:50:32.360
the easiest. But we're also exploring some[br]image clustering algorithms, as well. To
0:50:32.360,0:50:37.099
cluster the advertisers across platforms[br]and also within platform because we're
0:50:37.099,0:50:40.299
finding a lot where, you know, they create[br]multiple shell companies, where they just
0:50:40.299,0:50:44.089
lie about their disclaimers and so this is[br]definitely something that we're focusing
0:50:44.089,0:50:49.049
on, is better clustering of the[br]advertisers. Because like that group
0:50:49.049,0:50:53.180
MOTIVE AI, even though they created the[br]different LLCs they were running the same
0:50:53.180,0:50:59.180
images and videos across their different[br]LLC shell companies.
0:50:59.180,0:51:02.090
Herald: Great thank you. Please if you[br]have any questions, queue up by the
0:51:02.090,0:51:07.820
microphones. Microphone number 1 please.[br]Question: Hi, Oliver Moldenhauer? Thanks a
0:51:07.820,0:51:13.390
lot for the talk. Definitely one of the[br]best I've seen here so far. Two questions.
0:51:13.390,0:51:18.970
A: Why do those transparency archives[br]exist? Was there some law or political
0:51:18.970,0:51:24.959
process around that? And B: As we are[br]nearing the European election next year,
0:51:24.959,0:51:30.680
what kind of data is available for Europe?[br]Answer: That are both good questions.
0:51:30.680,0:51:33.979
Again I'm not intern in one of these[br]companies, so I can just speculate as to
0:51:33.979,0:51:38.279
why these transparency archive exists. But[br]my my guess is, right, that this was
0:51:38.279,0:51:43.839
reactionary. So Mark Zuckerberg and high[br]ranking officials from Twitter and Google
0:51:43.839,0:51:49.920
were hauled in to testify in the House and[br]Senate, and this is them trying to self
0:51:49.920,0:51:53.989
regulate instead of having regulation[br]imposed on them by people. So that again,
0:51:53.989,0:51:58.420
this goes to the pressure part is that[br]there was regulatory pressure put on them,
0:51:58.420,0:52:03.089
the threat of regulatory pressure and so[br]that's what made them do these
0:52:03.089,0:52:09.150
transparency archives. In terms of what's[br]available in Europe. I guess as long as
0:52:09.150,0:52:17.390
the UK is still in the EU, kind of[br]teetering Facebook has started to make ads
0:52:17.390,0:52:22.599
transparent in the UK. They also make them[br]transparent in Brazil and they're going to
0:52:22.599,0:52:25.859
make them transparent in India. And I[br]think they have plans to make them
0:52:25.859,0:52:30.650
transparent in other places, in the EU as[br]well. However, they haven't done that.
0:52:30.650,0:52:35.570
However, again this goes back to the[br]pressure part. So there's no API for the
0:52:35.570,0:52:40.029
other countries, there's only an API for[br]the US and that might be because we put
0:52:40.029,0:52:45.640
pressure on them by scraping them and[br]publicly releasing their data. And, right,
0:52:45.640,0:52:48.700
there's no transparency reports for other[br]countries, as well, there's only
0:52:48.700,0:52:52.260
transparency report for the US. And again[br]that might have been because we applied
0:52:52.260,0:52:56.300
pressure and we were publishing numbers.[br]Some of the numbers in terms of spend were
0:52:56.300,0:52:59.710
very low, because, right, they were just[br]giving us ranges. So we might have been
0:52:59.710,0:53:03.700
making them look bad, when we took the[br]bottom range their spend and they might
0:53:03.700,0:53:08.309
have wanted to correct that with their own[br]transparency archive, as well. So again, a
0:53:08.309,0:53:11.960
lot of this unfortunately requires[br]pressure to get them to improve their
0:53:11.960,0:53:16.499
transparency efforts.[br]Herald: Great thank you. Microphone number
0:53:16.499,0:53:20.630
two please.[br]Question: So you mentioned you mentioned
0:53:20.630,0:53:25.900
FiveThirtyEight and their work on the[br]donations. Do you think it makes sense
0:53:25.900,0:53:33.047
to combine the data you gathered with what[br]they have to look at election outcomes,
0:53:33.047,0:53:37.660
like, election results and turnout and[br]stuff like that?
0:53:37.660,0:53:43.440
Answer: Yes. Actually this is the number[br]one project on our road map, right now.
0:53:43.440,0:53:49.940
Is, actually Google has processed the FEC[br]information and they've made this
0:53:49.940,0:53:55.631
information available via their big query[br]database. So we've downloaded this, we've
0:53:55.631,0:54:01.809
manually linked the Facebook advertisers[br]and the Google advertiser to the FEC data
0:54:01.809,0:54:06.999
and now we're doing the regression models,[br]specifically focused on the donation ads
0:54:06.999,0:54:10.760
first. Because those are what are reported[br]to the FEC, at this point. So we are
0:54:10.760,0:54:14.910
essentially trying to understand how[br]effective these donation ads are at
0:54:14.910,0:54:21.010
actually driving donations, within here.[br]Herald: Thank you. Microphone number 4
0:54:21.010,0:54:24.740
please.[br]Question: Hi. First of all thank you Mr.
0:54:24.740,0:54:30.420
McCoy and your team for this very[br]interesting research. I was wondering,
0:54:30.420,0:54:35.499
whether you know if there are any follow[br]up research conducted by political
0:54:35.499,0:54:42.350
scientists, sociologists etc. analyzing[br]the political repercussions of these ad
0:54:42.350,0:54:47.409
campaigns.[br]Answer: Yes, so we're aware of a few
0:54:47.409,0:54:50.420
efforts. I don't want to out the teams[br]that are doing them, in case they don't
0:54:50.420,0:54:55.549
want to be outed. There's there's nothing[br]that's been published, publicly I believe
0:54:55.549,0:54:59.880
on this. But we're definitely trying to.[br]That's one of the main goals of kind of
0:54:59.880,0:55:04.299
our overarching online political[br]advertising transparency thing, is to try
0:55:04.299,0:55:09.061
and get as much data as we can in the[br]hands of less technical people in an easy
0:55:09.061,0:55:15.989
way for them to analyze. And so this is[br]basically the primary goal of our project,
0:55:15.989,0:55:20.130
in here. So we've been working as hard as[br]we can to get political science to stay up
0:55:20.130,0:55:25.349
to speed on the data. And this is why it's[br]really unfortunate that Facebook has its
0:55:25.349,0:55:30.239
NDA in place for their particular data,[br]because this makes it very difficult for
0:55:30.239,0:55:35.099
us to share and collaborate in that[br]particular data. Which puts pressure on us
0:55:35.099,0:55:40.109
unfortunately as being the only ones that[br]can do some of this analysis right now. So
0:55:40.109,0:55:44.369
this is why I would I would love to apply[br]enough pressure to Facebook, to get better
0:55:44.369,0:55:49.490
access to their particular data.[br]Herald: Yes. And the question from the
0:55:49.490,0:55:53.960
Internet please.[br]Signal Angel: So Nomad is asking "Why are
0:55:53.960,0:55:59.099
those advertisements considered political or[br]election interference in the USA. Can't you just
0:55:59.099,0:56:03.930
see, that someone paid money to display[br]that content and conclude its purpose is
0:56:03.930,0:56:11.549
to promote an agenda or manipulate them?".[br]Answer: This is a good question. Right, a
0:56:11.549,0:56:15.210
lot of this goes to the tactics that[br]they're using here. So again they're
0:56:15.210,0:56:19.040
creating these communities, that they're[br]making look like their grass roots
0:56:19.040,0:56:24.479
communities and then they're kind of[br]sucking people in with these ads, that up
0:56:24.479,0:56:28.630
until recently had no disclaimers string[br]on them. So you had no idea who paid for
0:56:28.630,0:56:33.820
them. So they appear to be paid for by[br]kind of these grassroots organizations. So
0:56:33.820,0:56:39.221
you felt like you were, kind of, part of a[br]grassroots movement, enjoining these kinds
0:56:39.221,0:56:43.029
of communities. I think this is the really[br]scary, kind of subtle things. And you
0:56:43.029,0:56:46.520
might not realize why you're being[br]targeted for these particular ads or who
0:56:46.520,0:56:50.269
was behind these particular ads. So, I[br]think it was really easy for people to
0:56:50.269,0:56:55.339
kind of get unwittingly, kind of, duped[br]into joining what looked like these
0:56:55.339,0:56:59.690
grassroots campaigns. So that's why I[br]think improving these disclaimers strings
0:56:59.690,0:57:03.319
and showing who is really behind these[br]communities and these advertisements is
0:57:03.319,0:57:09.180
really important, to dispel this notion of[br]these fake grassroots communities, that
0:57:09.180,0:57:12.960
are luring people in within here. So I[br]think that's one of the big things that
0:57:12.960,0:57:18.029
can be gained by these transparency[br]archives. But it requires improvement of
0:57:18.029,0:57:22.680
the transparency archives, to do that.[br]Herald: Microphone number 3 please.
0:57:22.680,0:57:28.430
Question: Yes. So I'm curious about the[br]efficacy of some of the advertisements
0:57:28.430,0:57:36.579
that are on Facebook and Twitter. And I'm[br]wondering is any group like the ProPublica
0:57:36.579,0:57:45.279
web extension checking the engagement[br]rate? Like the number of comments, the
0:57:45.279,0:57:53.119
number of views and the number of shares,[br]to like kind of get an estimate of, OK
0:57:53.119,0:57:58.440
this big grassroots community is building[br]up a number of followers and these
0:57:58.440,0:58:05.520
followers population sizes and whatnot.[br]Answer: Yeah, this is again a really good
0:58:05.520,0:58:11.030
question. This is something that we are, I[br]would certainly encourage other people to
0:58:11.030,0:58:14.200
potentially do as well. So the problem is[br]that a lot of that information isn't
0:58:14.200,0:58:18.180
exposed by the transparency archives. This[br]is more of what they call kind of the
0:58:18.180,0:58:22.859
organic information, the non paid for[br]information, within here. And so this is
0:58:22.859,0:58:27.870
stuff that none of the platforms are[br]releasing. And so it requires kind of a
0:58:27.870,0:58:32.509
scraping operation, essentially, to gather[br]this information and collect it. And it's
0:58:32.509,0:58:36.230
something that we're definitely thinking[br]about how to efficiently do, is how to
0:58:36.230,0:58:40.619
efficiently scrape and collect this[br]information. Because this is very hard
0:58:40.619,0:58:43.420
because, right, you go against the anti[br]scraping teams of these companies, that
0:58:43.420,0:58:46.740
are well resourced. And this requires[br]accounts, and these accounts are going to
0:58:46.740,0:58:51.339
be shut down and detected. So this is[br]something that we're trying to pilot to
0:58:51.339,0:58:55.609
understand. Our other idea of how to do[br]this potentially is try and crowdsource
0:58:55.609,0:59:00.260
this information. This is similar to how[br]ProPublica crowdsourced it for the browser
0:59:00.260,0:59:03.950
extension information. We could[br]potentially crowdsource it, where you
0:59:03.950,0:59:08.109
know, when people interact with these[br]communities or these ads the plug-in could
0:59:08.109,0:59:11.770
potentially crowdsource that information[br]back to us. And then we would have to
0:59:11.770,0:59:17.700
figure out some strategy to sanitize that[br]information in some way. Because at that
0:59:17.700,0:59:20.859
point you might have some sensitive[br]information they are collecting. This is
0:59:20.859,0:59:26.390
something that we're thinking about. We're[br]cautious, I think, rightly so because this
0:59:26.390,0:59:31.160
can start stepping on, again, more[br]sensitive information that's available
0:59:31.160,0:59:33.859
from within here. But I think it's[br]definitely key to understanding the
0:59:33.859,0:59:37.380
effectiveness of these ads. Something that[br]we're going to have to do or we're going
0:59:37.380,0:59:42.339
to have to convince Facebook somehow to do[br]on our behalf in order to really
0:59:42.339,0:59:46.429
understand the effectiveness of these ads.[br]Herald: Thank you. Last question for
0:59:46.429,0:59:49.579
microphone number 1.[br]Question: All right. At the beginning of
0:59:49.579,0:59:54.340
your talk you explained how Russia[br]influenced the elections. I'm curious
0:59:54.340,0:59:59.529
about the attribution. Is there possibly[br]any doubts at any instance that you
0:59:59.529,1:00:06.359
presented that it was not Russia or maybe[br]some other country, China or Iran? How do
1:00:06.359,1:00:09.720
you know, and did you check the facts?[br]Answer: I mean, that's a good question.
1:00:09.720,1:00:14.430
Unfortunately, right, the national[br]security agencies don't release the
1:00:14.430,1:00:20.849
sources of their information. There's[br]another investigation done by the
1:00:20.849,1:00:27.229
Department of Justice by Robert Mueller,[br]that did release some more information
1:00:27.229,1:00:31.749
about this, within here. I've looked at[br]that information and it looks, you know,
1:00:31.749,1:00:36.449
right, you can never a 100%, unequivocally[br]state that it was Russia. It could have
1:00:36.449,1:00:41.009
been a false flag operation. But I think[br]that pretty much the overwhelming
1:00:41.009,1:00:45.349
information that everyone has found when[br]they've investigated this has pointed at
1:00:45.349,1:00:52.880
Russia and the organizations that were[br]prosecuted by Mueller.
1:00:52.880,1:00:56.920
Herald: Damon McCoy, thank you very much.[br]Please give them a great round of applause.
1:00:56.920,1:00:59.540
Applause
1:00:59.540,1:01:04.970
35c3 postroll music
1:01:04.970,1:01:22.000
subtitles created by c3subtitles.de[br]in the year 2019. Join, and help us!