0:00:00.000,0:00:19.450 35c3 preroll music 0:00:19.450,0:00:23.720 Herald: This talk will be held by Damon[br]McCoy. He will be explaining online U.S. 0:00:23.720,0:00:28.390 political advertising and he has been[br]working with researching like how 0:00:28.390,0:00:34.280 different online communities basically[br]behave around many different topics. But 0:00:34.280,0:00:37.480 this is what he's going to talk about[br]today so please give him a great round of 0:00:37.480,0:00:46.729 applause.[br]Applause 0:00:46.729,0:00:50.910 Damon McCoy: Thank you everyione for coming. I'm[br]up here speaking and I'm the only one that 0:00:50.910,0:00:55.399 wanted to fly to Germany over Christmas[br]and New Year's. However there were three 0:00:55.399,0:01:00.369 real people that were really key in[br]helping out with this research and before 0:01:00.369,0:01:04.559 we get started, I just want to credit[br]them. One is my grad student Laura 0:01:04.559,0:01:08.410 Adelson. She did a lot of the analysis[br]that you're going to be seeing generated 0:01:08.410,0:01:14.509 all the graphs. One of the undergraduate[br]students that's from our NYU Shanghai 0:01:14.509,0:01:19.740 campus secure did a lot of the work to[br]collect all the data that you're going to 0:01:19.740,0:01:25.119 see here. And then Raytown who is a[br]professor at NYU and the Shanghai campus 0:01:25.119,0:01:30.960 also helped out with kind of our initial[br]efforts of collecting some of this data. 0:01:30.960,0:01:34.930 And so before we get started I guess I'll[br]give a little bit of an introduction about 0:01:34.930,0:01:39.930 myself. I'm a professor at NYU tenant[br]school of engineering. As was mentioned 0:01:39.930,0:01:45.460 before I do a lot of stuff kind of looking[br]at how technology kind of impacts the 0:01:45.460,0:01:51.280 security and privacy of you know society[br]groups of people and things like them. So 0:01:51.280,0:01:56.530 this was really kind of an opportunistic[br]project that kind of captured the impact 0:01:56.530,0:02:06.799 of online advertising in the political[br]sphere of U.S. campaigns. And also quick 0:02:06.799,0:02:12.620 plug. So everything that I'm going to be[br]showing you most of the data scripts and 0:02:12.620,0:02:17.110 things like that we've put in a github[br]that's accessible by anyone that wants to 0:02:17.110,0:02:23.230 analyze the data or look at our scripts[br]and improve them or things like that. 0:02:23.230,0:02:30.920 Applause Thank you. This is the first[br]time that I've given this talk outside of 0:02:30.920,0:02:35.890 the U.S. so let me just start with some[br]quick explanation as to how U.S. elections 0:02:35.890,0:02:41.200 work for those of you that might not know[br]about this. So every two years in the U.S. 0:02:41.200,0:02:46.020 we hold federal elections. These are[br]elections - right - that impact all of the 0:02:46.020,0:02:53.430 states within the U.S. And so every four[br]years we have an election for president, 0:02:53.430,0:02:58.839 2018 were our last elections. This was not[br]a presidential election year so the 0:02:58.839,0:03:04.759 elections were for the Senate and the[br]House seats at the federal level. And then 0:03:04.759,0:03:10.099 we also had elections for state and local[br]positions as well with in here. And some 0:03:10.099,0:03:14.900 of them that are captured in our data[br]especially our Facebook data not so much 0:03:14.900,0:03:21.220 in our Twitter and Google add transparency[br]data that we have here. So this will be 0:03:21.220,0:03:26.740 focused this talk on the 2008 elections[br]that happened on November 7th. Election 0:03:26.740,0:03:34.650 day is always the first Tuesday in[br]November in the U.S. every two years. So 0:03:34.650,0:03:37.621 to begin with in the background right[br]probably some of you know about this, some 0:03:37.621,0:03:41.780 of you might not know about this, but in[br]the 2016 elections which were a 0:03:41.780,0:03:46.969 presidential election year there was right[br]this election interference that happened 0:03:46.969,0:03:53.060 here. And so Facebook has released these[br]ads. These ads were paid for by a Russian 0:03:53.060,0:04:00.819 company the Internet Research Agency that[br]ran these ads. And Facebook released these 0:04:00.819,0:04:05.430 to - right - the Senate and then the[br]Senate re-released these publicly to 0:04:05.430,0:04:10.260 people. And so this is an ad basically[br]trying to disenfranchise people from 0:04:10.260,0:04:15.069 voting in the elections. And you can see[br]right it's targeted at people in the U.S. 0:04:15.069,0:04:19.880 of a certain age range and interests like[br]Martin Luther King, African-American 0:04:19.880,0:04:24.260 culture, African-American civil rights.[br]Facebook doesn't actually allow you to 0:04:24.260,0:04:30.170 directly target to people based on like[br]their ethnicity. So this is a pretty good 0:04:30.170,0:04:33.880 proxy though. If you target these kinds of[br]interests for this, so this would probably 0:04:33.880,0:04:38.500 be fairly effective at targeting African-[br]American people within the U.S. for this 0:04:38.500,0:04:44.310 ad to try and disenfranchise them from[br]voting in the elections. There were other 0:04:44.310,0:04:49.680 ads , right, that tried to do[br]misinformation distance information kinds 0:04:49.680,0:04:55.230 of campaigns. So this was an ad that was[br]again paid for by the Russian agency that 0:04:55.230,0:05:00.670 was trying to perpetuate this rumor[br]basically unsubstantiated rumor that Bill 0:05:00.670,0:05:04.960 Clinton has this ill legitimate child[br]within here. And again, right, the 0:05:04.960,0:05:08.560 targeting information is targeted at[br]African-Americans within the U.S. and 0:05:08.560,0:05:14.770 African-Americans are kind of a key voting[br]block for a kind of more liberal, 0:05:14.770,0:05:19.790 democratic people within the U.S.[br]oftentimes. That's probably the other 0:05:19.790,0:05:23.860 thing that should have explained about the[br]U.S. election system especially at the 0:05:23.860,0:05:29.040 federal level is that we effectively have[br]two parties that are you know when any 0:05:29.040,0:05:33.810 meaningful amount of elections within the[br]U.S. and one of them is the Democratic 0:05:33.810,0:05:38.740 party that tends to skew more liberal. So[br]they're more kind of right for bigger 0:05:38.740,0:05:42.930 government, more social services and then[br]we have the Republicans which skew kind of 0:05:42.930,0:05:48.290 more conservative wanting kind of a[br]smaller government providing less services 0:05:48.290,0:05:54.810 and kind of less regulation around things[br]as well. And so right these are two 0:05:54.810,0:06:01.620 examples but there were a whole bunch of[br]these ads that were shown on Facebook 0:06:01.620,0:06:04.620 within a year. So right. Pretty much all[br]of these that they're tried to 0:06:04.620,0:06:11.160 disenfranchise people or tried to kind of[br]create chaos, kind of polarize people 0:06:11.160,0:06:16.990 around the election oftentimes with kind[br]of disinformation sorts of that things. 0:06:16.990,0:06:25.460 And so in 2017 our Office of the Director[br]of National Intelligence put out a report 0:06:25.460,0:06:29.290 - sorry for the big block of text, this[br]will be the only big box of text in here. 0:06:29.290,0:06:33.990 But I thought it was kind of important to[br]show this because they they pretty much 0:06:33.990,0:06:41.320 unequivocally state that Russia tried to[br]interfere in the US elections and that 0:06:41.320,0:06:46.400 Vladimir Putin was somehow involved within[br]this interference. And so this is - this 0:06:46.400,0:06:53.210 is pretty much as far as the National[br]Security Agency, the CIA, NSA pretty much 0:06:53.210,0:06:58.430 solid evidence that they have - that this[br]occurred within here. And so the other 0:06:58.430,0:07:02.910 thing that broke was right the Cambridge[br]Analytica scandal as well broke within 0:07:02.910,0:07:07.590 Facebook where there was this straight[br]third party advertising agency that 0:07:07.590,0:07:12.930 collected you know a whole bunch of data[br]on 80 million profiles within Facebook and 0:07:12.930,0:07:17.030 then tried to create psychological[br]profiles for targeting and messaging and 0:07:17.030,0:07:22.270 things like that around here. And so these[br]two particular scandals broke within here 0:07:22.270,0:07:28.760 and the first result of this is we have[br]Mark Zuckerberg in a suit a real suit not 0:07:28.760,0:07:37.470 a hoodie suit. Testifying right in front[br]of our Senate within a year. And so he, 0:07:37.470,0:07:42.490 right, he testified before House and[br]Senate committees about the abuses 0:07:42.490,0:07:50.340 occurring within Facebook. And he did this[br]on April 10th and 11th 2018 within here, 0:07:50.340,0:07:55.550 and right. So this is right. In here, he[br]did admit that Facebook had made mistakes 0:07:55.550,0:08:04.410 and that they they need to improve things[br]moving forward within their platform. The 0:08:04.410,0:08:11.340 most kind of tangible outcome from these[br]testimonies were these transparency 0:08:11.340,0:08:20.300 archives that began to appear. And here is[br]a view of what Facebook's add transparency 0:08:20.300,0:08:25.300 archive looks at. When it originally[br]deployed you needed a Facebook account to 0:08:25.300,0:08:29.900 interact with it. Now Facebook has dropped[br]the requirement so anyone with Internet 0:08:29.900,0:08:36.799 access, unless you're censored somehow,[br]can go to this archive and access these 0:08:36.799,0:08:42.440 ads. So the user facing portal for these[br]add transparency archives you type in 0:08:42.440,0:08:48.230 keywords and then it basically does is[br]write a pattern matching on the ad text 0:08:48.230,0:08:53.869 and other parts of the ad and then returns[br]the ads that matches that within here. And 0:08:53.869,0:08:59.230 so then you can see all the political ads[br]that match that particular term within 0:08:59.230,0:09:07.459 here. Facebook began archiving these ads[br]kind of at a large scale starting on May 0:09:07.459,0:09:16.149 7th 2018 by Election Day November 7th[br]2018. There were 1.6 million ads paid for 0:09:16.149,0:09:22.280 by over 85 thousand advertisers within[br]Facebook's platform. Facebook is actually 0:09:22.280,0:09:27.420 fairly broad as to what they included[br]within their political archive. They 0:09:27.420,0:09:32.839 included any ads related to US elections,[br]either federal state or local elections. 0:09:32.839,0:09:36.899 They also included these very important[br]kind of issue ads as we saw when we looked 0:09:36.899,0:09:41.129 at the Russian interference a lot of times[br]the ads didn't mention actual political 0:09:41.129,0:09:46.049 candidates, they mentioned kind of[br]polarizing issues within the U.S. So 0:09:46.049,0:09:50.620 Facebook also included these ads of[br]political national importance. They had a 0:09:50.620,0:09:56.050 list of I think about 13 different[br]criteria is the last of them being values. 0:09:56.050,0:10:00.500 And so it was a fairly encompassing set of[br]ads they tried to include within their 0:10:00.500,0:10:08.019 archive. Along with the text and images[br]or videos of the ad, they also included 0:10:08.019,0:10:14.780 basically ranges of geographic impressions[br]and demographic impressions. So right. 0:10:14.780,0:10:22.300 State level impression information in some[br]kind of ranges and demographic by gender 0:10:22.300,0:10:29.350 and by age kind of bucketed within here.[br]And they did this again for impressions 0:10:29.350,0:10:33.800 and then for they also included some spend[br]information again kind of in ranges so 0:10:33.800,0:10:40.950 they gave ranges of 0 to 99 dollars, a[br]hundred dollars to say like 500 dollars, 0:10:40.950,0:10:48.009 501 dollars, 2000 dollars and so on. And[br]so forth within these buckets one of the 0:10:48.009,0:10:51.960 key pieces of information that they did[br]not release was the targeting information. 0:10:51.960,0:10:55.230 So like I showed you before of those ads[br]they - right - they have that targeting 0:10:55.230,0:11:01.660 information, Facebook does not release[br]that within their transparency archive. 0:11:01.660,0:11:04.210 They have this right. They have right-[br]thay had that user portal where you could 0:11:04.210,0:11:09.660 do the keyword search from within there.[br]However right. I'm I like to do large 0:11:09.660,0:11:16.199 scale data analysis and so I wanted to[br]basically try and collect all of the ads 0:11:16.199,0:11:20.870 within this web portal. And so initially[br]all they had was this keyword search 0:11:20.870,0:11:25.320 portal within here. And so what we did is[br]we compiled kind of a large list of what 0:11:25.320,0:11:31.279 we thought were reasonables of keywords,[br]names of prominent politicians, names of 0:11:31.279,0:11:36.370 states issues within here. And so we tried[br]to compile this long list of keyword 0:11:36.370,0:11:41.379 searches and we began scraping the[br]reporter within here and I'll tell you the 0:11:41.379,0:11:47.840 story of how our scraping efforts went.[br]Now currently they are so off for a API 0:11:47.840,0:11:52.150 it's still keyword based their API and[br]it's restricted by an NDA so I'll kind of 0:11:52.150,0:11:58.070 flesh out the story of how this goes. So[br]at the beginning they, they released this 0:11:58.070,0:12:03.020 kind of towards the end of May the user[br]archive and I played with it and I 0:12:03.020,0:12:08.930 realized that this didn't lend itself well[br]to kind of large scale analysis of these 0:12:08.930,0:12:16.330 ads and so on. I went to my students[br]Secur and Laura and Secur worked kind 0:12:16.330,0:12:21.749 of furiously night and day and within[br]three days he had a workable scraper that 0:12:21.749,0:12:25.889 was able to put in our keywords and then[br]we were able to scrape all the results 0:12:25.889,0:12:32.889 from our keyword within here. And so we[br]ran the scraper for about 2 months and 0:12:32.889,0:12:37.970 then we released a report. Just kind of a[br]very general statistical report and we 0:12:37.970,0:12:45.009 released the data in our github archive at[br]that. After that about 2 weeks later 0:12:45.009,0:12:53.589 Facebook began anti-scraping measures[br]within here. And so, right, this kind of 0:12:53.589,0:12:59.750 hampered our efforts to scrape Facebook's[br]archive. At this point. I'm - I don't want 0:12:59.750,0:13:03.760 to attribute any malice. I don't believe[br]that Facebook was targeting just our 0:13:03.760,0:13:08.419 scraping efforts they were targeting[br]everyones scraping efforts. The 0:13:08.419,0:13:12.910 transparency whether it's wrong or right[br]to block people from collecting data on a 0:13:12.910,0:13:17.139 transparency archive I might kind of[br]quibble with them on that and say they 0:13:17.139,0:13:21.959 might once provide better access to the[br]data within their transparency archive. 0:13:21.959,0:13:26.629 But this was the choice that Facebook made[br]to kind of clamp down on the scraping 0:13:26.629,0:13:32.040 within here. So we tried to fight with[br]them a little bit to - right kind of a can 0:13:32.040,0:13:36.430 mouse game. You know we make some changes[br]to our scraper to avoid their anti 0:13:36.430,0:13:41.939 scraping. They do some things on their end[br]to block our scraper and probably other 0:13:41.939,0:13:47.969 people scrapers that are doing similar[br]things to us as well within here. And so 0:13:47.969,0:13:56.639 this persisted for probably about 2 weeks,[br]and then Facebook basically deployed their 0:13:56.639,0:14:01.779 API within here. However they said right,[br]their API is very limited and still in 0:14:01.779,0:14:07.000 beta at this point. So these were part of[br]the terms and conditions from here. One of 0:14:07.000,0:14:12.769 the ones that I found kind of the most[br]unease (?) is that it limited it only to 0:14:12.769,0:14:18.399 U.S. people so we could essentially only[br]very closely work with U.S. people within 0:14:18.399,0:14:23.371 here and at least it did kind of - it[br]limited the types of people that we could 0:14:23.371,0:14:28.100 work with in here. And so right[br]unfortunately this kind of ruled us out 0:14:28.100,0:14:31.930 from working closely with journalists from[br]you know really good news organizations 0:14:31.930,0:14:36.600 like the Guardian and so like that just[br]happened to have the misfortune of being 0:14:36.600,0:14:46.199 located somewhere outside of the U.S.[br]within here. Maybe the good fortune, yes. 0:14:46.199,0:14:51.110 And then the list of restrictions[br]continue. They also placed the data 0:14:51.110,0:14:57.240 retention on it so we could only retain[br]the data for one year. Again placing data 0:14:57.240,0:15:02.450 retention. So Facebook's data retention on[br]their archive is 7 years within here but 0:15:02.450,0:15:08.430 they're placing a 1 year data retention on[br]the data that we collect from their NDA. 0:15:08.430,0:15:14.430 I'd like to say that - right - we - right[br]- I got this NDA and I lit it on fire, I 0:15:14.430,0:15:23.140 tore it up and we continued to scrape the[br]archive. Within a year. No unfortunately 0:15:23.140,0:15:29.959 it was a hard call to make but right you[br]know there's basically two students and we 0:15:29.959,0:15:33.800 basically had to make a call whether we[br]wanted the data to analyze or whether we 0:15:33.800,0:15:38.040 wanted to spend all of our time kind of[br]faded - fighting with Facebook's anti- 0:15:38.040,0:15:45.581 scraping efforts. And so in the end we did[br]- I did in fact agree to their NDA within 0:15:45.581,0:15:51.319 a year. So the initial data we scraped, we[br]release were still scraping a small amount 0:15:51.319,0:15:56.509 of data that we do release as well from[br]here. But unfortunately at this point any 0:15:56.509,0:16:01.329 of the data that we collected from the NDA[br]we cannot release within here. If anyone 0:16:01.329,0:16:06.999 doesn't want to fight with Facebook and[br]resurrect the crawler within here I would 0:16:06.999,0:16:12.160 be more than happy for that to happen[br]within a year. Unfortunately given our 0:16:12.160,0:16:16.539 engineering constraints it just simply[br]wasn't feasible for us to do that within a 0:16:16.539,0:16:23.879 year. And so the story is a little bit[br]different with Google. So Google's archive 0:16:23.879,0:16:34.300 they began archiving ads on May 31, 2018.[br]By election day they had 45.000 and from 0:16:34.300,0:16:39.170 600 advertisers. Their criteria for[br]introducing advertising was much more 0:16:39.170,0:16:44.379 narrow than Facebook's, so they only[br]released ads related to U.S. federal 0:16:44.379,0:16:49.850 candidates and federal office holders[br]within here. So it is a much more limited 0:16:49.850,0:16:55.440 set of data that Google released within a[br]year. None of the issue ads that Facebook 0:16:55.440,0:17:03.360 released. They didn't release any of the[br]geographic or demographic data by 0:17:03.360,0:17:09.270 impression, they did release ranges of[br]impressions and ranges of spend data, and 0:17:09.270,0:17:14.800 they did release some limited targeting[br]data from here so they released geographic 0:17:14.800,0:17:20.069 and demographic targeting information[br]which Facebook hadn't released in their 0:17:20.069,0:17:25.230 ads. And their data is available through a[br]similar keyword based portal. But they 0:17:25.230,0:17:30.610 also make it available through just a[br]database, if you want to within here. So 0:17:30.610,0:17:34.971 this is what their portal looks like[br]within here. And this is - right - their 0:17:34.971,0:17:40.700 big table, sorry, their big query database[br]that they released from here. And so they 0:17:40.700,0:17:46.020 updated every week within here and you can[br]download it and analyze the data 0:17:46.020,0:17:52.530 relatively easily within here. So the last[br]one to kind of implement their archive was 0:17:52.530,0:18:01.370 Twitter. Twitter began archiving ads on[br]June 27, 2018. The scale of ads and 0:18:01.370,0:18:06.590 Twitter is very small compared to the rest[br]of them. The scale of their ad network in 0:18:06.590,0:18:13.380 general is much smaller than Google and[br]Facebook's. And what they included, it was 0:18:13.380,0:18:18.440 similar to what Google included in terms[br]of - right - only federal candidates 0:18:18.440,0:18:21.770 within here. Kind of closer to the[br]election, they also said that they were 0:18:21.770,0:18:28.340 going to release political issue ads.[br]However, the mechanism of enforcement 0:18:28.340,0:18:32.600 doesn't appear to exist within Twitter's[br]system. There doesn't appear to be anyones 0:18:32.600,0:18:38.980 job it is to actually enforce transparency[br]of ads from here. So we've been kind of 0:18:38.980,0:18:42.910 manually finding accounts and reporting[br]them to Twitter within here and then when 0:18:42.910,0:18:46.940 we manually report them to Twitter,[br]Twitter then includes them and future 0:18:46.940,0:18:51.890 transparency kinds of efforts within here.[br]But it appears like we're basically the 0:18:51.890,0:18:56.880 ones [Damon McCoy] short laughter it's[br]become our job to monitor the Twitter 0:18:56.880,0:19:00.910 accounts and then notify Twitter and then[br]they'll manually kind of deal with it. 0:19:00.910,0:19:04.420 Unfortunately, they still don't appear to[br]have a person that actually manages this 0:19:04.420,0:19:10.210 process internal to Twitter at this point.[br]Twitter does however release the most 0:19:10.210,0:19:15.760 information. So they release exact data[br]not the range data on impressions and 0:19:15.760,0:19:21.411 spend information, also by geographic and[br]demographics and they also include all of 0:19:21.411,0:19:27.730 the targeted information as far as we can[br]tell and their data is available through 0:19:27.730,0:19:31.770 without an account. Basically through[br]their portal and we've been scraping them 0:19:31.770,0:19:35.370 and there's been no problems they haven't[br]blocked us at this point. So we just 0:19:35.370,0:19:40.240 simply scraped their data and then we[br]republish it to github at that point. And 0:19:40.240,0:19:45.510 we've had no problems with Twitter in this[br]way in the scale, their data is so small 0:19:45.510,0:19:51.000 that it's been relatively easy to keep[br]pace with it at this point. And here's 0:19:51.000,0:19:56.520 just a picture of the Twitter transparency[br]archive and again this have a list of all 0:19:56.520,0:20:00.180 the Twitter accounts that they include in[br]their transparency archive. So we can 0:20:00.180,0:20:03.670 monitor this and then we can monitor other[br]people that we know that are politically 0:20:03.670,0:20:08.280 active when we see them doing paid[br]advertising then we can notify Twitter and 0:20:08.280,0:20:12.640 then Twitter will include them in their[br]transparency archive normally within like 0:20:12.640,0:20:18.800 a week, or so of here. And so this is,[br]this is kind of the background that you 0:20:18.800,0:20:24.730 need to understand the transparency[br]archives. So now we have a data set that 0:20:24.730,0:20:31.940 we can begin to analyze within here. For[br]Facebook since it was the keyword driven 0:20:31.940,0:20:38.620 thing at the beginning and it still is, we[br]were able to collect about 80% of the ads 0:20:38.620,0:20:43.400 in Twitter's database from there. The[br]other problem with the API is that it is 0:20:43.400,0:20:51.060 severely rate-limited at this point. I'm[br]talking about 3 to 4 queries per minute 0:20:51.060,0:20:57.930 that we can get through Facebook's API at[br]this point. And so we kind of did our best 0:20:57.930,0:21:02.640 effort to collect as much data as we could[br]from Facebook. About two weeks before the 0:21:02.640,0:21:07.910 election, Facebook began releasing a[br]transparency archive that included 0:21:07.910,0:21:12.430 basically an aggregated list of all the[br]advertisers and how many ads they have and 0:21:12.430,0:21:17.190 how much spent and this is how we can tell[br]that we got about 80% of the ads from 0:21:17.190,0:21:22.430 Facebook's archive based on this within[br]here. And the nice thing about the 0:21:22.430,0:21:26.000 transparency report is that we could go[br]back and now that we know we're missing we 0:21:26.000,0:21:33.030 could readjust our usage of the API and so[br]now we have virtually 100% coverage of 0:21:33.030,0:21:39.680 Facebook going forward within here.[br]Twitter - right - we could collect 100% of 0:21:39.680,0:21:45.600 their data. And again we've republished[br]the SOL (?) in an easier to process kind 0:21:45.600,0:21:50.410 of form. Google again - right - we have a[br]100% of their ads because they're all in 0:21:50.410,0:21:55.370 the big query database. However when we[br]started analyzing the data we noticed that 0:21:55.370,0:22:00.100 for a lot of the ads we're missing the[br]actual content, the images and text of the 0:22:00.100,0:22:07.050 ad. It turns out that for Google's ad[br]network if the ad was originally purchased 0:22:07.050,0:22:10.910 through a third party advertiser and then[br]run on one of Google's properties the 0:22:10.910,0:22:15.800 content of the ad won't be archived within[br]your system. This is unfortunately a big 0:22:15.800,0:22:21.290 loophole. So - right - if you're if you're[br]running a kind of malicious misinformation 0:22:21.290,0:22:26.470 thing, you can easily unfortunately[br]circumvent Google's archive at least from 0:22:26.470,0:22:31.320 archiving your content by simply just[br]paying for it by a third party within 0:22:31.320,0:22:35.290 here. It's unclear whether this is a[br]policy limitation or whether this is the 0:22:35.290,0:22:40.270 technical limitation on Google's part, but[br]the outcome is that we only have the 0:22:40.270,0:22:45.740 content for about 70% of Google's ads that[br]were paid for directly on Google's 0:22:45.740,0:22:52.410 platform and within here. So one of the[br]first things that we want to do is kind of 0:22:52.410,0:22:57.380 add some semantic meaning to these ads a[br]kind of large scale. And so we played 0:22:57.380,0:23:02.490 around with a few techniques, some fancy[br]kinds of natural language processing and 0:23:02.490,0:23:05.820 things like that. But we found that[br]there's actually a really fairly simple 0:23:05.820,0:23:11.570 and effective way of categorizing kind of[br]the intent of the ad, and that's that most 0:23:11.570,0:23:16.780 of these ads have a URL of some kind and[br]a lot easier or else just point back to 0:23:16.780,0:23:20.160 like third party services like if you're[br]holding some kind of event you're going to 0:23:20.160,0:23:23.590 coordinate it with like Everbright or[br]something that if you're seeking 0:23:23.590,0:23:27.640 donations, if you're a Democrat you're[br]going to use this third party Paron 0:23:27.640,0:23:31.320 processor they're called Act Blue, if[br]you're Republican there's like two or 0:23:31.320,0:23:35.620 three payment processors that you're going[br]to use for this. So we could simply just 0:23:35.620,0:23:39.610 look at these really prominent URLs that[br]occur a lot of times and just kind of 0:23:39.610,0:23:45.720 manually tag what is the purpose for this.[br]And by doing this we can tag ads as either 0:23:45.720,0:23:50.550 just purely informational that they wanted[br]just kind of get some kind of message 0:23:50.550,0:23:55.790 about the candidate either positive or[br]negative out their connection ads that are 0:23:55.790,0:24:00.620 seeking contact information like people's[br]e-mail addresses, phone numbers, names and 0:24:00.620,0:24:04.650 things like that. Presumably so they can -[br]you know - either get them to volunteer or 0:24:04.650,0:24:10.180 donate money in the future for the[br]campaign. There's move ads that are either 0:24:10.180,0:24:15.060 they're trying to get people to vote or to[br]attend some kind of rally or to volunteer 0:24:15.060,0:24:19.280 or something like that and then right[br]there's donation ads. And then finally 0:24:19.280,0:24:24.360 there's kind of commercial ads. These are[br]things either they are selling products 0:24:24.360,0:24:33.350 that are kind of directly critical nature[br]like a bobble head of some candidate or 0:24:33.350,0:24:38.530 they might be like solar panels which have[br]tax credits in the U.S. and things like 0:24:38.530,0:24:42.750 that. So there's some kind of commercial[br]good that's linked somehow to some 0:24:42.750,0:24:50.070 political messaging within here. So we use[br]this method and we were able to categorize 0:24:50.070,0:24:55.770 about 70% of the ads, we took a random[br]sample of them, we manually checked what 0:24:55.770,0:25:01.230 we were doing and we found it was pretty[br]accurate. About 96% accuracy we got using 0:25:01.230,0:25:07.790 this methods. The other thing that we did[br]is for the top advertisers, so for 0:25:07.790,0:25:13.760 Facebook the top 75% of the advertisers,[br]for Google the top 80% of the advertisers, 0:25:13.760,0:25:17.960 in terms of the money spent by the[br]advertiser. We went in and we manually 0:25:17.960,0:25:21.610 categorized what was this type of[br]organization. Was it a political 0:25:21.610,0:25:27.140 candidate, was it what's called a[br]political action committee, so these are 0:25:27.140,0:25:32.520 the PACs within the U.S., was it a[br]union, was that a for profit operation, 0:25:32.520,0:25:37.160 was it a non-profit operation. So on and[br]so forth and so we wrote like some regular 0:25:37.160,0:25:42.370 expressions that got us most of the way[br]there. Most of them have fairly uniform 0:25:42.370,0:25:46.790 naming conventions and for the ones that[br]we couldn't kind of automatically classify 0:25:46.790,0:25:51.490 we just did it manually, within a year.[br]And then since Twitter had so few 0:25:51.490,0:25:58.060 advertisers, we just did these all,[br]manually, within here. Now, right, we can 0:25:58.060,0:26:01.210 start to do some analysis. So the first[br]analysis that we did, the easiest 0:26:01.210,0:26:08.160 analysis, was we looked at the size of the[br]ads. And the thing that pops out is that 0:26:08.160,0:26:13.250 the majority of ads on all the platforms[br]are between $0 and $100 dollars. So these 0:26:13.250,0:26:18.130 are what are normally called the micro[br]targeted ads, that are typically seen by 0:26:18.130,0:26:22.770 less than a 1000 people within a year. So[br]these are very short lived, narrowly 0:26:22.770,0:26:28.270 targeted ads that are kind of honing in on[br]a specific demographic within here. So 0:26:28.270,0:26:32.050 these are these micro targeted ads within[br]here. And it appears, right, that the 0:26:32.050,0:26:37.280 majority of ads, especially on Facebook's[br]platform 82% of them, are of this micro 0:26:37.280,0:26:41.970 targeted kind of ilk within here. So it's[br]kind of confirms the reporting that people 0:26:41.970,0:26:46.820 had of this kind of trend of[br]microtargeting within political 0:26:46.820,0:26:52.480 advertising. The other thing, based on our[br]categorization we can look at how the 0:26:52.480,0:26:58.070 different platforms were used from within[br]here. The problem with these numbers is 0:26:58.070,0:27:04.220 that there was different inclusion[br]criteria within each of these databases. 0:27:04.220,0:27:11.600 And then right. Finally, we can kind of[br]look at the different types of advertisers 0:27:11.600,0:27:15.060 on these kind of platforms. And again it's[br]hard to read too much into these numbers 0:27:15.060,0:27:19.260 because again, right, Facebook included[br]much more of the commercial stuff. So 0:27:19.260,0:27:24.430 we're going to see a lot more of the[br]commercial stuff within here. And the the 0:27:24.430,0:27:30.750 final analysis of the entire data set that[br]we did was looking at right kind of 0:27:30.750,0:27:35.490 basically the ramp up to the election. We[br]cut this off in late October. This 0:27:35.490,0:27:40.680 analysis was done for a paper. So the due[br]date of the paper was ironically November 0:27:40.680,0:27:46.570 6 within here. So we cut it off a few[br]weeks later and we haven't regenerated the 0:27:46.570,0:27:49.730 contents since then. The one thing that[br]you can see is at the top there is that 0:27:49.730,0:27:57.570 green Spike. That's kind of the move ads.[br]So right, closer into the election the 0:27:57.570,0:28:03.760 campaigns were kind of doing sophisticated[br]get out the vote kinds of ads, within 0:28:03.760,0:28:07.470 here. So there were really sophisticated[br]kind of microtargeted ads that get out the 0:28:07.470,0:28:12.050 vote. Where like, it was almost kind of[br]spooky where like they knew where the 0:28:12.050,0:28:16.250 person lived that they were targeting and[br]so they gave them like directions on how 0:28:16.250,0:28:20.400 to get from where they live to their[br]nearest polling place within here. So 0:28:20.400,0:28:23.630 there are these really sophisticated kind[br]of get out the vote efforts that were 0:28:23.630,0:28:29.010 being run online, within here, towards the[br]end of the campaign. To kind of give you 0:28:29.010,0:28:33.780 more of a kind of apples to apples[br]comparison of these different ad 0:28:33.780,0:28:39.950 platforms, we also did some analysis kind[br]of narrowing each of the different 0:28:39.950,0:28:45.200 advertiser types to the ones that were[br]made transparent by all three platforms, 0:28:45.200,0:28:50.130 which were the federal candidates only.[br]And so this can give you some idea of kind 0:28:50.130,0:28:56.420 of a scale of these things. And we can see[br]that when we narrow it here we can still 0:28:56.420,0:29:00.670 see that Facebook has a lot more[br]advertisers and a lot more ads compared to 0:29:00.670,0:29:06.060 Google. However the spending numbers are[br]kind of comparable here. For Facebook 0:29:06.060,0:29:11.430 impressions and spends are ranges, that's[br]all that Facebook releases. For Google the 0:29:11.430,0:29:16.340 impression data is ranges, however we can[br]get exact spend data, because Google 0:29:16.340,0:29:21.940 basically released a weekly report of[br]exact spend numbers, aggregated by the 0:29:21.940,0:29:26.770 different advertisers, with here. So we[br]can use that, to get an exact number of 0:29:26.770,0:29:30.980 the spend. And again, right, Twitter's[br]numbers are much smaller in terms of 0:29:30.980,0:29:36.590 everything, within here. And we redid some[br]of our analysis to just see whether our 0:29:36.590,0:29:42.480 effects were simply a distortion based on[br]what was included in the archives. So 0:29:42.480,0:29:47.730 right we redid our ad size analysis and[br]even when we limit it to federal 0:29:47.730,0:29:53.370 candidates we can see this still holds,[br]that a lot of the ads on Facebook are so 0:29:53.370,0:29:57.060 these micro targeted ads. And they are[br]still micro targeted ads on the other 0:29:57.060,0:30:02.170 platforms, as well, within here. And right[br]this microtargeting of course varies 0:30:02.170,0:30:09.520 depending on the advertiser. So you take[br]someone like President Trump and he does a 0:30:09.520,0:30:16.020 lot of microtargeting. So almost all of[br]his ads probably about 90%, 95% of his ads 0:30:16.020,0:30:22.020 are micro targeted, within here. You look[br]at other candidates and they do much less 0:30:22.020,0:30:26.380 microtargeting, within here. So this is[br]definitely different strategies are used 0:30:26.380,0:30:30.330 by different advertisers, within here. But[br]when we look at it in aggregate, it still 0:30:30.330,0:30:37.650 appears that microtargeting is a very[br]popular strategy across advertisers. We 0:30:37.650,0:30:43.910 can also, right, look at some of the spend[br]type by ad type and this kind of shows you 0:30:43.910,0:30:49.870 a little bit how the different platforms[br]are used, within here. So Facebook's 0:30:49.870,0:30:54.420 platform looks like it's a little bit more[br]kind of informationally, it's still used a 0:30:54.420,0:30:59.120 lot for donations, whereas Google's[br]platform is used a lot more for donations 0:30:59.120,0:31:04.470 and a lot less for a kind of informational[br]ads and to connect within here. It's 0:31:04.470,0:31:07.770 really kind of hard to read anything into[br]Twitter's data because it's such a small 0:31:07.770,0:31:11.990 set of data. But from the data that we do[br]have it looks like there's a lot more kind 0:31:11.990,0:31:18.850 of collection of e-mails and things like[br]that, within here. The other analysis that 0:31:18.850,0:31:24.610 we did on the federal candidate ads was to[br]look at, that for Facebook in particular 0:31:24.610,0:31:30.270 right, we have the geographic impression[br]data from here. So we can effectively look 0:31:30.270,0:31:36.940 at how many states were targeted by each[br]ad with a Facebook advertiser. And the 0:31:36.940,0:31:40.160 interesting thing here is that right.[br]There was no presidential election. So 0:31:40.160,0:31:43.980 basically all these campaigns were[br]operating in one state. So their 0:31:43.980,0:31:50.190 constituents for all these elections were[br]essentially in one state, within here. And 0:31:50.190,0:31:54.430 so if you look at the inform ads, right,[br]most of those shown a very small number of 0:31:54.430,0:31:58.910 states. So the inform ads are mostly being[br]shown to the constituents that are 0:31:58.910,0:32:02.970 actually voting for that candidate.[br]However, if we look at that bottom line, 0:32:02.970,0:32:06.930 the kind of gold line, those are the[br]donation ads. And we can see that they 0:32:06.930,0:32:11.621 were fundraising in many more states[br]outside of their constituency, within 0:32:11.621,0:32:16.990 here. So FiveThirtyEight did an[br]interesting analysis of one particualar 0:32:16.990,0:32:22.410 candidate, Beto O'Rourke. He was a[br]candidate for Senate in Texas, Texas is a 0:32:22.410,0:32:27.740 very conservative state in the U.S., and[br]he did surprisingly well, within here. And 0:32:27.740,0:32:31.960 he kind of embraced online advertising and[br]online donations seeking, were kind of 0:32:31.960,0:32:37.530 cornerstones of his election, within here.[br]And so FiveThirtyEight did an analysis of 0:32:37.530,0:32:42.270 his donation records in the U.S., at the[br]federal level. All donations to candidates 0:32:42.270,0:32:46.480 have to be reported to the Federal[br]Election Committee. So this is all in a 0:32:46.480,0:32:49.410 database for the Federal Election[br]Committee the FiveThirtyEight people do 0:32:49.410,0:32:53.880 analysis And they kind of confirmed what[br]we saw on the donation ads, that he was 0:32:53.880,0:33:01.140 getting about 52% of his donations from[br]Texas and 48% from other states, primarily 0:33:01.140,0:33:08.080 kind of from coastal states that tended to[br]lean more liberal, like New York, 0:33:08.080,0:33:12.900 California, Washington and places like[br]that, was where he was donations seeking. 0:33:12.900,0:33:19.020 So this appears to be a very effective way[br]of getting small dollar donations kind of 0:33:19.020,0:33:24.760 throughout the U.S. within here, through[br]this online advertising. The last thing that 0:33:24.760,0:33:30.000 I'm going to talk about is the ad[br]targeting. Facebook didn't directly 0:33:30.000,0:33:35.510 release the ad targeting. However, we were[br]lucky enough and Pro Publica made a 0:33:35.510,0:33:41.090 browser plugin, that people can install in[br]their browser, and that's browser plugin 0:33:41.090,0:33:46.800 would identify what it thought was[br]political ads, based on a machine learning 0:33:46.800,0:33:54.460 algorithm. And for the political ads it[br]would upload these to their server along 0:33:54.460,0:33:58.840 with the targeting information. So, for[br]those of you with a facebook account, if 0:33:58.840,0:34:02.350 you're seeing ads you can actually click[br]on that ad kind of in the upper corner of 0:34:02.350,0:34:09.449 the ad and you can see why is this ad[br]targeting me, within here. And Facebook 0:34:09.449,0:34:14.919 will tell you a little bit, not all of why[br]you were targeted for this particular ad. 0:34:14.919,0:34:18.300 They will essentially show you the two[br]broadest categories of why you were 0:34:18.300,0:34:24.349 targeted for this particular ad, through[br]this feature they've added to their 0:34:24.349,0:34:27.440 platform. And this is this is actually[br]kind of interesting, this is something 0:34:27.440,0:34:32.559 that if you're a user of Facebook, I[br]highly recommend that you do. Because I 0:34:32.559,0:34:38.710 started doing it, and it was kind of eye[br]opening, as to the level of targeting that 0:34:38.710,0:34:42.809 was being done in terms of advertising.[br]That's kind of one thing, that we've 0:34:42.809,0:34:48.280 definitely learned from this is that when[br]you're seeing an ad, oftentimes there's a 0:34:48.280,0:34:53.409 very specific reason as to why you're[br]seeing that particular ad, within here. 0:34:53.409,0:34:57.730 And so we felt that it was very important[br]to, as much as we could, understand this 0:34:57.730,0:35:03.009 targeting that was going on within[br]Facebook's platform. So Pro Publica had 0:35:03.009,0:35:09.539 this browser plug-in and they had this[br]data set that anyone can analyze, with 0:35:09.539,0:35:15.460 here. So if you do have Facebook and[br]you're located within the US I would 0:35:15.460,0:35:19.150 highly recommend that you install this[br]plug-in, because it helps us to kind of 0:35:19.150,0:35:25.070 understand the political advertising in[br]terms of the targeting, within here. So we 0:35:25.070,0:35:29.690 took ProPublica's data set and we[br]effectively joined it with Facebook's add 0:35:29.690,0:35:34.640 transparency archive, within here. This[br]required us to scrape Facebook's ad 0:35:34.640,0:35:38.890 archive, because we needed the ad ID and[br]this is something that they don't expose 0:35:38.890,0:35:44.069 to their API, currently, within here.[br]However, they do expose it through their 0:35:44.069,0:35:49.339 user portal, within here. So we scraped[br]their user portal to join the specific ads 0:35:49.339,0:35:55.109 that were in the ProPublica data set to the[br]archive dataset, within here. And we were able 0:35:55.109,0:35:59.829 to join about 75% of the ads from here.[br]There were a lot of ads that were 0:35:59.829,0:36:05.859 collected by the ProPublica data set, that[br]just simply weren't archived by Facebook's 0:36:05.859,0:36:10.210 transparency archive. It misses things,[br]within here. It's imperfect as to how it 0:36:10.210,0:36:13.910 does things. And this would be another[br]interesting analysis to do, to understand 0:36:13.910,0:36:18.210 what is Facebook missing in their ad[br]transparency archive and this ProPublica 0:36:18.210,0:36:23.200 data set can allow you to somewhat do[br]this, although through bias of who 0:36:23.200,0:36:29.400 installs the Pro Publica plug-in in the[br]first place. So we we join these few data 0:36:29.400,0:36:33.609 sets, again with the caveat that the[br]ProPublica data set is, right, it's 0:36:33.609,0:36:38.279 obviously biased by the set of people that[br]installed it, which are probably not going 0:36:38.279,0:36:42.770 to be a normal representative set of[br]Facebook users, within here. But 0:36:42.770,0:36:46.230 unfortunately, it's the best thing that we[br]have in terms of a data set that releases 0:36:46.230,0:36:51.769 the targeting information, within here.[br]And so we collapse into three different 0:36:51.769,0:36:57.579 categorizations of targeting, within here.[br]I'll just quickly explain Facebook's ad 0:36:57.579,0:37:02.089 targeting platform for people that don't[br]know about it. So one way to target ads 0:37:02.089,0:37:07.990 is, right, through interest or segments,[br]right, age segments, gender segments or 0:37:07.990,0:37:13.329 interests like I showed you before, within[br]here. So this is one way to target ads 0:37:13.329,0:37:19.319 within Facebook's platform. Another way to[br]target ads is through uploading lists of 0:37:19.319,0:37:23.069 information. So you can upload lists of[br]people's phone numbers, people's email 0:37:23.069,0:37:27.910 addresses or their names. And then when[br]you upload this list Facebook will find 0:37:27.910,0:37:32.440 those profiles within their database, so[br]they'll basically join those emails with 0:37:32.440,0:37:36.979 the emails that were entered by the users[br]accounts, and then they'll target these 0:37:36.979,0:37:41.210 people. So they'll create what they call[br]an audience of these people through this 0:37:41.210,0:37:45.020 personally identifiable information and[br]then they'll target them, through this 0:37:45.020,0:37:50.309 method. The final kind of major form of[br]targeting that Facebook offers is through 0:37:50.309,0:37:53.969 what they call these lookalike audiences.[br]So this is where you can upload PII 0:37:53.969,0:37:59.349 information, like email addresses, phone[br]numbers, names. Facebook will link them to 0:37:59.349,0:38:03.650 their accounts and then they'll look at[br]kind of the interests and things that 0:38:03.650,0:38:06.600 these users and then they'll find you[br]other users, not these users, but other 0:38:06.600,0:38:11.950 users, that have a similar kind of profile[br]to these users within here. So these are 0:38:11.950,0:38:17.900 the lookalike audiences that Facebook[br]offers within their platform. And so we 0:38:17.900,0:38:23.349 categorized it by this and again by[br]advertiser type, within here. So the thing 0:38:23.349,0:38:29.500 that stands out is, right, is that the for[br]profit companies are doing a lot of 0:38:29.500,0:38:33.559 targeting based on interests and segments.[br]So they probably don't know who their 0:38:33.559,0:38:38.819 people that they want a message to are and[br]they're doing it mostly by interests and 0:38:38.819,0:38:44.549 segment. Whereas when you look at the PACs[br]and the political candidates they have 0:38:44.549,0:38:49.270 lists. So they have a lot of lists of[br]people's you know email addresses, phone 0:38:49.270,0:38:53.579 numbers, names, of things like this. And[br]they're plugging these into Facebook's 0:38:53.579,0:38:59.159 system. And this is how they're targeting[br]a lot of people, within here, is through 0:38:59.159,0:39:04.440 these lists. And this was expected, but[br]it's interesting to kind of quantify how 0:39:04.440,0:39:10.069 much of this is happening. And then the[br]lookalike audiences are also being used, a 0:39:10.069,0:39:14.000 good deal by everyone within here. And[br]this kind of makes sense, right? Because 0:39:14.000,0:39:17.290 if you have a list of people then you[br]advertise to them but then right you have 0:39:17.290,0:39:23.210 this lookalike audience of people that are[br]similar to them that are also perhaps good 0:39:23.210,0:39:32.640 people to advertise to, as well, within[br]here. The other thing we can do is break 0:39:32.640,0:39:37.670 this down by the intent of the ad here,[br]and this shows the difference even more 0:39:37.670,0:39:43.520 starkly, of the difference in behavior[br]between the commercial people and the 0:39:43.520,0:39:47.250 noncommercial people. The commercial[br]people are targeting mostly based on 0:39:47.250,0:39:52.640 interest, whereas the other people that[br]are, say, looking to connect with people, 0:39:52.640,0:39:55.789 they're the ones that are using the most[br]lookalike audiences. And this makes 0:39:55.789,0:39:59.359 perfect sense because right the connection[br]ads are there to get people's e-mails, 0:39:59.359,0:40:02.930 addresses, phone numbers, names and things[br]like that. So when you use the look like 0:40:02.930,0:40:08.220 audiences then you can, right, generate[br]more lists of people they'll convert for 0:40:08.220,0:40:13.249 whatever you want and then you can[br]retarget them with the direct lists 0:40:13.249,0:40:18.809 targeted ads, later on. So this all makes[br]pretty good sense when you look at how 0:40:18.809,0:40:23.310 this is behaving, from here. But again[br]it's interesting, right, kind of make this 0:40:23.310,0:40:28.900 transparent for people to understand how[br]targeting is happening within the U.S. 0:40:28.900,0:40:33.799 political advertising sphere, within here.[br]So these were pretty much the two major 0:40:33.799,0:40:38.789 analyses that we did in terms of[br]targeting, within here. The final part and 0:40:38.789,0:40:43.989 the part that kind of makes the juiciest[br]of stories is kind of the more dubious 0:40:43.989,0:40:48.259 advertisers that are advertising within[br]these platforms in terms of political 0:40:48.259,0:40:53.340 advertising. So we kind of call these more[br]politely kind of "new types of 0:40:53.340,0:40:57.951 advertising", within here. The first type[br]is one that you would you would pretty 0:40:57.951,0:41:02.910 much expect, so this is this corporate[br]astroturfing kind of stuff, that's going 0:41:02.910,0:41:09.290 on, within here. We see these ads for[br]assistance for tobacco rights. And I 0:41:09.290,0:41:13.089 pretty much expected that you look up this[br]group and it's probably going to be some 0:41:13.089,0:41:17.910 you know quasi nonprofit that's supported[br]by some industry money from the tobacco 0:41:17.910,0:41:21.510 lobbyists, or something like that. That's[br]pretty much what I expected to see when I 0:41:21.510,0:41:28.770 saw these ads. You go to this website and[br]it's actually pretty honest as to what it 0:41:28.770,0:41:32.809 does. This is probably because right of[br]all the lawsuits and regulations around 0:41:32.809,0:41:36.759 tobacco in the U.S. in advertising. But[br]the website clearly states, right, that 0:41:36.759,0:41:43.130 it's operated by Philip Morris, the[br]tobacco company, within here. And this 0:41:43.130,0:41:46.530 actually isn't a legal entity, this[br]citizens for tobacco rights. Is just 0:41:46.530,0:41:50.970 simply a website that's been stood up,[br]that's owned and operated by Philip 0:41:50.970,0:41:55.390 Morris, as far as we can tell, within[br]here. And this gets to a big problem with 0:41:55.390,0:42:00.819 Facebook's transparency archive, which is[br]that they don't actually vet that 0:42:00.819,0:42:05.140 disclaimer string of the sponsor, within[br]here. So pretty much anyone can type 0:42:05.140,0:42:11.880 anything that they want within that[br]disclaimer string and Facebook will allow 0:42:11.880,0:42:15.730 you to run it. We've tested it and as far[br]as we can tell, you can't say that you're 0:42:15.730,0:42:19.620 from Facebook, Instagram or that you're[br]Mark Zuckerberg, they'll block that. But 0:42:19.620,0:42:24.011 pretty much anything else that you type in[br]there they'll allow that ad to run, within 0:42:24.011,0:42:30.440 here, with no vetting. So we discovered[br]this, we politely, privately mentioned it 0:42:30.440,0:42:36.119 to Facebook. Some reporters kind of[br]trolled Facebook within here and so there 0:42:36.119,0:42:41.421 was a reporter that trolled Facebook and[br]opened up ads for all the senators, within 0:42:41.421,0:42:45.440 here, on Facebook. And of course Facebook[br]approved them all, from within here, and 0:42:45.440,0:42:49.849 they they did some other things to troll[br]Facebook where they insert some other 0:42:49.849,0:42:54.799 advertisements, within here. But the point[br]is, that that disclaimer string is not 0:42:54.799,0:43:00.500 vetted within here. Google actually does[br]that disclaimer string within there, so 0:43:00.500,0:43:05.079 they require either a tax ID number or a[br]federal election committee I.D. number and 0:43:05.079,0:43:11.539 they actually do vet it and they publish[br]that tax I.D. number or federal election 0:43:11.539,0:43:14.590 I.D. number along with the disclaimer[br]string, within here. Which makes it really 0:43:14.590,0:43:18.759 easy to track down advertising on Google.[br]On Facebook, because right they can 0:43:18.759,0:43:22.339 basically type in whatever they want in[br]the disclaimer string, it makes it much 0:43:22.339,0:43:25.930 more difficult to actually link these[br]advertisers. And sometimes just outright 0:43:25.930,0:43:33.910 impossible, if the disclaimer string is[br]made up or just too mutilated in some way 0:43:33.910,0:43:39.989 or form, within here. So this is[br]definitely a problem, where we have these 0:43:39.989,0:43:43.530 lobbyist organizations, or in this case[br]not even lobbyist organizations, just 0:43:43.530,0:43:49.519 industry, that can effectively lie about[br]who's paying for this ad in Facebook's 0:43:49.519,0:43:55.849 platform. The other thing we found were[br]what is now kind of being called these 0:43:55.849,0:44:03.029 junk media outlets. So this is for profit[br]outlets that are claiming that they're 0:44:03.029,0:44:08.410 doing kind of news operations. But right.[br]It's not really traditional kind of 0:44:08.410,0:44:12.780 reporting journalistic things. It's more[br]just kind of propaganda messaging, within 0:44:12.780,0:44:18.769 here. So there is this group called New[br]American Media Group LLC. They also ran 0:44:18.769,0:44:24.641 the name of New Democracy, or sorry[br]Democracy Now was their other name, within 0:44:24.641,0:44:30.529 here. And so they ran this, within here.[br]We tracked down these LLCs and they were 0:44:30.529,0:44:35.750 just simply shell companies and that kind[br]of led to nowhere, within here. We worked 0:44:35.750,0:44:41.500 with a journalist from The Atlantic that[br]actually did a lot of digging into the 0:44:41.500,0:44:48.650 shell companies. And he was able to,[br]through his basically investigation, link 0:44:48.650,0:44:53.640 these companies to the actual entity that[br]created these shell companies and was 0:44:53.640,0:45:00.579 running these ads, within here. And so[br]when we did our analysis of this, this 0:45:00.579,0:45:07.220 company basically this third party[br]advertising company was creating these. 0:45:07.220,0:45:12.710 They're meant to look like kind of[br]grassroots kind of organizations. There 0:45:12.710,0:45:17.170 were, a lot of them were kind of targeted[br]at more conservatively leaning groups, but 0:45:17.170,0:45:22.480 then they would bombard them with liberal[br]messaging, within here. So they would 0:45:22.480,0:45:26.169 create these fake communities that looked[br]more conservative. And then once they 0:45:26.169,0:45:30.390 attract an audience they would bombard[br]them with these liberal kinds of 0:45:30.390,0:45:35.589 messaging, within here. And so this[br]particular company is based in Colorado. 0:45:35.589,0:45:40.849 It's called MOTIVE AI. Apparently, it's[br]hoping to become the Cambridge Analytica 0:45:40.849,0:45:48.510 of the liberal side. I don't know if[br]that's something to aspire to or not. Some 0:45:48.510,0:45:52.450 other journalists also did some digging,[br]within here. There was some journalists 0:45:52.450,0:45:57.269 from ProPublica that did some digging,[br]within here. They found more of this 0:45:57.269,0:46:03.020 astroturfing by political lobbyist groups[br]and things like that. Big oil insurance 0:46:03.020,0:46:08.549 companies, again when they advertised on[br]say Google's platform they would be honest 0:46:08.549,0:46:11.809 about their disclaimer string, and then[br]when they advertised on Facebook's 0:46:11.809,0:46:16.319 platform they would often kind of[br]obfuscate their disclaimers string, to 0:46:16.319,0:46:23.430 make it more difficult to link them[br]together. And so they unmasked a whole 0:46:23.430,0:46:27.869 bunch of these other kinds of junk media[br]operations, as well, that were kind of 0:46:27.869,0:46:33.499 spreading propaganda, within here. I'm[br]picking on Facebook a lot. Again Google 0:46:33.499,0:46:38.400 does vet the tax I.D. number of these[br]people, but you see something like, right, 0:46:38.400,0:46:44.560 this DIGICO LLC that paid for some ads. So[br]you track this down, and this is again one 0:46:44.560,0:46:48.051 of these third party advertising agency.[br]It's easy to track down because of the tax 0:46:48.051,0:46:52.030 I.D. number. But it still doesn't actually[br]tell you who paid for the ad. It just 0:46:52.030,0:46:56.000 tells you the third party that, right, it[br]presumably was paid on behalf of someone 0:46:56.000,0:47:00.210 else to run these ads, from here. So this[br]is a big problem with these disclaimers 0:47:00.210,0:47:04.170 strings, that oftentimes they don't[br]actually identify the person that's paying 0:47:04.170,0:47:10.849 for the ad. So to kind of wrap this up,[br]within here, after our kind of experiences 0:47:10.849,0:47:15.150 looking at these transparency archives I[br]would say they're fairly adequate to 0:47:15.150,0:47:20.339 understand good actors. So we could fairly[br]well understand how good political 0:47:20.339,0:47:24.089 advertisers were behaving in Facebook's[br]platform. However, right, for the bad 0:47:24.089,0:47:28.829 advertisers, we probably missed a lot of[br]them because they could just simply type 0:47:28.829,0:47:33.119 in lots of different disclaimers strings[br]and easily avoid our analysis, at this 0:47:33.119,0:47:39.490 point. None of these current archives have[br]it just right yet. All of them have 0:47:39.490,0:47:44.190 issues, right. Facebook isn't providing[br]good access to their data. They're not 0:47:44.190,0:47:49.069 releasing targeting information. Google is[br]missing 30% the content because of third 0:47:49.069,0:47:56.499 parties using their advertising system.[br]They're not releasing spend and impression 0:47:56.499,0:48:01.950 information based on demographics, within[br]there. Twitter just simply hasn't hired 0:48:01.950,0:48:08.060 someone to enforce the policy of[br]transparency, well, within here. And 0:48:08.060,0:48:12.880 unfortunately our experience throughout[br]this process has been that these companies 0:48:12.880,0:48:17.569 are oftentimes reactive, instead of[br]proactive, within here. Which means that, 0:48:17.569,0:48:20.660 right, we have to continuously put[br]pressure on them, in order for them to 0:48:20.660,0:48:26.000 kind of improve these archives, within[br]here. So this is unfortunately kind of the 0:48:26.000,0:48:30.390 state that we're in, within here. And I'm[br]sure, one thing that I really want to give 0:48:30.390,0:48:33.430 a shoutout, is right there's people at[br]these companies that are actually trying 0:48:33.430,0:48:38.739 to build these transparency archives. And[br]I want to give them a lot of credit for 0:48:38.739,0:48:43.589 taking on this task, that's probably not[br]well rewarded within their companies, of 0:48:43.589,0:48:47.420 building these transparency archives,[br]within here. And so my hope is that by 0:48:47.420,0:48:52.690 applying pressure we can get them more[br]support to kind of get more resources and 0:48:52.690,0:48:58.891 be able to make more transparent, within[br]their companies, as well. Because I hope 0:48:58.891,0:49:04.960 that, right, this puts us in better shape[br]to understand the 2018 elections, but 2020 0:49:04.960,0:49:09.589 is another presidential election and my[br]hope is that we'll continue the improved 0:49:09.589,0:49:13.920 these archives, so that we'd be in a much[br]better position to understand both the 0:49:13.920,0:49:19.839 good and the bad advertisers by 2020, with[br]here. However this is going to take 0:49:19.839,0:49:25.299 probably regulatory pressure, legal[br]pressure, pressure by technologists and 0:49:25.299,0:49:31.670 things like this to improve these[br]archives, at this point. So with that, 0:49:31.670,0:49:35.521 again, I have my collaborators, that[br]aren't here on the stage, but they 0:49:35.521,0:49:40.809 definitely did a lot of the heavy lifting[br]to make this happen, within here. And 0:49:40.809,0:49:45.340 again all of our tools and most of our[br]data except for the Facebook data, that's 0:49:45.340,0:49:51.262 under NDA, is available through our[br]GitHub, there. And so with that I will 0:49:51.262,0:49:52.576 open it up to questions.[br] 0:49:52.576,0:50:03.836 applause 0:50:03.836,0:50:04.970 Herald: Thank you so much Damon. I know 0:50:04.970,0:50:11.530 that there are a few questions among the[br]audience. So, microphone 6 please. 0:50:11.530,0:50:16.480 Question: So [Name] on the IRC is asking[br]"Have you looked at links between the 0:50:16.480,0:50:21.620 advertisers and do they use the same[br]images or text for instance?". 0:50:21.620,0:50:24.200 Answer: This is a really good question.[br]This is actually one of the analysis that 0:50:24.200,0:50:27.539 we're currently doing. So we're starting[br]with the text, because that's obviously 0:50:27.539,0:50:32.360 the easiest. But we're also exploring some[br]image clustering algorithms, as well. To 0:50:32.360,0:50:37.099 cluster the advertisers across platforms[br]and also within platform because we're 0:50:37.099,0:50:40.299 finding a lot where, you know, they create[br]multiple shell companies, where they just 0:50:40.299,0:50:44.089 lie about their disclaimers and so this is[br]definitely something that we're focusing 0:50:44.089,0:50:49.049 on, is better clustering of the[br]advertisers. Because like that group 0:50:49.049,0:50:53.180 MOTIVE AI, even though they created the[br]different LLCs they were running the same 0:50:53.180,0:50:59.180 images and videos across their different[br]LLC shell companies. 0:50:59.180,0:51:02.090 Herald: Great thank you. Please if you[br]have any questions, queue up by the 0:51:02.090,0:51:07.820 microphones. Microphone number 1 please.[br]Question: Hi, Oliver Moldenhauer? Thanks a 0:51:07.820,0:51:13.390 lot for the talk. Definitely one of the[br]best I've seen here so far. Two questions. 0:51:13.390,0:51:18.970 A: Why do those transparency archives[br]exist? Was there some law or political 0:51:18.970,0:51:24.959 process around that? And B: As we are[br]nearing the European election next year, 0:51:24.959,0:51:30.680 what kind of data is available for Europe?[br]Answer: That are both good questions. 0:51:30.680,0:51:33.979 Again I'm not intern in one of these[br]companies, so I can just speculate as to 0:51:33.979,0:51:38.279 why these transparency archive exists. But[br]my my guess is, right, that this was 0:51:38.279,0:51:43.839 reactionary. So Mark Zuckerberg and high[br]ranking officials from Twitter and Google 0:51:43.839,0:51:49.920 were hauled in to testify in the House and[br]Senate, and this is them trying to self 0:51:49.920,0:51:53.989 regulate instead of having regulation[br]imposed on them by people. So that again, 0:51:53.989,0:51:58.420 this goes to the pressure part is that[br]there was regulatory pressure put on them, 0:51:58.420,0:52:03.089 the threat of regulatory pressure and so[br]that's what made them do these 0:52:03.089,0:52:09.150 transparency archives. In terms of what's[br]available in Europe. I guess as long as 0:52:09.150,0:52:17.390 the UK is still in the EU, kind of[br]teetering Facebook has started to make ads 0:52:17.390,0:52:22.599 transparent in the UK. They also make them[br]transparent in Brazil and they're going to 0:52:22.599,0:52:25.859 make them transparent in India. And I[br]think they have plans to make them 0:52:25.859,0:52:30.650 transparent in other places, in the EU as[br]well. However, they haven't done that. 0:52:30.650,0:52:35.570 However, again this goes back to the[br]pressure part. So there's no API for the 0:52:35.570,0:52:40.029 other countries, there's only an API for[br]the US and that might be because we put 0:52:40.029,0:52:45.640 pressure on them by scraping them and[br]publicly releasing their data. And, right, 0:52:45.640,0:52:48.700 there's no transparency reports for other[br]countries, as well, there's only 0:52:48.700,0:52:52.260 transparency report for the US. And again[br]that might have been because we applied 0:52:52.260,0:52:56.300 pressure and we were publishing numbers.[br]Some of the numbers in terms of spend were 0:52:56.300,0:52:59.710 very low, because, right, they were just[br]giving us ranges. So we might have been 0:52:59.710,0:53:03.700 making them look bad, when we took the[br]bottom range their spend and they might 0:53:03.700,0:53:08.309 have wanted to correct that with their own[br]transparency archive, as well. So again, a 0:53:08.309,0:53:11.960 lot of this unfortunately requires[br]pressure to get them to improve their 0:53:11.960,0:53:16.499 transparency efforts.[br]Herald: Great thank you. Microphone number 0:53:16.499,0:53:20.630 two please.[br]Question: So you mentioned you mentioned 0:53:20.630,0:53:25.900 FiveThirtyEight and their work on the[br]donations. Do you think it makes sense 0:53:25.900,0:53:33.047 to combine the data you gathered with what[br]they have to look at election outcomes, 0:53:33.047,0:53:37.660 like, election results and turnout and[br]stuff like that? 0:53:37.660,0:53:43.440 Answer: Yes. Actually this is the number[br]one project on our road map, right now. 0:53:43.440,0:53:49.940 Is, actually Google has processed the FEC[br]information and they've made this 0:53:49.940,0:53:55.631 information available via their big query[br]database. So we've downloaded this, we've 0:53:55.631,0:54:01.809 manually linked the Facebook advertisers[br]and the Google advertiser to the FEC data 0:54:01.809,0:54:06.999 and now we're doing the regression models,[br]specifically focused on the donation ads 0:54:06.999,0:54:10.760 first. Because those are what are reported[br]to the FEC, at this point. So we are 0:54:10.760,0:54:14.910 essentially trying to understand how[br]effective these donation ads are at 0:54:14.910,0:54:21.010 actually driving donations, within here.[br]Herald: Thank you. Microphone number 4 0:54:21.010,0:54:24.740 please.[br]Question: Hi. First of all thank you Mr. 0:54:24.740,0:54:30.420 McCoy and your team for this very[br]interesting research. I was wondering, 0:54:30.420,0:54:35.499 whether you know if there are any follow[br]up research conducted by political 0:54:35.499,0:54:42.350 scientists, sociologists etc. analyzing[br]the political repercussions of these ad 0:54:42.350,0:54:47.409 campaigns.[br]Answer: Yes, so we're aware of a few 0:54:47.409,0:54:50.420 efforts. I don't want to out the teams[br]that are doing them, in case they don't 0:54:50.420,0:54:55.549 want to be outed. There's there's nothing[br]that's been published, publicly I believe 0:54:55.549,0:54:59.880 on this. But we're definitely trying to.[br]That's one of the main goals of kind of 0:54:59.880,0:55:04.299 our overarching online political[br]advertising transparency thing, is to try 0:55:04.299,0:55:09.061 and get as much data as we can in the[br]hands of less technical people in an easy 0:55:09.061,0:55:15.989 way for them to analyze. And so this is[br]basically the primary goal of our project, 0:55:15.989,0:55:20.130 in here. So we've been working as hard as[br]we can to get political science to stay up 0:55:20.130,0:55:25.349 to speed on the data. And this is why it's[br]really unfortunate that Facebook has its 0:55:25.349,0:55:30.239 NDA in place for their particular data,[br]because this makes it very difficult for 0:55:30.239,0:55:35.099 us to share and collaborate in that[br]particular data. Which puts pressure on us 0:55:35.099,0:55:40.109 unfortunately as being the only ones that[br]can do some of this analysis right now. So 0:55:40.109,0:55:44.369 this is why I would I would love to apply[br]enough pressure to Facebook, to get better 0:55:44.369,0:55:49.490 access to their particular data.[br]Herald: Yes. And the question from the 0:55:49.490,0:55:53.960 Internet please.[br]Signal Angel: So Nomad is asking "Why are 0:55:53.960,0:55:59.099 those advertisements considered political or[br]election interference in the USA. Can't you just 0:55:59.099,0:56:03.930 see, that someone paid money to display[br]that content and conclude its purpose is 0:56:03.930,0:56:11.549 to promote an agenda or manipulate them?".[br]Answer: This is a good question. Right, a 0:56:11.549,0:56:15.210 lot of this goes to the tactics that[br]they're using here. So again they're 0:56:15.210,0:56:19.040 creating these communities, that they're[br]making look like their grass roots 0:56:19.040,0:56:24.479 communities and then they're kind of[br]sucking people in with these ads, that up 0:56:24.479,0:56:28.630 until recently had no disclaimers string[br]on them. So you had no idea who paid for 0:56:28.630,0:56:33.820 them. So they appear to be paid for by[br]kind of these grassroots organizations. So 0:56:33.820,0:56:39.221 you felt like you were, kind of, part of a[br]grassroots movement, enjoining these kinds 0:56:39.221,0:56:43.029 of communities. I think this is the really[br]scary, kind of subtle things. And you 0:56:43.029,0:56:46.520 might not realize why you're being[br]targeted for these particular ads or who 0:56:46.520,0:56:50.269 was behind these particular ads. So, I[br]think it was really easy for people to 0:56:50.269,0:56:55.339 kind of get unwittingly, kind of, duped[br]into joining what looked like these 0:56:55.339,0:56:59.690 grassroots campaigns. So that's why I[br]think improving these disclaimers strings 0:56:59.690,0:57:03.319 and showing who is really behind these[br]communities and these advertisements is 0:57:03.319,0:57:09.180 really important, to dispel this notion of[br]these fake grassroots communities, that 0:57:09.180,0:57:12.960 are luring people in within here. So I[br]think that's one of the big things that 0:57:12.960,0:57:18.029 can be gained by these transparency[br]archives. But it requires improvement of 0:57:18.029,0:57:22.680 the transparency archives, to do that.[br]Herald: Microphone number 3 please. 0:57:22.680,0:57:28.430 Question: Yes. So I'm curious about the[br]efficacy of some of the advertisements 0:57:28.430,0:57:36.579 that are on Facebook and Twitter. And I'm[br]wondering is any group like the ProPublica 0:57:36.579,0:57:45.279 web extension checking the engagement[br]rate? Like the number of comments, the 0:57:45.279,0:57:53.119 number of views and the number of shares,[br]to like kind of get an estimate of, OK 0:57:53.119,0:57:58.440 this big grassroots community is building[br]up a number of followers and these 0:57:58.440,0:58:05.520 followers population sizes and whatnot.[br]Answer: Yeah, this is again a really good 0:58:05.520,0:58:11.030 question. This is something that we are, I[br]would certainly encourage other people to 0:58:11.030,0:58:14.200 potentially do as well. So the problem is[br]that a lot of that information isn't 0:58:14.200,0:58:18.180 exposed by the transparency archives. This[br]is more of what they call kind of the 0:58:18.180,0:58:22.859 organic information, the non paid for[br]information, within here. And so this is 0:58:22.859,0:58:27.870 stuff that none of the platforms are[br]releasing. And so it requires kind of a 0:58:27.870,0:58:32.509 scraping operation, essentially, to gather[br]this information and collect it. And it's 0:58:32.509,0:58:36.230 something that we're definitely thinking[br]about how to efficiently do, is how to 0:58:36.230,0:58:40.619 efficiently scrape and collect this[br]information. Because this is very hard 0:58:40.619,0:58:43.420 because, right, you go against the anti[br]scraping teams of these companies, that 0:58:43.420,0:58:46.740 are well resourced. And this requires[br]accounts, and these accounts are going to 0:58:46.740,0:58:51.339 be shut down and detected. So this is[br]something that we're trying to pilot to 0:58:51.339,0:58:55.609 understand. Our other idea of how to do[br]this potentially is try and crowdsource 0:58:55.609,0:59:00.260 this information. This is similar to how[br]ProPublica crowdsourced it for the browser 0:59:00.260,0:59:03.950 extension information. We could[br]potentially crowdsource it, where you 0:59:03.950,0:59:08.109 know, when people interact with these[br]communities or these ads the plug-in could 0:59:08.109,0:59:11.770 potentially crowdsource that information[br]back to us. And then we would have to 0:59:11.770,0:59:17.700 figure out some strategy to sanitize that[br]information in some way. Because at that 0:59:17.700,0:59:20.859 point you might have some sensitive[br]information they are collecting. This is 0:59:20.859,0:59:26.390 something that we're thinking about. We're[br]cautious, I think, rightly so because this 0:59:26.390,0:59:31.160 can start stepping on, again, more[br]sensitive information that's available 0:59:31.160,0:59:33.859 from within here. But I think it's[br]definitely key to understanding the 0:59:33.859,0:59:37.380 effectiveness of these ads. Something that[br]we're going to have to do or we're going 0:59:37.380,0:59:42.339 to have to convince Facebook somehow to do[br]on our behalf in order to really 0:59:42.339,0:59:46.429 understand the effectiveness of these ads.[br]Herald: Thank you. Last question for 0:59:46.429,0:59:49.579 microphone number 1.[br]Question: All right. At the beginning of 0:59:49.579,0:59:54.340 your talk you explained how Russia[br]influenced the elections. I'm curious 0:59:54.340,0:59:59.529 about the attribution. Is there possibly[br]any doubts at any instance that you 0:59:59.529,1:00:06.359 presented that it was not Russia or maybe[br]some other country, China or Iran? How do 1:00:06.359,1:00:09.720 you know, and did you check the facts?[br]Answer: I mean, that's a good question. 1:00:09.720,1:00:14.430 Unfortunately, right, the national[br]security agencies don't release the 1:00:14.430,1:00:20.849 sources of their information. There's[br]another investigation done by the 1:00:20.849,1:00:27.229 Department of Justice by Robert Mueller,[br]that did release some more information 1:00:27.229,1:00:31.749 about this, within here. I've looked at[br]that information and it looks, you know, 1:00:31.749,1:00:36.449 right, you can never a 100%, unequivocally[br]state that it was Russia. It could have 1:00:36.449,1:00:41.009 been a false flag operation. But I think[br]that pretty much the overwhelming 1:00:41.009,1:00:45.349 information that everyone has found when[br]they've investigated this has pointed at 1:00:45.349,1:00:52.880 Russia and the organizations that were[br]prosecuted by Mueller. 1:00:52.880,1:00:56.920 Herald: Damon McCoy, thank you very much.[br]Please give them a great round of applause. 1:00:56.920,1:00:59.540 Applause 1:00:59.540,1:01:04.970 35c3 postroll music 1:01:04.970,1:01:22.000 subtitles created by c3subtitles.de[br]in the year 2019. Join, and help us!