Return to Video

35C3 - Explaining Online US Political Advertising

  • 0:00 - 0:19
    35c3 preroll music
  • 0:19 - 0:24
    Herald: This talk will be held by Damon
    McCoy. He will be explaining online U.S.
  • 0:24 - 0:28
    political advertising and he has been
    working with researching like how
  • 0:28 - 0:34
    different online communities basically
    behave around many different topics. But
  • 0:34 - 0:37
    this is what he's going to talk about
    today so please give him a great round of
  • 0:37 - 0:47
    applause.
    Applause
  • 0:47 - 0:51
    Damon McCoy: Thank you everyione for coming. I'm
    up here speaking and I'm the only one that
  • 0:51 - 0:55
    wanted to fly to Germany over Christmas
    and New Year's. However there were three
  • 0:55 - 1:00
    real people that were really key in
    helping out with this research and before
  • 1:00 - 1:05
    we get started, I just want to credit
    them. One is my grad student Laura
  • 1:05 - 1:08
    Adelson. She did a lot of the analysis
    that you're going to be seeing generated
  • 1:08 - 1:15
    all the graphs. One of the undergraduate
    students that's from our NYU Shanghai
  • 1:15 - 1:20
    campus secure did a lot of the work to
    collect all the data that you're going to
  • 1:20 - 1:25
    see here. And then Raytown who is a
    professor at NYU and the Shanghai campus
  • 1:25 - 1:31
    also helped out with kind of our initial
    efforts of collecting some of this data.
  • 1:31 - 1:35
    And so before we get started I guess I'll
    give a little bit of an introduction about
  • 1:35 - 1:40
    myself. I'm a professor at NYU tenant
    school of engineering. As was mentioned
  • 1:40 - 1:45
    before I do a lot of stuff kind of looking
    at how technology kind of impacts the
  • 1:45 - 1:51
    security and privacy of you know society
    groups of people and things like them. So
  • 1:51 - 1:57
    this was really kind of an opportunistic
    project that kind of captured the impact
  • 1:57 - 2:07
    of online advertising in the political
    sphere of U.S. campaigns. And also quick
  • 2:07 - 2:13
    plug. So everything that I'm going to be
    showing you most of the data scripts and
  • 2:13 - 2:17
    things like that we've put in a github
    that's accessible by anyone that wants to
  • 2:17 - 2:23
    analyze the data or look at our scripts
    and improve them or things like that.
  • 2:23 - 2:31
    Applause Thank you. This is the first
    time that I've given this talk outside of
  • 2:31 - 2:36
    the U.S. so let me just start with some
    quick explanation as to how U.S. elections
  • 2:36 - 2:41
    work for those of you that might not know
    about this. So every two years in the U.S.
  • 2:41 - 2:46
    we hold federal elections. These are
    elections - right - that impact all of the
  • 2:46 - 2:53
    states within the U.S. And so every four
    years we have an election for president,
  • 2:53 - 2:59
    2018 were our last elections. This was not
    a presidential election year so the
  • 2:59 - 3:05
    elections were for the Senate and the
    House seats at the federal level. And then
  • 3:05 - 3:10
    we also had elections for state and local
    positions as well with in here. And some
  • 3:10 - 3:15
    of them that are captured in our data
    especially our Facebook data not so much
  • 3:15 - 3:21
    in our Twitter and Google add transparency
    data that we have here. So this will be
  • 3:21 - 3:27
    focused this talk on the 2008 elections
    that happened on November 7th. Election
  • 3:27 - 3:35
    day is always the first Tuesday in
    November in the U.S. every two years. So
  • 3:35 - 3:38
    to begin with in the background right
    probably some of you know about this, some
  • 3:38 - 3:42
    of you might not know about this, but in
    the 2016 elections which were a
  • 3:42 - 3:47
    presidential election year there was right
    this election interference that happened
  • 3:47 - 3:53
    here. And so Facebook has released these
    ads. These ads were paid for by a Russian
  • 3:53 - 4:01
    company the Internet Research Agency that
    ran these ads. And Facebook released these
  • 4:01 - 4:05
    to - right - the Senate and then the
    Senate re-released these publicly to
  • 4:05 - 4:10
    people. And so this is an ad basically
    trying to disenfranchise people from
  • 4:10 - 4:15
    voting in the elections. And you can see
    right it's targeted at people in the U.S.
  • 4:15 - 4:20
    of a certain age range and interests like
    Martin Luther King, African-American
  • 4:20 - 4:24
    culture, African-American civil rights.
    Facebook doesn't actually allow you to
  • 4:24 - 4:30
    directly target to people based on like
    their ethnicity. So this is a pretty good
  • 4:30 - 4:34
    proxy though. If you target these kinds of
    interests for this, so this would probably
  • 4:34 - 4:38
    be fairly effective at targeting African-
    American people within the U.S. for this
  • 4:38 - 4:44
    ad to try and disenfranchise them from
    voting in the elections. There were other
  • 4:44 - 4:50
    ads , right, that tried to do
    misinformation distance information kinds
  • 4:50 - 4:55
    of campaigns. So this was an ad that was
    again paid for by the Russian agency that
  • 4:55 - 5:01
    was trying to perpetuate this rumor
    basically unsubstantiated rumor that Bill
  • 5:01 - 5:05
    Clinton has this ill legitimate child
    within here. And again, right, the
  • 5:05 - 5:09
    targeting information is targeted at
    African-Americans within the U.S. and
  • 5:09 - 5:15
    African-Americans are kind of a key voting
    block for a kind of more liberal,
  • 5:15 - 5:20
    democratic people within the U.S.
    oftentimes. That's probably the other
  • 5:20 - 5:24
    thing that should have explained about the
    U.S. election system especially at the
  • 5:24 - 5:29
    federal level is that we effectively have
    two parties that are you know when any
  • 5:29 - 5:34
    meaningful amount of elections within the
    U.S. and one of them is the Democratic
  • 5:34 - 5:39
    party that tends to skew more liberal. So
    they're more kind of right for bigger
  • 5:39 - 5:43
    government, more social services and then
    we have the Republicans which skew kind of
  • 5:43 - 5:48
    more conservative wanting kind of a
    smaller government providing less services
  • 5:48 - 5:55
    and kind of less regulation around things
    as well. And so right these are two
  • 5:55 - 6:02
    examples but there were a whole bunch of
    these ads that were shown on Facebook
  • 6:02 - 6:05
    within a year. So right. Pretty much all
    of these that they're tried to
  • 6:05 - 6:11
    disenfranchise people or tried to kind of
    create chaos, kind of polarize people
  • 6:11 - 6:17
    around the election oftentimes with kind
    of disinformation sorts of that things.
  • 6:17 - 6:25
    And so in 2017 our Office of the Director
    of National Intelligence put out a report
  • 6:25 - 6:29
    - sorry for the big block of text, this
    will be the only big box of text in here.
  • 6:29 - 6:34
    But I thought it was kind of important to
    show this because they they pretty much
  • 6:34 - 6:41
    unequivocally state that Russia tried to
    interfere in the US elections and that
  • 6:41 - 6:46
    Vladimir Putin was somehow involved within
    this interference. And so this is - this
  • 6:46 - 6:53
    is pretty much as far as the National
    Security Agency, the CIA, NSA pretty much
  • 6:53 - 6:58
    solid evidence that they have - that this
    occurred within here. And so the other
  • 6:58 - 7:03
    thing that broke was right the Cambridge
    Analytica scandal as well broke within
  • 7:03 - 7:08
    Facebook where there was this straight
    third party advertising agency that
  • 7:08 - 7:13
    collected you know a whole bunch of data
    on 80 million profiles within Facebook and
  • 7:13 - 7:17
    then tried to create psychological
    profiles for targeting and messaging and
  • 7:17 - 7:22
    things like that around here. And so these
    two particular scandals broke within here
  • 7:22 - 7:29
    and the first result of this is we have
    Mark Zuckerberg in a suit a real suit not
  • 7:29 - 7:37
    a hoodie suit. Testifying right in front
    of our Senate within a year. And so he,
  • 7:37 - 7:42
    right, he testified before House and
    Senate committees about the abuses
  • 7:42 - 7:50
    occurring within Facebook. And he did this
    on April 10th and 11th 2018 within here,
  • 7:50 - 7:56
    and right. So this is right. In here, he
    did admit that Facebook had made mistakes
  • 7:56 - 8:04
    and that they they need to improve things
    moving forward within their platform. The
  • 8:04 - 8:11
    most kind of tangible outcome from these
    testimonies were these transparency
  • 8:11 - 8:20
    archives that began to appear. And here is
    a view of what Facebook's add transparency
  • 8:20 - 8:25
    archive looks at. When it originally
    deployed you needed a Facebook account to
  • 8:25 - 8:30
    interact with it. Now Facebook has dropped
    the requirement so anyone with Internet
  • 8:30 - 8:37
    access, unless you're censored somehow,
    can go to this archive and access these
  • 8:37 - 8:42
    ads. So the user facing portal for these
    add transparency archives you type in
  • 8:42 - 8:48
    keywords and then it basically does is
    write a pattern matching on the ad text
  • 8:48 - 8:54
    and other parts of the ad and then returns
    the ads that matches that within here. And
  • 8:54 - 8:59
    so then you can see all the political ads
    that match that particular term within
  • 8:59 - 9:07
    here. Facebook began archiving these ads
    kind of at a large scale starting on May
  • 9:07 - 9:16
    7th 2018 by Election Day November 7th
    2018. There were 1.6 million ads paid for
  • 9:16 - 9:22
    by over 85 thousand advertisers within
    Facebook's platform. Facebook is actually
  • 9:22 - 9:27
    fairly broad as to what they included
    within their political archive. They
  • 9:27 - 9:33
    included any ads related to US elections,
    either federal state or local elections.
  • 9:33 - 9:37
    They also included these very important
    kind of issue ads as we saw when we looked
  • 9:37 - 9:41
    at the Russian interference a lot of times
    the ads didn't mention actual political
  • 9:41 - 9:46
    candidates, they mentioned kind of
    polarizing issues within the U.S. So
  • 9:46 - 9:51
    Facebook also included these ads of
    political national importance. They had a
  • 9:51 - 9:56
    list of I think about 13 different
    criteria is the last of them being values.
  • 9:56 - 10:00
    And so it was a fairly encompassing set of
    ads they tried to include within their
  • 10:00 - 10:08
    archive. Along with the text and images
    or videos of the ad, they also included
  • 10:08 - 10:15
    basically ranges of geographic impressions
    and demographic impressions. So right.
  • 10:15 - 10:22
    State level impression information in some
    kind of ranges and demographic by gender
  • 10:22 - 10:29
    and by age kind of bucketed within here.
    And they did this again for impressions
  • 10:29 - 10:34
    and then for they also included some spend
    information again kind of in ranges so
  • 10:34 - 10:41
    they gave ranges of 0 to 99 dollars, a
    hundred dollars to say like 500 dollars,
  • 10:41 - 10:48
    501 dollars, 2000 dollars and so on. And
    so forth within these buckets one of the
  • 10:48 - 10:52
    key pieces of information that they did
    not release was the targeting information.
  • 10:52 - 10:55
    So like I showed you before of those ads
    they - right - they have that targeting
  • 10:55 - 11:02
    information, Facebook does not release
    that within their transparency archive.
  • 11:02 - 11:04
    They have this right. They have right-
    thay had that user portal where you could
  • 11:04 - 11:10
    do the keyword search from within there.
    However right. I'm I like to do large
  • 11:10 - 11:16
    scale data analysis and so I wanted to
    basically try and collect all of the ads
  • 11:16 - 11:21
    within this web portal. And so initially
    all they had was this keyword search
  • 11:21 - 11:25
    portal within here. And so what we did is
    we compiled kind of a large list of what
  • 11:25 - 11:31
    we thought were reasonables of keywords,
    names of prominent politicians, names of
  • 11:31 - 11:36
    states issues within here. And so we tried
    to compile this long list of keyword
  • 11:36 - 11:41
    searches and we began scraping the
    reporter within here and I'll tell you the
  • 11:41 - 11:48
    story of how our scraping efforts went.
    Now currently they are so off for a API
  • 11:48 - 11:52
    it's still keyword based their API and
    it's restricted by an NDA so I'll kind of
  • 11:52 - 11:58
    flesh out the story of how this goes. So
    at the beginning they, they released this
  • 11:58 - 12:03
    kind of towards the end of May the user
    archive and I played with it and I
  • 12:03 - 12:09
    realized that this didn't lend itself well
    to kind of large scale analysis of these
  • 12:09 - 12:16
    ads and so on. I went to my students
    Secur and Laura and Secur worked kind
  • 12:16 - 12:22
    of furiously night and day and within
    three days he had a workable scraper that
  • 12:22 - 12:26
    was able to put in our keywords and then
    we were able to scrape all the results
  • 12:26 - 12:33
    from our keyword within here. And so we
    ran the scraper for about 2 months and
  • 12:33 - 12:38
    then we released a report. Just kind of a
    very general statistical report and we
  • 12:38 - 12:45
    released the data in our github archive at
    that. After that about 2 weeks later
  • 12:45 - 12:54
    Facebook began anti-scraping measures
    within here. And so, right, this kind of
  • 12:54 - 13:00
    hampered our efforts to scrape Facebook's
    archive. At this point. I'm - I don't want
  • 13:00 - 13:04
    to attribute any malice. I don't believe
    that Facebook was targeting just our
  • 13:04 - 13:08
    scraping efforts they were targeting
    everyones scraping efforts. The
  • 13:08 - 13:13
    transparency whether it's wrong or right
    to block people from collecting data on a
  • 13:13 - 13:17
    transparency archive I might kind of
    quibble with them on that and say they
  • 13:17 - 13:22
    might once provide better access to the
    data within their transparency archive.
  • 13:22 - 13:27
    But this was the choice that Facebook made
    to kind of clamp down on the scraping
  • 13:27 - 13:32
    within here. So we tried to fight with
    them a little bit to - right kind of a can
  • 13:32 - 13:36
    mouse game. You know we make some changes
    to our scraper to avoid their anti
  • 13:36 - 13:42
    scraping. They do some things on their end
    to block our scraper and probably other
  • 13:42 - 13:48
    people scrapers that are doing similar
    things to us as well within here. And so
  • 13:48 - 13:57
    this persisted for probably about 2 weeks,
    and then Facebook basically deployed their
  • 13:57 - 14:02
    API within here. However they said right,
    their API is very limited and still in
  • 14:02 - 14:07
    beta at this point. So these were part of
    the terms and conditions from here. One of
  • 14:07 - 14:13
    the ones that I found kind of the most
    unease (?) is that it limited it only to
  • 14:13 - 14:18
    U.S. people so we could essentially only
    very closely work with U.S. people within
  • 14:18 - 14:23
    here and at least it did kind of - it
    limited the types of people that we could
  • 14:23 - 14:28
    work with in here. And so right
    unfortunately this kind of ruled us out
  • 14:28 - 14:32
    from working closely with journalists from
    you know really good news organizations
  • 14:32 - 14:37
    like the Guardian and so like that just
    happened to have the misfortune of being
  • 14:37 - 14:46
    located somewhere outside of the U.S.
    within here. Maybe the good fortune, yes.
  • 14:46 - 14:51
    And then the list of restrictions
    continue. They also placed the data
  • 14:51 - 14:57
    retention on it so we could only retain
    the data for one year. Again placing data
  • 14:57 - 15:02
    retention. So Facebook's data retention on
    their archive is 7 years within here but
  • 15:02 - 15:08
    they're placing a 1 year data retention on
    the data that we collect from their NDA.
  • 15:08 - 15:14
    I'd like to say that - right - we - right
    - I got this NDA and I lit it on fire, I
  • 15:14 - 15:23
    tore it up and we continued to scrape the
    archive. Within a year. No unfortunately
  • 15:23 - 15:30
    it was a hard call to make but right you
    know there's basically two students and we
  • 15:30 - 15:34
    basically had to make a call whether we
    wanted the data to analyze or whether we
  • 15:34 - 15:38
    wanted to spend all of our time kind of
    faded - fighting with Facebook's anti-
  • 15:38 - 15:46
    scraping efforts. And so in the end we did
    - I did in fact agree to their NDA within
  • 15:46 - 15:51
    a year. So the initial data we scraped, we
    release were still scraping a small amount
  • 15:51 - 15:57
    of data that we do release as well from
    here. But unfortunately at this point any
  • 15:57 - 16:01
    of the data that we collected from the NDA
    we cannot release within here. If anyone
  • 16:01 - 16:07
    doesn't want to fight with Facebook and
    resurrect the crawler within here I would
  • 16:07 - 16:12
    be more than happy for that to happen
    within a year. Unfortunately given our
  • 16:12 - 16:17
    engineering constraints it just simply
    wasn't feasible for us to do that within a
  • 16:17 - 16:24
    year. And so the story is a little bit
    different with Google. So Google's archive
  • 16:24 - 16:34
    they began archiving ads on May 31, 2018.
    By election day they had 45.000 and from
  • 16:34 - 16:39
    600 advertisers. Their criteria for
    introducing advertising was much more
  • 16:39 - 16:44
    narrow than Facebook's, so they only
    released ads related to U.S. federal
  • 16:44 - 16:50
    candidates and federal office holders
    within here. So it is a much more limited
  • 16:50 - 16:55
    set of data that Google released within a
    year. None of the issue ads that Facebook
  • 16:55 - 17:03
    released. They didn't release any of the
    geographic or demographic data by
  • 17:03 - 17:09
    impression, they did release ranges of
    impressions and ranges of spend data, and
  • 17:09 - 17:15
    they did release some limited targeting
    data from here so they released geographic
  • 17:15 - 17:20
    and demographic targeting information
    which Facebook hadn't released in their
  • 17:20 - 17:25
    ads. And their data is available through a
    similar keyword based portal. But they
  • 17:25 - 17:31
    also make it available through just a
    database, if you want to within here. So
  • 17:31 - 17:35
    this is what their portal looks like
    within here. And this is - right - their
  • 17:35 - 17:41
    big table, sorry, their big query database
    that they released from here. And so they
  • 17:41 - 17:46
    updated every week within here and you can
    download it and analyze the data
  • 17:46 - 17:53
    relatively easily within here. So the last
    one to kind of implement their archive was
  • 17:53 - 18:01
    Twitter. Twitter began archiving ads on
    June 27, 2018. The scale of ads and
  • 18:01 - 18:07
    Twitter is very small compared to the rest
    of them. The scale of their ad network in
  • 18:07 - 18:13
    general is much smaller than Google and
    Facebook's. And what they included, it was
  • 18:13 - 18:18
    similar to what Google included in terms
    of - right - only federal candidates
  • 18:18 - 18:22
    within here. Kind of closer to the
    election, they also said that they were
  • 18:22 - 18:28
    going to release political issue ads.
    However, the mechanism of enforcement
  • 18:28 - 18:33
    doesn't appear to exist within Twitter's
    system. There doesn't appear to be anyones
  • 18:33 - 18:39
    job it is to actually enforce transparency
    of ads from here. So we've been kind of
  • 18:39 - 18:43
    manually finding accounts and reporting
    them to Twitter within here and then when
  • 18:43 - 18:47
    we manually report them to Twitter,
    Twitter then includes them and future
  • 18:47 - 18:52
    transparency kinds of efforts within here.
    But it appears like we're basically the
  • 18:52 - 18:57
    ones [Damon McCoy] short laughter it's
    become our job to monitor the Twitter
  • 18:57 - 19:01
    accounts and then notify Twitter and then
    they'll manually kind of deal with it.
  • 19:01 - 19:04
    Unfortunately, they still don't appear to
    have a person that actually manages this
  • 19:04 - 19:10
    process internal to Twitter at this point.
    Twitter does however release the most
  • 19:10 - 19:16
    information. So they release exact data
    not the range data on impressions and
  • 19:16 - 19:21
    spend information, also by geographic and
    demographics and they also include all of
  • 19:21 - 19:28
    the targeted information as far as we can
    tell and their data is available through
  • 19:28 - 19:32
    without an account. Basically through
    their portal and we've been scraping them
  • 19:32 - 19:35
    and there's been no problems they haven't
    blocked us at this point. So we just
  • 19:35 - 19:40
    simply scraped their data and then we
    republish it to github at that point. And
  • 19:40 - 19:46
    we've had no problems with Twitter in this
    way in the scale, their data is so small
  • 19:46 - 19:51
    that it's been relatively easy to keep
    pace with it at this point. And here's
  • 19:51 - 19:57
    just a picture of the Twitter transparency
    archive and again this have a list of all
  • 19:57 - 20:00
    the Twitter accounts that they include in
    their transparency archive. So we can
  • 20:00 - 20:04
    monitor this and then we can monitor other
    people that we know that are politically
  • 20:04 - 20:08
    active when we see them doing paid
    advertising then we can notify Twitter and
  • 20:08 - 20:13
    then Twitter will include them in their
    transparency archive normally within like
  • 20:13 - 20:19
    a week, or so of here. And so this is,
    this is kind of the background that you
  • 20:19 - 20:25
    need to understand the transparency
    archives. So now we have a data set that
  • 20:25 - 20:32
    we can begin to analyze within here. For
    Facebook since it was the keyword driven
  • 20:32 - 20:39
    thing at the beginning and it still is, we
    were able to collect about 80% of the ads
  • 20:39 - 20:43
    in Twitter's database from there. The
    other problem with the API is that it is
  • 20:43 - 20:51
    severely rate-limited at this point. I'm
    talking about 3 to 4 queries per minute
  • 20:51 - 20:58
    that we can get through Facebook's API at
    this point. And so we kind of did our best
  • 20:58 - 21:03
    effort to collect as much data as we could
    from Facebook. About two weeks before the
  • 21:03 - 21:08
    election, Facebook began releasing a
    transparency archive that included
  • 21:08 - 21:12
    basically an aggregated list of all the
    advertisers and how many ads they have and
  • 21:12 - 21:17
    how much spent and this is how we can tell
    that we got about 80% of the ads from
  • 21:17 - 21:22
    Facebook's archive based on this within
    here. And the nice thing about the
  • 21:22 - 21:26
    transparency report is that we could go
    back and now that we know we're missing we
  • 21:26 - 21:33
    could readjust our usage of the API and so
    now we have virtually 100% coverage of
  • 21:33 - 21:40
    Facebook going forward within here.
    Twitter - right - we could collect 100% of
  • 21:40 - 21:46
    their data. And again we've republished
    the SOL (?) in an easier to process kind
  • 21:46 - 21:50
    of form. Google again - right - we have a
    100% of their ads because they're all in
  • 21:50 - 21:55
    the big query database. However when we
    started analyzing the data we noticed that
  • 21:55 - 22:00
    for a lot of the ads we're missing the
    actual content, the images and text of the
  • 22:00 - 22:07
    ad. It turns out that for Google's ad
    network if the ad was originally purchased
  • 22:07 - 22:11
    through a third party advertiser and then
    run on one of Google's properties the
  • 22:11 - 22:16
    content of the ad won't be archived within
    your system. This is unfortunately a big
  • 22:16 - 22:21
    loophole. So - right - if you're if you're
    running a kind of malicious misinformation
  • 22:21 - 22:26
    thing, you can easily unfortunately
    circumvent Google's archive at least from
  • 22:26 - 22:31
    archiving your content by simply just
    paying for it by a third party within
  • 22:31 - 22:35
    here. It's unclear whether this is a
    policy limitation or whether this is the
  • 22:35 - 22:40
    technical limitation on Google's part, but
    the outcome is that we only have the
  • 22:40 - 22:46
    content for about 70% of Google's ads that
    were paid for directly on Google's
  • 22:46 - 22:52
    platform and within here. So one of the
    first things that we want to do is kind of
  • 22:52 - 22:57
    add some semantic meaning to these ads a
    kind of large scale. And so we played
  • 22:57 - 23:02
    around with a few techniques, some fancy
    kinds of natural language processing and
  • 23:02 - 23:06
    things like that. But we found that
    there's actually a really fairly simple
  • 23:06 - 23:12
    and effective way of categorizing kind of
    the intent of the ad, and that's that most
  • 23:12 - 23:17
    of these ads have a URL of some kind and
    a lot easier or else just point back to
  • 23:17 - 23:20
    like third party services like if you're
    holding some kind of event you're going to
  • 23:20 - 23:24
    coordinate it with like Everbright or
    something that if you're seeking
  • 23:24 - 23:28
    donations, if you're a Democrat you're
    going to use this third party Paron
  • 23:28 - 23:31
    processor they're called Act Blue, if
    you're Republican there's like two or
  • 23:31 - 23:36
    three payment processors that you're going
    to use for this. So we could simply just
  • 23:36 - 23:40
    look at these really prominent URLs that
    occur a lot of times and just kind of
  • 23:40 - 23:46
    manually tag what is the purpose for this.
    And by doing this we can tag ads as either
  • 23:46 - 23:51
    just purely informational that they wanted
    just kind of get some kind of message
  • 23:51 - 23:56
    about the candidate either positive or
    negative out their connection ads that are
  • 23:56 - 24:01
    seeking contact information like people's
    e-mail addresses, phone numbers, names and
  • 24:01 - 24:05
    things like that. Presumably so they can -
    you know - either get them to volunteer or
  • 24:05 - 24:10
    donate money in the future for the
    campaign. There's move ads that are either
  • 24:10 - 24:15
    they're trying to get people to vote or to
    attend some kind of rally or to volunteer
  • 24:15 - 24:19
    or something like that and then right
    there's donation ads. And then finally
  • 24:19 - 24:24
    there's kind of commercial ads. These are
    things either they are selling products
  • 24:24 - 24:33
    that are kind of directly critical nature
    like a bobble head of some candidate or
  • 24:33 - 24:39
    they might be like solar panels which have
    tax credits in the U.S. and things like
  • 24:39 - 24:43
    that. So there's some kind of commercial
    good that's linked somehow to some
  • 24:43 - 24:50
    political messaging within here. So we use
    this method and we were able to categorize
  • 24:50 - 24:56
    about 70% of the ads, we took a random
    sample of them, we manually checked what
  • 24:56 - 25:01
    we were doing and we found it was pretty
    accurate. About 96% accuracy we got using
  • 25:01 - 25:08
    this methods. The other thing that we did
    is for the top advertisers, so for
  • 25:08 - 25:14
    Facebook the top 75% of the advertisers,
    for Google the top 80% of the advertisers,
  • 25:14 - 25:18
    in terms of the money spent by the
    advertiser. We went in and we manually
  • 25:18 - 25:22
    categorized what was this type of
    organization. Was it a political
  • 25:22 - 25:27
    candidate, was it what's called a
    political action committee, so these are
  • 25:27 - 25:33
    the PACs within the U.S., was it a
    union, was that a for profit operation,
  • 25:33 - 25:37
    was it a non-profit operation. So on and
    so forth and so we wrote like some regular
  • 25:37 - 25:42
    expressions that got us most of the way
    there. Most of them have fairly uniform
  • 25:42 - 25:47
    naming conventions and for the ones that
    we couldn't kind of automatically classify
  • 25:47 - 25:51
    we just did it manually, within a year.
    And then since Twitter had so few
  • 25:51 - 25:58
    advertisers, we just did these all,
    manually, within here. Now, right, we can
  • 25:58 - 26:01
    start to do some analysis. So the first
    analysis that we did, the easiest
  • 26:01 - 26:08
    analysis, was we looked at the size of the
    ads. And the thing that pops out is that
  • 26:08 - 26:13
    the majority of ads on all the platforms
    are between $0 and $100 dollars. So these
  • 26:13 - 26:18
    are what are normally called the micro
    targeted ads, that are typically seen by
  • 26:18 - 26:23
    less than a 1000 people within a year. So
    these are very short lived, narrowly
  • 26:23 - 26:28
    targeted ads that are kind of honing in on
    a specific demographic within here. So
  • 26:28 - 26:32
    these are these micro targeted ads within
    here. And it appears, right, that the
  • 26:32 - 26:37
    majority of ads, especially on Facebook's
    platform 82% of them, are of this micro
  • 26:37 - 26:42
    targeted kind of ilk within here. So it's
    kind of confirms the reporting that people
  • 26:42 - 26:47
    had of this kind of trend of
    microtargeting within political
  • 26:47 - 26:52
    advertising. The other thing, based on our
    categorization we can look at how the
  • 26:52 - 26:58
    different platforms were used from within
    here. The problem with these numbers is
  • 26:58 - 27:04
    that there was different inclusion
    criteria within each of these databases.
  • 27:04 - 27:12
    And then right. Finally, we can kind of
    look at the different types of advertisers
  • 27:12 - 27:15
    on these kind of platforms. And again it's
    hard to read too much into these numbers
  • 27:15 - 27:19
    because again, right, Facebook included
    much more of the commercial stuff. So
  • 27:19 - 27:24
    we're going to see a lot more of the
    commercial stuff within here. And the the
  • 27:24 - 27:31
    final analysis of the entire data set that
    we did was looking at right kind of
  • 27:31 - 27:35
    basically the ramp up to the election. We
    cut this off in late October. This
  • 27:35 - 27:41
    analysis was done for a paper. So the due
    date of the paper was ironically November
  • 27:41 - 27:47
    6 within here. So we cut it off a few
    weeks later and we haven't regenerated the
  • 27:47 - 27:50
    contents since then. The one thing that
    you can see is at the top there is that
  • 27:50 - 27:58
    green Spike. That's kind of the move ads.
    So right, closer into the election the
  • 27:58 - 28:04
    campaigns were kind of doing sophisticated
    get out the vote kinds of ads, within
  • 28:04 - 28:07
    here. So there were really sophisticated
    kind of microtargeted ads that get out the
  • 28:07 - 28:12
    vote. Where like, it was almost kind of
    spooky where like they knew where the
  • 28:12 - 28:16
    person lived that they were targeting and
    so they gave them like directions on how
  • 28:16 - 28:20
    to get from where they live to their
    nearest polling place within here. So
  • 28:20 - 28:24
    there are these really sophisticated kind
    of get out the vote efforts that were
  • 28:24 - 28:29
    being run online, within here, towards the
    end of the campaign. To kind of give you
  • 28:29 - 28:34
    more of a kind of apples to apples
    comparison of these different ad
  • 28:34 - 28:40
    platforms, we also did some analysis kind
    of narrowing each of the different
  • 28:40 - 28:45
    advertiser types to the ones that were
    made transparent by all three platforms,
  • 28:45 - 28:50
    which were the federal candidates only.
    And so this can give you some idea of kind
  • 28:50 - 28:56
    of a scale of these things. And we can see
    that when we narrow it here we can still
  • 28:56 - 29:01
    see that Facebook has a lot more
    advertisers and a lot more ads compared to
  • 29:01 - 29:06
    Google. However the spending numbers are
    kind of comparable here. For Facebook
  • 29:06 - 29:11
    impressions and spends are ranges, that's
    all that Facebook releases. For Google the
  • 29:11 - 29:16
    impression data is ranges, however we can
    get exact spend data, because Google
  • 29:16 - 29:22
    basically released a weekly report of
    exact spend numbers, aggregated by the
  • 29:22 - 29:27
    different advertisers, with here. So we
    can use that, to get an exact number of
  • 29:27 - 29:31
    the spend. And again, right, Twitter's
    numbers are much smaller in terms of
  • 29:31 - 29:37
    everything, within here. And we redid some
    of our analysis to just see whether our
  • 29:37 - 29:42
    effects were simply a distortion based on
    what was included in the archives. So
  • 29:42 - 29:48
    right we redid our ad size analysis and
    even when we limit it to federal
  • 29:48 - 29:53
    candidates we can see this still holds,
    that a lot of the ads on Facebook are so
  • 29:53 - 29:57
    these micro targeted ads. And they are
    still micro targeted ads on the other
  • 29:57 - 30:02
    platforms, as well, within here. And right
    this microtargeting of course varies
  • 30:02 - 30:10
    depending on the advertiser. So you take
    someone like President Trump and he does a
  • 30:10 - 30:16
    lot of microtargeting. So almost all of
    his ads probably about 90%, 95% of his ads
  • 30:16 - 30:22
    are micro targeted, within here. You look
    at other candidates and they do much less
  • 30:22 - 30:26
    microtargeting, within here. So this is
    definitely different strategies are used
  • 30:26 - 30:30
    by different advertisers, within here. But
    when we look at it in aggregate, it still
  • 30:30 - 30:38
    appears that microtargeting is a very
    popular strategy across advertisers. We
  • 30:38 - 30:44
    can also, right, look at some of the spend
    type by ad type and this kind of shows you
  • 30:44 - 30:50
    a little bit how the different platforms
    are used, within here. So Facebook's
  • 30:50 - 30:54
    platform looks like it's a little bit more
    kind of informationally, it's still used a
  • 30:54 - 30:59
    lot for donations, whereas Google's
    platform is used a lot more for donations
  • 30:59 - 31:04
    and a lot less for a kind of informational
    ads and to connect within here. It's
  • 31:04 - 31:08
    really kind of hard to read anything into
    Twitter's data because it's such a small
  • 31:08 - 31:12
    set of data. But from the data that we do
    have it looks like there's a lot more kind
  • 31:12 - 31:19
    of collection of e-mails and things like
    that, within here. The other analysis that
  • 31:19 - 31:25
    we did on the federal candidate ads was to
    look at, that for Facebook in particular
  • 31:25 - 31:30
    right, we have the geographic impression
    data from here. So we can effectively look
  • 31:30 - 31:37
    at how many states were targeted by each
    ad with a Facebook advertiser. And the
  • 31:37 - 31:40
    interesting thing here is that right.
    There was no presidential election. So
  • 31:40 - 31:44
    basically all these campaigns were
    operating in one state. So their
  • 31:44 - 31:50
    constituents for all these elections were
    essentially in one state, within here. And
  • 31:50 - 31:54
    so if you look at the inform ads, right,
    most of those shown a very small number of
  • 31:54 - 31:59
    states. So the inform ads are mostly being
    shown to the constituents that are
  • 31:59 - 32:03
    actually voting for that candidate.
    However, if we look at that bottom line,
  • 32:03 - 32:07
    the kind of gold line, those are the
    donation ads. And we can see that they
  • 32:07 - 32:12
    were fundraising in many more states
    outside of their constituency, within
  • 32:12 - 32:17
    here. So FiveThirtyEight did an
    interesting analysis of one particualar
  • 32:17 - 32:22
    candidate, Beto O'Rourke. He was a
    candidate for Senate in Texas, Texas is a
  • 32:22 - 32:28
    very conservative state in the U.S., and
    he did surprisingly well, within here. And
  • 32:28 - 32:32
    he kind of embraced online advertising and
    online donations seeking, were kind of
  • 32:32 - 32:38
    cornerstones of his election, within here.
    And so FiveThirtyEight did an analysis of
  • 32:38 - 32:42
    his donation records in the U.S., at the
    federal level. All donations to candidates
  • 32:42 - 32:46
    have to be reported to the Federal
    Election Committee. So this is all in a
  • 32:46 - 32:49
    database for the Federal Election
    Committee the FiveThirtyEight people do
  • 32:49 - 32:54
    analysis And they kind of confirmed what
    we saw on the donation ads, that he was
  • 32:54 - 33:01
    getting about 52% of his donations from
    Texas and 48% from other states, primarily
  • 33:01 - 33:08
    kind of from coastal states that tended to
    lean more liberal, like New York,
  • 33:08 - 33:13
    California, Washington and places like
    that, was where he was donations seeking.
  • 33:13 - 33:19
    So this appears to be a very effective way
    of getting small dollar donations kind of
  • 33:19 - 33:25
    throughout the U.S. within here, through
    this online advertising. The last thing that
  • 33:25 - 33:30
    I'm going to talk about is the ad
    targeting. Facebook didn't directly
  • 33:30 - 33:36
    release the ad targeting. However, we were
    lucky enough and Pro Publica made a
  • 33:36 - 33:41
    browser plugin, that people can install in
    their browser, and that's browser plugin
  • 33:41 - 33:47
    would identify what it thought was
    political ads, based on a machine learning
  • 33:47 - 33:54
    algorithm. And for the political ads it
    would upload these to their server along
  • 33:54 - 33:59
    with the targeting information. So, for
    those of you with a facebook account, if
  • 33:59 - 34:02
    you're seeing ads you can actually click
    on that ad kind of in the upper corner of
  • 34:02 - 34:09
    the ad and you can see why is this ad
    targeting me, within here. And Facebook
  • 34:09 - 34:15
    will tell you a little bit, not all of why
    you were targeted for this particular ad.
  • 34:15 - 34:18
    They will essentially show you the two
    broadest categories of why you were
  • 34:18 - 34:24
    targeted for this particular ad, through
    this feature they've added to their
  • 34:24 - 34:27
    platform. And this is this is actually
    kind of interesting, this is something
  • 34:27 - 34:33
    that if you're a user of Facebook, I
    highly recommend that you do. Because I
  • 34:33 - 34:39
    started doing it, and it was kind of eye
    opening, as to the level of targeting that
  • 34:39 - 34:43
    was being done in terms of advertising.
    That's kind of one thing, that we've
  • 34:43 - 34:48
    definitely learned from this is that when
    you're seeing an ad, oftentimes there's a
  • 34:48 - 34:53
    very specific reason as to why you're
    seeing that particular ad, within here.
  • 34:53 - 34:58
    And so we felt that it was very important
    to, as much as we could, understand this
  • 34:58 - 35:03
    targeting that was going on within
    Facebook's platform. So Pro Publica had
  • 35:03 - 35:10
    this browser plug-in and they had this
    data set that anyone can analyze, with
  • 35:10 - 35:15
    here. So if you do have Facebook and
    you're located within the US I would
  • 35:15 - 35:19
    highly recommend that you install this
    plug-in, because it helps us to kind of
  • 35:19 - 35:25
    understand the political advertising in
    terms of the targeting, within here. So we
  • 35:25 - 35:30
    took ProPublica's data set and we
    effectively joined it with Facebook's add
  • 35:30 - 35:35
    transparency archive, within here. This
    required us to scrape Facebook's ad
  • 35:35 - 35:39
    archive, because we needed the ad ID and
    this is something that they don't expose
  • 35:39 - 35:44
    to their API, currently, within here.
    However, they do expose it through their
  • 35:44 - 35:49
    user portal, within here. So we scraped
    their user portal to join the specific ads
  • 35:49 - 35:55
    that were in the ProPublica data set to the
    archive dataset, within here. And we were able
  • 35:55 - 36:00
    to join about 75% of the ads from here.
    There were a lot of ads that were
  • 36:00 - 36:06
    collected by the ProPublica data set, that
    just simply weren't archived by Facebook's
  • 36:06 - 36:10
    transparency archive. It misses things,
    within here. It's imperfect as to how it
  • 36:10 - 36:14
    does things. And this would be another
    interesting analysis to do, to understand
  • 36:14 - 36:18
    what is Facebook missing in their ad
    transparency archive and this ProPublica
  • 36:18 - 36:23
    data set can allow you to somewhat do
    this, although through bias of who
  • 36:23 - 36:29
    installs the Pro Publica plug-in in the
    first place. So we we join these few data
  • 36:29 - 36:34
    sets, again with the caveat that the
    ProPublica data set is, right, it's
  • 36:34 - 36:38
    obviously biased by the set of people that
    installed it, which are probably not going
  • 36:38 - 36:43
    to be a normal representative set of
    Facebook users, within here. But
  • 36:43 - 36:46
    unfortunately, it's the best thing that we
    have in terms of a data set that releases
  • 36:46 - 36:52
    the targeting information, within here.
    And so we collapse into three different
  • 36:52 - 36:58
    categorizations of targeting, within here.
    I'll just quickly explain Facebook's ad
  • 36:58 - 37:02
    targeting platform for people that don't
    know about it. So one way to target ads
  • 37:02 - 37:08
    is, right, through interest or segments,
    right, age segments, gender segments or
  • 37:08 - 37:13
    interests like I showed you before, within
    here. So this is one way to target ads
  • 37:13 - 37:19
    within Facebook's platform. Another way to
    target ads is through uploading lists of
  • 37:19 - 37:23
    information. So you can upload lists of
    people's phone numbers, people's email
  • 37:23 - 37:28
    addresses or their names. And then when
    you upload this list Facebook will find
  • 37:28 - 37:32
    those profiles within their database, so
    they'll basically join those emails with
  • 37:32 - 37:37
    the emails that were entered by the users
    accounts, and then they'll target these
  • 37:37 - 37:41
    people. So they'll create what they call
    an audience of these people through this
  • 37:41 - 37:45
    personally identifiable information and
    then they'll target them, through this
  • 37:45 - 37:50
    method. The final kind of major form of
    targeting that Facebook offers is through
  • 37:50 - 37:54
    what they call these lookalike audiences.
    So this is where you can upload PII
  • 37:54 - 37:59
    information, like email addresses, phone
    numbers, names. Facebook will link them to
  • 37:59 - 38:04
    their accounts and then they'll look at
    kind of the interests and things that
  • 38:04 - 38:07
    these users and then they'll find you
    other users, not these users, but other
  • 38:07 - 38:12
    users, that have a similar kind of profile
    to these users within here. So these are
  • 38:12 - 38:18
    the lookalike audiences that Facebook
    offers within their platform. And so we
  • 38:18 - 38:23
    categorized it by this and again by
    advertiser type, within here. So the thing
  • 38:23 - 38:30
    that stands out is, right, is that the for
    profit companies are doing a lot of
  • 38:30 - 38:34
    targeting based on interests and segments.
    So they probably don't know who their
  • 38:34 - 38:39
    people that they want a message to are and
    they're doing it mostly by interests and
  • 38:39 - 38:45
    segment. Whereas when you look at the PACs
    and the political candidates they have
  • 38:45 - 38:49
    lists. So they have a lot of lists of
    people's you know email addresses, phone
  • 38:49 - 38:54
    numbers, names, of things like this. And
    they're plugging these into Facebook's
  • 38:54 - 38:59
    system. And this is how they're targeting
    a lot of people, within here, is through
  • 38:59 - 39:04
    these lists. And this was expected, but
    it's interesting to kind of quantify how
  • 39:04 - 39:10
    much of this is happening. And then the
    lookalike audiences are also being used, a
  • 39:10 - 39:14
    good deal by everyone within here. And
    this kind of makes sense, right? Because
  • 39:14 - 39:17
    if you have a list of people then you
    advertise to them but then right you have
  • 39:17 - 39:23
    this lookalike audience of people that are
    similar to them that are also perhaps good
  • 39:23 - 39:33
    people to advertise to, as well, within
    here. The other thing we can do is break
  • 39:33 - 39:38
    this down by the intent of the ad here,
    and this shows the difference even more
  • 39:38 - 39:44
    starkly, of the difference in behavior
    between the commercial people and the
  • 39:44 - 39:47
    noncommercial people. The commercial
    people are targeting mostly based on
  • 39:47 - 39:53
    interest, whereas the other people that
    are, say, looking to connect with people,
  • 39:53 - 39:56
    they're the ones that are using the most
    lookalike audiences. And this makes
  • 39:56 - 39:59
    perfect sense because right the connection
    ads are there to get people's e-mails,
  • 39:59 - 40:03
    addresses, phone numbers, names and things
    like that. So when you use the look like
  • 40:03 - 40:08
    audiences then you can, right, generate
    more lists of people they'll convert for
  • 40:08 - 40:13
    whatever you want and then you can
    retarget them with the direct lists
  • 40:13 - 40:19
    targeted ads, later on. So this all makes
    pretty good sense when you look at how
  • 40:19 - 40:23
    this is behaving, from here. But again
    it's interesting, right, kind of make this
  • 40:23 - 40:29
    transparent for people to understand how
    targeting is happening within the U.S.
  • 40:29 - 40:34
    political advertising sphere, within here.
    So these were pretty much the two major
  • 40:34 - 40:39
    analyses that we did in terms of
    targeting, within here. The final part and
  • 40:39 - 40:44
    the part that kind of makes the juiciest
    of stories is kind of the more dubious
  • 40:44 - 40:48
    advertisers that are advertising within
    these platforms in terms of political
  • 40:48 - 40:53
    advertising. So we kind of call these more
    politely kind of "new types of
  • 40:53 - 40:58
    advertising", within here. The first type
    is one that you would you would pretty
  • 40:58 - 41:03
    much expect, so this is this corporate
    astroturfing kind of stuff, that's going
  • 41:03 - 41:09
    on, within here. We see these ads for
    assistance for tobacco rights. And I
  • 41:09 - 41:13
    pretty much expected that you look up this
    group and it's probably going to be some
  • 41:13 - 41:18
    you know quasi nonprofit that's supported
    by some industry money from the tobacco
  • 41:18 - 41:22
    lobbyists, or something like that. That's
    pretty much what I expected to see when I
  • 41:22 - 41:29
    saw these ads. You go to this website and
    it's actually pretty honest as to what it
  • 41:29 - 41:33
    does. This is probably because right of
    all the lawsuits and regulations around
  • 41:33 - 41:37
    tobacco in the U.S. in advertising. But
    the website clearly states, right, that
  • 41:37 - 41:43
    it's operated by Philip Morris, the
    tobacco company, within here. And this
  • 41:43 - 41:47
    actually isn't a legal entity, this
    citizens for tobacco rights. Is just
  • 41:47 - 41:51
    simply a website that's been stood up,
    that's owned and operated by Philip
  • 41:51 - 41:55
    Morris, as far as we can tell, within
    here. And this gets to a big problem with
  • 41:55 - 42:01
    Facebook's transparency archive, which is
    that they don't actually vet that
  • 42:01 - 42:05
    disclaimer string of the sponsor, within
    here. So pretty much anyone can type
  • 42:05 - 42:12
    anything that they want within that
    disclaimer string and Facebook will allow
  • 42:12 - 42:16
    you to run it. We've tested it and as far
    as we can tell, you can't say that you're
  • 42:16 - 42:20
    from Facebook, Instagram or that you're
    Mark Zuckerberg, they'll block that. But
  • 42:20 - 42:24
    pretty much anything else that you type in
    there they'll allow that ad to run, within
  • 42:24 - 42:30
    here, with no vetting. So we discovered
    this, we politely, privately mentioned it
  • 42:30 - 42:36
    to Facebook. Some reporters kind of
    trolled Facebook within here and so there
  • 42:36 - 42:41
    was a reporter that trolled Facebook and
    opened up ads for all the senators, within
  • 42:41 - 42:45
    here, on Facebook. And of course Facebook
    approved them all, from within here, and
  • 42:45 - 42:50
    they they did some other things to troll
    Facebook where they insert some other
  • 42:50 - 42:55
    advertisements, within here. But the point
    is, that that disclaimer string is not
  • 42:55 - 43:00
    vetted within here. Google actually does
    that disclaimer string within there, so
  • 43:00 - 43:05
    they require either a tax ID number or a
    federal election committee I.D. number and
  • 43:05 - 43:12
    they actually do vet it and they publish
    that tax I.D. number or federal election
  • 43:12 - 43:15
    I.D. number along with the disclaimer
    string, within here. Which makes it really
  • 43:15 - 43:19
    easy to track down advertising on Google.
    On Facebook, because right they can
  • 43:19 - 43:22
    basically type in whatever they want in
    the disclaimer string, it makes it much
  • 43:22 - 43:26
    more difficult to actually link these
    advertisers. And sometimes just outright
  • 43:26 - 43:34
    impossible, if the disclaimer string is
    made up or just too mutilated in some way
  • 43:34 - 43:40
    or form, within here. So this is
    definitely a problem, where we have these
  • 43:40 - 43:44
    lobbyist organizations, or in this case
    not even lobbyist organizations, just
  • 43:44 - 43:50
    industry, that can effectively lie about
    who's paying for this ad in Facebook's
  • 43:50 - 43:56
    platform. The other thing we found were
    what is now kind of being called these
  • 43:56 - 44:03
    junk media outlets. So this is for profit
    outlets that are claiming that they're
  • 44:03 - 44:08
    doing kind of news operations. But right.
    It's not really traditional kind of
  • 44:08 - 44:13
    reporting journalistic things. It's more
    just kind of propaganda messaging, within
  • 44:13 - 44:19
    here. So there is this group called New
    American Media Group LLC. They also ran
  • 44:19 - 44:25
    the name of New Democracy, or sorry
    Democracy Now was their other name, within
  • 44:25 - 44:31
    here. And so they ran this, within here.
    We tracked down these LLCs and they were
  • 44:31 - 44:36
    just simply shell companies and that kind
    of led to nowhere, within here. We worked
  • 44:36 - 44:42
    with a journalist from The Atlantic that
    actually did a lot of digging into the
  • 44:42 - 44:49
    shell companies. And he was able to,
    through his basically investigation, link
  • 44:49 - 44:54
    these companies to the actual entity that
    created these shell companies and was
  • 44:54 - 45:01
    running these ads, within here. And so
    when we did our analysis of this, this
  • 45:01 - 45:07
    company basically this third party
    advertising company was creating these.
  • 45:07 - 45:13
    They're meant to look like kind of
    grassroots kind of organizations. There
  • 45:13 - 45:17
    were, a lot of them were kind of targeted
    at more conservatively leaning groups, but
  • 45:17 - 45:22
    then they would bombard them with liberal
    messaging, within here. So they would
  • 45:22 - 45:26
    create these fake communities that looked
    more conservative. And then once they
  • 45:26 - 45:30
    attract an audience they would bombard
    them with these liberal kinds of
  • 45:30 - 45:36
    messaging, within here. And so this
    particular company is based in Colorado.
  • 45:36 - 45:41
    It's called MOTIVE AI. Apparently, it's
    hoping to become the Cambridge Analytica
  • 45:41 - 45:49
    of the liberal side. I don't know if
    that's something to aspire to or not. Some
  • 45:49 - 45:52
    other journalists also did some digging,
    within here. There was some journalists
  • 45:52 - 45:57
    from ProPublica that did some digging,
    within here. They found more of this
  • 45:57 - 46:03
    astroturfing by political lobbyist groups
    and things like that. Big oil insurance
  • 46:03 - 46:09
    companies, again when they advertised on
    say Google's platform they would be honest
  • 46:09 - 46:12
    about their disclaimer string, and then
    when they advertised on Facebook's
  • 46:12 - 46:16
    platform they would often kind of
    obfuscate their disclaimers string, to
  • 46:16 - 46:23
    make it more difficult to link them
    together. And so they unmasked a whole
  • 46:23 - 46:28
    bunch of these other kinds of junk media
    operations, as well, that were kind of
  • 46:28 - 46:33
    spreading propaganda, within here. I'm
    picking on Facebook a lot. Again Google
  • 46:33 - 46:38
    does vet the tax I.D. number of these
    people, but you see something like, right,
  • 46:38 - 46:45
    this DIGICO LLC that paid for some ads. So
    you track this down, and this is again one
  • 46:45 - 46:48
    of these third party advertising agency.
    It's easy to track down because of the tax
  • 46:48 - 46:52
    I.D. number. But it still doesn't actually
    tell you who paid for the ad. It just
  • 46:52 - 46:56
    tells you the third party that, right, it
    presumably was paid on behalf of someone
  • 46:56 - 47:00
    else to run these ads, from here. So this
    is a big problem with these disclaimers
  • 47:00 - 47:04
    strings, that oftentimes they don't
    actually identify the person that's paying
  • 47:04 - 47:11
    for the ad. So to kind of wrap this up,
    within here, after our kind of experiences
  • 47:11 - 47:15
    looking at these transparency archives I
    would say they're fairly adequate to
  • 47:15 - 47:20
    understand good actors. So we could fairly
    well understand how good political
  • 47:20 - 47:24
    advertisers were behaving in Facebook's
    platform. However, right, for the bad
  • 47:24 - 47:29
    advertisers, we probably missed a lot of
    them because they could just simply type
  • 47:29 - 47:33
    in lots of different disclaimers strings
    and easily avoid our analysis, at this
  • 47:33 - 47:39
    point. None of these current archives have
    it just right yet. All of them have
  • 47:39 - 47:44
    issues, right. Facebook isn't providing
    good access to their data. They're not
  • 47:44 - 47:49
    releasing targeting information. Google is
    missing 30% the content because of third
  • 47:49 - 47:56
    parties using their advertising system.
    They're not releasing spend and impression
  • 47:56 - 48:02
    information based on demographics, within
    there. Twitter just simply hasn't hired
  • 48:02 - 48:08
    someone to enforce the policy of
    transparency, well, within here. And
  • 48:08 - 48:13
    unfortunately our experience throughout
    this process has been that these companies
  • 48:13 - 48:18
    are oftentimes reactive, instead of
    proactive, within here. Which means that,
  • 48:18 - 48:21
    right, we have to continuously put
    pressure on them, in order for them to
  • 48:21 - 48:26
    kind of improve these archives, within
    here. So this is unfortunately kind of the
  • 48:26 - 48:30
    state that we're in, within here. And I'm
    sure, one thing that I really want to give
  • 48:30 - 48:33
    a shoutout, is right there's people at
    these companies that are actually trying
  • 48:33 - 48:39
    to build these transparency archives. And
    I want to give them a lot of credit for
  • 48:39 - 48:44
    taking on this task, that's probably not
    well rewarded within their companies, of
  • 48:44 - 48:47
    building these transparency archives,
    within here. And so my hope is that by
  • 48:47 - 48:53
    applying pressure we can get them more
    support to kind of get more resources and
  • 48:53 - 48:59
    be able to make more transparent, within
    their companies, as well. Because I hope
  • 48:59 - 49:05
    that, right, this puts us in better shape
    to understand the 2018 elections, but 2020
  • 49:05 - 49:10
    is another presidential election and my
    hope is that we'll continue the improved
  • 49:10 - 49:14
    these archives, so that we'd be in a much
    better position to understand both the
  • 49:14 - 49:20
    good and the bad advertisers by 2020, with
    here. However this is going to take
  • 49:20 - 49:25
    probably regulatory pressure, legal
    pressure, pressure by technologists and
  • 49:25 - 49:32
    things like this to improve these
    archives, at this point. So with that,
  • 49:32 - 49:36
    again, I have my collaborators, that
    aren't here on the stage, but they
  • 49:36 - 49:41
    definitely did a lot of the heavy lifting
    to make this happen, within here. And
  • 49:41 - 49:45
    again all of our tools and most of our
    data except for the Facebook data, that's
  • 49:45 - 49:51
    under NDA, is available through our
    GitHub, there. And so with that I will
  • 49:51 - 49:53
    open it up to questions.
  • 49:53 - 50:04
    applause
  • 50:04 - 50:05
    Herald: Thank you so much Damon. I know
  • 50:05 - 50:12
    that there are a few questions among the
    audience. So, microphone 6 please.
  • 50:12 - 50:16
    Question: So [Name] on the IRC is asking
    "Have you looked at links between the
  • 50:16 - 50:22
    advertisers and do they use the same
    images or text for instance?".
  • 50:22 - 50:24
    Answer: This is a really good question.
    This is actually one of the analysis that
  • 50:24 - 50:28
    we're currently doing. So we're starting
    with the text, because that's obviously
  • 50:28 - 50:32
    the easiest. But we're also exploring some
    image clustering algorithms, as well. To
  • 50:32 - 50:37
    cluster the advertisers across platforms
    and also within platform because we're
  • 50:37 - 50:40
    finding a lot where, you know, they create
    multiple shell companies, where they just
  • 50:40 - 50:44
    lie about their disclaimers and so this is
    definitely something that we're focusing
  • 50:44 - 50:49
    on, is better clustering of the
    advertisers. Because like that group
  • 50:49 - 50:53
    MOTIVE AI, even though they created the
    different LLCs they were running the same
  • 50:53 - 50:59
    images and videos across their different
    LLC shell companies.
  • 50:59 - 51:02
    Herald: Great thank you. Please if you
    have any questions, queue up by the
  • 51:02 - 51:08
    microphones. Microphone number 1 please.
    Question: Hi, Oliver Moldenhauer? Thanks a
  • 51:08 - 51:13
    lot for the talk. Definitely one of the
    best I've seen here so far. Two questions.
  • 51:13 - 51:19
    A: Why do those transparency archives
    exist? Was there some law or political
  • 51:19 - 51:25
    process around that? And B: As we are
    nearing the European election next year,
  • 51:25 - 51:31
    what kind of data is available for Europe?
    Answer: That are both good questions.
  • 51:31 - 51:34
    Again I'm not intern in one of these
    companies, so I can just speculate as to
  • 51:34 - 51:38
    why these transparency archive exists. But
    my my guess is, right, that this was
  • 51:38 - 51:44
    reactionary. So Mark Zuckerberg and high
    ranking officials from Twitter and Google
  • 51:44 - 51:50
    were hauled in to testify in the House and
    Senate, and this is them trying to self
  • 51:50 - 51:54
    regulate instead of having regulation
    imposed on them by people. So that again,
  • 51:54 - 51:58
    this goes to the pressure part is that
    there was regulatory pressure put on them,
  • 51:58 - 52:03
    the threat of regulatory pressure and so
    that's what made them do these
  • 52:03 - 52:09
    transparency archives. In terms of what's
    available in Europe. I guess as long as
  • 52:09 - 52:17
    the UK is still in the EU, kind of
    teetering Facebook has started to make ads
  • 52:17 - 52:23
    transparent in the UK. They also make them
    transparent in Brazil and they're going to
  • 52:23 - 52:26
    make them transparent in India. And I
    think they have plans to make them
  • 52:26 - 52:31
    transparent in other places, in the EU as
    well. However, they haven't done that.
  • 52:31 - 52:36
    However, again this goes back to the
    pressure part. So there's no API for the
  • 52:36 - 52:40
    other countries, there's only an API for
    the US and that might be because we put
  • 52:40 - 52:46
    pressure on them by scraping them and
    publicly releasing their data. And, right,
  • 52:46 - 52:49
    there's no transparency reports for other
    countries, as well, there's only
  • 52:49 - 52:52
    transparency report for the US. And again
    that might have been because we applied
  • 52:52 - 52:56
    pressure and we were publishing numbers.
    Some of the numbers in terms of spend were
  • 52:56 - 53:00
    very low, because, right, they were just
    giving us ranges. So we might have been
  • 53:00 - 53:04
    making them look bad, when we took the
    bottom range their spend and they might
  • 53:04 - 53:08
    have wanted to correct that with their own
    transparency archive, as well. So again, a
  • 53:08 - 53:12
    lot of this unfortunately requires
    pressure to get them to improve their
  • 53:12 - 53:16
    transparency efforts.
    Herald: Great thank you. Microphone number
  • 53:16 - 53:21
    two please.
    Question: So you mentioned you mentioned
  • 53:21 - 53:26
    FiveThirtyEight and their work on the
    donations. Do you think it makes sense
  • 53:26 - 53:33
    to combine the data you gathered with what
    they have to look at election outcomes,
  • 53:33 - 53:38
    like, election results and turnout and
    stuff like that?
  • 53:38 - 53:43
    Answer: Yes. Actually this is the number
    one project on our road map, right now.
  • 53:43 - 53:50
    Is, actually Google has processed the FEC
    information and they've made this
  • 53:50 - 53:56
    information available via their big query
    database. So we've downloaded this, we've
  • 53:56 - 54:02
    manually linked the Facebook advertisers
    and the Google advertiser to the FEC data
  • 54:02 - 54:07
    and now we're doing the regression models,
    specifically focused on the donation ads
  • 54:07 - 54:11
    first. Because those are what are reported
    to the FEC, at this point. So we are
  • 54:11 - 54:15
    essentially trying to understand how
    effective these donation ads are at
  • 54:15 - 54:21
    actually driving donations, within here.
    Herald: Thank you. Microphone number 4
  • 54:21 - 54:25
    please.
    Question: Hi. First of all thank you Mr.
  • 54:25 - 54:30
    McCoy and your team for this very
    interesting research. I was wondering,
  • 54:30 - 54:35
    whether you know if there are any follow
    up research conducted by political
  • 54:35 - 54:42
    scientists, sociologists etc. analyzing
    the political repercussions of these ad
  • 54:42 - 54:47
    campaigns.
    Answer: Yes, so we're aware of a few
  • 54:47 - 54:50
    efforts. I don't want to out the teams
    that are doing them, in case they don't
  • 54:50 - 54:56
    want to be outed. There's there's nothing
    that's been published, publicly I believe
  • 54:56 - 55:00
    on this. But we're definitely trying to.
    That's one of the main goals of kind of
  • 55:00 - 55:04
    our overarching online political
    advertising transparency thing, is to try
  • 55:04 - 55:09
    and get as much data as we can in the
    hands of less technical people in an easy
  • 55:09 - 55:16
    way for them to analyze. And so this is
    basically the primary goal of our project,
  • 55:16 - 55:20
    in here. So we've been working as hard as
    we can to get political science to stay up
  • 55:20 - 55:25
    to speed on the data. And this is why it's
    really unfortunate that Facebook has its
  • 55:25 - 55:30
    NDA in place for their particular data,
    because this makes it very difficult for
  • 55:30 - 55:35
    us to share and collaborate in that
    particular data. Which puts pressure on us
  • 55:35 - 55:40
    unfortunately as being the only ones that
    can do some of this analysis right now. So
  • 55:40 - 55:44
    this is why I would I would love to apply
    enough pressure to Facebook, to get better
  • 55:44 - 55:49
    access to their particular data.
    Herald: Yes. And the question from the
  • 55:49 - 55:54
    Internet please.
    Signal Angel: So Nomad is asking "Why are
  • 55:54 - 55:59
    those advertisements considered political or
    election interference in the USA. Can't you just
  • 55:59 - 56:04
    see, that someone paid money to display
    that content and conclude its purpose is
  • 56:04 - 56:12
    to promote an agenda or manipulate them?".
    Answer: This is a good question. Right, a
  • 56:12 - 56:15
    lot of this goes to the tactics that
    they're using here. So again they're
  • 56:15 - 56:19
    creating these communities, that they're
    making look like their grass roots
  • 56:19 - 56:24
    communities and then they're kind of
    sucking people in with these ads, that up
  • 56:24 - 56:29
    until recently had no disclaimers string
    on them. So you had no idea who paid for
  • 56:29 - 56:34
    them. So they appear to be paid for by
    kind of these grassroots organizations. So
  • 56:34 - 56:39
    you felt like you were, kind of, part of a
    grassroots movement, enjoining these kinds
  • 56:39 - 56:43
    of communities. I think this is the really
    scary, kind of subtle things. And you
  • 56:43 - 56:47
    might not realize why you're being
    targeted for these particular ads or who
  • 56:47 - 56:50
    was behind these particular ads. So, I
    think it was really easy for people to
  • 56:50 - 56:55
    kind of get unwittingly, kind of, duped
    into joining what looked like these
  • 56:55 - 57:00
    grassroots campaigns. So that's why I
    think improving these disclaimers strings
  • 57:00 - 57:03
    and showing who is really behind these
    communities and these advertisements is
  • 57:03 - 57:09
    really important, to dispel this notion of
    these fake grassroots communities, that
  • 57:09 - 57:13
    are luring people in within here. So I
    think that's one of the big things that
  • 57:13 - 57:18
    can be gained by these transparency
    archives. But it requires improvement of
  • 57:18 - 57:23
    the transparency archives, to do that.
    Herald: Microphone number 3 please.
  • 57:23 - 57:28
    Question: Yes. So I'm curious about the
    efficacy of some of the advertisements
  • 57:28 - 57:37
    that are on Facebook and Twitter. And I'm
    wondering is any group like the ProPublica
  • 57:37 - 57:45
    web extension checking the engagement
    rate? Like the number of comments, the
  • 57:45 - 57:53
    number of views and the number of shares,
    to like kind of get an estimate of, OK
  • 57:53 - 57:58
    this big grassroots community is building
    up a number of followers and these
  • 57:58 - 58:06
    followers population sizes and whatnot.
    Answer: Yeah, this is again a really good
  • 58:06 - 58:11
    question. This is something that we are, I
    would certainly encourage other people to
  • 58:11 - 58:14
    potentially do as well. So the problem is
    that a lot of that information isn't
  • 58:14 - 58:18
    exposed by the transparency archives. This
    is more of what they call kind of the
  • 58:18 - 58:23
    organic information, the non paid for
    information, within here. And so this is
  • 58:23 - 58:28
    stuff that none of the platforms are
    releasing. And so it requires kind of a
  • 58:28 - 58:33
    scraping operation, essentially, to gather
    this information and collect it. And it's
  • 58:33 - 58:36
    something that we're definitely thinking
    about how to efficiently do, is how to
  • 58:36 - 58:41
    efficiently scrape and collect this
    information. Because this is very hard
  • 58:41 - 58:43
    because, right, you go against the anti
    scraping teams of these companies, that
  • 58:43 - 58:47
    are well resourced. And this requires
    accounts, and these accounts are going to
  • 58:47 - 58:51
    be shut down and detected. So this is
    something that we're trying to pilot to
  • 58:51 - 58:56
    understand. Our other idea of how to do
    this potentially is try and crowdsource
  • 58:56 - 59:00
    this information. This is similar to how
    ProPublica crowdsourced it for the browser
  • 59:00 - 59:04
    extension information. We could
    potentially crowdsource it, where you
  • 59:04 - 59:08
    know, when people interact with these
    communities or these ads the plug-in could
  • 59:08 - 59:12
    potentially crowdsource that information
    back to us. And then we would have to
  • 59:12 - 59:18
    figure out some strategy to sanitize that
    information in some way. Because at that
  • 59:18 - 59:21
    point you might have some sensitive
    information they are collecting. This is
  • 59:21 - 59:26
    something that we're thinking about. We're
    cautious, I think, rightly so because this
  • 59:26 - 59:31
    can start stepping on, again, more
    sensitive information that's available
  • 59:31 - 59:34
    from within here. But I think it's
    definitely key to understanding the
  • 59:34 - 59:37
    effectiveness of these ads. Something that
    we're going to have to do or we're going
  • 59:37 - 59:42
    to have to convince Facebook somehow to do
    on our behalf in order to really
  • 59:42 - 59:46
    understand the effectiveness of these ads.
    Herald: Thank you. Last question for
  • 59:46 - 59:50
    microphone number 1.
    Question: All right. At the beginning of
  • 59:50 - 59:54
    your talk you explained how Russia
    influenced the elections. I'm curious
  • 59:54 - 60:00
    about the attribution. Is there possibly
    any doubts at any instance that you
  • 60:00 - 60:06
    presented that it was not Russia or maybe
    some other country, China or Iran? How do
  • 60:06 - 60:10
    you know, and did you check the facts?
    Answer: I mean, that's a good question.
  • 60:10 - 60:14
    Unfortunately, right, the national
    security agencies don't release the
  • 60:14 - 60:21
    sources of their information. There's
    another investigation done by the
  • 60:21 - 60:27
    Department of Justice by Robert Mueller,
    that did release some more information
  • 60:27 - 60:32
    about this, within here. I've looked at
    that information and it looks, you know,
  • 60:32 - 60:36
    right, you can never a 100%, unequivocally
    state that it was Russia. It could have
  • 60:36 - 60:41
    been a false flag operation. But I think
    that pretty much the overwhelming
  • 60:41 - 60:45
    information that everyone has found when
    they've investigated this has pointed at
  • 60:45 - 60:53
    Russia and the organizations that were
    prosecuted by Mueller.
  • 60:53 - 60:57
    Herald: Damon McCoy, thank you very much.
    Please give them a great round of applause.
  • 60:57 - 61:00
    Applause
  • 61:00 - 61:05
    35c3 postroll music
  • 61:05 - 61:22
    subtitles created by c3subtitles.de
    in the year 2019. Join, and help us!
Title:
35C3 - Explaining Online US Political Advertising
Description:

more » « less
Video Language:
English
Duration:
01:01:22

English subtitles

Revisions