< Return to Video

Fake videos of real people -- and how to spot them

  • 0:01 - 0:02
    Look at these images.
  • 0:02 - 0:05
    Now, tell me which Obama here is real.
  • 0:05 - 0:08
    (Video) Barack Obama: To help families
    refinance their homes,
  • 0:08 - 0:10
    to invest in things
    like high-tech manufacturing,
  • 0:10 - 0:11
    clean energy,
  • 0:11 - 0:14
    and the infrastructure
    that creates good new jobs.
  • 0:15 - 0:16
    Supasorn Suwajanakorn: Anyone?
  • 0:16 - 0:18
    The answer is none of them.
  • 0:18 - 0:19
    (Laughter)
  • 0:19 - 0:21
    None of these is actually real.
  • 0:21 - 0:23
    So let me tell you how we got here.
  • 0:24 - 0:26
    My inspiration for this work
  • 0:26 - 0:31
    was a project meant to preserve our last
    chance for learning about the Holocaust
  • 0:31 - 0:33
    from the survivors.
  • 0:33 - 0:35
    It's called New Dimensions in Testimony,
  • 0:35 - 0:39
    and it allows you to have
    interactive conversations
  • 0:39 - 0:41
    with a hologram
    of a real Holocaust survivor.
  • 0:42 - 0:44
    How did you survive the Holocaust?
  • 0:44 - 0:45
    How did I survive?
  • 0:46 - 0:48
    I survived,
  • 0:48 - 0:50
    I believe,
  • 0:50 - 0:53
    because providence watched over me.
  • 0:54 - 0:57
    SS: Turns out these answers
    were prerecorded in a studio.
  • 0:57 - 1:00
    Yet the effect is astounding.
  • 1:00 - 1:03
    You feel so connected to his story
    and to him as a person.
  • 1:04 - 1:07
    I think there's something special
    about human interaction
  • 1:07 - 1:10
    that makes it much more profound
  • 1:10 - 1:12
    and personal
  • 1:12 - 1:16
    than what books or lectures
    or movies could ever teach us.
  • 1:16 - 1:19
    So I saw this and began to wonder,
  • 1:19 - 1:22
    can we create a model
    like this for anyone?
  • 1:22 - 1:25
    A model that looks, talks,
    and acts just like them?
  • 1:26 - 1:28
    So I set out to see if this could be done,
  • 1:28 - 1:30
    and eventually came up with a new solution
  • 1:30 - 1:33
    that can build a model of a person
    using nothing but these:
  • 1:34 - 1:36
    existing photos and videos of a person.
  • 1:37 - 1:39
    If you can leverage
    this kind of passive information,
  • 1:39 - 1:41
    just photos and video that are out there,
  • 1:41 - 1:43
    that's the key to scaling to anyone.
  • 1:44 - 1:46
    By the way, here's Richard Feynman,
  • 1:46 - 1:49
    who in addition to being
    a Nobel Prize Winner in physics
  • 1:49 - 1:52
    was also known as a legendary teacher.
  • 1:53 - 1:55
    Wouldn't it be great
    if we could bring him back
  • 1:55 - 1:59
    to give his lectures
    and inspire millions of kids,
  • 1:59 - 2:02
    perhaps not just in English
    but in any language?
  • 2:02 - 2:07
    Or if you could ask our grandparents
    for advice and hear those comforting words
  • 2:07 - 2:09
    even if they're no longer with us?
  • 2:10 - 2:13
    Or maybe using this tool,
    books authors, alive or not,
  • 2:13 - 2:16
    could read aloud all of their books
    for anyone interested.
  • 2:17 - 2:20
    The creative possibilities
    here are endless,
  • 2:20 - 2:21
    and to me, that's very exciting.
  • 2:23 - 2:25
    And here's how it's working so far.
  • 2:25 - 2:26
    First, we introduce a new technique
  • 2:26 - 2:31
    that can reconstruct a high-detailed
    3D face model from any image
  • 2:31 - 2:33
    without ever 3D-scanning the person.
  • 2:34 - 2:37
    And here's the same output model
    from different views.
  • 2:38 - 2:39
    This also works on videos,
  • 2:39 - 2:42
    by running the same algorithm
    on each video frame
  • 2:42 - 2:45
    and generating a moving 3D model.
  • 2:46 - 2:48
    And here's the same
    output model from different angles.
  • 2:50 - 2:52
    It turns out this problem
    is very challenging,
  • 2:52 - 2:55
    but the key trick
    is that we are going to analyze
  • 2:55 - 2:58
    to analyze a large photo collection
    of the person beforehand.
  • 2:59 - 3:01
    For George W. Bush,
    we can just search on Google,
  • 3:02 - 3:05
    and from that, we are able
    to build an average model,
  • 3:05 - 3:08
    an iterative, refined model
    to recover the expression
  • 3:08 - 3:10
    in fine details,
    like creases and wrinkles.
  • 3:11 - 3:13
    What's fascinating about this
  • 3:13 - 3:16
    is that the photo collection
    can come from your typical photos.
  • 3:16 - 3:19
    It doesn't really matter
    what expression you're making
  • 3:19 - 3:21
    or where you took those photos.
  • 3:21 - 3:23
    What matters is
    that there are a lot of them.
  • 3:23 - 3:25
    And we are still missing color here,
  • 3:25 - 3:27
    so next, we develop
    a new blending technique
  • 3:27 - 3:30
    that improves upon
    a single averaging method
  • 3:30 - 3:33
    and produces sharp
    facial textures and colors.
  • 3:34 - 3:37
    And this can be done for any expression.
  • 3:37 - 3:40
    Now we have a control
    of a model of a person,
  • 3:40 - 3:44
    and the way it's controlled now
    is by a sequence of static photos.
  • 3:44 - 3:47
    Notice how the wrinkles come and go,
    depending on the expression.
  • 3:48 - 3:51
    We can also use a video
    to drive the model.
  • 3:51 - 3:53
    (Video) Daniel Craig: Right, but somehow,
  • 3:53 - 3:57
    we've managed to attract
    some more amazing people.
  • 3:58 - 4:00
    SS: And here's another fun demo.
  • 4:00 - 4:02
    So what you see here
    are controllable models
  • 4:02 - 4:04
    of people I built
    from their internet photos.
  • 4:04 - 4:07
    Now, if you transfer
    the motion from the input video,
  • 4:07 - 4:10
    we can actually drive the entire party.
  • 4:10 - 4:12
    George W. Bush:
    It's a difficult bill to pass,
  • 4:12 - 4:14
    because there's a lot of moving parts,
  • 4:14 - 4:19
    and the legislative process can be ugly.
  • 4:19 - 4:21
    (Applause)
  • 4:21 - 4:23
    SS: So coming back a little bit,
  • 4:23 - 4:26
    our ultimate goal, rather,
    is to capture their mannerisms
  • 4:26 - 4:29
    or the unique way each
    of these people talks and smiles.
  • 4:29 - 4:31
    So to do that, can we
    actually teach the computer
  • 4:31 - 4:34
    to imitate the way someone talks
  • 4:34 - 4:36
    by only showing it
    video footage of the person?
  • 4:37 - 4:39
    And what I did exactly was,
    I let a computer watch
  • 4:39 - 4:43
    14 hours of pure Barack Obama
    giving addresses.
  • 4:43 - 4:47
    And here's what we can produce
    given only his audio.
  • 4:47 - 4:49
    (Video) BO: The results are clear.
  • 4:49 - 4:53
    America's business have created
    14.5 million new jobs
  • 4:53 - 4:56
    over 75 straight months.
  • 4:56 - 4:59
    SS: So what's being synthesized here
    is only the mouth region,
  • 4:59 - 5:00
    and here's how we do it.
  • 5:01 - 5:03
    Our pipeline uses a neural network
  • 5:03 - 5:06
    to convert and input audio
    into these mouth points.
  • 5:07 - 5:11
    (Video) BO: We get it through our job
    or through Medicare or Medicaid.
  • 5:11 - 5:14
    SS: Then we synthesize the texture,
    enhance details and teeth,
  • 5:14 - 5:17
    and blend it into the head
    and background from a source video.
  • 5:17 - 5:19
    (Video) BO: Women can get free checkups,
  • 5:19 - 5:22
    and you can't get charged more
    just for being a woman.
  • 5:23 - 5:26
    Young people can stay
    on a parent's plan until they turn 26.
  • 5:27 - 5:30
    SS: I think these results
    seem very realistic and intriguing,
  • 5:30 - 5:33
    but at the same time
    frightening, even to me.
  • 5:33 - 5:37
    Our goal was to build an accurate model
    of a person, not to misrepresent them.
  • 5:38 - 5:41
    But one thing that concerns me
    is its potential for misuse.
  • 5:42 - 5:45
    People have been thinking
    about this problem for a long time,
  • 5:45 - 5:47
    since the days when Photoshop
    first hit the market.
  • 5:48 - 5:52
    As a researcher, I'm also working
    on countermeasure technology,
  • 5:52 - 5:55
    and I'm part of an ongoing
    effort at AI Foundation,
  • 5:55 - 5:58
    which uses a combination
    of machine learning and human moderators
  • 5:58 - 6:00
    to detect fake images and videos,
  • 6:00 - 6:02
    fighting against my own work.
  • 6:03 - 6:06
    And one of the tools we plan to release
    is called Reality Defender,
  • 6:06 - 6:10
    which is a web-browser plug-in
    that can flag potentially fake content
  • 6:10 - 6:12
    automatically, right in the browser.
  • 6:13 - 6:17
    (Applause)
  • 6:17 - 6:18
    Despite all this, though,
  • 6:18 - 6:20
    fake videos could do a lot of damage,
  • 6:20 - 6:23
    even before anyone has a chance to verify,
  • 6:23 - 6:26
    so it's very important
    that we make everyone aware
  • 6:26 - 6:28
    of what's currently possible
  • 6:28 - 6:32
    so we can have the right assumption
    and be critical about what we see.
  • 6:32 - 6:37
    There's still a long way to go before
    we can fully model individual people
  • 6:37 - 6:40
    and before we can ensure
    the safety of this technology,
  • 6:41 - 6:43
    but I'm excited and hopeful,
  • 6:43 - 6:46
    because if we use it right and carefully,
  • 6:46 - 6:51
    this tool can allow any individual's
    positive impact on the world
  • 6:51 - 6:53
    to be massively scaled
  • 6:53 - 6:56
    and really help shape our future
    the way we want it to be.
  • 6:56 - 6:57
    Thank you.
  • 6:57 - 7:02
    (Applause)
Title:
Fake videos of real people -- and how to spot them
Speaker:
Supasorn Suwajanakorn
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
07:15
  • @5:02 the speaker says "to convert AN input audio into these mouth points," not "to convert AND input audio into these mouth points," imho.

English subtitles

Revisions Compare revisions