< Return to Video

Fake videos of real people -- and how to spot them

  • 0:01 - 0:02
    Look at these images.
  • 0:02 - 0:05
    Now, tell me which Obama here is real.
  • 0:05 - 0:07
    (Video) Barack Obama: To help families
    refinance their homes,
  • 0:07 - 0:10
    to invest in things
    like high-tech manufacturing,
  • 0:10 - 0:12
    clean energy,
  • 0:12 - 0:15
    and the infrastructure
    that creates good new jobs.
  • 0:15 - 0:16
    Supasorn Suwajanakorn: Anyone?
  • 0:16 - 0:18
    The answer is none of them.
  • 0:18 - 0:19
    (Laughter)
  • 0:19 - 0:21
    None of these is actually real.
  • 0:21 - 0:24
    So let me tell you how we got here.
  • 0:24 - 0:28
    My inspiration for this work
    was a project meant to preserve
  • 0:28 - 0:33
    our last chance for learning
    about the Holocaust from the survivors.
  • 0:33 - 0:35
    It's called New Dimensions in Testimony,
  • 0:35 - 0:38
    and it allows you to have
    interactive conversations
  • 0:38 - 0:42
    with a hologram
    of a real Holocaust survivor.
  • 0:42 - 0:44
    (Video) Question: How did you
    survive the Holocaust?
  • 0:44 - 0:46
    (Video) Answer: How did I survive?
  • 0:46 - 0:48
    I survived,
  • 0:48 - 0:50
    I believe,
  • 0:50 - 0:54
    because providence watched over me.
  • 0:54 - 0:57
    SS: Turns out these answers
    were pre-recorded in a studio.
  • 0:57 - 0:59
    Yet the effect is astounding.
  • 0:59 - 1:04
    You feel so connected to his story
    and to him as a person.
  • 1:04 - 1:07
    I think there is something special
    about human interaction
  • 1:07 - 1:10
    that makes it much more profound
  • 1:10 - 1:11
    and personal
  • 1:11 - 1:16
    than what books or lectures
    or movies could ever teach us.
  • 1:16 - 1:18
    So I saw this and began to wonder,
  • 1:18 - 1:22
    can we create a model
    like this for anyone?
  • 1:22 - 1:25
    A model that looks, talks,
    and acts just like them?
  • 1:25 - 1:28
    So I set out to see if this could be done,
  • 1:28 - 1:30
    and eventually came up
    with a new solution
  • 1:30 - 1:32
    that can build a model of a person
  • 1:32 - 1:33
    using nothing but these:
  • 1:34 - 1:37
    existing photos and videos of a person.
  • 1:37 - 1:39
    If you can leverage this kind
    of passive information,
  • 1:39 - 1:42
    just photos and video that are out there,
  • 1:42 - 1:44
    that's the key to scaling to anyone.
  • 1:44 - 1:46
    By the way, here's Richard Feynman,
  • 1:46 - 1:49
    who in addition to being
    a Nobel Prize Winner in physics
  • 1:49 - 1:52
    was also known as a legendary teacher.
  • 1:52 - 1:55
    Wouldn't it be great
    if we could bring him back
  • 1:55 - 1:59
    to give his lectures
    and inspire millions of kids,
  • 1:59 - 2:03
    perhaps not just in English
    but in any language?
  • 2:03 - 2:07
    Or if you could ask our grandparents
    for advice and hear those comforting words
  • 2:07 - 2:10
    even if they are no longer with us?
  • 2:10 - 2:13
    Or maybe using this tool,
    books authors, alive or not,
  • 2:13 - 2:18
    could read aloud all of their books
    for anyone interested.
  • 2:18 - 2:20
    The creative possibilities
    here are endless,
  • 2:20 - 2:23
    and to me that's very exciting.
  • 2:23 - 2:25
    And here's how it's working so far.
  • 2:25 - 2:27
    First, we introduce a new technique
  • 2:27 - 2:30
    that can reconstruct
    a high-detailed 3D face model
  • 2:30 - 2:31
    from any image
  • 2:31 - 2:34
    without ever 3D-scanning the person.
  • 2:34 - 2:37
    And here's the same output model
    from different views.
  • 2:37 - 2:40
    This also works on videos
  • 2:40 - 2:43
    by running the same algorithm
    on each video frame
  • 2:43 - 2:45
    and generating a moving 3D model.
  • 2:45 - 2:50
    And here's the same
    output model from different angles.
  • 2:50 - 2:53
    It turns out, this problem
    is very challenging,
  • 2:53 - 2:57
    but the key trick is that we are going
    to analyze a large photo collection
  • 2:57 - 2:59
    of the person beforehand.
  • 2:59 - 3:03
    For George W. Bush,
    we can just search on Google,
  • 3:03 - 3:05
    and from that, we are able
    to build an average model,
  • 3:05 - 3:08
    an iterative, refined model
    to recover the expression
  • 3:08 - 3:11
    in fine details like creases and wrinkles.
  • 3:12 - 3:14
    What's fascinating about this
    is that the photo collection
  • 3:14 - 3:16
    can come from your typical photos.
  • 3:16 - 3:19
    It doesn't really matter
    what expression you're making
  • 3:19 - 3:21
    or where you took those photos.
  • 3:21 - 3:23
    What matters is that
    there are a lot of them.
  • 3:23 - 3:25
    And we are still missing color here,
  • 3:25 - 3:27
    so next we develop
    a new blending technique
  • 3:27 - 3:30
    that improves upon
    a single averaging method
  • 3:30 - 3:34
    and produces sharp
    facial textures and colors.
  • 3:34 - 3:37
    And this can be done for any expression.
  • 3:37 - 3:40
    Now we have a control
    of a model of a person,
  • 3:40 - 3:44
    and the way it's controlled now
    is by a sequence of static photos.
  • 3:44 - 3:48
    Notice how the wrinkles come and go
    depending on the expression.
  • 3:48 - 3:51
    We can also use a video
    to drive the model.
  • 3:51 - 3:55
    (Video) Daniel Craig: Rory, but somehow
    we've managed to attract
  • 3:55 - 3:58
    more amazing people.
  • 3:58 - 4:00
    SS: And here's another fun demo.
  • 4:00 - 4:02
    So what you see here
    are controllable models
  • 4:02 - 4:05
    of people I built
    from their internet photos.
  • 4:05 - 4:07
    Now, if you transfer the motion
    from the input video,
  • 4:07 - 4:10
    we can actually drive the entire party.
  • 4:10 - 4:12
    (Video) George W. Bush:
    It's a very difficult bill to pass,
  • 4:12 - 4:14
    because there's a lot of moving parts,
  • 4:14 - 4:18
    and the legislative process can be ugly.
  • 4:19 - 4:21
    (Applause)
  • 4:21 - 4:23
    SS: So coming back a little bit,
  • 4:23 - 4:26
    our ultimate goal, rather,
    is to capture their mannerisms
  • 4:26 - 4:29
    or the unique way each
    of these people talks and smiles.
  • 4:29 - 4:32
    So to do that, can we
    actually teach the computer
  • 4:32 - 4:34
    to imitate the way someone talks
  • 4:34 - 4:37
    by only showing it
    video footage of the person?
  • 4:37 - 4:41
    And what I did exactly was,
    I let a computer watch
  • 4:41 - 4:43
    14 hours of pure Barack Obama
    giving addresses.
  • 4:43 - 4:47
    And here's what we can produce
    given only his audio.
  • 4:47 - 4:49
    (Video) BO: The results are clear.
  • 4:49 - 4:53
    America's business have created
    14.5 million new jobs
  • 4:53 - 4:56
    over 75 straight months.
  • 4:56 - 4:59
    SS: So what's being synthesized here
    is only the mouth region,
  • 4:59 - 5:01
    and here's how we do it.
  • 5:01 - 5:03
    Our pipeline uses a neural network
  • 5:03 - 5:07
    to convert and input audio
    into these mouth points.
  • 5:07 - 5:11
    (Video) BO: We get it through our job
    or through Medicare or Medicaid.
  • 5:11 - 5:14
    SS: Then we synthesize the texture,
    enhance details and teeth,
  • 5:14 - 5:18
    and blend it into the head
    and background from a source video.
  • 5:18 - 5:20
    (Video) BO: Women can get free checkups,
  • 5:20 - 5:23
    and you can't get charged more
    just for being a woman.
  • 5:23 - 5:27
    Young people can stay
    on a parent's plan until they turn 26.
  • 5:27 - 5:30
    SS: I think these results
    seem very realistic and intriguing,
  • 5:30 - 5:32
    but at the same time frightening,
  • 5:32 - 5:34
    even to me.
  • 5:34 - 5:38
    Our goal was to build an accurate model
    of a person, not to misrepresent them.
  • 5:38 - 5:42
    But one thing that concerns me
    is its potential for misuse.
  • 5:42 - 5:46
    People have been thinking
    about this problem for a long time,
  • 5:46 - 5:48
    since the days when Photoshop
    first hit the market.
  • 5:48 - 5:52
    As a researcher, I'm also working
    on countermeasure technology,
  • 5:52 - 5:55
    and I'm part of an ongoing
    effort at AI Foundation
  • 5:55 - 5:58
    which uses a combination
    of machine learning and human moderators
  • 5:58 - 6:01
    to detect fake images and videos,
  • 6:01 - 6:03
    fighting against my own work.
  • 6:03 - 6:06
    And one of the tools we plan to release
    is called Reality Defender,
  • 6:06 - 6:10
    which is a web browser plugin
    that can flag potentially fake content
  • 6:10 - 6:12
    automatically right in the browser.
  • 6:12 - 6:15
    (Applause)
  • 6:17 - 6:18
    Despite all this, though,
  • 6:18 - 6:20
    fake videos could do a lot of damage,
  • 6:20 - 6:23
    even before anyone has a chance to verify,
  • 6:23 - 6:26
    so it's very important
    that we make everyone aware
  • 6:26 - 6:28
    of what's currently possible
  • 6:28 - 6:32
    so we can have the right assumption
    and be critical about what we see.
  • 6:32 - 6:37
    There's still a long way to go before
    we can fully model individual people
  • 6:37 - 6:41
    and before we can ensure
    the safety of this technology,
  • 6:41 - 6:43
    but I'm excited and hopeful
  • 6:43 - 6:46
    because if we use it right and carefully,
  • 6:46 - 6:50
    this tool can allow any individual's
    positive impact on the world
  • 6:50 - 6:52
    to be massively scaled
  • 6:52 - 6:56
    and really help shape our future
    the way we want it to be.
  • 6:56 - 6:57
    Thank you.
  • 6:57 - 7:02
    (Applause)
Title:
Fake videos of real people -- and how to spot them
Speaker:
Supasorn Suwajanakorn
Description:

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
07:15
  • @5:02 the speaker says "to convert AN input audio into these mouth points," not "to convert AND input audio into these mouth points," imho.

English subtitles

Revisions Compare revisions