-
Look at these images.
-
Now, tell me which Obama here is real.
-
(Video) Barack Obama: To help families
refinance their homes,
-
to invest in things
like high-tech manufacturing,
-
clean energy,
-
and the infrastructure
that creates good new jobs.
-
Supasorn Suwajanakorn: Anyone?
-
The answer is none of them.
-
(Laughter)
-
None of these is actually real.
-
So let me tell you how we got here.
-
My inspiration for this work
was a project meant to preserve
-
our last chance for learning
about the Holocaust from the survivors.
-
It's called New Dimensions in Testimony,
-
and it allows you to have
interactive conversations
-
with a hologram
of a real Holocaust survivor.
-
(Video) Question: How did you
survive the Holocaust?
-
(Video) Answer: How did I survive?
-
I survived,
-
I believe,
-
because providence watched over me.
-
SS: Turns out these answers
were pre-recorded in a studio.
-
Yet the effect is astounding.
-
You feel so connected to his story
and to him as a person.
-
I think there is something special
about human interaction
-
that makes it much more profound
-
and personal
-
than what books or lectures
or movies could ever teach us.
-
So I saw this and began to wonder,
-
can we create a model
like this for anyone?
-
A model that looks, talks,
and acts just like them?
-
So I set out to see if this could be done,
-
and eventually came up with a new solution
-
that can build a model of a person
using nothing but these:
-
existing photos and videos of a person.
-
If you can leverage this kind
of passive information,
-
just photos and video that are out there,
-
that's the key to scaling to anyone.
-
By the way, here's Richard Feynman,
-
who in addition to being
a Nobel Prize Winner in physics
-
was also known as a legendary teacher.
-
Wouldn't it be great
if we could bring him back
-
to give his lectures
and inspire millions of kids,
-
perhaps not just in English
but in any language?
-
Or if you could ask our grandparents
for advice and hear those comforting words
-
even if they are no longer with us?
-
Or maybe using this tool,
books authors, alive or not,
-
could read aloud all of their books
for anyone interested.
-
The creative possibilities
here are endless,
-
and to me that's very exciting.
-
And here's how it's working so far.
-
First, we introduce a new technique
-
that can reconstruct
a high-detailed 3D face model
-
from any image
-
without ever 3D-scanning the person.
-
And here's the same output model
from different views.
-
This also works on videos
-
by running the same algorithm
on each video frame
-
and generating a moving 3D model.
-
And here's the same
output model from different angles.
-
It turns out this problem
is very challenging,
-
but the key trick is that we are going
to analyze a large photo collection
-
of the person beforehand.
-
For George W. Bush,
we can just search on Google,
-
and from that, we are able
to build an average model,
-
an iterative, refined model
to recover the expression
-
in fine details,
like creases and wrinkles.
-
What's fascinating about this
-
is that the photo collection
can come from your typical photos.
-
It doesn't really matter
what expression you're making
-
or where you took those photos.
-
What matters is that
there are a lot of them.
-
And we are still missing color here,
-
so next we develop
a new blending technique
-
that improves upon
a single averaging method
-
and produces sharp
facial textures and colors.
-
And this can be done for any expression.
-
Now we have a control
of a model of a person,
-
and the way it's controlled now
is by a sequence of static photos.
-
Notice how the wrinkles come and go
depending on the expression.
-
We can also use a video
to drive the model.
-
(Video) Daniel Craig: Rory, but somehow
we've managed to attract
-
some more amazing people.
-
SS: And here's another fun demo.
-
So what you see here
are controllable models
-
of people I built
from their internet photos.
-
Now, if you transfer the motion
from the input video,
-
we can actually drive the entire party.
-
(Video) George W. Bush:
It's a difficult bill to pass,
-
because there's a lot of moving parts,
-
and the legislative process can be ugly.
-
(Applause)
-
SS: So coming back a little bit,
-
our ultimate goal, rather,
is to capture their mannerisms
-
or the unique way each
of these people talks and smiles.
-
So to do that, can we
actually teach the computer
-
to imitate the way someone talks
-
by only showing it
video footage of the person?
-
And what I did exactly was,
I let a computer watch
-
14 hours of pure Barack Obama
giving addresses.
-
And here's what we can produce
given only his audio.
-
(Video) BO: The results are clear.
-
America's business have created
14.5 million new jobs
-
over 75 straight months.
-
SS: So what's being synthesized here
is only the mouth region,
-
and here's how we do it.
-
Our pipeline uses a neural network
-
to convert and input audio
into these mouth points.
-
(Video) BO: We get it through our job
or through Medicare or Medicaid.
-
SS: Then we synthesize the texture,
enhance details and teeth,
-
and blend it into the head
and background from a source video.
-
(Video) BO: Women can get free checkups,
-
and you can't get charged more
just for being a woman.
-
Young people can stay
on a parent's plan until they turn 26.
-
SS: I think these results
seem very realistic and intriguing,
-
but at the same time
frightening, even to me.
-
Our goal was to build an accurate model
of a person, not to misrepresent them.
-
But one thing that concerns me
is its potential for misuse.
-
People have been thinking
about this problem for a long time,
-
since the days when Photoshop
first hit the market.
-
As a researcher, I'm also working
on countermeasure technology,
-
and I'm part of an ongoing
effort at AI Foundation
-
which uses a combination
of machine learning and human moderators
-
to detect fake images and videos,
-
fighting against my own work.
-
And one of the tools we plan to release
is called Reality Defender,
-
which is a web browser plugin
that can flag potentially fake content
-
automatically right in the browser.
-
(Applause)
-
Despite all this, though,
-
fake videos could do a lot of damage,
-
even before anyone has a chance to verify,
-
so it's very important
that we make everyone aware
-
of what's currently possible
-
so we can have the right assumption
and be critical about what we see.
-
There's still a long way to go before
we can fully model individual people
-
and before we can ensure
the safety of this technology,
-
but I'm excited and hopeful
-
because if we use it right and carefully,
-
this tool can allow any individual's
positive impact on the world
-
to be massively scaled
-
and really help shape our future
the way we want it to be.
-
Thank you.
-
(Applause)
Peter van de Ven
@5:02 the speaker says "to convert AN input audio into these mouth points," not "to convert AND input audio into these mouth points," imho.