0:00:00.876,0:00:02.027 Look at these images. 0:00:02.051,0:00:04.686 Now, tell me which Obama here is real. 0:00:04.710,0:00:07.571 (Video) Barack Obama: To help families[br]refinance their homes, 0:00:07.595,0:00:10.242 to invest in things[br]like high-tech manufacturing, 0:00:10.266,0:00:11.425 clean energy 0:00:11.449,0:00:14.228 and the infrastructure[br]that creates good new jobs. 0:00:14.647,0:00:16.131 Supasorn Suwajanakorn: Anyone? 0:00:16.155,0:00:18.029 The answer is none of them. 0:00:18.053,0:00:19.167 (Laughter) 0:00:19.191,0:00:20.977 None of these is actually real. 0:00:21.001,0:00:22.841 So let me tell you how we got here. 0:00:23.940,0:00:25.518 My inspiration for this work 0:00:25.542,0:00:30.953 was a project meant to preserve our last[br]chance for learning about the Holocaust 0:00:30.977,0:00:32.745 from the survivors. 0:00:32.769,0:00:35.396 It's called New Dimensions in Testimony, 0:00:35.420,0:00:38.546 and it allows you to have[br]interactive conversations 0:00:38.570,0:00:41.126 with a hologram[br]of a real Holocaust survivor. 0:00:41.793,0:00:43.759 (Video) Man: How did you[br]survive the Holocaust? 0:00:43.783,0:00:45.451 (Video) Hologram: How did I survive? 0:00:45.912,0:00:47.719 I survived, 0:00:48.419,0:00:49.946 I believe, 0:00:49.970,0:00:52.993 because providence watched over me. 0:00:53.573,0:00:57.027 SS: Turns out these answers[br]were prerecorded in a studio. 0:00:57.051,0:00:59.503 Yet the effect is astounding. 0:00:59.527,0:01:03.146 You feel so connected to his story[br]and to him as a person. 0:01:04.011,0:01:07.312 I think there's something special[br]about human interaction 0:01:07.336,0:01:10.093 that makes it much more profound 0:01:10.117,0:01:12.315 and personal 0:01:12.339,0:01:15.824 than what books or lectures[br]or movies could ever teach us. 0:01:16.267,0:01:18.692 So I saw this and began to wonder, 0:01:18.716,0:01:21.526 can we create a model[br]like this for anyone? 0:01:21.550,0:01:24.525 A model that looks, talks[br]and acts just like them? 0:01:25.573,0:01:27.580 So I set out to see if this could be done 0:01:27.604,0:01:29.914 and eventually came up with a new solution 0:01:29.938,0:01:33.158 that can build a model of a person[br]using nothing but these: 0:01:33.747,0:01:35.961 existing photos and videos of a person. 0:01:36.701,0:01:39.318 If you can leverage[br]this kind of passive information, 0:01:39.342,0:01:41.349 just photos and video that are out there, 0:01:41.373,0:01:43.429 that's the key to scaling to anyone. 0:01:44.119,0:01:45.896 By the way, here's Richard Feynman, 0:01:45.920,0:01:49.333 who in addition to being[br]a Nobel Prize winner in physics 0:01:49.357,0:01:51.810 was also known as a legendary teacher. 0:01:53.080,0:01:55.278 Wouldn't it be great[br]if we could bring him back 0:01:55.302,0:01:58.567 to give his lectures[br]and inspire millions of kids, 0:01:58.591,0:02:01.583 perhaps not just in English[br]but in any language? 0:02:02.441,0:02:07.043 Or if you could ask our grandparents[br]for advice and hear those comforting words 0:02:07.067,0:02:08.837 even if they're no longer with us? 0:02:09.683,0:02:13.079 Or maybe using this tool,[br]book authors, alive or not, 0:02:13.103,0:02:16.040 could read aloud all of their books[br]for anyone interested. 0:02:17.199,0:02:19.636 The creative possibilities[br]here are endless, 0:02:19.660,0:02:21.373 and to me, that's very exciting. 0:02:22.595,0:02:24.597 And here's how it's working so far. 0:02:24.621,0:02:26.288 First, we introduce a new technique 0:02:26.312,0:02:30.884 that can reconstruct a high-detailed[br]3D face model from any image 0:02:30.908,0:02:33.027 without ever 3D-scanning the person. 0:02:33.890,0:02:36.532 And here's the same output model[br]from different views. 0:02:37.969,0:02:39.471 This also works on videos, 0:02:39.495,0:02:42.347 by running the same algorithm[br]on each video frame 0:02:42.371,0:02:44.593 and generating a moving 3D model. 0:02:45.538,0:02:48.310 And here's the same[br]output model from different angles. 0:02:49.933,0:02:52.467 It turns out this problem[br]is very challenging, 0:02:52.491,0:02:55.016 but the key trick[br]is that we are going to analyze 0:02:55.040,0:02:58.006 a large photo collection[br]of the person beforehand. 0:02:58.650,0:03:01.189 For George W. Bush,[br]we can just search on Google, 0:03:02.309,0:03:04.808 and from that, we are able[br]to build an average model, 0:03:04.832,0:03:07.943 an iterative, refined model[br]to recover the expression 0:03:07.967,0:03:10.303 in fine details,[br]like creases and wrinkles. 0:03:11.326,0:03:12.729 What's fascinating about this 0:03:12.753,0:03:16.176 is that the photo collection[br]can come from your typical photos. 0:03:16.200,0:03:18.803 It doesn't really matter[br]what expression you're making 0:03:18.827,0:03:20.712 or where you took those photos. 0:03:20.736,0:03:23.136 What matters is[br]that there are a lot of them. 0:03:23.160,0:03:24.896 And we are still missing color here, 0:03:24.920,0:03:27.268 so next, we develop[br]a new blending technique 0:03:27.292,0:03:30.128 that improves upon[br]a single averaging method 0:03:30.152,0:03:32.970 and produces sharp[br]facial textures and colors. 0:03:33.779,0:03:36.550 And this can be done for any expression. 0:03:37.485,0:03:39.984 Now we have a control[br]of a model of a person, 0:03:40.008,0:03:43.803 and the way it's controlled now[br]is by a sequence of static photos. 0:03:43.827,0:03:46.953 Notice how the wrinkles come and go,[br]depending on the expression. 0:03:48.109,0:03:50.855 We can also use a video[br]to drive the model. 0:03:50.879,0:03:53.472 (Video) Daniel Craig: Right, but somehow, 0:03:53.496,0:03:57.267 we've managed to attract[br]some more amazing people. 0:03:58.021,0:03:59.663 SS: And here's another fun demo. 0:03:59.687,0:04:01.933 So what you see here[br]are controllable models 0:04:01.957,0:04:04.401 of people I built[br]from their internet photos. 0:04:04.425,0:04:07.329 Now, if you transfer[br]the motion from the input video, 0:04:07.353,0:04:09.505 we can actually drive the entire party. 0:04:09.529,0:04:11.701 George W. Bush:[br]It's a difficult bill to pass, 0:04:11.725,0:04:14.028 because there's a lot of moving parts, 0:04:14.052,0:04:19.283 and the legislative processes can be ugly. 0:04:19.307,0:04:20.937 (Applause) 0:04:20.961,0:04:22.798 SS: So coming back a little bit, 0:04:22.822,0:04:26.013 our ultimate goal, rather,[br]is to capture their mannerisms 0:04:26.037,0:04:29.082 or the unique way each[br]of these people talks and smiles. 0:04:29.106,0:04:31.419 So to do that, can we[br]actually teach the computer 0:04:31.443,0:04:33.665 to imitate the way someone talks 0:04:33.689,0:04:36.109 by only showing it[br]video footage of the person? 0:04:36.898,0:04:39.475 And what I did exactly was,[br]I let a computer watch 0:04:39.499,0:04:42.776 14 hours of pure Barack Obama[br]giving addresses. 0:04:43.443,0:04:46.959 And here's what we can produce[br]given only his audio. 0:04:46.983,0:04:48.760 (Video) BO: The results are clear. 0:04:48.784,0:04:53.133 America's businesses have created[br]14.5 million new jobs 0:04:53.157,0:04:55.931 over 75 straight months. 0:04:55.955,0:04:58.860 SS: So what's being synthesized here[br]is only the mouth region, 0:04:58.884,0:05:00.424 and here's how we do it. 0:05:00.764,0:05:02.590 Our pipeline uses a neural network 0:05:02.614,0:05:05.550 to convert and input audio[br]into these mouth points. 0:05:06.547,0:05:10.772 (Video) BO: We get it through our job[br]or through Medicare or Medicaid. 0:05:10.796,0:05:14.216 SS: Then we synthesize the texture,[br]enhance details and teeth, 0:05:14.240,0:05:17.314 and blend it into the head[br]and background from a source video. 0:05:17.338,0:05:19.243 (Video) BO: Women can get free checkups, 0:05:19.267,0:05:22.235 and you can't get charged more[br]just for being a woman. 0:05:22.973,0:05:26.279 Young people can stay[br]on a parent's plan until they turn 26. 0:05:27.267,0:05:30.219 SS: I think these results[br]seem very realistic and intriguing, 0:05:30.243,0:05:33.416 but at the same time[br]frightening, even to me. 0:05:33.440,0:05:37.455 Our goal was to build an accurate model[br]of a person, not to misrepresent them. 0:05:37.956,0:05:41.067 But one thing that concerns me[br]is its potential for misuse. 0:05:41.958,0:05:44.929 People have been thinking[br]about this problem for a long time, 0:05:44.953,0:05:47.334 since the days when Photoshop[br]first hit the market. 0:05:47.862,0:05:51.663 As a researcher, I'm also working[br]on countermeasure technology, 0:05:51.687,0:05:54.629 and I'm part of an ongoing[br]effort at AI Foundation, 0:05:54.653,0:05:58.050 which uses a combination[br]of machine learning and human moderators 0:05:58.074,0:06:00.218 to detect fake images and videos, 0:06:00.242,0:06:01.756 fighting against my own work. 0:06:02.675,0:06:05.865 And one of the tools we plan to release[br]is called Reality Defender, 0:06:05.889,0:06:09.928 which is a web-browser plug-in[br]that can flag potentially fake content 0:06:09.952,0:06:12.485 automatically, right in the browser. 0:06:12.509,0:06:16.737 (Applause) 0:06:16.761,0:06:18.214 Despite all this, though, 0:06:18.238,0:06:20.078 fake videos could do a lot of damage, 0:06:20.102,0:06:23.396 even before anyone has a chance to verify, 0:06:23.420,0:06:26.142 so it's very important[br]that we make everyone aware 0:06:26.166,0:06:28.173 of what's currently possible 0:06:28.197,0:06:31.566 so we can have the right assumption[br]and be critical about what we see. 0:06:32.423,0:06:37.430 There's still a long way to go before[br]we can fully model individual people 0:06:37.454,0:06:40.240 and before we can ensure[br]the safety of this technology. 0:06:41.097,0:06:42.684 But I'm excited and hopeful, 0:06:42.708,0:06:46.247 because if we use it right and carefully, 0:06:46.271,0:06:50.580 this tool can allow any individual's[br]positive impact on the world 0:06:50.604,0:06:52.794 to be massively scaled 0:06:52.818,0:06:55.560 and really help shape our future[br]the way we want it to be. 0:06:55.584,0:06:56.735 Thank you. 0:06:56.759,0:07:01.849 (Applause)