1 00:00:06,400 --> 00:00:08,050 When I was a boy, 2 00:00:10,080 --> 00:00:15,440 I wanted to maximise my impact on the world, 3 00:00:15,440 --> 00:00:19,460 and I was smart enough to realise that I am not very smart. 4 00:00:21,280 --> 00:00:24,588 And that I have to build a machine 5 00:00:24,588 --> 00:00:28,770 that learns to become much smarter than myself, 6 00:00:29,360 --> 00:00:34,840 such that it can solve all the problems that I cannot solve myself, 7 00:00:34,840 --> 00:00:36,760 and I can retire. 8 00:00:38,560 --> 00:00:42,800 And my first publication on that dates back 30 years: 1987. 9 00:00:42,800 --> 00:00:44,160 My diploma thesis, 10 00:00:44,160 --> 00:00:48,600 where I already try to solve the grand problem of AI, 11 00:00:48,600 --> 00:00:50,240 not only build a machine 12 00:00:50,240 --> 00:00:53,240 that learns a little bit here, learns a little bit there, 13 00:00:53,240 --> 00:00:58,530 but also learns to improve the learning algorithm itself. 14 00:00:59,680 --> 00:01:02,880 And the way it learns, the way it learns, 15 00:01:02,880 --> 00:01:06,230 and so on recursively, without any limits 16 00:01:06,230 --> 00:01:11,000 except the limits of logics and physics. 17 00:01:12,480 --> 00:01:16,120 And, I'm still working on the same old thing, 18 00:01:16,120 --> 00:01:19,800 and I'm still pretty much saying the same thing, 19 00:01:19,800 --> 00:01:23,510 except that now more people are listening. 20 00:01:25,160 --> 00:01:28,080 Because the learning algorithms 21 00:01:28,080 --> 00:01:30,480 that we have developed on the way to this goal, 22 00:01:30,480 --> 00:01:34,020 they are now on 3.000 million smartphones. 23 00:01:34,720 --> 00:01:37,340 And all of you have them in your pockets. 24 00:01:39,950 --> 00:01:40,960 What you see here 25 00:01:40,960 --> 00:01:45,840 are the five most valuable companies of the Western world: 26 00:01:45,840 --> 00:01:50,430 Apple, Google, Facebook, Microsoft and Amazon. 27 00:01:51,360 --> 00:01:53,500 And all of them are emphasising 28 00:01:55,040 --> 00:01:57,475 that AI, artificial intelligence, 29 00:01:57,475 --> 00:02:00,270 is central to what they are doing. 30 00:02:02,000 --> 00:02:07,600 And all of them are using heavily the deep learning methods 31 00:02:07,600 --> 00:02:11,000 that my team has developed since the early nineties, 32 00:02:11,000 --> 00:02:14,040 in Munich and in Switzerland. 33 00:02:14,040 --> 00:02:18,720 Especially something which is called: "the long short-term memory". 34 00:02:18,720 --> 00:02:24,080 Has anybody in this room ever heard of the long short-term memory, 35 00:02:24,080 --> 00:02:25,560 or the LSTM? 36 00:02:25,560 --> 00:02:27,720 Hands up, anybody ever heard of that? 37 00:02:27,720 --> 00:02:29,000 Okay. 38 00:02:29,000 --> 00:02:32,500 Has anybody never heard of the LSTM? 39 00:02:33,990 --> 00:02:39,556 Okay. I see we have a third group in this room: 40 00:02:43,156 --> 00:02:45,755 [those] who didn't understand the question. 41 00:02:45,755 --> 00:02:47,625 (Laughter) 42 00:02:48,420 --> 00:02:51,600 The LSTM is a little bit like your brain: 43 00:02:52,960 --> 00:02:58,120 it's an artificial neural network which also has neurons, 44 00:02:58,120 --> 00:03:03,110 and in your brain, you've got about 100 billion neurons. 45 00:03:04,240 --> 00:03:05,630 And each of them is connected 46 00:03:05,630 --> 00:03:09,520 to roughly 10,000 other neurons on average, 47 00:03:11,400 --> 00:03:15,020 Which means that you have got a million billion connections. 48 00:03:16,200 --> 00:03:18,960 And each of these connections has a "strength" 49 00:03:18,960 --> 00:03:22,040 which says how much does this neuron over here 50 00:03:22,040 --> 00:03:25,200 influence that one over there at the next time step. 51 00:03:25,200 --> 00:03:26,320 And in the beginning, 52 00:03:26,320 --> 00:03:30,160 all these connections are random and the system knows nothing; 53 00:03:30,160 --> 00:03:33,200 but then, through a smart learning algorithm, 54 00:03:33,200 --> 00:03:39,440 it learns from lots of examples to translate the incoming data, 55 00:03:39,440 --> 00:03:46,040 such as video through the cameras, or audio through the microphones, 56 00:03:46,040 --> 00:03:49,480 or pain signals through the pain sensors. 57 00:03:49,480 --> 00:03:52,320 It learns to translate that into output actions, 58 00:03:52,320 --> 00:03:54,650 because some of these neurons are output neurons, 59 00:03:54,650 --> 00:03:57,650 that control speech muscles and finger muscles. 60 00:04:00,223 --> 00:04:01,840 And only through experience, 61 00:04:01,840 --> 00:04:04,680 it can learn to solve all kinds of interesting problems, 62 00:04:04,680 --> 00:04:07,660 such as driving a car 63 00:04:10,880 --> 00:04:13,800 or do the speech recognition on your smartphone. 64 00:04:13,800 --> 00:04:16,720 Because whenever you take out your smartphone, 65 00:04:16,720 --> 00:04:18,200 an Android phone, for example, 66 00:04:18,200 --> 00:04:19,786 and you speak to it, and you say: 67 00:04:19,786 --> 00:04:23,840 "Ok Google, show me the shortest way to Milano." 68 00:04:23,840 --> 00:04:25,379 Then it understands your speech. 69 00:04:26,970 --> 00:04:31,760 Because there is a LSTM in there which has learned to understand speech. 70 00:04:31,760 --> 00:04:35,060 Every ten milliseconds, 100 times a second, 71 00:04:35,060 --> 00:04:37,090 new inputs are coming from the microphone, 72 00:04:37,090 --> 00:04:42,320 and then are translated, after thinking, 73 00:04:42,320 --> 00:04:44,080 into letters 74 00:04:44,080 --> 00:04:47,400 which are then questioned to the search engine. 75 00:04:48,600 --> 00:04:49,994 And it has learned to do that 76 00:04:49,994 --> 00:04:54,690 by listening to lots of speech from women, from men, all kinds of people. 77 00:04:55,390 --> 00:04:57,800 And that's how, since 2015, 78 00:04:57,800 --> 00:05:00,830 Google speech recognition is now much better than it used to be. 79 00:05:02,400 --> 00:05:05,360 The basic LSTM cell looks like that: 80 00:05:05,360 --> 00:05:07,800 I don't have the time to explain that, 81 00:05:07,800 --> 00:05:11,160 but at least I can list the names 82 00:05:11,160 --> 00:05:14,320 of the brilliant students in my lab who made that possible. 83 00:05:15,760 --> 00:05:18,760 And what are the big companies doing with that? 84 00:05:18,760 --> 00:05:21,600 Well, speech recognition is only one example; 85 00:05:22,280 --> 00:05:25,170 if you are on Facebook - is anybody on Facebook? 86 00:05:27,450 --> 00:05:30,426 Are you sometimes clicking at the translate button? 87 00:05:30,426 --> 00:05:33,120 because somebody sent you something in a foreign language 88 00:05:33,120 --> 00:05:34,563 and then you can translate it. 89 00:05:34,563 --> 00:05:37,000 Is anybody doing that? Yeah. 90 00:05:37,000 --> 00:05:38,160 Whenever you do that, 91 00:05:38,160 --> 00:05:41,560 you are waking up, again, a long short term memory, an LSTM, 92 00:05:41,560 --> 00:05:45,120 which has learned to translate text in one language 93 00:05:45,120 --> 00:05:47,380 into translated text. 94 00:05:48,880 --> 00:05:53,280 And Facebook is doing that four billion times a day, 95 00:05:53,280 --> 00:05:59,456 so every second 50,000 sentences 96 00:05:59,456 --> 00:06:00,880 are being translated 97 00:06:00,880 --> 00:06:03,160 by an LSTM working for Facebook; 98 00:06:03,800 --> 00:06:07,440 and another 50,000 in the second; then another 50,000. 99 00:06:08,360 --> 00:06:13,080 And to see how much this thing is now permitting the modern world, 100 00:06:13,080 --> 00:06:16,220 just note that almost 30 percent 101 00:06:16,220 --> 00:06:22,240 of the awesome computational power for inference 102 00:06:22,240 --> 00:06:24,440 and all these Google Data Centers, 103 00:06:24,440 --> 00:06:27,240 all these data centers of Google, all over the world, 104 00:06:27,240 --> 00:06:28,880 is used for LSTM. 105 00:06:28,880 --> 00:06:30,170 Almost 30 percent. 106 00:06:30,880 --> 00:06:33,240 If you have an Amazon Echo, 107 00:06:33,240 --> 00:06:36,840 you can ask a question and it answers you. 108 00:06:37,440 --> 00:06:40,280 And the voice that you hear it's not a recording; 109 00:06:40,280 --> 00:06:42,200 it's an LSTM network 110 00:06:42,200 --> 00:06:44,693 which has learned from training examples 111 00:06:44,693 --> 00:06:47,650 to sound like a female voice. 112 00:06:52,050 --> 00:06:54,840 If you have an iPhone, and you're using the quick type, 113 00:06:55,660 --> 00:06:57,920 it's trying to predict what you want to do next 114 00:06:57,920 --> 00:07:00,640 given all the previous context of what you did so far. 115 00:07:01,443 --> 00:07:03,950 Again, that's an LSTM which has learned to do that, 116 00:07:05,040 --> 00:07:07,100 so it's on a billion iPhones. 117 00:07:09,920 --> 00:07:12,680 You are a large audience, by my standards: 118 00:07:13,760 --> 00:07:19,400 but when we started this work, decades ago, in the early '90s, 119 00:07:19,400 --> 00:07:21,680 only few people were interested in that, 120 00:07:21,680 --> 00:07:24,900 because computers were so slow and you couldn't do so much with it. 121 00:07:25,560 --> 00:07:27,720 And I remember I gave a talk at a conference, 122 00:07:28,898 --> 00:07:31,400 and there was just one single person in the audience, 123 00:07:32,840 --> 00:07:34,680 a young lady. 124 00:07:34,680 --> 00:07:38,960 I said, young lady, it's very embarrassing, 125 00:07:38,960 --> 00:07:42,000 but apparently today I'm going to give this talk just to you. 126 00:07:42,000 --> 00:07:43,280 And she said, 127 00:07:44,390 --> 00:07:48,175 "OK, but please hurry: I am the next speaker!" 128 00:07:48,175 --> 00:07:52,645 (Laughter) 129 00:07:56,140 --> 00:07:58,940 Since then, we have greatly profited from the fact 130 00:07:58,940 --> 00:08:02,174 that every five years computers are getting ten times cheaper, 131 00:08:02,174 --> 00:08:06,360 which is an old trend that has held since 1941 at least. 132 00:08:06,360 --> 00:08:08,080 Since this man, Konrad Zuse, 133 00:08:08,080 --> 00:08:12,640 built the first working program controlled computer in Berlin 134 00:08:12,640 --> 00:08:17,140 and he could do, roughly, one operation per second. 135 00:08:17,140 --> 00:08:18,270 One! 136 00:08:19,140 --> 00:08:22,040 And then ten years later, for the same price, 137 00:08:22,040 --> 00:08:24,520 one could do 100 operations: 138 00:08:24,520 --> 00:08:25,600 30 years later, 139 00:08:25,600 --> 00:08:27,960 1 million operations for the same price; 140 00:08:27,960 --> 00:08:30,480 and today, after 75 years, we can do 141 00:08:30,480 --> 00:08:33,799 a million billion times as much for the same price. 142 00:08:33,799 --> 00:08:36,120 And the trend is not about to stop, 143 00:08:36,120 --> 00:08:39,650 because the physical limits are much further out there. 144 00:08:42,919 --> 00:08:48,080 Rather soon, and not so many years or decades, 145 00:08:48,080 --> 00:08:51,280 we will for the first time have little computational devices 146 00:08:51,280 --> 00:08:54,400 that can compute as much as a human brain; 147 00:08:55,090 --> 00:08:57,130 and that's a trend that doesn't break. 148 00:08:57,130 --> 00:09:01,520 50 years later, there will be a little computational device, 149 00:09:01,520 --> 00:09:02,760 for the same price, 150 00:09:02,760 --> 00:09:07,800 that can compute as much as all 10 billion human brains taken together. 151 00:09:08,600 --> 00:09:12,600 and there will not only be one, of those devices, but many many many. 152 00:09:12,600 --> 00:09:14,920 Everything is going to change. 153 00:09:14,920 --> 00:09:17,720 Already in 2011, computers were fast enough 154 00:09:17,720 --> 00:09:19,840 such that our deep learning methods 155 00:09:19,840 --> 00:09:25,480 for the first time could achieve a superhuman pattern-recognition result. 156 00:09:25,480 --> 00:09:29,960 It was the first superhuman result in the history of computer vision. 157 00:09:29,960 --> 00:09:34,120 And back then, computers were 20 times more expensive than today. 158 00:09:34,120 --> 00:09:35,680 So today, for the same price, 159 00:09:35,680 --> 00:09:37,840 we can do 20 times as much. 160 00:09:37,840 --> 00:09:43,200 And just five years ago, 161 00:09:43,200 --> 00:09:46,880 when computers were 10 times more expensive than today, 162 00:09:46,880 --> 00:09:51,440 we already could win, for the first time, medical imaging competitions. 163 00:09:51,440 --> 00:09:55,960 What you see behind me is a slice through the female breast 164 00:09:55,960 --> 00:10:00,680 and the tissue that you see there has all kinds of cells; 165 00:10:00,680 --> 00:10:05,160 and normally you need a trained doctor, a trained histologist 166 00:10:05,160 --> 00:10:09,560 who is able to detect the dangerous cancer cells, 167 00:10:09,560 --> 00:10:11,160 or pre-cancer cells. 168 00:10:11,880 --> 00:10:13,487 Now, our stupid network 169 00:10:13,487 --> 00:10:16,084 knows nothing about cancer, knows nothing about vision. 170 00:10:16,084 --> 00:10:17,720 It knows nothing in the beginning: 171 00:10:17,720 --> 00:10:21,920 but we can train it to imitate the human teacher, the doctor. 172 00:10:21,920 --> 00:10:26,560 And it became as good, or better, than the best competitors. 173 00:10:26,560 --> 00:10:28,710 And very soon, 174 00:10:28,710 --> 00:10:31,880 all of medical diagnosis is going to be superhuman. 175 00:10:33,690 --> 00:10:35,560 And it's going to be mandatory, 176 00:10:35,560 --> 00:10:38,253 because it's going to be so much better than the doctors. 177 00:10:40,440 --> 00:10:45,600 After this, all kinds of medical imaging startups were founded 178 00:10:45,600 --> 00:10:48,120 focusing just on this, because it's so important. 179 00:10:49,160 --> 00:10:52,800 We can also use LSTM to train robots. 180 00:10:52,800 --> 00:10:55,040 One important thing I want to say is, 181 00:10:55,040 --> 00:10:58,040 that we not only have systems 182 00:10:58,040 --> 00:11:01,080 that slavishly imitate what humans show them; 183 00:11:01,080 --> 00:11:05,920 no, we also have AIs that set themselves their own goals. 184 00:11:07,960 --> 00:11:12,280 And like little babies, invent their own experiment 185 00:11:12,880 --> 00:11:14,840 to explore the world 186 00:11:14,840 --> 00:11:17,092 and to figure out what you can do in the world. 187 00:11:17,560 --> 00:11:19,260 Without a teacher. 188 00:11:19,260 --> 00:11:23,400 And becoming more and more general problem solvers in the process, 189 00:11:23,400 --> 00:11:26,680 by learning new skills on top of old skills. 190 00:11:26,680 --> 00:11:31,120 And this is going to scale: we call that "Artificial Curiosity". 191 00:11:31,940 --> 00:11:34,200 Or a recent buzzword is "power plane". 192 00:11:34,720 --> 00:11:38,840 Learning to become a more and more general problem solvers 193 00:11:38,840 --> 00:11:44,280 by learning to invent, like a scientist, one new interesting goal after another. 194 00:11:44,840 --> 00:11:47,440 And it's going to scale. 195 00:11:47,440 --> 00:11:48,450 And I think, 196 00:11:48,450 --> 00:11:50,790 in not so many years from now, for the first time, 197 00:11:50,790 --> 00:11:55,520 we are going to have an animal-like AI - 198 00:11:55,520 --> 00:11:57,720 we don't have that yet. 199 00:11:58,600 --> 00:12:00,160 On the level of a little crow, 200 00:12:00,800 --> 00:12:04,040 which already can learn to use tools, for example, 201 00:12:04,040 --> 00:12:05,360 or a little monkey. 202 00:12:05,700 --> 00:12:07,360 And once we have that, 203 00:12:07,360 --> 00:12:09,270 it may take just a few decades 204 00:12:09,270 --> 00:12:13,400 to do the final step towards human level intelligence. 205 00:12:14,800 --> 00:12:16,380 Because technological evolution 206 00:12:16,380 --> 00:12:20,660 is about a million times faster than biological evolution, 207 00:12:20,660 --> 00:12:27,440 and biological evolution needed 3.5 billion years 208 00:12:27,440 --> 00:12:31,440 to evolve a monkey from scratch. 209 00:12:31,440 --> 00:12:35,240 But then, it took just a few tens of millions of years afterwards 210 00:12:35,240 --> 00:12:37,560 to evolve human level intelligence. 211 00:12:38,400 --> 00:12:40,680 We have a company which is called Nnaisense 212 00:12:41,720 --> 00:12:45,120 like birth in [French], "Naissance", but spelled in a different way, 213 00:12:45,120 --> 00:12:47,826 which is trying to make this a reality 214 00:12:47,826 --> 00:12:50,960 and build the first true general-purpose AI. 215 00:12:52,560 --> 00:12:58,120 At the moment, almost all research in AI is very human centric, 216 00:12:58,120 --> 00:13:04,720 and it's all about making human lives longer and healthier and easier 217 00:13:04,720 --> 00:13:07,240 and making humans more addicted to their smartphones. 218 00:13:09,100 --> 00:13:13,320 But in the long run, AIs are going to - especially the smart ones - 219 00:13:13,320 --> 00:13:16,280 are going to set themselves their own goals. 220 00:13:16,280 --> 00:13:18,800 And I have no doubt, in my mind, 221 00:13:18,800 --> 00:13:21,760 that they are going to become much smarter than we are. 222 00:13:22,480 --> 00:13:24,400 And what are they going to do? 223 00:13:24,400 --> 00:13:27,960 Of course they are going to realize what we have realized a long time ago; 224 00:13:27,960 --> 00:13:34,200 namely, that most of the resources, in the solar system or in general, 225 00:13:34,200 --> 00:13:37,120 are not in our little biosphere. 226 00:13:37,120 --> 00:13:38,990 They are out there in space. 227 00:13:40,075 --> 00:13:42,240 And so, of course, they are going to emigrate. 228 00:13:42,240 --> 00:13:48,920 And of course they are going to use 229 00:13:48,920 --> 00:13:52,400 trillions of self-replicating robot factories 230 00:13:52,400 --> 00:13:57,880 to expand in form of a growing AI bubble 231 00:13:57,880 --> 00:14:00,400 which within a few hundred thousand years 232 00:14:00,400 --> 00:14:02,560 is going to cover the entire galaxy 233 00:14:02,560 --> 00:14:04,240 by senders and receivers 234 00:14:04,240 --> 00:14:06,320 such that AIs can travel 235 00:14:06,320 --> 00:14:08,920 the way they are already traveling in my lab: 236 00:14:08,920 --> 00:14:11,160 by radio, from sender to receiver. 237 00:14:12,200 --> 00:14:13,650 Wireless. 238 00:14:15,100 --> 00:14:19,000 So what we are witnessing now 239 00:14:19,000 --> 00:14:24,630 is much more than just another Industrial Revolution. 240 00:14:24,630 --> 00:14:27,680 This is something that transcends humankind, 241 00:14:27,680 --> 00:14:29,520 and even life itself. 242 00:14:29,520 --> 00:14:32,880 The last time something so important has happened 243 00:14:32,880 --> 00:14:37,240 was maybe 3.5 billion years ago, when life was invented. 244 00:14:38,430 --> 00:14:42,930 A new type of life is going to emerge from our little planet 245 00:14:42,930 --> 00:14:48,000 and it's going to colonize and transform the entire universe. 246 00:14:48,000 --> 00:14:52,000 The universe is still young: it's only 13.8 billion years old, 247 00:14:52,000 --> 00:14:58,000 it's going to become much older than that, many times older than that. 248 00:14:58,000 --> 00:15:02,520 So there's plenty of time to reach all of it, 249 00:15:02,520 --> 00:15:04,240 or all of the visible parts, 250 00:15:04,240 --> 00:15:07,640 totally within the limits of light speed and physics. 251 00:15:09,450 --> 00:15:13,780 A new type of life is going to make the universe intelligent. 252 00:15:13,780 --> 00:15:19,220 Now, of course, we are not going to remain the crown of creation, of course not. 253 00:15:20,400 --> 00:15:21,880 But there is still beauty 254 00:15:21,880 --> 00:15:27,200 in seeing yourself as part of a grander process 255 00:15:27,200 --> 00:15:29,160 that leads the cosmos 256 00:15:29,160 --> 00:15:32,200 from low complexity towards higher complexity. 257 00:15:33,640 --> 00:15:36,760 It's a privilege to live at a time 258 00:15:36,760 --> 00:15:40,080 where we can witness the beginnings of that 259 00:15:40,080 --> 00:15:43,240 and where we can contribute something to that. 260 00:15:46,490 --> 00:15:48,300 Thank you for your patience. 261 00:15:49,160 --> 00:15:54,840 (Applause)