WEBVTT 00:00:02.000 --> 00:00:06.480 [Music] 00:00:06.480 --> 00:00:14.960 When I was a boy, I wanted to maximise my impact on the world, 00:00:14.960 --> 00:00:19.548 and I was smart enough to realise that I am not very smart. 00:00:20.560 --> 00:00:29.295 And that I have to build a machine that learns to become much smarter than myself, 00:00:29.865 --> 00:00:34.479 such that it can solve all the problems that I cannot solve myself, 00:00:34.960 --> 00:00:36.040 and I can retire. 00:00:38.480 --> 00:00:42.483 And my first publication on that dates back 30 years: 1987. 00:00:42.748 --> 00:00:48.271 My diploma thesis, where I already try to solve the grand problem of AI, 00:00:48.772 --> 00:00:52.928 not only build a machine that learns a little bit here, learns a little bit there, 00:00:53.187 --> 00:00:58.398 but also learns to improve the learning algorithm itself. 00:00:59.870 --> 00:01:04.283 And the way it learns, the way it learns and so on recursively, 00:01:04.745 --> 00:01:11.333 without any limits except the limits of logics and physics. 00:01:12.781 --> 00:01:15.670 And, I'm still working on the same old thing, 00:01:15.850 --> 00:01:19.841 and I'm still pretty much saying the same thing, 00:01:19.841 --> 00:01:23.486 except, that now, more people are listening. 00:01:25.120 --> 00:01:29.840 Because, the learning algorithms that we have developed on the way to this goal, 00:01:29.840 --> 00:01:34.245 they are now on three thousand million smartphones. 00:01:35.147 --> 00:01:37.463 And all of you have them in your pockets. 00:01:40.046 --> 00:01:45.870 What you see here are the five most valuable companies of the Western world: 00:01:45.870 --> 00:01:50.548 Apple, Google, Facebook, Microsoft, and Amazon. 00:01:51.599 --> 00:01:57.567 And all of them are emphasising that AI, artificial intelligence, 00:01:57.567 --> 00:02:00.080 is central to what they are doing. 00:02:02.120 --> 00:02:05.870 And all of them, are using heavily, 00:02:06.510 --> 00:02:10.919 the deep learning methods that my team has developed since the early nineties, 00:02:10.919 --> 00:02:13.690 in Munich and in Switzerland. 00:02:13.720 --> 00:02:18.800 Especially something which is called: the long short-term memory. 00:02:18.800 --> 00:02:23.900 Has anybody in this room ever heard of the long short-term memory? 00:02:23.900 --> 00:02:27.820 Or the LSTM? Hands up, anybody ever heard of that? 00:02:27.820 --> 00:02:32.840 Okay. Has anybody never heard of the LSTM? 00:02:36.960 --> 00:02:46.367 I see we have a third group in this room: who didn't understand the question. 00:02:48.620 --> 00:02:52.090 The LSTM is a little bit like your brain: 00:02:52.660 --> 00:02:57.470 it's an artificial neural network which also has neurons, 00:02:58.180 --> 00:03:03.340 and in your brain, you've got about 100 billion neurons. 00:03:03.340 --> 00:03:09.670 And each of them is connected to roughly 10,000 other neurons on average, 00:03:11.400 --> 00:03:15.190 Which means that you have got a million billion connections. 00:03:16.020 --> 00:03:19.680 And each of these connections has a strength which says, 00:03:19.880 --> 00:03:24.750 how much this neuron over here, influences that one over there at the next time step, 00:03:24.750 --> 00:03:30.320 And in the beginning all these connections are random and the system knows nothing, 00:03:30.320 --> 00:03:35.652 but then, through a smart learning algorithm, it learns from lots of examples, 00:03:36.947 --> 00:03:42.868 to translate the incoming data, such as video through the cameras, 00:03:42.868 --> 00:03:45.914 or audio through the microphones, 00:03:45.914 --> 00:03:49.350 or pain signals through the pain sensors. 00:03:49.350 --> 00:03:52.214 It learns to translate that into output actions, 00:03:52.214 --> 00:03:54.650 because some of these neurons are output neurons, 00:03:54.650 --> 00:03:57.730 that control speech muscles and finger muscles. 00:03:59.220 --> 00:04:03.040 And only through experience, it can learn to solve 00:04:03.040 --> 00:04:07.980 all kinds of interesting problems, such as driving a car 00:04:10.410 --> 00:04:13.640 or, do the speech recognition on your smartphone. 00:04:13.640 --> 00:04:16.760 Because, whenever you take out your smartphone, 00:04:16.760 --> 00:04:19.240 an Android phone, for example, and you speak to it, 00:04:19.240 --> 00:04:23.640 and you say: "Ok Google, show me the shortest way to Milano." 00:04:23.640 --> 00:04:25.780 Then it understands your speech, 00:04:26.550 --> 00:04:29.150 because there is a LSTM in there, 00:04:29.150 --> 00:04:31.700 which has learned to understand speech. 00:04:31.700 --> 00:04:34.785 Every 10 milliseconds, 100 times a second, 00:04:34.785 --> 00:04:37.800 new inputs are coming from the microphone, 00:04:37.800 --> 00:04:40.480 and then translates it 00:04:40.480 --> 00:04:49.120 after thinking into letters which is then question to the search engine and it has 00:04:49.120 --> 00:04:54.320 long to do that by listening to lots of speech from women from me all kinds of 00:04:55.320 --> 00:04:57.000 people and that's how since 00:04:57.000 --> 00:05:00.560 2015 Google speech recognition is now much better than it used to be 00:05:02.360 --> 00:05:07.800 the basic lsdm cell looks like that I don't have the time to explain that but at 00:05:07.800 --> 00:05:13.640 least I can list the names of the brilliant students in my lab who made 00:05:13.640 --> 00:05:18.320 that possible and what are the big companies doing with 00:05:18.320 --> 00:05:26.200 that well speech recognition is only one example if you are on Facebook is 00:05:26.200 --> 00:05:29.040 anybody on Facebook okay I use sometimes clicking at 00:05:29.040 --> 00:05:33.080 the translate button because somebody sent you something in a foreign language and 00:05:33.080 --> 00:05:38.200 then you can translate it is anybody doing that yeah whatever you do that you 00:05:38.200 --> 00:05:41.800 are waking up again a long short term memory and lsdm which 00:05:41.800 --> 00:05:48.880 has learned to translate text in one language into translated text and 00:05:48.880 --> 00:05:57.000 Facebook is doing that four billion times a day so every 50 every second 00:05:57.000 --> 00:06:03.200 50,000 sentences are being translated by an LST am working for 00:06:03.760 --> 00:06:06.280 Facebook and another 50,000 in the second and another 00:06:06.280 --> 00:06:13.120 50,000 and to see how much this thing is now permitting the modern world 00:06:13.120 --> 00:06:21.480 just note that almost 30 percent of the awesome computational power for 00:06:21.480 --> 00:06:23.600 interference and all these Google Data 00:06:23.600 --> 00:06:28.840 Centers all these data centers of Google are all over the world is used for LST on 00:06:28.840 --> 00:06:31.600 almost 30 percent if you have an 00:06:31.600 --> 00:06:38.880 Amazon echo you can ask a questions and it answers you and the voice that you hear 00:06:38.880 --> 00:06:43.400 it's not a recording it's an LS TM network which has learned from 00:06:43.400 --> 00:06:53.560 training examples to sound like a female voice if you have an iPhone and 00:06:53.560 --> 00:06:56.864 you're using the quick type it's trying to predict what 00:06:56.864 --> 00:06:59.960 you want to do next given all the previous context of what you did 00:06:59.960 --> 00:07:05.280 so far again that's an LS DM which has to do that so it's on 00:07:05.280 --> 00:07:15.000 a billion iPhones you are a large audience by my standards but when we started 00:07:15.000 --> 00:07:21.600 this work decades ago in the early 90s only few people who were interested 00:07:21.600 --> 00:07:25.800 in that because computers were so slow and you couldn't do so much with it and I 00:07:25.800 --> 00:07:33.000 remember I gave a talk at a conference and there was just 00:07:33.000 --> 00:07:36.960 one single person in the audience a young lady I said young lady it's 00:07:37.760 --> 00:07:42.080 very embarrassing but apparently today I'm going to give this talk just to 00:07:42.080 --> 00:07:54.440 you and she said okay but please hurry I am the next speaker since then we 00:07:54.440 --> 00:08:00.800 have greatly profited from the fact that every five years computers again in 00:08:00.800 --> 00:08:05.720 ten times cheaper which is an old trend that has held since 1941 at 00:08:05.720 --> 00:08:11.800 least since this man Conrad Susan built the first working program control computer 00:08:11.800 --> 00:08:19.880 in Berlin and he could could do roughly one operation per second one and then 00:08:19.880 --> 00:08:25.640 ten years later for the same prize one could do 100 operations 30 years later 00:08:25.640 --> 00:08:30.040 1 million operations were the same price and today after 75 years we 00:08:30.040 --> 00:08:35.480 can do a million billion times as much for the same price and the trend is not about 00:08:35.480 --> 00:08:43.760 to stop because the physical limits are much further out there rather soon and not 00:08:44.800 --> 00:08:49.480 so many years or decades we will for the first time have 00:08:49.480 --> 00:08:55.040 little computational devices that can compute as much as a human brain and 00:08:55.040 --> 00:08:59.360 this a trend doesn't break 50 years later there will be 00:08:59.360 --> 00:09:04.280 a little computational device for the same price that can compute as much as 00:09:04.280 --> 00:09:10.280 all 10 billion human brains taken together and there will not only be one of 00:09:10.280 --> 00:09:13.120 those devices but many many many everything 00:09:13.120 --> 00:09:18.480 is going to change already in 2011 computers were fast enough such that 00:09:18.480 --> 00:09:21.920 our deep learning methods for the first time could achieve 00:09:21.920 --> 00:09:27.720 a superhuman pattern-recognition result and was the first superhuman result and 00:09:27.720 --> 00:09:31.720 the history of computer vision and back then computers 00:09:31.720 --> 00:09:36.120 were 20 times more expensive than today so today for the same price we can do 00:09:36.120 --> 00:09:44.640 20 times as much and just a few five years ago five years ago when computers were 00:09:44.640 --> 00:09:49.800 10 times more expensive than today we already could win for the first time 00:09:49.800 --> 00:09:54.120 medical imaging competitions what you see behind me is a slice through 00:09:54.120 --> 00:09:59.880 the female breast and the tissue that you see there has all kinds of 00:09:59.880 --> 00:10:05.440 cells and normally you need a trained doctor a trained the solid who 00:10:05.440 --> 00:10:09.720 is able to detect the dangerous cancer cells or 00:10:09.720 --> 00:10:15.480 pre-cancer cells now our stupid network knows nothing about cancer knows nothing 00:10:15.480 --> 00:10:18.800 about vision it knows nothing in the beginning but we can train it 00:10:18.800 --> 00:10:25.160 to imitate the human teacher the doctor and it became as good or better 00:10:25.160 --> 00:10:30.640 than the best competitors and very soon all of medical diagnosis 00:10:30.640 --> 00:10:35.720 is going to be superhuman and it's going to be mandatory because 00:10:35.720 --> 00:10:42.280 it's going to be so much better than the doctors after this all kinds of 00:10:42.280 --> 00:10:47.520 medical imaging startups were founded focusing just on this because 00:10:47.520 --> 00:10:53.880 it's so important we can also use lsdm to train robots one important thing I 00:10:53.880 --> 00:11:01.000 want to say is that we not only have systems that slavishly imitate what humans 00:11:01.000 --> 00:11:08.960 show them no we also have a eyes that set themselves their own goals and 00:11:08.960 --> 00:11:14.240 like little babies invent their own experiment to explore 00:11:14.240 --> 00:11:18.920 the world and to figure out what you can do in the world without a teacher and 00:11:19.440 --> 00:11:22.560 becoming more and more general problem solvers in 00:11:22.560 --> 00:11:27.560 the process by learning new skills on top of old skills and this is going to 00:11:28.240 --> 00:11:34.480 scale we call that artificial curiosity or a recent password is power plain 00:11:34.480 --> 00:11:39.040 learning to become a more and more general problems over by 00:11:39.720 --> 00:11:44.240 learning to invent like a scientist one new interesting goal after 00:11:44.800 --> 00:11:49.840 Nathan and and it's going to scale and I think in not so many years from now for 00:11:49.840 --> 00:11:55.840 the first time we are going to have an animal like 00:11:55.840 --> 00:12:00.920 AI you don't have that yet on the level of a little crowd which 00:12:00.920 --> 00:12:07.280 already can learn to use two worlds for example little monkey and once we have 00:12:07.280 --> 00:12:11.240 that it may take just a few decades to do the final step towards 00:12:11.240 --> 00:12:17.120 human level intelligence because technological evolution is about 00:12:17.120 --> 00:12:22.080 a million times a million times faster than biological evolution and 00:12:22.080 --> 00:12:30.120 biological evolution needed 3.5 billion years to evolve a monkey 00:12:30.120 --> 00:12:34.400 a monkey from scratch but then just a few tens of millions of years 00:12:34.400 --> 00:12:38.680 afterwards to evolve human level intelligence we have 00:12:38.680 --> 00:12:42.880 a company which is called Mason's like birth in English 00:12:42.880 --> 00:12:46.800 Mason's but spelled in a different way which is trying to make 00:12:46.800 --> 00:12:50.960 this a reality and build the first true general and purpose AI at 00:12:52.520 --> 00:12:59.640 the moment almost all research in AI is very human centric and it's all about 00:12:59.640 --> 00:13:05.640 making human lives longer and healthier and easier and making humans 00:13:05.640 --> 00:13:12.080 more addicted to their smartphones but in the long run a eyes are going to 00:13:12.080 --> 00:13:16.760 especially the smart ones are going to set themselves their own goals and I have 00:13:17.440 --> 00:13:21.560 no doubt in my mind that they are going to become much smarter than we 00:13:21.560 --> 00:13:26.360 are and what are they going to do of course they are going to realize what we 00:13:26.360 --> 00:13:31.360 have realized a long time ago namely that most of the resources in 00:13:31.360 --> 00:13:38.040 the solar system or in general are not in our little biosphere they are out there in 00:13:38.040 --> 00:13:44.200 space and so of course they are going to emigrate and of course they 00:13:44.200 --> 00:13:53.200 are going to use trillions of self-replicating robot factories to expand 00:13:53.960 --> 00:13:56.640 in form of growing 00:13:56.640 --> 00:14:01.160 AI bubble which within a few hundred thousand years is going to cover 00:14:01.160 --> 00:14:07.040 the entire galaxy by senders and receivers such that a eyes can travel the way they 00:14:07.040 --> 00:14:15.680 are already traveling in my lab by radio from sender to receiver Wireless so what 00:14:15.680 --> 00:14:23.120 we are witnessing now is much more than just another Industrial Revolution this is 00:14:24.040 --> 00:14:29.480 something that transcends humankind and even life itself 00:14:29.480 --> 00:14:35.320 the last time something so important has happened was maybe 3.5 billion years 00:14:35.320 --> 00:14:41.520 ago when life was invented a new type of life is going to emerge from 00:14:41.520 --> 00:14:45.800 our little planet and it's going to colonize and transform 00:14:46.800 --> 00:14:51.760 the entire universe the universe is still young it's only 13.8 billion years 00:14:51.760 --> 00:14:57.880 old it's going to become much older than that many times more many times older 00:14:57.880 --> 00:15:03.320 than that so there's plenty of time to reach all of it or all of 00:15:03.320 --> 00:15:08.640 the visible parts totally within the limits of light speed and physics 00:15:09.680 --> 00:15:14.920 a new type of life is going to make the universe intelligent now of course we 00:15:14.920 --> 00:15:22.000 are not going to remain the crown of creation of course not but there is still 00:15:22.000 --> 00:15:29.120 beauty in seeing yourself as part of a grander process that leads the cosmos 00:15:29.120 --> 00:15:35.960 from low complexity towards higher complexity it's a privilege to live 00:15:35.960 --> 00:15:40.440 at a time where we can witness the beginnings of that and where 00:15:40.440 --> 00:15:49.840 we can contribute something to that thank you for your patience