WEBVTT

00:00:02.000 --> 00:00:06.480
[Music]

00:00:06.480 --> 00:00:14.960
When I was a boy, I wanted to
maximise my impact on the world,

00:00:14.960 --> 00:00:19.548
and I was smart enough to realise
that I am not very smart.

00:00:20.560 --> 00:00:29.295
And that I have to build a machine that
learns to become much smarter than myself,

00:00:29.865 --> 00:00:34.479
such that it can solve all the problems
that I cannot solve myself,

00:00:34.960 --> 00:00:36.040
and I can retire.

00:00:38.480 --> 00:00:42.483
And my first publication on
that dates back 30 years: 1987.

00:00:42.748 --> 00:00:48.271
My diploma thesis, where I already try
to solve the grand problem of AI,

00:00:48.772 --> 00:00:52.928
not only build a machine that learns a
little bit here, learns a little bit there,

00:00:53.187 --> 00:00:58.398
but also learns to improve
the learning algorithm itself.

00:00:59.870 --> 00:01:04.283
And the way it learns, the way
it learns and so on recursively,

00:01:04.745 --> 00:01:11.333
without any limits except
the limits of logics and physics.

00:01:12.781 --> 00:01:15.670
And, I'm still working
on the same old thing,

00:01:15.850 --> 00:01:19.841
and I'm still pretty much
saying the same thing,

00:01:19.841 --> 00:01:23.486
except, that now, more people
are listening.

00:01:25.120 --> 00:01:29.840
Because, the learning algorithms that we
have developed on the way to this goal,

00:01:29.840 --> 00:01:34.245
they are now on three thousand
million smartphones.

00:01:35.147 --> 00:01:37.463
And all of you have them
in your pockets.

00:01:40.046 --> 00:01:45.870
What you see here are the five most
valuable companies of the Western world:

00:01:45.870 --> 00:01:50.548
Apple, Google, Facebook,
Microsoft, and Amazon.

00:01:51.599 --> 00:01:57.567
And all of them are emphasising that AI,
artificial intelligence,

00:01:57.567 --> 00:02:00.080
is central to what they are doing.

00:02:02.120 --> 00:02:05.870
And all of them,
are using heavily,

00:02:06.510 --> 00:02:10.919
the deep learning methods that my team
has developed since the early nineties,

00:02:10.919 --> 00:02:13.690
in Munich and in Switzerland.

00:02:13.720 --> 00:02:18.800
Especially something which is called:
the long short-term memory.

00:02:18.800 --> 00:02:23.900
Has anybody in this room ever heard
of the long short-term memory?

00:02:23.900 --> 00:02:27.820
Or the LSTM? Hands up,
anybody ever heard of that?

00:02:27.820 --> 00:02:32.840
Okay. Has anybody never heard of
the LSTM?

00:02:36.960 --> 00:02:46.367
I see we have a third group in this room:
who didn't understand the question.

00:02:48.620 --> 00:02:52.090
The LSTM is a little bit like your brain:

00:02:52.660 --> 00:02:57.470
it's an artificial neural network
which also has neurons,

00:02:58.180 --> 00:03:03.340
and in your brain, you've got about
100 billion neurons.

00:03:03.340 --> 00:03:09.670
And each of them is connected to
roughly 10,000 other neurons on average,

00:03:11.400 --> 00:03:15.190
Which means that you have got
a million billion connections.

00:03:16.020 --> 00:03:19.680
And each of these connections has
a strength which says,

00:03:19.880 --> 00:03:24.750
how much this neuron over here, influences
that one over there at the next time step,

00:03:24.750 --> 00:03:30.320
And in the beginning all these connections
are random and the system knows nothing,

00:03:30.320 --> 00:03:35.652
but then, through a smart learning algorithm,
it learns from lots of examples,

00:03:36.947 --> 00:03:42.868
to translate the incoming data,
such as video through the cameras,

00:03:42.868 --> 00:03:45.914
or audio through the microphones,

00:03:45.914 --> 00:03:49.350
or pain signals through the
pain sensors.

00:03:49.350 --> 00:03:52.214
It learns to translate that
into output actions,

00:03:52.214 --> 00:03:54.650
because some of these neurons are
output neurons,

00:03:54.650 --> 00:03:57.730
that control speech muscles
and finger muscles.

00:03:59.220 --> 00:04:03.040
And only through experience,
it can learn to solve

00:04:03.040 --> 00:04:07.980
all kinds of interesting problems,
such as driving a car

00:04:10.410 --> 00:04:13.640
or, do the speech recognition
on your smartphone.

00:04:13.640 --> 00:04:16.760
Because, whenever you
take out your smartphone,

00:04:16.760 --> 00:04:19.240
an Android phone, for example,
and you speak to it,

00:04:19.240 --> 00:04:23.640
and you say: "Ok Google, show me
the shortest way to Milano."

00:04:23.640 --> 00:04:25.780
Then it understands your speech,

00:04:26.550 --> 00:04:29.150
because there is a LSTM in there,

00:04:29.150 --> 00:04:31.700
which has learned to understand speech.

00:04:31.700 --> 00:04:34.785
Every 10 milliseconds,
100 times a second,

00:04:34.785 --> 00:04:37.800
new inputs are coming
from the microphone,

00:04:37.800 --> 00:04:40.480
and then translates it

00:04:40.480 --> 00:04:49.120
after thinking into letters which is then
question to the search engine and it has

00:04:49.120 --> 00:04:54.320
long to do that by listening to lots of
speech from women from me all kinds of

00:04:55.320 --> 00:04:57.000
people and that's how since

00:04:57.000 --> 00:05:00.560
2015 Google speech recognition is
now much better than it used to be

00:05:02.360 --> 00:05:07.800
the basic lsdm cell looks like that I
don't have the time to explain that but at

00:05:07.800 --> 00:05:13.640
least I can list the names of
the brilliant students in my lab who made

00:05:13.640 --> 00:05:18.320
that possible and what are
the big companies doing with

00:05:18.320 --> 00:05:26.200
that well speech recognition is only
one example if you are on Facebook is

00:05:26.200 --> 00:05:29.040
anybody on Facebook okay I
use sometimes clicking at

00:05:29.040 --> 00:05:33.080
the translate button because somebody sent
you something in a foreign language and

00:05:33.080 --> 00:05:38.200
then you can translate it is anybody
doing that yeah whatever you do that you

00:05:38.200 --> 00:05:41.800
are waking up again
a long short term memory and lsdm which

00:05:41.800 --> 00:05:48.880
has learned to translate text in
one language into translated text and

00:05:48.880 --> 00:05:57.000
Facebook is doing that four billion times
a day so every 50 every second

00:05:57.000 --> 00:06:03.200
50,000 sentences are being translated by
an LST am working for

00:06:03.760 --> 00:06:06.280
Facebook and another 50,000 in the
second and another

00:06:06.280 --> 00:06:13.120
50,000 and to see how much this thing
is now permitting the modern world

00:06:13.120 --> 00:06:21.480
just note that almost 30 percent of
the awesome computational power for

00:06:21.480 --> 00:06:23.600
interference and all these Google Data

00:06:23.600 --> 00:06:28.840
Centers all these data centers of Google
are all over the world is used for LST on

00:06:28.840 --> 00:06:31.600
almost 30 percent if you have an

00:06:31.600 --> 00:06:38.880
Amazon echo you can ask a questions and it
answers you and the voice that you hear

00:06:38.880 --> 00:06:43.400
it's not a recording it's
an LS TM network which has learned from

00:06:43.400 --> 00:06:53.560
training examples to sound like
a female voice if you have an iPhone and

00:06:53.560 --> 00:06:56.864
you're using
the quick type it's trying to predict what

00:06:56.864 --> 00:06:59.960
you want to do next given
all the previous context of what you did

00:06:59.960 --> 00:07:05.280
so far again that's an LS DM which
has to do that so it's on

00:07:05.280 --> 00:07:15.000
a billion iPhones you are a large audience
by my standards but when we started

00:07:15.000 --> 00:07:21.600
this work decades ago in the early
90s only few people who were interested

00:07:21.600 --> 00:07:25.800
in that because computers were so slow and
you couldn't do so much with it and I

00:07:25.800 --> 00:07:33.000
remember I gave a talk at
a conference and there was just

00:07:33.000 --> 00:07:36.960
one single person in the audience
a young lady I said young lady it's

00:07:37.760 --> 00:07:42.080
very embarrassing but apparently
today I'm going to give this talk just to

00:07:42.080 --> 00:07:54.440
you and she said okay but please hurry I
am the next speaker since then we

00:07:54.440 --> 00:08:00.800
have greatly profited from the fact
that every five years computers again in

00:08:00.800 --> 00:08:05.720
ten times cheaper which is
an old trend that has held since 1941 at

00:08:05.720 --> 00:08:11.800
least since this man Conrad Susan built
the first working program control computer

00:08:11.800 --> 00:08:19.880
in Berlin and he could could do roughly
one operation per second one and then

00:08:19.880 --> 00:08:25.640
ten years later for the same prize one
could do 100 operations 30 years later

00:08:25.640 --> 00:08:30.040
1 million operations were
the same price and today after 75 years we

00:08:30.040 --> 00:08:35.480
can do a million billion times as much for
the same price and the trend is not about

00:08:35.480 --> 00:08:43.760
to stop because the physical limits are
much further out there rather soon and not

00:08:44.800 --> 00:08:49.480
so many years or decades we will for
the first time have

00:08:49.480 --> 00:08:55.040
little computational devices that
can compute as much as a human brain and

00:08:55.040 --> 00:08:59.360
this a trend doesn't break 50 years
later there will be

00:08:59.360 --> 00:09:04.280
a little computational device for
the same price that can compute as much as

00:09:04.280 --> 00:09:10.280
all 10 billion human brains taken
together and there will not only be one of

00:09:10.280 --> 00:09:13.120
those devices but
many many many everything

00:09:13.120 --> 00:09:18.480
is going to change already in
2011 computers were fast enough such that

00:09:18.480 --> 00:09:21.920
our deep learning methods for
the first time could achieve

00:09:21.920 --> 00:09:27.720
a superhuman pattern-recognition result and
was the first superhuman result and

00:09:27.720 --> 00:09:31.720
the history of
computer vision and back then computers

00:09:31.720 --> 00:09:36.120
were 20 times more expensive than today so
today for the same price we can do

00:09:36.120 --> 00:09:44.640
20 times as much and just a few five years
ago five years ago when computers were

00:09:44.640 --> 00:09:49.800
10 times more expensive than today we
already could win for the first time

00:09:49.800 --> 00:09:54.120
medical imaging competitions what you see
behind me is a slice through

00:09:54.120 --> 00:09:59.880
the female breast and the tissue that you
see there has all kinds of

00:09:59.880 --> 00:10:05.440
cells and normally you need
a trained doctor a trained the solid who

00:10:05.440 --> 00:10:09.720
is able to detect
the dangerous cancer cells or

00:10:09.720 --> 00:10:15.480
pre-cancer cells now our stupid network
knows nothing about cancer knows nothing

00:10:15.480 --> 00:10:18.800
about vision it knows nothing in
the beginning but we can train it

00:10:18.800 --> 00:10:25.160
to imitate the human teacher
the doctor and it became as good or better

00:10:25.160 --> 00:10:30.640
than the best competitors and
very soon all of medical diagnosis

00:10:30.640 --> 00:10:35.720
is going to be superhuman and
it's going to be mandatory because

00:10:35.720 --> 00:10:42.280
it's going to be so much better than
the doctors after this all kinds of

00:10:42.280 --> 00:10:47.520
medical imaging startups
were founded focusing just on this because

00:10:47.520 --> 00:10:53.880
it's so important we can also use lsdm
to train robots one important thing I

00:10:53.880 --> 00:11:01.000
want to say is that we not only have
systems that slavishly imitate what humans

00:11:01.000 --> 00:11:08.960
show them no we also have a eyes that set
themselves their own goals and

00:11:08.960 --> 00:11:14.240
like little babies invent
their own experiment to explore

00:11:14.240 --> 00:11:18.920
the world and to figure out what you
can do in the world without a teacher and

00:11:19.440 --> 00:11:22.560
becoming more and
more general problem solvers in

00:11:22.560 --> 00:11:27.560
the process by learning new skills on top
of old skills and this is going to

00:11:28.240 --> 00:11:34.480
scale we call that artificial curiosity or
a recent password is power plain

00:11:34.480 --> 00:11:39.040
learning to become a more and
more general problems over by

00:11:39.720 --> 00:11:44.240
learning to invent like a scientist
one new interesting goal after

00:11:44.800 --> 00:11:49.840
Nathan and and it's going to scale and I
think in not so many years from now for

00:11:49.840 --> 00:11:55.840
the first time we are going to have
an animal like

00:11:55.840 --> 00:12:00.920
AI you don't have that yet on the level of
a little crowd which

00:12:00.920 --> 00:12:07.280
already can learn to use two worlds for
example little monkey and once we have

00:12:07.280 --> 00:12:11.240
that it may take just a few decades to do
the final step towards

00:12:11.240 --> 00:12:17.120
human level intelligence because
technological evolution is about

00:12:17.120 --> 00:12:22.080
a million times a million times faster
than biological evolution and

00:12:22.080 --> 00:12:30.120
biological evolution needed
3.5 billion years to evolve a monkey

00:12:30.120 --> 00:12:34.400
a monkey from scratch but then just
a few tens of millions of years

00:12:34.400 --> 00:12:38.680
afterwards to evolve
human level intelligence we have

00:12:38.680 --> 00:12:42.880
a company which is called Mason's
like birth in English

00:12:42.880 --> 00:12:46.800
Mason's but spelled in
a different way which is trying to make

00:12:46.800 --> 00:12:50.960
this a reality and build
the first true general and purpose AI at

00:12:52.520 --> 00:12:59.640
the moment almost all research in AI is
very human centric and it's all about

00:12:59.640 --> 00:13:05.640
making human lives longer and
healthier and easier and making humans

00:13:05.640 --> 00:13:12.080
more addicted to their smartphones but
in the long run a eyes are going to

00:13:12.080 --> 00:13:16.760
especially the smart ones are going to set
themselves their own goals and I have

00:13:17.440 --> 00:13:21.560
no doubt in my mind that they
are going to become much smarter than we

00:13:21.560 --> 00:13:26.360
are and what are they going to do of
course they are going to realize what we

00:13:26.360 --> 00:13:31.360
have realized a long time ago namely that
most of the resources in

00:13:31.360 --> 00:13:38.040
the solar system or in general are not in
our little biosphere they are out there in

00:13:38.040 --> 00:13:44.200
space and so of course they
are going to emigrate and of course they

00:13:44.200 --> 00:13:53.200
are going to use trillions of
self-replicating robot factories to expand

00:13:53.960 --> 00:13:56.640
in form of growing

00:13:56.640 --> 00:14:01.160
AI bubble which within a few
hundred thousand years is going to cover

00:14:01.160 --> 00:14:07.040
the entire galaxy by senders and receivers
such that a eyes can travel the way they

00:14:07.040 --> 00:14:15.680
are already traveling in my lab by radio
from sender to receiver Wireless so what

00:14:15.680 --> 00:14:23.120
we are witnessing now is much more than
just another Industrial Revolution this is

00:14:24.040 --> 00:14:29.480
something that transcends
humankind and even life itself

00:14:29.480 --> 00:14:35.320
the last time something so important
has happened was maybe 3.5 billion years

00:14:35.320 --> 00:14:41.520
ago when life was invented a new type of
life is going to emerge from

00:14:41.520 --> 00:14:45.800
our little planet and
it's going to colonize and transform

00:14:46.800 --> 00:14:51.760
the entire universe the universe is
still young it's only 13.8 billion years

00:14:51.760 --> 00:14:57.880
old it's going to become much older than
that many times more many times older

00:14:57.880 --> 00:15:03.320
than that so there's plenty of time
to reach all of it or all of

00:15:03.320 --> 00:15:08.640
the visible parts totally within
the limits of light speed and physics

00:15:09.680 --> 00:15:14.920
a new type of life is going to make
the universe intelligent now of course we

00:15:14.920 --> 00:15:22.000
are not going to remain the crown of
creation of course not but there is still

00:15:22.000 --> 00:15:29.120
beauty in seeing yourself as part of
a grander process that leads the cosmos

00:15:29.120 --> 00:15:35.960
from low complexity towards
higher complexity it's a privilege to live

00:15:35.960 --> 00:15:40.440
at a time where we can
witness the beginnings of that and where

00:15:40.440 --> 00:15:49.840
we can contribute something to
that thank you for your patience