[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:00.00,0:00:18.23,Default,,0000,0000,0000,,{\i1}35C3 preroll music{\i0}\N Dialogue: 0,0:00:18.23,0:00:24.75,Default,,0000,0000,0000,,Herald Angel: Welcome to our introduction\Nto deep learning with Teubi. Deep Dialogue: 0,0:00:24.75,0:00:30.25,Default,,0000,0000,0000,,learning, also often called machine\Nlearning is a hype word which we hear in Dialogue: 0,0:00:30.25,0:00:37.15,Default,,0000,0000,0000,,the media all the time. It's nearly as bad\Nas blockchain. It's a solution for Dialogue: 0,0:00:37.15,0:00:43.25,Default,,0000,0000,0000,,everything. Today we'll get a sneak peek\Ninto the internals of this mystical black Dialogue: 0,0:00:43.25,0:00:48.82,Default,,0000,0000,0000,,box, they are talking about. And Teubi\Nwill show us why people, who know what Dialogue: 0,0:00:48.82,0:00:53.04,Default,,0000,0000,0000,,machine learning really is about, have to\Nfacepalm so often, when they read the Dialogue: 0,0:00:53.04,0:00:58.72,Default,,0000,0000,0000,,news. So please welcome Teubi\Nwith a big round of applause! Dialogue: 0,0:00:58.72,0:01:10.24,Default,,0000,0000,0000,,{\i1}Applause{\i0}\NTeubi: Alright! Good morning and welcome Dialogue: 0,0:01:10.24,0:01:14.47,Default,,0000,0000,0000,,to Introduction to Deep Learning. The\Ntitle will already tell you what this talk Dialogue: 0,0:01:14.47,0:01:19.92,Default,,0000,0000,0000,,is about. I want to give you an\Nintroduction onto how deep learning works, Dialogue: 0,0:01:19.92,0:01:27.09,Default,,0000,0000,0000,,what happens inside this black box. But,\Nfirst of all, who am I? I'm Teubi. It's a Dialogue: 0,0:01:27.09,0:01:32.28,Default,,0000,0000,0000,,German nickname, it has nothing to do with\Ntoys or bees. You might have heard my Dialogue: 0,0:01:32.28,0:01:36.48,Default,,0000,0000,0000,,voice before, because I host the\NNussschale podcast. There I explain Dialogue: 0,0:01:36.48,0:01:41.56,Default,,0000,0000,0000,,scientific topics in under 10 minutes.\NI'll have to use a little more time today, Dialogue: 0,0:01:41.56,0:01:46.85,Default,,0000,0000,0000,,and you'll also have fancy animations\Nwhich hopefully will help. In my day job Dialogue: 0,0:01:46.85,0:01:52.54,Default,,0000,0000,0000,,I'm a research scientist at an institute\Nfor computer vision. I analyze microscopy Dialogue: 0,0:01:52.54,0:01:58.24,Default,,0000,0000,0000,,images of bone marrow blood cells and try\Nto find ways to teach the computer to Dialogue: 0,0:01:58.24,0:02:04.66,Default,,0000,0000,0000,,understand what it sees. Namely, to\Ndifferentiate between certain cells or, Dialogue: 0,0:02:04.66,0:02:09.45,Default,,0000,0000,0000,,first of all, find cells in an image,\Nwhich is a task that is more complex than Dialogue: 0,0:02:09.45,0:02:17.18,Default,,0000,0000,0000,,it might sound like. Let me start with the\Nintroduction to deep learning. We all know Dialogue: 0,0:02:17.18,0:02:22.77,Default,,0000,0000,0000,,how to code. We code in a very simple way.\NWe have some input for all computer Dialogue: 0,0:02:22.77,0:02:27.62,Default,,0000,0000,0000,,algorithm. Then we have an algorithm which\Nsays: Do this, do that. If this, then Dialogue: 0,0:02:28.51,0:02:28.91,Default,,0000,0000,0000,,that. And in that way we generate some\Noutput. This is not how machine learning Dialogue: 0,0:02:29.50,0:02:30.75,Default,,0000,0000,0000,,works. Machine learning assumes you have\Nsome input, and you also have some output. Dialogue: 0,0:02:40.81,0:02:46.18,Default,,0000,0000,0000,,And what you also have is some statistical\Nmodel. This statistical model is flexible. Dialogue: 0,0:02:46.18,0:02:51.55,Default,,0000,0000,0000,,It has certain parameters, which it can\Nlearn from the distribution of inputs and Dialogue: 0,0:02:51.55,0:02:57.43,Default,,0000,0000,0000,,outputs you give it for training. So you\Nbasically learn the statistical model to Dialogue: 0,0:02:57.43,0:03:03.66,Default,,0000,0000,0000,,generate the desired output from the given\Ninput. Let me give you a really simple Dialogue: 0,0:03:03.66,0:03:09.98,Default,,0000,0000,0000,,example of how this might work. Let's say\Nwe have two animals. Well, we have two Dialogue: 0,0:03:09.98,0:03:15.69,Default,,0000,0000,0000,,kinds of animals: unicorns and rabbits.\NAnd now we want to find an algorithm that Dialogue: 0,0:03:15.69,0:03:24.27,Default,,0000,0000,0000,,tells us whether this animal we have right\Nnow as an input is a rabbit or a unicorn. Dialogue: 0,0:03:24.27,0:03:28.23,Default,,0000,0000,0000,,We can write a simple algorithm to do\Nthat, but we can also do it with machine Dialogue: 0,0:03:28.23,0:03:34.59,Default,,0000,0000,0000,,learning. The first thing we need is some\Ninput. I choose two features that are able Dialogue: 0,0:03:34.59,0:03:42.27,Default,,0000,0000,0000,,to tell me whether this animal is a rabbit\Nor a unicorn. Namely, speed and size. We Dialogue: 0,0:03:42.27,0:03:46.86,Default,,0000,0000,0000,,call these features, and they describe\Nsomething about what we want to classify. Dialogue: 0,0:03:46.86,0:03:52.41,Default,,0000,0000,0000,,And the class is in this case our animal.\NFirst thing I need is some training data, Dialogue: 0,0:03:52.41,0:03:59.17,Default,,0000,0000,0000,,some input. The input here are just pairs\Nof speed and size. What I also need is Dialogue: 0,0:03:59.17,0:04:04.13,Default,,0000,0000,0000,,information about the desired output. The\Ndesired output, of course, being the Dialogue: 0,0:04:04.13,0:04:12.100,Default,,0000,0000,0000,,class. So either unicorn or rabbit, here\Ndenoted by yellow and red X's. So let's Dialogue: 0,0:04:12.100,0:04:18.30,Default,,0000,0000,0000,,try to find a statistical model which we\Ncan use to separate this feature space Dialogue: 0,0:04:18.30,0:04:24.15,Default,,0000,0000,0000,,into two halves: One for the rabbits, one\Nfor the unicorns. Looking at this, we can Dialogue: 0,0:04:24.15,0:04:28.66,Default,,0000,0000,0000,,actually find a really simple statistical\Nmodel, and our statistical model in this Dialogue: 0,0:04:28.66,0:04:34.39,Default,,0000,0000,0000,,case is just a straight line. And the\Nlearning process is then to find where in Dialogue: 0,0:04:34.39,0:04:41.08,Default,,0000,0000,0000,,this feature space the line should be.\NIdeally, for example, here. Right in the Dialogue: 0,0:04:41.08,0:04:45.22,Default,,0000,0000,0000,,middle between the two classes rabbit and\Nunicorn. Of course this is an overly Dialogue: 0,0:04:45.22,0:04:50.37,Default,,0000,0000,0000,,simplified example. Real-world\Napplications have feature distributions Dialogue: 0,0:04:50.37,0:04:56.08,Default,,0000,0000,0000,,which look much more like this. So, we\Nhave a gradient, we don't have a perfect Dialogue: 0,0:04:56.08,0:05:00.13,Default,,0000,0000,0000,,separation between those two classes, and\Nthose two classes are definitely not Dialogue: 0,0:05:00.13,0:05:05.56,Default,,0000,0000,0000,,separable by a line. If we look again at\Nsome training samples — training samples Dialogue: 0,0:05:05.56,0:05:11.73,Default,,0000,0000,0000,,are the data points we use for the machine\Nlearning process, so, to try to find the Dialogue: 0,0:05:11.73,0:05:17.54,Default,,0000,0000,0000,,parameters of our statistical model — if\Nwe look at the line again, then this will Dialogue: 0,0:05:17.54,0:05:23.00,Default,,0000,0000,0000,,not be able to separate this training set.\NWell, we will have a line that has some Dialogue: 0,0:05:23.00,0:05:27.32,Default,,0000,0000,0000,,errors, some unicorns which will be\Nclassified as rabbits, some rabbits which Dialogue: 0,0:05:27.32,0:05:33.07,Default,,0000,0000,0000,,will be classified as unicorns. This is\Nwhat we call underfitting. Our model is Dialogue: 0,0:05:33.07,0:05:40.15,Default,,0000,0000,0000,,just not able to express what we want it\Nto learn. There is the opposite case. The Dialogue: 0,0:05:40.15,0:05:45.51,Default,,0000,0000,0000,,opposite case being: we just learn all the\Ntraining samples by heart. This is if we Dialogue: 0,0:05:45.51,0:05:50.02,Default,,0000,0000,0000,,have a very complex model and just a few\Ntraining samples to teach the model what Dialogue: 0,0:05:50.02,0:05:55.12,Default,,0000,0000,0000,,it should learn. In this case we have a\Nperfect separation of unicorns and Dialogue: 0,0:05:55.12,0:06:00.70,Default,,0000,0000,0000,,rabbits, at least for the few data points\Nwe have. If we draw another example from Dialogue: 0,0:06:00.70,0:06:07.30,Default,,0000,0000,0000,,the real world,some other data points,\Nthey will most likely be wrong. And this Dialogue: 0,0:06:07.30,0:06:11.38,Default,,0000,0000,0000,,is what we call overfitting. The perfect\Nscenario in this case would be something Dialogue: 0,0:06:11.38,0:06:17.34,Default,,0000,0000,0000,,like this: a classifier which is really\Nclose to the distribution we have in the Dialogue: 0,0:06:17.34,0:06:23.35,Default,,0000,0000,0000,,real world and machine learning is tasked\Nwith finding this perfect model and its Dialogue: 0,0:06:23.35,0:06:28.96,Default,,0000,0000,0000,,parameters. Let me show you a different\Nkind of model, something you probably all Dialogue: 0,0:06:28.96,0:06:35.67,Default,,0000,0000,0000,,have heard about: Neural networks. Neural\Nnetworks are inspired by the brain. Dialogue: 0,0:06:35.67,0:06:41.21,Default,,0000,0000,0000,,Or more precisely, by the neurons in our\Nbrain. Neurons are tiny objects, tiny Dialogue: 0,0:06:41.21,0:06:47.25,Default,,0000,0000,0000,,cells in our brain that take some input\Nand generate some output. Sounds familiar, Dialogue: 0,0:06:47.25,0:06:52.68,Default,,0000,0000,0000,,right? We have inputs usually in the form\Nof electrical signals. And if they are Dialogue: 0,0:06:52.68,0:06:57.86,Default,,0000,0000,0000,,strong enough, this neuron will also send\Nout an electrical signal. And this is Dialogue: 0,0:06:57.86,0:07:03.43,Default,,0000,0000,0000,,something we can model in a computer-\Nengineering way. So, what we do is: We Dialogue: 0,0:07:03.43,0:07:09.24,Default,,0000,0000,0000,,take a neuron. The neuron is just a simple\Nmapping from input to output. Input here, Dialogue: 0,0:07:09.24,0:07:17.20,Default,,0000,0000,0000,,just three input nodes. We denote them by\Ni1, i2 and i3 and output denoted by o. And Dialogue: 0,0:07:17.20,0:07:20.84,Default,,0000,0000,0000,,now you will actually see some\Nmathematical equations. There are not many Dialogue: 0,0:07:20.84,0:07:26.70,Default,,0000,0000,0000,,of these in this foundation talk, don't\Nworry, and it's really simple. There's one Dialogue: 0,0:07:26.70,0:07:30.25,Default,,0000,0000,0000,,more thing we need first, though, if we\Nwant to map input to output in the way a Dialogue: 0,0:07:30.25,0:07:35.49,Default,,0000,0000,0000,,neuron does. Namely, the weights. The\Nweights are just some arbitrary numbers Dialogue: 0,0:07:35.49,0:07:43.02,Default,,0000,0000,0000,,for now. Let's call them w1, w2 and w3.\NSo, we take those weights and we multiply Dialogue: 0,0:07:43.02,0:07:51.36,Default,,0000,0000,0000,,them with the input. Input1 times weight1,\Ninput2 times weight2, and so on. And this, Dialogue: 0,0:07:51.36,0:07:57.55,Default,,0000,0000,0000,,this sum just will be our output. Well,\Nnot quite. We make it a little bit more Dialogue: 0,0:07:57.55,0:08:02.43,Default,,0000,0000,0000,,complicated. We also use something called\Nan activation function. The activation Dialogue: 0,0:08:02.43,0:08:08.52,Default,,0000,0000,0000,,function is just a mapping from one scalar\Nvalue to another scalar value. In this Dialogue: 0,0:08:08.52,0:08:14.28,Default,,0000,0000,0000,,case from what we got as an output, the\Nsum, to something that more closely fits Dialogue: 0,0:08:14.28,0:08:19.36,Default,,0000,0000,0000,,what we need. This could for example be\Nsomething binary, where we have all the Dialogue: 0,0:08:19.36,0:08:23.78,Default,,0000,0000,0000,,negative numbers being mapped to zero and\Nall the positive numbers being mapped to Dialogue: 0,0:08:23.78,0:08:30.91,Default,,0000,0000,0000,,one. And then this zero and one can encode\Nsomething. For example: rabbit or unicorn. Dialogue: 0,0:08:30.91,0:08:35.31,Default,,0000,0000,0000,,So, let me give you an example of how we\Ncan make the previous example with the Dialogue: 0,0:08:35.31,0:08:41.73,Default,,0000,0000,0000,,rabbits and unicorns work with such a\Nsimple neuron. We just use speed, size, Dialogue: 0,0:08:41.73,0:08:49.65,Default,,0000,0000,0000,,and the arbitrarily chosen number 10 as\Nour inputs and the weights 1, 1, and -1. Dialogue: 0,0:08:49.65,0:08:54.40,Default,,0000,0000,0000,,If we look at the equations, then we get\Nfor our negative numbers — so, speed plus Dialogue: 0,0:08:54.40,0:09:01.44,Default,,0000,0000,0000,,size being less than 10 — a 0, and a 1 for\Nall positive numbers — being speed plus Dialogue: 0,0:09:01.44,0:09:07.68,Default,,0000,0000,0000,,size larger than 10, greater than 10. This\Nway we again have a separating line Dialogue: 0,0:09:07.68,0:09:14.60,Default,,0000,0000,0000,,between unicorns and rabbits. But again we\Nhave this really simplistic model. We want Dialogue: 0,0:09:14.60,0:09:21.53,Default,,0000,0000,0000,,to become more and more complicated in\Norder to express more complex tasks. So Dialogue: 0,0:09:21.53,0:09:26.28,Default,,0000,0000,0000,,what do we do? We take more neurons. We\Ntake our three input values and put them Dialogue: 0,0:09:26.28,0:09:31.92,Default,,0000,0000,0000,,into one neuron, and into a second neuron,\Nand into a third neuron. And we take the Dialogue: 0,0:09:31.92,0:09:38.33,Default,,0000,0000,0000,,output of those three neurons as input for\Nanother neuron. We also call this a Dialogue: 0,0:09:38.33,0:09:42.14,Default,,0000,0000,0000,,multilayer perceptron, perceptron just\Nbeing a different name for a neuron, what Dialogue: 0,0:09:42.14,0:09:48.67,Default,,0000,0000,0000,,we have there. And the whole thing is also\Ncalled a neural network. So now the Dialogue: 0,0:09:48.67,0:09:53.30,Default,,0000,0000,0000,,question: How do we train this? How do we\Nlearn what this network should encode? Dialogue: 0,0:09:53.30,0:09:57.62,Default,,0000,0000,0000,,Well, we want a mapping from input to\Noutput, and what we can change are the Dialogue: 0,0:09:57.62,0:10:02.88,Default,,0000,0000,0000,,weights. First, what we do is we take a\Ntraining sample, some input. Put it Dialogue: 0,0:10:02.88,0:10:07.01,Default,,0000,0000,0000,,through the network, get an output. But\Nthis might not be the desired output which Dialogue: 0,0:10:07.01,0:10:13.57,Default,,0000,0000,0000,,we know. So, in the binary case there are\Nfour possible cases: computed output, Dialogue: 0,0:10:13.57,0:10:19.86,Default,,0000,0000,0000,,expected output, each two values, 0 and 1.\NThe best case would be: we want a 0, get a Dialogue: 0,0:10:19.86,0:10:27.12,Default,,0000,0000,0000,,0, want a 1 and get a 1. But there is also\Nthe opposite case. In these two cases we Dialogue: 0,0:10:27.12,0:10:31.44,Default,,0000,0000,0000,,can learn something about our model.\NNamely, in which direction to change the Dialogue: 0,0:10:31.44,0:10:37.27,Default,,0000,0000,0000,,weights. It's a little bit simplified, but\Nin principle you just raise the weights if Dialogue: 0,0:10:37.27,0:10:41.25,Default,,0000,0000,0000,,you need a higher number as output and you\Nlower the weights if you need a lower Dialogue: 0,0:10:41.25,0:10:47.35,Default,,0000,0000,0000,,number as output. To tell you how much, we\Nhave two terms. First term being the Dialogue: 0,0:10:47.35,0:10:53.11,Default,,0000,0000,0000,,error, so in this case just the difference\Nbetween desired and expected output – also Dialogue: 0,0:10:53.11,0:10:56.89,Default,,0000,0000,0000,,often called a loss function, especially\Nin deep learning and more complex Dialogue: 0,0:10:56.89,0:11:04.12,Default,,0000,0000,0000,,applications. You also have a second term\Nwe call the act the learning rate, and the Dialogue: 0,0:11:04.12,0:11:09.17,Default,,0000,0000,0000,,learning rate is what tells us how quickly\Nwe should change the weights, how quickly Dialogue: 0,0:11:09.17,0:11:14.89,Default,,0000,0000,0000,,we should adapt the weights. Okay, this is\Nhow we learn a model. This is almost Dialogue: 0,0:11:14.89,0:11:18.55,Default,,0000,0000,0000,,everything you need to know. There are\Nmathematical equations that tell you how Dialogue: 0,0:11:18.55,0:11:23.77,Default,,0000,0000,0000,,much to change based on the error and the\Nlearning function. And this is the entire Dialogue: 0,0:11:23.77,0:11:30.34,Default,,0000,0000,0000,,learning process. Let's get back to the\Nterminology. We have the input layer. We Dialogue: 0,0:11:30.34,0:11:34.02,Default,,0000,0000,0000,,have the output layer, which somehow\Nencodes our output either in one value or Dialogue: 0,0:11:34.02,0:11:39.65,Default,,0000,0000,0000,,in several values if we have a multiple,\Nif we have multiple classes. We also have Dialogue: 0,0:11:39.65,0:11:45.93,Default,,0000,0000,0000,,the hidden layers, which are actually what\Nmakes our model deep. What we can change, Dialogue: 0,0:11:45.93,0:11:51.98,Default,,0000,0000,0000,,what we can learn, is the are the weights,\Nthe parameters of this model. But what we Dialogue: 0,0:11:51.98,0:11:55.49,Default,,0000,0000,0000,,also need to keep in mind, is the number\Nof layers, the number of neurons per Dialogue: 0,0:11:55.49,0:11:59.59,Default,,0000,0000,0000,,layer, the learning rate, and the\Nactivation function. These are called Dialogue: 0,0:11:59.59,0:12:04.24,Default,,0000,0000,0000,,hyper parameters, and they determine how\Ncomplex our model is, how well it is Dialogue: 0,0:12:04.24,0:12:09.97,Default,,0000,0000,0000,,suited to solve the task at hand. I quite\Noften spoke about solving tasks, so the Dialogue: 0,0:12:09.97,0:12:14.63,Default,,0000,0000,0000,,question is: What can we actually do with\Nneural networks? Mostly classification Dialogue: 0,0:12:14.63,0:12:19.56,Default,,0000,0000,0000,,tasks, for example: Tell me, is this\Nanimal a rabbit or unicorn? Is this text Dialogue: 0,0:12:19.56,0:12:24.69,Default,,0000,0000,0000,,message spam or legitimate? Is this\Npatient healthy or ill? Is this image a Dialogue: 0,0:12:24.69,0:12:30.71,Default,,0000,0000,0000,,picture of a cat or a dog? We already saw\Nfor the animal that we need something Dialogue: 0,0:12:30.71,0:12:35.04,Default,,0000,0000,0000,,called features, which somehow encodes\Ninformation about what we want to Dialogue: 0,0:12:35.04,0:12:39.53,Default,,0000,0000,0000,,classify, something we can use as input\Nfor the neural network. Some kind of Dialogue: 0,0:12:39.53,0:12:43.83,Default,,0000,0000,0000,,number that is meaningful. So, for the\Nanimal it could be speed, size, or Dialogue: 0,0:12:43.83,0:12:48.74,Default,,0000,0000,0000,,something like color. Color, of course,\Nbeing more complex again, because we have, Dialogue: 0,0:12:48.74,0:12:55.94,Default,,0000,0000,0000,,for example, RGB, so three values. And,\Ntext message being a more complex case Dialogue: 0,0:12:55.94,0:13:00.06,Default,,0000,0000,0000,,again, because we somehow need to encode\Nthe sender, and whether the sender is Dialogue: 0,0:13:00.06,0:13:04.77,Default,,0000,0000,0000,,legitimate. Same for the recipient, or the\Nnumber of hyperlinks, or where the Dialogue: 0,0:13:04.77,0:13:11.40,Default,,0000,0000,0000,,hyperlinks refer to, or the, whether there\Nare certain words present in the text. It Dialogue: 0,0:13:11.40,0:13:16.72,Default,,0000,0000,0000,,gets more and more complicated. Even more\Nso for a patient. How do we encode medical Dialogue: 0,0:13:16.72,0:13:22.42,Default,,0000,0000,0000,,history in a proper way for the network to\Nlearn. I mean, temperature is simple. It's Dialogue: 0,0:13:22.42,0:13:26.75,Default,,0000,0000,0000,,a scalar value, we just have a number. But\Nhow do we encode whether certain symptoms Dialogue: 0,0:13:26.75,0:13:32.72,Default,,0000,0000,0000,,are present. And the image, which is\Nactually what I work with everyday, is Dialogue: 0,0:13:32.72,0:13:38.35,Default,,0000,0000,0000,,again quite complex. We have values, we\Nhave numbers, but only pixel values, which Dialogue: 0,0:13:38.35,0:13:43.45,Default,,0000,0000,0000,,make it difficult, which are difficult to\Nuse as input for a neural network. Why? Dialogue: 0,0:13:43.45,0:13:48.35,Default,,0000,0000,0000,,I'll show you. I'll actually show you with\Nthis picture, it's a very famous picture, Dialogue: 0,0:13:48.35,0:13:53.97,Default,,0000,0000,0000,,and everybody uses it in computer vision.\NThey will tell you, it's because there is Dialogue: 0,0:13:53.97,0:14:01.01,Default,,0000,0000,0000,,a multitude of different characteristics\Nin this image: shapes, edges, whatever you Dialogue: 0,0:14:01.01,0:14:07.08,Default,,0000,0000,0000,,desire. The truth is, it's a crop from the\Ncentrefold of the Playboy, and in earlier Dialogue: 0,0:14:07.08,0:14:12.07,Default,,0000,0000,0000,,years, the computer vision engineers was a\Nmostly male audience. Anyway, let's take Dialogue: 0,0:14:12.07,0:14:16.85,Default,,0000,0000,0000,,five by five pixels. Let's assume, this is\Na five by five pixels, a really small, Dialogue: 0,0:14:16.85,0:14:22.23,Default,,0000,0000,0000,,image. If we take those 25 pixels and use\Nthem as input for a neural network you Dialogue: 0,0:14:22.23,0:14:26.73,Default,,0000,0000,0000,,already see that we have many connections\N- many weights - which means a very Dialogue: 0,0:14:26.73,0:14:32.54,Default,,0000,0000,0000,,complex model. Complex model, of course,\Nprone to overfitting. But there are more Dialogue: 0,0:14:32.54,0:14:38.80,Default,,0000,0000,0000,,problems. First being, we have\Ndisconnected the pixels from its neigh-, a Dialogue: 0,0:14:38.80,0:14:43.67,Default,,0000,0000,0000,,pixel from its neighbors. We can't encode\Ninformation about the neighborhood Dialogue: 0,0:14:43.67,0:14:47.85,Default,,0000,0000,0000,,anymore, and that really sucks. If we just\Ntake the whole picture, and move it to the Dialogue: 0,0:14:47.85,0:14:52.79,Default,,0000,0000,0000,,left or to the right by just one pixel,\Nthe network will see something completely Dialogue: 0,0:14:52.79,0:14:58.47,Default,,0000,0000,0000,,different, even though to us it is exactly\Nthe same. But, we can solve that with some Dialogue: 0,0:14:58.47,0:15:03.40,Default,,0000,0000,0000,,very clever engineering, something we call\Na convolutional layer. It is again a Dialogue: 0,0:15:03.40,0:15:08.86,Default,,0000,0000,0000,,hidden layer in a neural network, but it\Ndoes something special. It actually is a Dialogue: 0,0:15:08.86,0:15:13.97,Default,,0000,0000,0000,,very simple neuron again, just four input\Nvalues - one output value. But the four Dialogue: 0,0:15:13.97,0:15:19.78,Default,,0000,0000,0000,,input values look at two by two pixels,\Nand encode one output value. And then the Dialogue: 0,0:15:19.78,0:15:23.79,Default,,0000,0000,0000,,same network is shifted to the right, and\Nencodes another pixel, and another pixel, Dialogue: 0,0:15:23.79,0:15:30.15,Default,,0000,0000,0000,,and the next row of pixels. And in this\Nway creates another 2D image. We have Dialogue: 0,0:15:30.15,0:15:34.90,Default,,0000,0000,0000,,preserved information about the\Nneighborhood, and we just have a very low Dialogue: 0,0:15:34.90,0:15:41.91,Default,,0000,0000,0000,,number of weights, not the huge number of\Nparameters we saw earlier. We can use this Dialogue: 0,0:15:41.91,0:15:49.64,Default,,0000,0000,0000,,once, or twice, or several hundred times.\NAnd this is actually where we go deep. Dialogue: 0,0:15:49.64,0:15:54.92,Default,,0000,0000,0000,,Deep means: We have several layers, and\Nhaving layers that don't need thousands or Dialogue: 0,0:15:54.92,0:16:01.04,Default,,0000,0000,0000,,millions of connections, but only a few.\NThis is what allows us to go really deep. Dialogue: 0,0:16:01.04,0:16:06.25,Default,,0000,0000,0000,,And in this fashion we can encode an\Nentire image in just a few meaningful Dialogue: 0,0:16:06.25,0:16:11.48,Default,,0000,0000,0000,,values. How these values look like, and\Nwhat they encode, this is learned through Dialogue: 0,0:16:11.48,0:16:18.24,Default,,0000,0000,0000,,the learning process. And we can then, for\Nexample, use these few values as input for Dialogue: 0,0:16:18.24,0:16:24.71,Default,,0000,0000,0000,,a classification network. \NThe fully connected network we saw earlier. Dialogue: 0,0:16:24.71,0:16:29.56,Default,,0000,0000,0000,,Or we can do something more clever. We can \Ndo the inverse operation and create an image Dialogue: 0,0:16:29.56,0:16:35.17,Default,,0000,0000,0000,,again, for example, the same image, which\Nis then called an auto encoder. Auto Dialogue: 0,0:16:35.17,0:16:40.20,Default,,0000,0000,0000,,encoders are tremendously useful, even\Nthough they don't appear that way. For Dialogue: 0,0:16:40.20,0:16:43.96,Default,,0000,0000,0000,,example, imagine you want to check whether\Nsomething has a defect, or not, a picture Dialogue: 0,0:16:43.96,0:16:51.29,Default,,0000,0000,0000,,of a fabric, or of something. You just\Ntrain the network with normal pictures. Dialogue: 0,0:16:51.29,0:16:56.77,Default,,0000,0000,0000,,And then, if you have a defect picture,\Nthe network is not able to produce this Dialogue: 0,0:16:56.77,0:17:02.15,Default,,0000,0000,0000,,defect. And so the difference of the\Nreproduced picture, and the real picture Dialogue: 0,0:17:02.15,0:17:07.42,Default,,0000,0000,0000,,will show you where errors are. If it\Nworks properly, I'll have to admit that. Dialogue: 0,0:17:07.42,0:17:12.57,Default,,0000,0000,0000,,But we can go even further. Let's say, we\Nwant to encode something entirely else. Dialogue: 0,0:17:12.57,0:17:17.40,Default,,0000,0000,0000,,Well, let's encode the image, the\Ninformation in the image, but in another Dialogue: 0,0:17:17.40,0:17:21.86,Default,,0000,0000,0000,,representation. For example, let's say we\Nhave three classes again. The background Dialogue: 0,0:17:21.86,0:17:30.05,Default,,0000,0000,0000,,class in grey, a class called hat or\Nheadwear in blue, and person in green. We Dialogue: 0,0:17:30.05,0:17:34.31,Default,,0000,0000,0000,,can also use this for other applications\Nthan just for pictures of humans. For Dialogue: 0,0:17:34.31,0:17:38.37,Default,,0000,0000,0000,,example, we have a picture of a street and\Nwant to encode: Where is the car, where's Dialogue: 0,0:17:38.37,0:17:44.86,Default,,0000,0000,0000,,the pedestrian? Tremendously useful. Or we\Nhave an MRI scan of a brain: Where in the Dialogue: 0,0:17:44.86,0:17:51.11,Default,,0000,0000,0000,,brain is the tumor? Can we somehow learn\Nthis? Yes we can do this, with methods Dialogue: 0,0:17:51.11,0:17:57.48,Default,,0000,0000,0000,,like these, if they are trained properly.\NMore about that later. Well we expect Dialogue: 0,0:17:57.48,0:18:01.02,Default,,0000,0000,0000,,something like this to come out but the\Ntruth looks rather like this – especially Dialogue: 0,0:18:01.02,0:18:05.87,Default,,0000,0000,0000,,if it's not properly trained. We have not\Nthe real shape we want to get but Dialogue: 0,0:18:05.87,0:18:11.98,Default,,0000,0000,0000,,something distorted. So here is again\Nwhere we need to do learning. First we Dialogue: 0,0:18:11.98,0:18:15.79,Default,,0000,0000,0000,,take a picture, put it through the\Nnetwork, get our output representation. Dialogue: 0,0:18:15.79,0:18:21.11,Default,,0000,0000,0000,,And we have the information about how we\Nwant it to look. We again compute some Dialogue: 0,0:18:21.11,0:18:27.04,Default,,0000,0000,0000,,kind of loss value. This time for example\Nbeing the overlap between the shape we get Dialogue: 0,0:18:27.04,0:18:34.04,Default,,0000,0000,0000,,out of the model and the shape we want to\Nhave. And we use this error, this lost Dialogue: 0,0:18:34.04,0:18:38.66,Default,,0000,0000,0000,,function, to update the weights of our\Nnetwork. Again – even though it's more Dialogue: 0,0:18:38.66,0:18:43.57,Default,,0000,0000,0000,,complicated here, even though we have more\Nlayers, and even though the layers look Dialogue: 0,0:18:43.57,0:18:48.64,Default,,0000,0000,0000,,slightly different – it is the same\Nprocess all over again as with a binary Dialogue: 0,0:18:48.64,0:18:56.54,Default,,0000,0000,0000,,case. And we need lots of training data.\NThis is something that you'll hear often Dialogue: 0,0:18:56.54,0:19:02.96,Default,,0000,0000,0000,,in connection with deep learning: You need\Nlots of training data to make this work. Dialogue: 0,0:19:02.96,0:19:10.10,Default,,0000,0000,0000,,Images are complex things and in order to\Nmeaningful extract knowledge from them, Dialogue: 0,0:19:10.10,0:19:17.09,Default,,0000,0000,0000,,the network needs to see a multitude of\Ndifferent images. Well now I already Dialogue: 0,0:19:17.09,0:19:22.23,Default,,0000,0000,0000,,showed you some things we use in network\Narchitecture, some support networks: The Dialogue: 0,0:19:22.23,0:19:26.68,Default,,0000,0000,0000,,fully convolutional encoder, which takes\Nan image and produces a few meaningful Dialogue: 0,0:19:26.68,0:19:33.11,Default,,0000,0000,0000,,values out of this image; its counterpart\Nthe fully convolutional decoder – fully Dialogue: 0,0:19:33.11,0:19:36.96,Default,,0000,0000,0000,,convolutional meaning by the way that we\Nonly have these convolutional layers with Dialogue: 0,0:19:36.96,0:19:42.98,Default,,0000,0000,0000,,a few parameters that somehow encode\Nspatial information and keep it for the Dialogue: 0,0:19:42.98,0:19:49.36,Default,,0000,0000,0000,,next layers. The decoder takes a few\Nmeaningful numbers and reproduces an image Dialogue: 0,0:19:49.36,0:19:55.42,Default,,0000,0000,0000,,– either the same image or another\Nrepresentation of the information encoded Dialogue: 0,0:19:55.42,0:20:01.40,Default,,0000,0000,0000,,in the image. We also already saw the\Nfully connected network. Fully connected Dialogue: 0,0:20:01.40,0:20:06.64,Default,,0000,0000,0000,,meaning every neuron is connected to every\Nneuron in the next layer. This of course Dialogue: 0,0:20:06.64,0:20:12.57,Default,,0000,0000,0000,,can be dangerous because this is where we\Nactually get most of our parameters. If we Dialogue: 0,0:20:12.57,0:20:16.39,Default,,0000,0000,0000,,have a fully connected network, this is\Nwhere the most parameters will be present Dialogue: 0,0:20:16.39,0:20:21.58,Default,,0000,0000,0000,,because connecting every node to every\Nnode … this is just a high number of Dialogue: 0,0:20:21.58,0:20:25.86,Default,,0000,0000,0000,,connections. We can also do other things.\NFor example something called a pooling Dialogue: 0,0:20:25.86,0:20:32.28,Default,,0000,0000,0000,,layer. A pooling layer being basically the\Nsame as one of those convolutional layers, Dialogue: 0,0:20:32.28,0:20:36.37,Default,,0000,0000,0000,,just that we don't have parameters we need\Nto learn. This works without parameters Dialogue: 0,0:20:36.37,0:20:43.74,Default,,0000,0000,0000,,because this neuron just chooses whichever\Nvalue is the highest and takes that value Dialogue: 0,0:20:43.74,0:20:49.60,Default,,0000,0000,0000,,as output. This is really great for\Nreducing the size of your image and also Dialogue: 0,0:20:49.60,0:20:55.15,Default,,0000,0000,0000,,getting rid of information that might not\Nbe that important. We can also do some Dialogue: 0,0:20:55.15,0:20:59.89,Default,,0000,0000,0000,,clever techniques like adding a dropout\Nlayer. A dropout layer just being a normal Dialogue: 0,0:20:59.89,0:21:05.80,Default,,0000,0000,0000,,layer in a neural network where we remove\Nsome connections: In one training step Dialogue: 0,0:21:05.80,0:21:10.72,Default,,0000,0000,0000,,these connections, in the next training\Nstep some other connections. This way we Dialogue: 0,0:21:10.72,0:21:18.05,Default,,0000,0000,0000,,teach the other connections to become more\Nresilient against errors. I would like to Dialogue: 0,0:21:18.05,0:21:22.75,Default,,0000,0000,0000,,start with something I call the "Model\NShow" now, and show you some models and Dialogue: 0,0:21:22.75,0:21:28.87,Default,,0000,0000,0000,,how we train those models. And I will\Nstart with a fully convolutional decoder Dialogue: 0,0:21:28.87,0:21:34.74,Default,,0000,0000,0000,,we saw earlier: This thing that takes a\Nnumber and creates a picture. I would like Dialogue: 0,0:21:34.74,0:21:41.42,Default,,0000,0000,0000,,to take this model, put in some number and\Nget out a picture – a picture of a horse Dialogue: 0,0:21:41.42,0:21:46.00,Default,,0000,0000,0000,,for example. If I put in a different\Nnumber I also want to get a picture of a Dialogue: 0,0:21:46.00,0:21:52.39,Default,,0000,0000,0000,,horse, but of a different horse. So what I\Nwant to get is a mapping from some Dialogue: 0,0:21:52.39,0:21:56.73,Default,,0000,0000,0000,,numbers, some features that encode\Nsomething about the horse picture, and get Dialogue: 0,0:21:56.73,0:22:03.45,Default,,0000,0000,0000,,a horse picture out of it. You might see\Nalready why this is problematic. It is Dialogue: 0,0:22:03.45,0:22:08.23,Default,,0000,0000,0000,,problematic because we don't have a\Nmapping from feature to horse or from Dialogue: 0,0:22:08.23,0:22:15.05,Default,,0000,0000,0000,,horse to features. So we don't have a\Ntruth value we can use to learn how to Dialogue: 0,0:22:15.05,0:22:21.79,Default,,0000,0000,0000,,generate this mapping. Well computer\Nvision engineers – or deep learning Dialogue: 0,0:22:21.79,0:22:26.80,Default,,0000,0000,0000,,professionals – they're smart and have\Nclever ideas. Let's just assume we have Dialogue: 0,0:22:26.80,0:22:32.87,Default,,0000,0000,0000,,such a network and let's call it a\Ngenerator. Let's take some numbers put, Dialogue: 0,0:22:32.87,0:22:39.24,Default,,0000,0000,0000,,them into the generator and get some\Nhorses. Well it doesn't work yet. We still Dialogue: 0,0:22:39.24,0:22:42.49,Default,,0000,0000,0000,,have to train it. So they're probably not\Nonly horses but also some very special Dialogue: 0,0:22:42.49,0:22:47.97,Default,,0000,0000,0000,,unicorns among the horses; which might be\Nnice for other applications, but I wanted Dialogue: 0,0:22:47.97,0:22:55.48,Default,,0000,0000,0000,,pictures of horses right now. So I can't\Ntrain with this data directly. But what I Dialogue: 0,0:22:55.48,0:23:01.60,Default,,0000,0000,0000,,can do is I can create a second network.\NThis network is called a discriminator and Dialogue: 0,0:23:01.60,0:23:08.82,Default,,0000,0000,0000,,I can give it the input generated from the\Ngenerator as well as the real data I have: Dialogue: 0,0:23:08.82,0:23:13.92,Default,,0000,0000,0000,,the real horse pictures. And then I can\Nteach the discriminator to distinguish Dialogue: 0,0:23:13.92,0:23:22.08,Default,,0000,0000,0000,,between those. Tell me it is a real horse\Nor it's not a real horse. And there I know Dialogue: 0,0:23:22.08,0:23:27.00,Default,,0000,0000,0000,,what is the truth because I either take\Nreal horse pictures or fake horse pictures Dialogue: 0,0:23:27.00,0:23:34.17,Default,,0000,0000,0000,,from the generator. So I have a truth\Nvalue for this discriminator. But in doing Dialogue: 0,0:23:34.17,0:23:39.07,Default,,0000,0000,0000,,this I also have a truth value for the\Ngenerator. Because I want the generator to Dialogue: 0,0:23:39.07,0:23:43.80,Default,,0000,0000,0000,,work against the discriminator. So I can\Nalso use the information how well the Dialogue: 0,0:23:43.80,0:23:51.01,Default,,0000,0000,0000,,discriminator does to train the generator\Nto become better in fooling. This is Dialogue: 0,0:23:51.01,0:23:57.47,Default,,0000,0000,0000,,called a generative adversarial network.\NAnd it can be used to generate pictures of Dialogue: 0,0:23:57.47,0:24:02.35,Default,,0000,0000,0000,,an arbitrary distribution. Let's do this\Nwith numbers and I will actually show you Dialogue: 0,0:24:02.35,0:24:07.59,Default,,0000,0000,0000,,the training process. Before I start the\Nvideo, I'll tell you what I did. I took Dialogue: 0,0:24:07.59,0:24:11.55,Default,,0000,0000,0000,,some handwritten digits. There is a\Ndatabase called "??? of handwritten Dialogue: 0,0:24:11.55,0:24:18.57,Default,,0000,0000,0000,,digits" so the numbers of 0 to 9. And I\Ntook those and used them as training data. Dialogue: 0,0:24:18.57,0:24:24.30,Default,,0000,0000,0000,,I trained a generator in the way I showed\Nyou on the previous slide, and then I just Dialogue: 0,0:24:24.30,0:24:30.11,Default,,0000,0000,0000,,took some random numbers. I put those\Nrandom numbers into the network and just Dialogue: 0,0:24:30.11,0:24:35.96,Default,,0000,0000,0000,,stored the image of what came out of the\Nnetwork. And here in the video you'll see Dialogue: 0,0:24:35.96,0:24:43.09,Default,,0000,0000,0000,,how the network improved with ongoing\Ntraining. You will see that we start Dialogue: 0,0:24:43.09,0:24:50.18,Default,,0000,0000,0000,,basically with just noisy images … and\Nthen after some – what we call apox(???) Dialogue: 0,0:24:50.18,0:24:55.92,Default,,0000,0000,0000,,so training iterations – the network is\Nable to almost perfectly generate Dialogue: 0,0:24:55.92,0:25:05.68,Default,,0000,0000,0000,,handwritten digits just from noise. Which\NI find truly fascinating. Of course this Dialogue: 0,0:25:05.68,0:25:11.27,Default,,0000,0000,0000,,is an example where it works. It highly\Ndepends on your data set and how you train Dialogue: 0,0:25:11.27,0:25:15.60,Default,,0000,0000,0000,,the model whether it is a success or not.\NBut if it works, you can use it to Dialogue: 0,0:25:15.60,0:25:22.56,Default,,0000,0000,0000,,generate fonts. You can generate\Ncharacters, 3D objects, pictures of Dialogue: 0,0:25:22.56,0:25:28.70,Default,,0000,0000,0000,,animals, whatever you want as long as you\Nhave training data. Let's go more crazy. Dialogue: 0,0:25:28.70,0:25:34.54,Default,,0000,0000,0000,,Let's take two of those and let's say we\Nhave pictures of horses and pictures of Dialogue: 0,0:25:34.54,0:25:41.15,Default,,0000,0000,0000,,zebras. I want to convert those pictures\Nof horses into pictures of zebras, and I Dialogue: 0,0:25:41.15,0:25:44.59,Default,,0000,0000,0000,,want to convert pictures of zebras into\Npictures of horses. So I want to have the Dialogue: 0,0:25:44.59,0:25:49.69,Default,,0000,0000,0000,,same picture just with the other animal.\NBut I don't have training data of the same Dialogue: 0,0:25:49.69,0:25:56.27,Default,,0000,0000,0000,,situation just once with a horse and once\Nwith a zebra. Doesn't matter. We can train Dialogue: 0,0:25:56.27,0:26:00.65,Default,,0000,0000,0000,,a network that does that for us. Again we\Njust have a network – we call it the Dialogue: 0,0:26:00.65,0:26:05.73,Default,,0000,0000,0000,,generator – and we have two of those: One\Nthat converts horses to zebras and one Dialogue: 0,0:26:05.73,0:26:14.84,Default,,0000,0000,0000,,that converts zebras to horses. And then\Nwe also have two discriminators that tell Dialogue: 0,0:26:14.84,0:26:21.15,Default,,0000,0000,0000,,us: real horse – fake horse – real zebra –\Nfake zebra. And then we again need to Dialogue: 0,0:26:21.15,0:26:27.21,Default,,0000,0000,0000,,perform some training. So we need to\Nsomehow encode: Did it work what we wanted Dialogue: 0,0:26:27.21,0:26:31.46,Default,,0000,0000,0000,,to do? And a very simple way to do this is\Nwe take a picture of a horse put it Dialogue: 0,0:26:31.46,0:26:35.47,Default,,0000,0000,0000,,through the generator that generates a\Nzebra. Take this fake picture of a zebra, Dialogue: 0,0:26:35.47,0:26:39.34,Default,,0000,0000,0000,,put it through the generator that\Ngenerates a picture of a horse. And if Dialogue: 0,0:26:39.34,0:26:43.70,Default,,0000,0000,0000,,this is the same picture as we put in,\Nthen our model worked. And if it didn't, Dialogue: 0,0:26:43.70,0:26:48.55,Default,,0000,0000,0000,,we can use that information to update the\Nweights. I just took a random picture, Dialogue: 0,0:26:48.55,0:26:54.46,Default,,0000,0000,0000,,from a free library in the Internet, of a\Nhorse and generated a zebra and it worked Dialogue: 0,0:26:54.46,0:26:59.47,Default,,0000,0000,0000,,remarkably well. I actually didn't even do\Ntraining. It also doesn't need to be a Dialogue: 0,0:26:59.47,0:27:03.12,Default,,0000,0000,0000,,picture. You can also convert text to\Nimages: You describe something in words Dialogue: 0,0:27:03.12,0:27:09.57,Default,,0000,0000,0000,,and generate images. You can age your face\Nor age a cell; or make a patient healthy Dialogue: 0,0:27:09.57,0:27:15.51,Default,,0000,0000,0000,,or sick – or the image of a patient, not\Nthe patient self, unfortunately. You can Dialogue: 0,0:27:15.51,0:27:20.69,Default,,0000,0000,0000,,do style transfer like take a picture of\NVan Gogh and apply it to your own picture. Dialogue: 0,0:27:20.69,0:27:27.56,Default,,0000,0000,0000,,Stuff like that. Something else that we\Ncan do with neural networks. Let's assume Dialogue: 0,0:27:27.56,0:27:31.03,Default,,0000,0000,0000,,we have a classification network, we have\Na picture of a toothbrush and the network Dialogue: 0,0:27:31.03,0:27:36.77,Default,,0000,0000,0000,,tells us: Well, this is a toothbrush.\NGreat! But how resilient is this network? Dialogue: 0,0:27:36.77,0:27:44.53,Default,,0000,0000,0000,,Does it really work in every scenario.\NThere's a second network we can apply: We Dialogue: 0,0:27:44.53,0:27:48.70,Default,,0000,0000,0000,,call it an adversarial network. And that\Nnetwork is trained to do one thing: Look Dialogue: 0,0:27:48.70,0:27:52.29,Default,,0000,0000,0000,,at the network, look at the picture, and\Nthen find the one weak spot in the Dialogue: 0,0:27:52.29,0:27:55.88,Default,,0000,0000,0000,,picture: Just change one pixel slightly so\Nthat the network will tell me this Dialogue: 0,0:27:55.88,0:28:03.60,Default,,0000,0000,0000,,toothbrush is an octopus. Works remarkably\Nwell. Also works with just changing the Dialogue: 0,0:28:03.60,0:28:08.94,Default,,0000,0000,0000,,picture slightly, so changing all the\Npixels, but just slight minute changes Dialogue: 0,0:28:08.94,0:28:12.86,Default,,0000,0000,0000,,that we don't perceive, but the network –\Nthe classification network – is completely Dialogue: 0,0:28:12.86,0:28:19.64,Default,,0000,0000,0000,,thrown off. Well sounds bad. Is bad if you\Ndon't consider it. But you can also for Dialogue: 0,0:28:19.64,0:28:24.20,Default,,0000,0000,0000,,example use this for training your network\Nand make your network resilient. So Dialogue: 0,0:28:24.20,0:28:28.46,Default,,0000,0000,0000,,there's always an upside and downside.\NSomething entirely else: Now I'd like to Dialogue: 0,0:28:28.46,0:28:32.88,Default,,0000,0000,0000,,show you something about text. A word-\Nlanguage model. I want to generate Dialogue: 0,0:28:32.88,0:28:38.10,Default,,0000,0000,0000,,sentences for my podcast. I have a network\Nthat gives me a word, and then if I want Dialogue: 0,0:28:38.10,0:28:42.64,Default,,0000,0000,0000,,to somehow get the next word in the\Nsentence, I also need to consider this Dialogue: 0,0:28:42.64,0:28:47.07,Default,,0000,0000,0000,,word. So another network architecture –\Nquite interestingly – just takes the Dialogue: 0,0:28:47.07,0:28:52.18,Default,,0000,0000,0000,,hidden states of the network and uses them\Nas the input for the same network so that Dialogue: 0,0:28:52.18,0:28:58.78,Default,,0000,0000,0000,,in the next iteration we still know what\Nwe did in the previous step. I tried to Dialogue: 0,0:28:58.78,0:29:04.73,Default,,0000,0000,0000,,train a network that generates podcast\Nepisodes for my podcasts. Didn't work. Dialogue: 0,0:29:04.73,0:29:08.45,Default,,0000,0000,0000,,What I learned is I don't have enough\Ntraining data. I really need to produce Dialogue: 0,0:29:08.45,0:29:15.79,Default,,0000,0000,0000,,more podcast episodes in order to train a\Nmodel to do my job for me. And this is Dialogue: 0,0:29:15.79,0:29:21.54,Default,,0000,0000,0000,,very important, a very crucial point:\NTraining data. We need shitloads of Dialogue: 0,0:29:21.54,0:29:26.08,Default,,0000,0000,0000,,training data. And actually the more\Ncomplicated our model and our training Dialogue: 0,0:29:26.08,0:29:30.99,Default,,0000,0000,0000,,process becomes, the more training data we\Nneed. I started with a supervised case – Dialogue: 0,0:29:30.99,0:29:35.99,Default,,0000,0000,0000,,the really simple case where we, really\Nsimple, the really simpler case where we Dialogue: 0,0:29:35.99,0:29:40.66,Default,,0000,0000,0000,,have a picture and a label that\Ncorresponds to that picture; or a Dialogue: 0,0:29:40.66,0:29:46.28,Default,,0000,0000,0000,,representation of that picture showing\Nentirely what I wanted to learn. But we Dialogue: 0,0:29:46.28,0:29:51.91,Default,,0000,0000,0000,,also saw a more complex task, where I had\Nto pictures – horses and zebras – that are Dialogue: 0,0:29:51.91,0:29:56.40,Default,,0000,0000,0000,,from two different domains – but domains\Nwith no direct mapping. What can also Dialogue: 0,0:29:56.40,0:30:01.02,Default,,0000,0000,0000,,happen – and actually happens quite a lot\N– is weakly annotated data, so data that Dialogue: 0,0:30:01.02,0:30:08.75,Default,,0000,0000,0000,,is not precisely annotated; where we can't\Nrely on the information we get. Or even Dialogue: 0,0:30:08.75,0:30:13.05,Default,,0000,0000,0000,,more complicated: Something called\Nreinforcement learning where we perform a Dialogue: 0,0:30:13.05,0:30:19.38,Default,,0000,0000,0000,,sequence of actions and then in the end\Nare told "yeah that was great". Which is Dialogue: 0,0:30:19.38,0:30:24.08,Default,,0000,0000,0000,,often not enough information to really\Nperform proper training. But of course Dialogue: 0,0:30:24.08,0:30:28.19,Default,,0000,0000,0000,,there are also methods for that. As well\Nas there are methods for the unsupervised Dialogue: 0,0:30:28.19,0:30:33.59,Default,,0000,0000,0000,,case where we don't have annotations,\Nlabeled data – no ground truth at all – Dialogue: 0,0:30:33.59,0:30:41.24,Default,,0000,0000,0000,,just the picture itself. Well I talked\Nabout pictures. I told you that we can Dialogue: 0,0:30:41.24,0:30:45.32,Default,,0000,0000,0000,,learn features and create images from\Nthem. And we can use them for Dialogue: 0,0:30:45.32,0:30:51.64,Default,,0000,0000,0000,,classification. And for this there exist\Nmany databases. There are public data sets Dialogue: 0,0:30:51.64,0:30:56.66,Default,,0000,0000,0000,,we can use. Often they refer to for\Nexample Flickr. They're just hyperlinks Dialogue: 0,0:30:56.66,0:31:00.96,Default,,0000,0000,0000,,which is also why I didn't show you many\Npictures right here, because I am honestly Dialogue: 0,0:31:00.96,0:31:05.69,Default,,0000,0000,0000,,not sure about the copyright in those\Ncases. But there are also challenge Dialogue: 0,0:31:05.69,0:31:11.19,Default,,0000,0000,0000,,datasets where you can just sign up, get\Nsome for example medical data sets, and Dialogue: 0,0:31:11.19,0:31:16.65,Default,,0000,0000,0000,,then compete against other researchers.\NAnd of course there are those companies Dialogue: 0,0:31:16.65,0:31:22.09,Default,,0000,0000,0000,,that just have lots of data. And those\Ncompanies also have the means, the Dialogue: 0,0:31:22.09,0:31:28.11,Default,,0000,0000,0000,,capacity to perform intense computations.\NAnd those are also often the companies you Dialogue: 0,0:31:28.11,0:31:36.18,Default,,0000,0000,0000,,hear from in terms of innovation for deep\Nlearning. Well this was mostly to tell you Dialogue: 0,0:31:36.18,0:31:40.20,Default,,0000,0000,0000,,that you can process images quite well\Nwith deep learning if you have enough Dialogue: 0,0:31:40.20,0:31:46.03,Default,,0000,0000,0000,,training data, if you have a proper\Ntraining process and also a little if you Dialogue: 0,0:31:46.03,0:31:52.09,Default,,0000,0000,0000,,know what you're doing. But you can also\Nprocess text, you can process audio and Dialogue: 0,0:31:52.09,0:31:58.52,Default,,0000,0000,0000,,time series like prices or a stack\Nexchange – stuff like that. You can Dialogue: 0,0:31:58.52,0:32:02.93,Default,,0000,0000,0000,,process almost everything if you make it\Nencodeable to your network. Sounds like a Dialogue: 0,0:32:02.93,0:32:08.12,Default,,0000,0000,0000,,dream come true. But – as I already told\Nyou – you need data, a lot of it. I told Dialogue: 0,0:32:08.12,0:32:14.02,Default,,0000,0000,0000,,you about those companies that have lots\Nof data sets and the publicly available Dialogue: 0,0:32:14.02,0:32:21.37,Default,,0000,0000,0000,,data sets which you can actually use to\Nget started with your own experiments. But Dialogue: 0,0:32:21.37,0:32:24.31,Default,,0000,0000,0000,,that also makes it a little dangerous\Nbecause deep learning still is a black box Dialogue: 0,0:32:24.31,0:32:30.82,Default,,0000,0000,0000,,to us. I told you what happens inside the\Nblack box on a level that teaches you how Dialogue: 0,0:32:30.82,0:32:36.53,Default,,0000,0000,0000,,we learn and how the network is\Nstructured, but not really what the Dialogue: 0,0:32:36.53,0:32:42.83,Default,,0000,0000,0000,,network learned. It is for us computer\Nvision engineers really nice that we can Dialogue: 0,0:32:42.83,0:32:48.59,Default,,0000,0000,0000,,visualize the first layers of a neural\Nnetwork and see what is actually encoded Dialogue: 0,0:32:48.59,0:32:53.95,Default,,0000,0000,0000,,in those first layers; what information\Nthe network looks at. But you can't really Dialogue: 0,0:32:53.95,0:32:59.06,Default,,0000,0000,0000,,mathematically prove what happens in a\Nnetwork. Which is one major downside. And Dialogue: 0,0:32:59.06,0:33:02.15,Default,,0000,0000,0000,,so if you want to use it, the numbers may\Nbe really great but be sure to properly Dialogue: 0,0:33:02.15,0:33:08.06,Default,,0000,0000,0000,,evaluate them. In summary I call that\N"easy to learn". Every one – every single Dialogue: 0,0:33:08.06,0:33:12.68,Default,,0000,0000,0000,,one of you – can just start with deep\Nlearning right away. You don't need to do Dialogue: 0,0:33:12.68,0:33:19.44,Default,,0000,0000,0000,,much work. You don't need to do much\Nlearning. The model learns for you. But Dialogue: 0,0:33:19.44,0:33:23.77,Default,,0000,0000,0000,,they're hard to master in a way that makes\Nthem useful for production use cases for Dialogue: 0,0:33:23.77,0:33:29.90,Default,,0000,0000,0000,,example. So if you want to use deep\Nlearning for something – if you really Dialogue: 0,0:33:29.90,0:33:34.30,Default,,0000,0000,0000,,want to seriously use it –, make sure that\Nit really does what you wanted to and Dialogue: 0,0:33:34.30,0:33:38.90,Default,,0000,0000,0000,,doesn't learn something else – which also\Nhappens. Pretty sure you saw some talks Dialogue: 0,0:33:38.90,0:33:43.67,Default,,0000,0000,0000,,about deep learning fails – which is not\Nwhat this talk is about. They're quite Dialogue: 0,0:33:43.67,0:33:47.37,Default,,0000,0000,0000,,funny to look at. Just make sure that they\Ndon't happen to you! If you do that Dialogue: 0,0:33:47.37,0:33:53.30,Default,,0000,0000,0000,,though, you'll achieve great things with\Ndeep learning, I'm sure. And that was Dialogue: 0,0:33:53.30,0:34:00.74,Default,,0000,0000,0000,,introduction to deep learning. Thank you!\N{\i1}Applause{\i0} Dialogue: 0,0:34:09.17,0:34:13.45,Default,,0000,0000,0000,,Herald Angel: So now it's question and\Nanswer time. So if you have a question, Dialogue: 0,0:34:13.45,0:34:19.11,Default,,0000,0000,0000,,please line up at the mikes. We have in\Ntotal eight, so it shouldn't be far from Dialogue: 0,0:34:19.11,0:34:26.14,Default,,0000,0000,0000,,you. They are here in the corridors and on\Nthese sides. Please line up! For Dialogue: 0,0:34:26.14,0:34:31.54,Default,,0000,0000,0000,,everybody: A question consists of one\Nsentence with the question mark in the end Dialogue: 0,0:34:31.54,0:34:38.45,Default,,0000,0000,0000,,– not three minutes of rambling. And also\Nif you go to the microphone, speak into Dialogue: 0,0:34:38.45,0:34:53.89,Default,,0000,0000,0000,,the microphone, so you really get close to\Nit. Okay. Where do we have … Number 7! Dialogue: 0,0:34:53.89,0:35:02.20,Default,,0000,0000,0000,,We start with mic number 7:\NQuestion: Hello. My question is: How did Dialogue: 0,0:35:02.20,0:35:13.02,Default,,0000,0000,0000,,you compute the example for the fonts, the\Nnumbers? I didn't really understand it, Dialogue: 0,0:35:13.02,0:35:19.77,Default,,0000,0000,0000,,you just said it was made from white\Nnoise. Dialogue: 0,0:35:19.77,0:35:25.58,Default,,0000,0000,0000,,Teubi: I'll give you a really brief recap\Nof what I did. I showed you that we have a Dialogue: 0,0:35:25.58,0:35:31.14,Default,,0000,0000,0000,,model that maps image to some meaningful\Nvalues, that an image can be encoded in Dialogue: 0,0:35:31.14,0:35:36.86,Default,,0000,0000,0000,,just a few values. What happens here is\Nexactly the other way round. We have some Dialogue: 0,0:35:36.86,0:35:43.27,Default,,0000,0000,0000,,values, just some arbitrary values we\Nactually know nothing about. We can Dialogue: 0,0:35:43.27,0:35:47.48,Default,,0000,0000,0000,,generate pictures out of those. So I\Ntrained this model to just take some Dialogue: 0,0:35:47.48,0:35:54.56,Default,,0000,0000,0000,,random values and show the pictures\Ngenerated from the model. The training Dialogue: 0,0:35:54.56,0:36:03.32,Default,,0000,0000,0000,,process was this "min max game", as its\Ncalled. We have two networks that try to Dialogue: 0,0:36:03.32,0:36:08.26,Default,,0000,0000,0000,,compete against each other. One network\Ntrying to distinguish, whether a picture Dialogue: 0,0:36:08.26,0:36:12.79,Default,,0000,0000,0000,,it sees is real or one of those fake\Npictures, and the network that actually Dialogue: 0,0:36:12.79,0:36:18.51,Default,,0000,0000,0000,,generates those pictures and in training\Nthe network that is able to distinguish Dialogue: 0,0:36:18.51,0:36:24.60,Default,,0000,0000,0000,,between those, we can also get information\Nfor the training of the network that Dialogue: 0,0:36:24.60,0:36:30.41,Default,,0000,0000,0000,,generates the pictures. So the videos you\Nsaw were just animations of what happens Dialogue: 0,0:36:30.41,0:36:36.44,Default,,0000,0000,0000,,during this training process. At first if\Nwe input noise we get noise. But as the Dialogue: 0,0:36:36.44,0:36:41.51,Default,,0000,0000,0000,,network is able to better and better\Nrecreate those images from the dataset we Dialogue: 0,0:36:41.51,0:36:47.39,Default,,0000,0000,0000,,used as input, in this case pictures of\Nhandwritten digits, the output also became Dialogue: 0,0:36:47.39,0:36:54.66,Default,,0000,0000,0000,,more lookalike to those numbers, these\Nhandwritten digits. Hope that helped. Dialogue: 0,0:36:54.66,0:37:06.59,Default,,0000,0000,0000,,Herald Angel: Now we go to the\NInternet. – Can we get sound for the signal Dialogue: 0,0:37:06.59,0:37:10.04,Default,,0000,0000,0000,,Angel, please? Teubi: Sounded so great,\N"now we go to the Internet." Dialogue: 0,0:37:10.04,0:37:11.04,Default,,0000,0000,0000,,Herald Angel: Yeah, that sounds like\N"yeeaah". Dialogue: 0,0:37:11.04,0:37:13.04,Default,,0000,0000,0000,,Signal Angel: And now we're finally ready\Nto go to the interwebs. "Schorsch" is Dialogue: 0,0:37:13.04,0:37:18.04,Default,,0000,0000,0000,,asking: Do you have any recommendations\Nfor a beginner regarding the framework or Dialogue: 0,0:37:18.04,0:37:26.46,Default,,0000,0000,0000,,the software?\NTeubi: I, of course, am very biased to Dialogue: 0,0:37:26.46,0:37:34.15,Default,,0000,0000,0000,,recommend what I use everyday. But I also\Nthink that it is a great start. Basically, Dialogue: 0,0:37:34.15,0:37:40.21,Default,,0000,0000,0000,,use python and use pytorch. Many people\Nwill disagree with me and tell you Dialogue: 0,0:37:40.21,0:37:45.93,Default,,0000,0000,0000,,"tensorflow is better." It might be, in my\Nopinion not for getting started, and there Dialogue: 0,0:37:45.93,0:37:51.56,Default,,0000,0000,0000,,are also some nice tutorials on the\Npytorch website. What you can also do is Dialogue: 0,0:37:51.56,0:37:57.20,Default,,0000,0000,0000,,look at websites like OpenAI, where they\Nhave a gym to get you started with some Dialogue: 0,0:37:57.20,0:38:02.37,Default,,0000,0000,0000,,training exercises, where you already have\Ndatasets. Yeah, basically my Dialogue: 0,0:38:02.37,0:38:08.60,Default,,0000,0000,0000,,recommendation is get used to Python and\Nstart with a pytorch tutorial, see where Dialogue: 0,0:38:08.60,0:38:13.59,Default,,0000,0000,0000,,to go from there. Often there also some\Ngithub repositories linked with many Dialogue: 0,0:38:13.59,0:38:18.74,Default,,0000,0000,0000,,examples for already established network\Narchitectures like the cycle GAN or the Dialogue: 0,0:38:18.74,0:38:26.25,Default,,0000,0000,0000,,GAN itself or basically everything else.\NThere will be a repo you can use to get Dialogue: 0,0:38:26.25,0:38:29.94,Default,,0000,0000,0000,,started.\NHerald Angel: OK, we stay with the Dialogue: 0,0:38:29.94,0:38:32.59,Default,,0000,0000,0000,,internet. There's some more questions, I\Nheard. Dialogue: 0,0:38:32.59,0:38:37.92,Default,,0000,0000,0000,,Signal Angel: Yes. Rubin8 is asking: Have\Nyou have you ever come across an example Dialogue: 0,0:38:37.92,0:38:42.58,Default,,0000,0000,0000,,of a neural network that deals with audio\Ninstead of images? Dialogue: 0,0:38:42.58,0:38:49.41,Default,,0000,0000,0000,,Teubi: Me personally, no. At least not\Ndirectly. I've heard about examples, like Dialogue: 0,0:38:49.41,0:38:54.86,Default,,0000,0000,0000,,where you can change the voice to sound\Nlike another person, but there is not much Dialogue: 0,0:38:54.86,0:38:59.98,Default,,0000,0000,0000,,I can reliably tell about that. My\Nexpertise really is in image processing, Dialogue: 0,0:38:59.98,0:39:05.55,Default,,0000,0000,0000,,I'm sorry.\NHerald Angel: And I think we have time for Dialogue: 0,0:39:05.55,0:39:12.34,Default,,0000,0000,0000,,one more question. We have one at number\N8. Microphone number 8. Dialogue: 0,0:39:12.34,0:39:20.73,Default,,0000,0000,0000,,Question: Is the current Face recognition\Ntechnologies in, for example iPhone X, is Dialogue: 0,0:39:20.73,0:39:26.42,Default,,0000,0000,0000,,it also a deep learning algorithm or is\Nit something more simple? Do you have any Dialogue: 0,0:39:26.42,0:39:31.88,Default,,0000,0000,0000,,idea about that?\NTeubi: As far as I know, yes. That's all I Dialogue: 0,0:39:31.88,0:39:38.63,Default,,0000,0000,0000,,can reliably tell you about that, but it\Nis not only based on images but also uses Dialogue: 0,0:39:38.63,0:39:45.42,Default,,0000,0000,0000,,other information. I think distance\Ninformation encoded with some infrared Dialogue: 0,0:39:45.42,0:39:50.60,Default,,0000,0000,0000,,signals. I don't really know exactly how\Nit works, but at least iPhones already Dialogue: 0,0:39:50.60,0:39:56.00,Default,,0000,0000,0000,,have a neural network\Nprocessing engine built in, so a chip Dialogue: 0,0:39:56.00,0:40:01.19,Default,,0000,0000,0000,,dedicated to just doing those\Ncomputations. You saw that many of those Dialogue: 0,0:40:01.19,0:40:05.82,Default,,0000,0000,0000,,things can be parallelized, and this is\Nwhat those hardware architectures make use Dialogue: 0,0:40:05.82,0:40:10.38,Default,,0000,0000,0000,,of. So I'm pretty confident in saying,\Nyes, they also do it there. Dialogue: 0,0:40:10.38,0:40:12.79,Default,,0000,0000,0000,,How exactly, no clue. Dialogue: 0,0:40:13.76,0:40:15.32,Default,,0000,0000,0000,,\NHerald Angel: OK. I myself have a last Dialogue: 0,0:40:15.39,0:40:20.68,Default,,0000,0000,0000,,completely unrelated question: Did you\Ncreate the design of the slides yourself? Dialogue: 0,0:40:20.68,0:40:29.06,Default,,0000,0000,0000,,Teubi: I had some help. We have a really\Ngreat Congress design and I use that as an Dialogue: 0,0:40:29.06,0:40:32.79,Default,,0000,0000,0000,,inspiration to create those slides, yes.\N Dialogue: 0,0:40:32.79,0:40:36.76,Default,,0000,0000,0000,,Herald Angel: OK, yeah, because those are really amazing. I love them.\N Dialogue: 0,0:40:36.76,0:40:38.14,Default,,0000,0000,0000,,Teubi: Thank you! Dialogue: 0,0:40:38.47,0:40:41.20,Default,,0000,0000,0000,,Herald Angel: OK, thank you very much\NTeubi. Dialogue: 0,0:40:45.13,0:40:48.90,Default,,0000,0000,0000,,{\i1}35C5 outro music{\i0} Dialogue: 0,0:40:48.90,0:41:07.00,Default,,0000,0000,0000,,subtitles created by c3subtitles.de\Nin the year 2019. Join, and help us!