And Teubi\Nwill show us why people, who know what Dialogue: 0,0:00:48.82,0:00:53.04,Default,,0000,0000,0000,,machine learning really is about, have to\Nfacepalm so often, when they read the Dialogue: 0,0:00:53.04,0:00:58.72,Default,,0000,0000,0000,,news. So please welcome Teubi\Nwith a big round of applause! Dialogue: 0,0:00:58.72,0:01:10.24,Default,,0000,0000,0000,,{\i1}Applause{\i0}\NTeubi: Alright! Good morning and welcome Dialogue: 0,0:01:10.24,0:01:14.47,Default,,0000,0000,0000,,to Introduction to Deep Learning. The\Ntitle will already tell you what this talk Dialogue: 0,0:01:14.47,0:01:19.92,Default,,0000,0000,0000,,is about. I want to give you an\Nintroduction onto how deep learning works, Dialogue: 0,0:01:19.92,0:01:27.09,Default,,0000,0000,0000,,what happens inside this black box. But,\Nfirst of all, who am I? I'm Teubi. It's a Dialogue: 0,0:01:27.09,0:01:32.28,Default,,0000,0000,0000,,German nickname, it has nothing to do with\Ntoys or bees. You might have heard my Dialogue: 0,0:01:32.28,0:01:36.48,Default,,0000,0000,0000,,voice before, because I host the\NNussschale podcast. There I explain Dialogue: 0,0:01:36.48,0:01:41.56,Default,,0000,0000,0000,,scientific topics in under 10 minutes.\NI'll have to use a little more time today, Dialogue: 0,0:01:41.56,0:01:46.85,Default,,0000,0000,0000,,and you'll also have fancy animations\Nwhich hopefully will help. In my day job Dialogue: 0,0:01:46.85,0:01:52.54,Default,,0000,0000,0000,,I'm a research scientist at an institute\Nfor computer vision. I analyze microscopy Dialogue: 0,0:01:52.54,0:01:58.24,Default,,0000,0000,0000,,images of bone marrow blood cells and try\Nto find ways to teach the computer to Dialogue: 0,0:01:58.24,0:02:04.66,Default,,0000,0000,0000,,understand what it sees. Namely, to\Ndifferentiate between certain cells or, Dialogue: 0,0:02:04.66,0:02:09.45,Default,,0000,0000,0000,,first of all, find cells in an image,\Nwhich is a task that is more complex than Dialogue: 0,0:02:09.45,0:02:17.18,Default,,0000,0000,0000,,it might sound like. Let me start with the\Nintroduction to deep learning. We all know Dialogue: 0,0:02:17.18,0:02:22.77,Default,,0000,0000,0000,,how to code. We code in a very simple way.\NWe have some input for all computer Dialogue: 0,0:02:22.77,0:02:27.62,Default,,0000,0000,0000,,algorithm. Then we have an algorithm which\Nsays: Do this, do that. If this, then Dialogue: 0,0:02:28.51,0:02:28.91,Default,,0000,0000,0000,,that. And in that way we generate some\Noutput. This is not how machine learning Dialogue: 0,0:02:29.50,0:02:30.75,Default,,0000,0000,0000,,works. Machine learning assumes you have\Nsome input, and you also have some output. Dialogue: 0,0:02:40.81,0:02:46.18,Default,,0000,0000,0000,,And what you also have is some statistical\Nmodel. This statistical model is flexible. Dialogue: 0,0:02:46.18,0:02:51.55,Default,,0000,0000,0000,,It has certain parameters, which it can\Nlearn from the distribution of inputs and Dialogue: 0,0:02:51.55,0:02:57.43,Default,,0000,0000,0000,,outputs you give it for training. So you\Nbasically learn the statistical model to Dialogue: 0,0:02:57.43,0:03:03.66,Default,,0000,0000,0000,,generate the desired output from the given\Ninput. Let me give you a really simple Dialogue: 0,0:03:03.66,0:03:09.98,Default,,0000,0000,0000,,example of how this might work. Let's say\Nwe have two animals. Well, we have two Dialogue: 0,0:03:09.98,0:03:15.69,Default,,0000,0000,0000,,kinds of animals: unicorns and rabbits.\NAnd now we want to find an algorithm that Dialogue: 0,0:03:15.69,0:03:24.27,Default,,0000,0000,0000,,tells us whether this animal we have right\Nnow as an input is a rabbit or a unicorn. Dialogue: 0,0:03:24.27,0:03:28.23,Default,,0000,0000,0000,,We can write a simple algorithm to do\Nthat, but we can also do it with machine Dialogue: 0,0:03:28.23,0:03:34.59,Default,,0000,0000,0000,,learning. The first thing we need is some\Ninput. I choose two features that are able Dialogue: 0,0:03:34.59,0:03:42.27,Default,,0000,0000,0000,,to tell me whether this animal is a rabbit\Nor a unicorn. Namely, speed and size. We Dialogue: 0,0:03:42.27,0:03:46.86,Default,,0000,0000,0000,,call these features, and they describe\Nsomething about what we want to classify. Dialogue: 0,0:03:46.86,0:03:52.41,Default,,0000,0000,0000,,And the class is in this case our animal.\NFirst thing I need is some training data, Dialogue: 0,0:03:52.41,0:03:59.17,Default,,0000,0000,0000,,some input. The input here are just pairs\Nof speed and size. What I also need is Dialogue: 0,0:03:59.17,0:04:04.13,Default,,0000,0000,0000,,information about the desired output. The\Ndesired output, of course, being the Dialogue: 0,0:04:04.13,0:04:12.100,Default,,0000,0000,0000,,class. So either unicorn or rabbit, here\Ndenoted by yellow and red X's. So let's Dialogue: 0,0:04:12.100,0:04:18.30,Default,,0000,0000,0000,,try to find a statistical model which we\Ncan use to separate this feature space Dialogue: 0,0:04:18.30,0:04:24.15,Default,,0000,0000,0000,,into two halves: One for the rabbits, one\Nfor the unicorns. Looking at this, we can Dialogue: 0,0:04:24.15,0:04:28.66,Default,,0000,0000,0000,,actually find a really simple statistical\Nmodel, and our statistical model in this Dialogue: 0,0:04:28.66,0:04:34.39,Default,,0000,0000,0000,,case is just a straight line. And the\Nlearning process is then to find where in Dialogue: 0,0:04:34.39,0:04:41.08,Default,,0000,0000,0000,,this feature space the line should be.\NIdeally, for example, here. Right in the Dialogue: 0,0:04:41.08,0:04:45.22,Default,,0000,0000,0000,,middle between the two classes rabbit and\Nunicorn. Of course this is an overly Dialogue: 0,0:04:45.22,0:04:50.37,Default,,0000,0000,0000,,simplified example. Real-world\Napplications have feature distributions Dialogue: 0,0:04:50.37,0:04:56.08,Default,,0000,0000,0000,,which look much more like this. So, we\Nhave a gradient, we don't have a perfect Dialogue: 0,0:04:56.08,0:05:00.13,Default,,0000,0000,0000,,separation between those two classes, and\Nthose two classes are definitely not Dialogue: 0,0:05:00.13,0:05:05.56,Default,,0000,0000,0000,,separable by a line. If we look again at\Nsome training samples — training samples Dialogue: 0,0:05:05.56,0:05:11.73,Default,,0000,0000,0000,,are the data points we use for the machine\Nlearning process, so, to try to find the Dialogue: 0,0:05:11.73,0:05:17.54,Default,,0000,0000,0000,,parameters of our statistical model — if\Nwe look at the line again, then this will Dialogue: 0,0:05:17.54,0:05:23.00,Default,,0000,0000,0000,,not be able to separate this training set.\NWell, we will have a line that has some Dialogue: 0,0:05:23.00,0:05:27.32,Default,,0000,0000,0000,,errors, some unicorns which will be\Nclassified as rabbits, some rabbits which Dialogue: 0,0:05:27.32,0:05:33.07,Default,,0000,0000,0000,,will be classified as unicorns. This is\Nwhat we call underfitting. Our model is Dialogue: 0,0:05:33.07,0:05:40.15,Default,,0000,0000,0000,,just not able to express what we want it\Nto learn. There is the opposite case. The Dialogue: 0,0:05:40.15,0:05:45.51,Default,,0000,0000,0000,,opposite case being: we just learn all the\Ntraining samples by heart. This is if we Dialogue: 0,0:05:45.51,0:05:50.02,Default,,0000,0000,0000,,have a very complex model and just a few\Ntraining samples to teach the model what Dialogue: 0,0:05:50.02,0:05:55.12,Default,,0000,0000,0000,,it should learn. In this case we have a\Nperfect separation of unicorns and Dialogue: 0,0:05:55.12,0:06:00.70,Default,,0000,0000,0000,,rabbits, at least for the few data points\Nwe have. If we draw another example from Dialogue: 0,0:06:00.70,0:06:07.30,Default,,0000,0000,0000,,the real world,some other data points,\Nthey will most likely be wrong. And this Dialogue: 0,0:06:07.30,0:06:11.38,Default,,0000,0000,0000,,is what we call overfitting. The perfect\Nscenario in this case would be something Dialogue: 0,0:06:11.38,0:06:17.34,Default,,0000,0000,0000,,like this: a classifier which is really\Nclose to the distribution we have in the Dialogue: 0,0:06:17.34,0:06:23.35,Default,,0000,0000,0000,,real world and machine learning is tasked\Nwith finding this perfect model and its Dialogue: 0,0:06:23.35,0:06:28.96,Default,,0000,0000,0000,,parameters. Let me show you a different\Nkind of model, something you probably all Dialogue: 0,0:06:28.96,0:06:35.67,Default,,0000,0000,0000,,have heard about: Neural networks. Neural\Nnetworks are inspired by the brain. Dialogue: 0,0:06:35.67,0:06:41.21,Default,,0000,0000,0000,,Or more precisely, by the neurons in our\Nbrain. Neurons are tiny objects, tiny Dialogue: 0,0:06:41.21,0:06:47.25,Default,,0000,0000,0000,,cells in our brain that take some input\Nand generate some output. Sounds familiar, Dialogue: 0,0:06:47.25,0:06:52.68,Default,,0000,0000,0000,,right? We have inputs usually in the form\Nof electrical signals. And if they are Dialogue: 0,0:06:52.68,0:06:57.86,Default,,0000,0000,0000,,strong enough, this neuron will also send\Nout an electrical signal. And this is Dialogue: 0,0:06:57.86,0:07:03.43,Default,,0000,0000,0000,,something we can model in a computer-\Nengineering way. So, what we do is: We Dialogue: 0,0:07:03.43,0:07:09.24,Default,,0000,0000,0000,,take a neuron. The neuron is just a simple\Nmapping from input to output. Input here, Dialogue: 0,0:07:09.24,0:07:17.20,Default,,0000,0000,0000,,just three input nodes. We denote them by\Ni1, i2 and i3 and output denoted by o. And Dialogue: 0,0:07:17.20,0:07:20.84,Default,,0000,0000,0000,,now you will actually see some\Nmathematical equations. There are not many Dialogue: 0,0:07:20.84,0:07:26.70,Default,,0000,0000,0000,,of these in this foundation talk, don't\Nworry, and it's really simple. There's one Dialogue: 0,0:07:26.70,0:07:30.25,Default,,0000,0000,0000,,more thing we need first, though, if we\Nwant to map input to output in the way a Dialogue: 0,0:07:30.25,0:07:35.49,Default,,0000,0000,0000,,neuron does. Namely, the weights. The\Nweights are just some arbitrary numbers Dialogue: 0,0:07:35.49,0:07:43.02,Default,,0000,0000,0000,,for now. Let's call them w1, w2 and w3.\NSo, we take those weights and we multiply Dialogue: 0,0:07:43.02,0:07:51.36,Default,,0000,0000,0000,,them with the input. Input1 times weight1,\Ninput2 times weight2, and so on. And this, Dialogue: 0,0:07:51.36,0:07:57.55,Default,,0000,0000,0000,,this sum just will be our output. Well,\Nnot quite. We make it a little bit more Dialogue: 0,0:07:57.55,0:08:02.43,Default,,0000,0000,0000,,complicated. We also use something called\Nan activation function. The activation Dialogue: 0,0:08:02.43,0:08:08.52,Default,,0000,0000,0000,,function is just a mapping from one scalar\Nvalue to another scalar value. In this Dialogue: 0,0:08:08.52,0:08:14.28,Default,,0000,0000,0000,,case from what we got as an output, the\Nsum, to something that more closely fits Dialogue: 0,0:08:14.28,0:08:19.36,Default,,0000,0000,0000,,what we need. This could for example be\Nsomething binary, where we have all the Dialogue: 0,0:08:19.36,0:08:23.78,Default,,0000,0000,0000,,negative numbers being mapped to zero and\Nall the positive numbers being mapped to Dialogue: 0,0:08:23.78,0:08:30.91,Default,,0000,0000,0000,,one. And then this zero and one can encode\Nsomething. For example: rabbit or unicorn. Dialogue: 0,0:08:30.91,0:08:35.31,Default,,0000,0000,0000,,So, let me give you an example of how we\Ncan make the previous example with the Dialogue: 0,0:08:35.31,0:08:41.73,Default,,0000,0000,0000,,rabbits and unicorns work with such a\Nsimple neuron. We just use speed, size, Dialogue: 0,0:08:41.73,0:08:49.65,Default,,0000,0000,0000,,and the arbitrarily chosen number 10 as\Nour inputs and the weights 1, 1, and -1. Dialogue: 0,0:08:49.65,0:08:54.40,Default,,0000,0000,0000,,If we look at the equations, then we get\Nfor our negative numbers — so, speed plus Dialogue: 0,0:08:54.40,0:09:01.44,Default,,0000,0000,0000,,size being less than 10 — a 0, and a 1 for\Nall positive numbers — being speed plus Dialogue: 0,0:09:01.44,0:09:07.68,Default,,0000,0000,0000,,size larger than 10, greater than 10. This\Nway we again have a separating line Dialogue: 0,0:09:07.68,0:09:14.60,Default,,0000,0000,0000,,between unicorns and rabbits. But again we\Nhave this really simplistic model. We want Dialogue: 0,0:09:14.60,0:09:21.53,Default,,0000,0000,0000,,to become more and more complicated in\Norder to express more complex tasks. So Dialogue: 0,0:09:21.53,0:09:26.28,Default,,0000,0000,0000,,what do we do? We take more neurons. We\Ntake our three input values and put them Dialogue: 0,0:09:26.28,0:09:31.92,Default,,0000,0000,0000,,into one neuron, and into a second neuron,\Nand into a third neuron. And we take the Dialogue: 0,0:09:31.92,0:09:38.33,Default,,0000,0000,0000,,output of those three neurons as input for\Nanother neuron. We also call this a Dialogue: 0,0:09:38.33,0:09:42.14,Default,,0000,0000,0000,,multilayer perceptron, perceptron just\Nbeing a different name for a neuron, what Dialogue: 0,0:09:42.14,0:09:48.67,Default,,0000,0000,0000,,we have there. And the whole thing is also\Ncalled a neural network. So now the Dialogue: 0,0:09:48.67,0:09:53.30,Default,,0000,0000,0000,,question: How do we train this? How do we\Nlearn what this network should encode? Dialogue: 0,0:09:53.30,0:09:57.62,Default,,0000,0000,0000,,Well, we want a mapping from input to\Noutput, and what we can change are the Dialogue: 0,0:09:57.62,0:10:02.88,Default,,0000,0000,0000,,weights. First, what we do is we take a\Ntraining sample, some input. Put it Dialogue: 0,0:10:02.88,0:10:07.01,Default,,0000,0000,0000,,through the network, get an output. But\Nthis might not be the desired output which Dialogue: 0,0:10:07.01,0:10:13.57,Default,,0000,0000,0000,,we know. So, in the binary case there are\Nfour possible cases: computed output, Dialogue: 0,0:10:13.57,0:10:19.86,Default,,0000,0000,0000,,expected output, each two values, 0 and 1.\NThe best case would be: we want a 0, get a Dialogue: 0,0:10:19.86,0:10:27.12,Default,,0000,0000,0000,,0, want a 1 and get a 1. But there is also\Nthe opposite case. In these two cases we Dialogue: 0,0:10:27.12,0:10:31.44,Default,,0000,0000,0000,,can learn something about our model.\NNamely, in which direction to change the Dialogue: 0,0:10:31.44,0:10:37.27,Default,,0000,0000,0000,,weights. It's a little bit simplified, but\Nin principle you just raise the weights if Dialogue: 0,0:10:37.27,0:10:41.25,Default,,0000,0000,0000,,you need a higher number as output and you\Nlower the weights if you need a lower Dialogue: 0,0:10:41.25,0:10:47.35,Default,,0000,0000,0000,,number as output. To tell you how much, we\Nhave two terms. First term being the Dialogue: 0,0:10:47.35,0:10:53.11,Default,,0000,0000,0000,,error, so in this case just the difference\Nbetween desired and expected output – also Dialogue: 0,0:10:53.11,0:10:56.89,Default,,0000,0000,0000,,often called a loss function, especially\Nin deep learning and more complex Dialogue: 0,0:10:56.89,0:11:04.12,Default,,0000,0000,0000,,applications. You also have a second term\Nwe call the act the learning rate, and the Dialogue: 0,0:11:04.12,0:11:09.17,Default,,0000,0000,0000,,learning rate is what tells us how quickly\Nwe should change the weights, how quickly Dialogue: 0,0:11:09.17,0:11:14.89,Default,,0000,0000,0000,,we should adapt the weights. Okay, this is\Nhow we learn a model. This is almost Dialogue: 0,0:11:14.89,0:11:18.55,Default,,0000,0000,0000,,everything you need to know. There are\Nmathematical equations that tell you how Dialogue: 0,0:11:18.55,0:11:23.77,Default,,0000,0000,0000,,much to change based on the error and the\Nlearning function. And this is the entire Dialogue: 0,0:11:23.77,0:11:30.34,Default,,0000,0000,0000,,learning process. Let's get back to the\Nterminology. We have the input layer. We Dialogue: 0,0:11:30.34,0:11:34.02,Default,,0000,0000,0000,,have the output layer, which somehow\Nencodes our output either in one value or Dialogue: 0,0:11:34.02,0:11:39.65,Default,,0000,0000,0000,,in several values if we have a multiple,\Nif we have multiple classes. We also have Dialogue: 0,0:11:39.65,0:11:45.93,Default,,0000,0000,0000,,the hidden layers, which are actually what\Nmakes our model deep. What we can change, Dialogue: 0,0:11:45.93,0:11:51.98,Default,,0000,0000,0000,,what we can learn, is the are the weights,\Nthe parameters of this model. But what we Dialogue: 0,0:11:51.98,0:11:55.49,Default,,0000,0000,0000,,also need to keep in mind, is the number\Nof layers, the number of neurons per Dialogue: 0,0:11:55.49,0:11:59.59,Default,,0000,0000,0000,,layer, the learning rate, and the\Nactivation function. These are called Dialogue: 0,0:11:59.59,0:12:04.24,Default,,0000,0000,0000,,hyper parameters, and they determine how\Ncomplex our model is, how well it is Dialogue: 0,0:12:04.24,0:12:09.97,Default,,0000,0000,0000,,suited to solve the task at hand. I quite\Noften spoke about solving tasks, so the Dialogue: 0,0:12:09.97,0:12:14.63,Default,,0000,0000,0000,,question is: What can we actually do with\Nneural networks? Mostly classification Dialogue: 0,0:12:14.63,0:12:19.56,Default,,0000,0000,0000,,tasks, for example: Tell me, is this\Nanimal a rabbit or unicorn? Is this text Dialogue: 0,0:12:19.56,0:12:24.69,Default,,0000,0000,0000,,message spam or legitimate? Is this\Npatient healthy or ill? Is this image a Dialogue: 0,0:12:24.69,0:12:30.71,Default,,0000,0000,0000,,picture of a cat or a dog? We already saw\Nfor the animal that we need something Dialogue: 0,0:12:30.71,0:12:35.04,Default,,0000,0000,0000,,called features, which somehow encodes\Ninformation about what we want to Dialogue: 0,0:12:35.04,0:12:39.53,Default,,0000,0000,0000,,classify, something we can use as input\Nfor the neural network. Some kind of Dialogue: 0,0:12:39.53,0:12:43.83,Default,,0000,0000,0000,,number that is meaningful. So, for the\Nanimal it could be speed, size, or Dialogue: 0,0:12:43.83,0:12:48.74,Default,,0000,0000,0000,,something like color. Color, of course,\Nbeing more complex again, because we have, Dialogue: 0,0:12:48.74,0:12:55.94,Default,,0000,0000,0000,,for example, RGB, so three values. And,\Ntext message being a more complex case Dialogue: 0,0:12:55.94,0:13:00.06,Default,,0000,0000,0000,,again, because we somehow need to encode\Nthe sender, and whether the sender is Dialogue: 0,0:13:00.06,0:13:04.77,Default,,0000,0000,0000,,legitimate. Same for the recipient, or the\Nnumber of hyperlinks, or where the Dialogue: 0,0:13:04.77,0:13:11.40,Default,,0000,0000,0000,,hyperlinks refer to, or the, whether there\Nare certain words present in the text. It Dialogue: 0,0:13:11.40,0:13:16.72,Default,,0000,0000,0000,,gets more and more complicated. Even more\Nso for a patient. How do we encode medical Dialogue: 0,0:13:16.72,0:13:22.42,Default,,0000,0000,0000,,history in a proper way for the network to\Nlearn. I mean, temperature is simple. It's Dialogue: 0,0:13:22.42,0:13:26.75,Default,,0000,0000,0000,,a scalar value, we just have a number. But\Nhow do we encode whether certain symptoms Dialogue: 0,0:13:26.75,0:13:32.72,Default,,0000,0000,0000,,are present. And the image, which is\Nactually what I work with everyday, is Dialogue: 0,0:13:32.72,0:13:38.35,Default,,0000,0000,0000,,again quite complex. We have values, we\Nhave numbers, but only pixel values, which Dialogue: 0,0:13:38.35,0:13:43.45,Default,,0000,0000,0000,,make it difficult, which are difficult to\Nuse as input for a neural network. Why? Dialogue: 0,0:13:43.45,0:13:48.35,Default,,0000,0000,0000,,I'll show you. I'll actually show you with\Nthis picture, it's a very famous picture, Dialogue: 0,0:13:48.35,0:13:53.97,Default,,0000,0000,0000,,and everybody uses it in computer vision.\NThey will tell you, it's because there is Dialogue: 0,0:13:53.97,0:14:01.01,Default,,0000,0000,0000,,a multitude of different characteristics\Nin this image: shapes, edges, whatever you Dialogue: 0,0:14:01.01,0:14:07.08,Default,,0000,0000,0000,,desire. The truth is, it's a crop from the\Ncentrefold of the Playboy, and in earlier Dialogue: 0,0:14:07.08,0:14:12.07,Default,,0000,0000,0000,,years, the computer vision engineers was a\Nmostly male audience. Anyway, let's take Dialogue: 0,0:14:12.07,0:14:16.85,Default,,0000,0000,0000,,five by five pixels. Let's assume, this is\Na five by five pixels, a really small, Dialogue: 0,0:14:16.85,0:14:22.23,Default,,0000,0000,0000,,image. If we take those 25 pixels and use\Nthem as input for a neural network you Dialogue: 0,0:14:22.23,0:14:26.73,Default,,0000,0000,0000,,already see that we have many connections\N- many weights - which means a very Dialogue: 0,0:14:26.73,0:14:32.54,Default,,0000,0000,0000,,complex model. Complex model, of course,\Nprone to overfitting. But there are more Dialogue: 0,0:14:32.54,0:14:38.80,Default,,0000,0000,0000,,problems. First being, we have\Ndisconnected the pixels from its neigh-, a Dialogue: 0,0:14:38.80,0:14:43.67,Default,,0000,0000,0000,,pixel from its neighbors. We can't encode\Ninformation about the neighborhood Dialogue: 0,0:14:43.67,0:14:47.85,Default,,0000,0000,0000,,anymore, and that really sucks. If we just\Ntake the whole picture, and move it to the Dialogue: 0,0:14:47.85,0:14:52.79,Default,,0000,0000,0000,,left or to the right by just one pixel,\Nthe network will see something completely Dialogue: 0,0:14:52.79,0:14:58.47,Default,,0000,0000,0000,,different, even though to us it is exactly\Nthe same. But, we can solve that with some Dialogue: 0,0:14:58.47,0:15:03.40,Default,,0000,0000,0000,,very clever engineering, something we call\Na convolutional layer. It is again a Dialogue: 0,0:15:03.40,0:15:08.86,Default,,0000,0000,0000,,hidden layer in a neural network, but it\Ndoes something special. It actually is a Dialogue: 0,0:15:08.86,0:15:13.97,Default,,0000,0000,0000,,very simple neuron again, just four input\Nvalues - one output value. But the four Dialogue: 0,0:15:13.97,0:15:19.78,Default,,0000,0000,0000,,input values look at two by two pixels,\Nand encode one output value. And then the Dialogue: 0,0:15:19.78,0:15:23.79,Default,,0000,0000,0000,,same network is shifted to the right, and\Nencodes another pixel, and another pixel, Dialogue: 0,0:15:23.79,0:15:30.15,Default,,0000,0000,0000,,and the next row of pixels. And in this\Nway creates another 2D image. We have Dialogue: 0,0:15:30.15,0:15:34.90,Default,,0000,0000,0000,,preserved information about the\Nneighborhood, and we just have a very low Dialogue: 0,0:15:34.90,0:15:41.91,Default,,0000,0000,0000,,number of weights, not the huge number of\Nparameters we saw earlier. We can use this Dialogue: 0,0:15:41.91,0:15:49.64,Default,,0000,0000,0000,,once, or twice, or several hundred times.\NAnd this is actually where we go deep. Dialogue: 0,0:15:49.64,0:15:54.92,Default,,0000,0000,0000,,Deep means: We have several layers, and\Nhaving layers that don't need thousands or Dialogue: 0,0:15:54.92,0:16:01.04,Default,,0000,0000,0000,,millions of connections, but only a few.\NThis is what allows us to go really deep. Dialogue: 0,0:16:01.04,0:16:06.25,Default,,0000,0000,0000,,And in this fashion we can encode an\Nentire image in just a few meaningful Dialogue: 0,0:16:06.25,0:16:11.48,Default,,0000,0000,0000,,values. How these values look like, and\Nwhat they encode, this is learned through Dialogue: 0,0:16:11.48,0:16:18.24,Default,,0000,0000,0000,,the learning process. And we can then, for\Nexample, use these few values as input for Dialogue: 0,0:16:18.24,0:16:24.71,Default,,0000,0000,0000,,a classification network. \NThe fully connected network we saw earlier. Dialogue: 0,0:16:24.71,0:16:29.56,Default,,0000,0000,0000,,Or we can do something more clever. We can \Ndo the inverse operation and create an image Dialogue: 0,0:16:29.56,0:16:35.17,Default,,0000,0000,0000,,again, for example, the same image, which\Nis then called an auto encoder. Auto Dialogue: 0,0:16:35.17,0:16:40.20,Default,,0000,0000,0000,,encoders are tremendously useful, even\Nthough they don't appear that way. For Dialogue: 0,0:16:40.20,0:16:43.96,Default,,0000,0000,0000,,example, imagine you want to check whether\Nsomething has a defect, or not, a picture Dialogue: 0,0:16:43.96,0:16:51.29,Default,,0000,0000,0000,,of a fabric, or of something. You just\Ntrain the network with normal pictures. Dialogue: 0,0:16:51.29,0:16:56.77,Default,,0000,0000,0000,,And then, if you have a defect picture,\Nthe network is not able to produce this Dialogue: 0,0:16:56.77,0:17:02.15,Default,,0000,0000,0000,,defect. And so the difference of the\Nreproduced picture, and the real picture Dialogue: 0,0:17:02.15,0:17:07.42,Default,,0000,0000,0000,,will show you where errors are. If it\Nworks properly, I'll have to admit that. Dialogue: 0,0:17:07.42,0:17:12.57,Default,,0000,0000,0000,,But we can go even further. Let's say, we\Nwant to encode something entirely else. Dialogue: 0,0:17:12.57,0:17:17.40,Default,,0000,0000,0000,,Well, let's encode the image, the\Ninformation in the image, but in another Dialogue: 0,0:17:17.40,0:17:21.86,Default,,0000,0000,0000,,representation. For example, let's say we\Nhave three classes again. The background Dialogue: 0,0:17:21.86,0:17:30.05,Default,,0000,0000,0000,,class in grey, a class called hat or\Nheadwear in blue, and person in green. We Dialogue: 0,0:17:30.05,0:17:34.31,Default,,0000,0000,0000,,can also use this for other applications\Nthan just for pictures of humans. For Dialogue: 0,0:17:34.31,0:17:38.37,Default,,0000,0000,0000,,example, we have a picture of a street and\Nwant to encode: Where is the car, where's Dialogue: 0,0:17:38.37,0:17:44.86,Default,,0000,0000,0000,,the pedestrian? Tremendously useful. Or we\Nhave an MRI scan of a brain: Where in the Dialogue: 0,0:17:44.86,0:17:51.11,Default,,0000,0000,0000,,brain is the tumor? Can we somehow learn\Nthis? Yes we can do this, with methods Dialogue: 0,0:17:51.11,0:17:57.48,Default,,0000,0000,0000,,like these, if they are trained properly.\NMore about that later. Well we expect Dialogue: 0,0:17:57.48,0:18:01.02,Default,,0000,0000,0000,,something like this to come out but the\Ntruth looks rather like this – especially Dialogue: 0,0:18:01.02,0:18:05.87,Default,,0000,0000,0000,,if it's not properly trained. We have not\Nthe real shape we want to get but Dialogue: 0,0:18:05.87,0:18:11.98,Default,,0000,0000,0000,,something distorted. So here is again\Nwhere we need to do learning. First we Dialogue: 0,0:18:11.98,0:18:15.79,Default,,0000,0000,0000,,take a picture, put it through the\Nnetwork, get our output representation. Dialogue: 0,0:18:15.79,0:18:21.11,Default,,0000,0000,0000,,And we have the information about how we\Nwant it to look. We again compute some Dialogue: 0,0:18:21.11,0:18:27.04,Default,,0000,0000,0000,,kind of loss value. This time for example\Nbeing the overlap between the shape we get Dialogue: 0,0:18:27.04,0:18:34.04,Default,,0000,0000,0000,,out of the model and the shape we want to\Nhave. And we use this error, this lost Dialogue: 0,0:18:34.04,0:18:38.66,Default,,0000,0000,0000,,function, to update the weights of our\Nnetwork. Again – even though it's more Dialogue: 0,0:18:38.66,0:18:43.57,Default,,0000,0000,0000,,complicated here, even though we have more\Nlayers, and even though the layers look Dialogue: 0,0:18:43.57,0:18:48.64,Default,,0000,0000,0000,,slightly different – it is the same\Nprocess all over again as with a binary Dialogue: 0,0:18:48.64,0:18:56.54,Default,,0000,0000,0000,,case. And we need lots of training data.\NThis is something that you'll hear often Dialogue: 0,0:18:56.54,0:19:02.96,Default,,0000,0000,0000,,in connection with deep learning: You need\Nlots of training data to make this work. Dialogue: 0,0:19:02.96,0:19:10.10,Default,,0000,0000,0000,,Images are complex things and in order to\Nmeaningful extract knowledge from them, Dialogue: 0,0:19:10.10,0:19:17.09,Default,,0000,0000,0000,,the network needs to see a multitude of\Ndifferent images. Well now I already Dialogue: 0,0:19:17.09,0:19:22.23,Default,,0000,0000,0000,,showed you some things we use in network\Narchitecture, some support networks: The Dialogue: 0,0:19:22.23,0:19:26.68,Default,,0000,0000,0000,,fully convolutional encoder, which takes\Nan image and produces a few meaningful Dialogue: 0,0:19:26.68,0:19:33.11,Default,,0000,0000,0000,,values out of this image; its counterpart\Nthe fully convolutional decoder – fully Dialogue: 0,0:19:33.11,0:19:36.96,Default,,0000,0000,0000,,convolutional meaning by the way that we\Nonly have these convolutional layers with Dialogue: 0,0:19:36.96,0:19:42.98,Default,,0000,0000,0000,,a few parameters that somehow encode\Nspatial information and keep it for the Dialogue: 0,0:19:42.98,0:19:49.36,Default,,0000,0000,0000,,next layers. The decoder takes a few\Nmeaningful numbers and reproduces an image Dialogue: 0,0:19:49.36,0:19:55.42,Default,,0000,0000,0000,,– either the same image or another\Nrepresentation of the information encoded Dialogue: 0,0:19:55.42,0:20:01.40,Default,,0000,0000,0000,,in the image. We also already saw the\Nfully connected network. Fully connected Dialogue: 0,0:20:01.40,0:20:06.64,Default,,0000,0000,0000,,meaning every neuron is connected to every\Nneuron in the next layer. This of course Dialogue: 0,0:20:06.64,0:20:12.57,Default,,0000,0000,0000,,can be dangerous because this is where we\Nactually get most of our parameters. If we Dialogue: 0,0:20:12.57,0:20:16.39,Default,,0000,0000,0000,,have a fully connected network, this is\Nwhere the most parameters will be present Dialogue: 0,0:20:16.39,0:20:21.58,Default,,0000,0000,0000,,because connecting every node to every\Nnode … this is just a high number of Dialogue: 0,0:20:21.58,0:20:25.86,Default,,0000,0000,0000,,connections. We can also do other things.\NFor example something called a pooling Dialogue: 0,0:20:25.86,0:20:32.28,Default,,0000,0000,0000,,layer. A pooling layer being basically the\Nsame as one of those convolutional layers, Dialogue: 0,0:20:32.28,0:20:36.37,Default,,0000,0000,0000,,just that we don't have parameters we need\Nto learn. This works without parameters Dialogue: 0,0:20:36.37,0:20:43.74,Default,,0000,0000,0000,,because this neuron just chooses whichever\Nvalue is the highest and takes that value Dialogue: 0,0:20:43.74,0:20:49.60,Default,,0000,0000,0000,,as output. This is really great for\Nreducing the size of your image and also Dialogue: 0,0:20:49.60,0:20:55.15,Default,,0000,0000,0000,,getting rid of information that might not\Nbe that important. We can also do some Dialogue: 0,0:20:55.15,0:20:59.89,Default,,0000,0000,0000,,clever techniques like adding a dropout\Nlayer. A dropout layer just being a normal Dialogue: 0,0:20:59.89,0:21:05.80,Default,,0000,0000,0000,,layer in a neural network where we remove\Nsome connections: In one training step Dialogue: 0,0:21:05.80,0:21:10.72,Default,,0000,0000,0000,,these connections, in the next training\Nstep some other connections. This way we Dialogue: 0,0:21:10.72,0:21:18.05,Default,,0000,0000,0000,,teach the other connections to become more\Nresilient against errors. I would like to Dialogue: 0,0:21:18.05,0:21:22.75,Default,,0000,0000,0000,,start with something I call the "Model\NShow" now, and show you some models and Dialogue: 0,0:21:22.75,0:21:28.87,Default,,0000,0000,0000,,how we train those models. And I will\Nstart with a fully convolutional decoder Dialogue: 0,0:21:28.87,0:21:34.74,Default,,0000,0000,0000,,we saw earlier: This thing that takes a\Nnumber and creates a picture. I would like Dialogue: 0,0:21:34.74,0:21:41.42,Default,,0000,0000,0000,,to take this model, put in some number and\Nget out a picture – a picture of a horse Dialogue: 0,0:21:41.42,0:21:46.00,Default,,0000,0000,0000,,for example. If I put in a different\Nnumber I also want to get a picture of a Dialogue: 0,0:21:46.00,0:21:52.39,Default,,0000,0000,0000,,horse, but of a different horse. So what I\Nwant to get is a mapping from some Dialogue: 0,0:21:52.39,0:21:56.73,Default,,0000,0000,0000,,numbers, some features that encode\Nsomething about the horse picture, and get Dialogue: 0,0:21:56.73,0:22:03.45,Default,,0000,0000,0000,,a horse picture out of it. You might see\Nalready why this is problematic. It is Dialogue: 0,0:22:03.45,0:22:08.23,Default,,0000,0000,0000,,problematic because we don't have a\Nmapping from feature to horse or from Dialogue: 0,0:22:08.23,0:22:15.05,Default,,0000,0000,0000,,horse to features. So we don't have a\Ntruth value we can use to learn how to Dialogue: 0,0:22:15.05,0:22:21.79,Default,,0000,0000,0000,,generate this mapping. Well computer\Nvision engineers – or deep learning Dialogue: 0,0:22:21.79,0:22:26.80,Default,,0000,0000,0000,,professionals – they're smart and have\Nclever ideas. Let's just assume we have Dialogue: 0,0:22:26.80,0:22:32.87,Default,,0000,0000,0000,,such a network and let's call it a\Ngenerator. Let's take some numbers put, Dialogue: 0,0:22:32.87,0:22:39.24,Default,,0000,0000,0000,,them into the generator and get some\Nhorses. Well it doesn't work yet. We still Dialogue: 0,0:22:39.24,0:22:42.49,Default,,0000,0000,0000,,have to train it. So they're probably not\Nonly horses but also some very special Dialogue: 0,0:22:42.49,0:22:47.97,Default,,0000,0000,0000,,unicorns among the horses; which might be\Nnice for other applications, but I wanted Dialogue: 0,0:22:47.97,0:22:55.48,Default,,0000,0000,0000,,pictures of horses right now. So I can't\Ntrain with this data directly. But what I Dialogue: 0,0:22:55.48,0:23:01.60,Default,,0000,0000,0000,,can do is I can create a second network.\NThis network is called a discriminator and Dialogue: 0,0:23:01.60,0:23:08.82,Default,,0000,0000,0000,,I can give it the input generated from the\Ngenerator as well as the real data I have: Dialogue: 0,0:23:08.82,0:23:13.92,Default,,0000,0000,0000,,the real horse pictures. And then I can\Nteach the discriminator to distinguish Dialogue: 0,0:23:13.92,0:23:22.08,Default,,0000,0000,0000,,between those. Tell me it is a real horse\Nor it's not a real horse. And there I know Dialogue: 0,0:23:22.08,0:23:27.00,Default,,0000,0000,0000,,what is the truth because I either take\Nreal horse pictures or fake horse pictures Dialogue: 0,0:23:27.00,0:23:34.17,Default,,0000,0000,0000,,from the generator. So I have a truth\Nvalue for this discriminator. But in doing Dialogue: 0,0:23:34.17,0:23:39.07,Default,,0000,0000,0000,,this I also have a truth value for the\Ngenerator. Because I want the generator to Dialogue: 0,0:23:39.07,0:23:43.80,Default,,0000,0000,0000,,work against the discriminator. So I can\Nalso use the information how well the Dialogue: 0,0:23:43.80,0:23:51.01,Default,,0000,0000,0000,,discriminator does to train the generator\Nto become better in fooling. This is Dialogue: 0,0:23:51.01,0:23:57.47,Default,,0000,0000,0000,,called a generative adversarial network.\NAnd it can be used to generate pictures of Dialogue: 0,0:23:57.47,0:24:02.35,Default,,0000,0000,0000,,an arbitrary distribution. Let's do this\Nwith numbers and I will actually show you Dialogue: 0,0:24:02.35,0:24:07.59,Default,,0000,0000,0000,,the training process. Before I start the\Nvideo, I'll tell you what I did. I took Dialogue: 0,0:24:07.59,0:24:11.55,Default,,0000,0000,0000,,some handwritten digits. There is a\Ndatabase called "??? of handwritten Dialogue: 0,0:24:11.55,0:24:18.57,Default,,0000,0000,0000,,digits" so the numbers of 0 to 9. And I\Ntook those and used them as training data. Dialogue: 0,0:24:18.57,0:24:24.30,Default,,0000,0000,0000,,I trained a generator in the way I showed\Nyou on the previous slide, and then I just Dialogue: 0,0:24:24.30,0:24:30.11,Default,,0000,0000,0000,,took some random numbers. I put those\Nrandom numbers into the network and just Dialogue: 0,0:24:30.11,0:24:35.96,Default,,0000,0000,0000,,stored the image of what came out of the\Nnetwork. And here in the video you'll see Dialogue: 0,0:24:35.96,0:24:43.09,Default,,0000,0000,0000,,how the network improved with ongoing\Ntraining. You will see that we start Dialogue: 0,0:24:43.09,0:24:50.18,Default,,0000,0000,0000,,basically with just noisy images … and\Nthen after some – what we call apox(???) Dialogue: 0,0:24:50.18,0:24:55.92,Default,,0000,0000,0000,,so training iterations – the network is\Nable to almost perfectly generate Dialogue: 0,0:24:55.92,0:25:05.68,Default,,0000,0000,0000,,handwritten digits just from noise. Which\NI find truly fascinating. Of course this Dialogue: 0,0:25:05.68,0:25:11.27,Default,,0000,0000,0000,,is an example where it works. It highly\Ndepends on your data set and how you train Dialogue: 0,0:25:11.27,0:25:15.60,Default,,0000,0000,0000,,the model whether it is a success or not.\NBut if it works, you can use it to Dialogue: 0,0:25:15.60,0:25:22.56,Default,,0000,0000,0000,,generate fonts. You can generate\Ncharacters, 3D objects, pictures of Dialogue: 0,0:25:22.56,0:25:28.70,Default,,0000,0000,0000,,animals, whatever you want as long as you\Nhave training data. Let's go more crazy. Dialogue: 0,0:25:28.70,0:25:34.54,Default,,0000,0000,0000,,Let's take two of those and let's say we\Nhave pictures of horses and pictures of Dialogue: 0,0:25:34.54,0:25:41.15,Default,,0000,0000,0000,,zebras. I want to convert those pictures\Nof horses into pictures of zebras, and I Dialogue: 0,0:25:41.15,0:25:44.59,Default,,0000,0000,0000,,want to convert pictures of zebras into\Npictures of horses. So I want to have the Dialogue: 0,0:25:44.59,0:25:49.69,Default,,0000,0000,0000,,same picture just with the other animal.\NBut I don't have training data of the same Dialogue: 0,0:25:49.69,0:25:56.27,Default,,0000,0000,0000,,situation just once with a horse and once\Nwith a zebra. Doesn't matter. We can train Dialogue: 0,0:25:56.27,0:26:00.65,Default,,0000,0000,0000,,a network that does that for us. Again we\Njust have a network – we call it the Dialogue: 0,0:26:00.65,0:26:05.73,Default,,0000,0000,0000,,generator – and we have two of those: One\Nthat converts horses to zebras and one Dialogue: 0,0:26:05.73,0:26:14.84,Default,,0000,0000,0000,,that converts zebras to horses. And then\Nwe also have two discriminators that tell Dialogue: 0,0:26:14.84,0:26:21.15,Default,,0000,0000,0000,,us: real horse – fake horse – real zebra –\Nfake zebra. And then we again need to Dialogue: 0,0:26:21.15,0:26:27.21,Default,,0000,0000,0000,,perform some training. So we need to\Nsomehow encode: Did it work what we wanted Dialogue: 0,0:26:27.21,0:26:31.46,Default,,0000,0000,0000,,to do? And a very simple way to do this is\Nwe take a picture of a horse put it Dialogue: 0,0:26:31.46,0:26:35.47,Default,,0000,0000,0000,,through the generator that generates a\Nzebra. Take this fake picture of a zebra, Dialogue: 0,0:26:35.47,0:26:39.34,Default,,0000,0000,0000,,put it through the generator that\Ngenerates a picture of a horse. And if Dialogue: 0,0:26:39.34,0:26:43.70,Default,,0000,0000,0000,,this is the same picture as we put in,\Nthen our model worked. And if it didn't, Dialogue: 0,0:26:43.70,0:26:48.55,Default,,0000,0000,0000,,we can use that information to update the\Nweights. I just took a random picture, Dialogue: 0,0:26:48.55,0:26:54.46,Default,,0000,0000,0000,,from a free library in the Internet, of a\Nhorse and generated a zebra and it worked Dialogue: 0,0:26:54.46,0:26:59.47,Default,,0000,0000,0000,,remarkably well. I actually didn't even do\Ntraining. It also doesn't need to be a Dialogue: 0,0:26:59.47,0:27:03.12,Default,,0000,0000,0000,,picture. You can also convert text to\Nimages: You describe something in words Dialogue: 0,0:27:03.12,0:27:09.57,Default,,0000,0000,0000,,and generate images. You can age your face\Nor age a cell; or make a patient healthy Dialogue: 0,0:27:09.57,0:27:15.51,Default,,0000,0000,0000,,or sick – or the image of a patient, not\Nthe patient self, unfortunately. You can Dialogue: 0,0:27:15.51,0:27:20.69,Default,,0000,0000,0000,,do style transfer like take a picture of\NVan Gogh and apply it to your own picture. Dialogue: 0,0:27:20.69,0:27:27.56,Default,,0000,0000,0000,,Stuff like that. Something else that we\Ncan do with neural networks. Let's assume Dialogue: 0,0:27:27.56,0:27:31.03,Default,,0000,0000,0000,,we have a classification network, we have\Na picture of a toothbrush and the network Dialogue: 0,0:27:31.03,0:27:36.77,Default,,0000,0000,0000,,tells us: Well, this is a toothbrush.\NGreat! But how resilient is this network? Dialogue: 0,0:27:36.77,0:27:44.53,Default,,0000,0000,0000,,Does it really work in every scenario.\NThere's a second network we can apply: We Dialogue: 0,0:27:44.53,0:27:48.70,Default,,0000,0000,0000,,call it an adversarial network. And that\Nnetwork is trained to do one thing: Look Dialogue: 0,0:27:48.70,0:27:52.29,Default,,0000,0000,0000,,at the network, look at the picture, and\Nthen find the one weak spot in the Dialogue: 0,0:27:52.29,0:27:55.88,Default,,0000,0000,0000,,picture: Just change one pixel slightly so\Nthat the network will tell me this Dialogue: 0,0:27:55.88,0:28:03.60,Default,,0000,0000,0000,,toothbrush is an octopus. Works remarkably\Nwell. Also works with just changing the Dialogue: 0,0:28:03.60,0:28:08.94,Default,,0000,0000,0000,,picture slightly, so changing all the\Npixels, but just slight minute changes Dialogue: 0,0:28:08.94,0:28:12.86,Default,,0000,0000,0000,,that we don't perceive, but the network –\Nthe classification network – is completely Dialogue: 0,0:28:12.86,0:28:19.64,Default,,0000,0000,0000,,thrown off. Well sounds bad. Is bad if you\Ndon't consider it. But you can also for Dialogue: 0,0:28:19.64,0:28:24.20,Default,,0000,0000,0000,,example use this for training your network\Nand make your network resilient. So Dialogue: 0,0:28:24.20,0:28:28.46,Default,,0000,0000,0000,,there's always an upside and downside.\NSomething entirely else: Now I'd like to Dialogue: 0,0:28:28.46,0:28:32.88,Default,,0000,0000,0000,,show you something about text. A word-\Nlanguage model. I want to generate Dialogue: 0,0:28:32.88,0:28:38.10,Default,,0000,0000,0000,,sentences for my podcast. I have a network\Nthat gives me a word, and then if I want Dialogue: 0,0:28:38.10,0:28:42.64,Default,,0000,0000,0000,,to somehow get the next word in the\Nsentence, I also need to consider this Dialogue: 0,0:28:42.64,0:28:47.07,Default,,0000,0000,0000,,word. So another network architecture –\Nquite interestingly – just takes the Dialogue: 0,0:28:47.07,0:28:52.18,Default,,0000,0000,0000,,hidden states of the network and uses them\Nas the input for the same network so that Dialogue: 0,0:28:52.18,0:28:58.78,Default,,0000,0000,0000,,in the next iteration we still know what\Nwe did in the previous step. I tried to Dialogue: 0,0:28:58.78,0:29:04.73,Default,,0000,0000,0000,,train a network that generates podcast\Nepisodes for my podcasts. Didn't work. Dialogue: 0,0:29:04.73,0:29:08.45,Default,,0000,0000,0000,,What I learned is I don't have enough\Ntraining data. I really need to produce Dialogue: 0,0:29:08.45,0:29:15.79,Default,,0000,0000,0000,,more podcast episodes in order to train a\Nmodel to do my job for me. And this is Dialogue: 0,0:29:15.79,0:29:21.54,Default,,0000,0000,0000,,very important, a very crucial point:\NTraining data. We need shitloads of Dialogue: 0,0:29:21.54,0:29:26.08,Default,,0000,0000,0000,,training data. And actually the more\Ncomplicated our model and our training Dialogue: 0,0:29:26.08,0:29:30.99,Default,,0000,0000,0000,,process becomes, the more training data we\Nneed. I started with a supervised case – Dialogue: 0,0:29:30.99,0:29:35.99,Default,,0000,0000,0000,,the really simple case where we, really\Nsimple, the really simpler case where we Dialogue: 0,0:29:35.99,0:29:40.66,Default,,0000,0000,0000,,have a picture and a label that\Ncorresponds to that picture; or a Dialogue: 0,0:29:40.66,0:29:46.28,Default,,0000,0000,0000,,representation of that picture showing\Nentirely what I wanted to learn. But we Dialogue: 0,0:29:46.28,0:29:51.91,Default,,0000,0000,0000,,also saw a more complex task, where I had\Nto pictures – horses and zebras – that are Dialogue: 0,0:29:51.91,0:29:56.40,Default,,0000,0000,0000,,from two different domains – but domains\Nwith no direct mapping. What can also Dialogue: 0,0:29:56.40,0:30:01.02,Default,,0000,0000,0000,,happen – and actually happens quite a lot\N– is weakly annotated data, so data that Dialogue: 0,0:30:01.02,0:30:08.75,Default,,0000,0000,0000,,is not precisely annotated; where we can't\Nrely on the information we get. Or even Dialogue: 0,0:30:08.75,0:30:13.05,Default,,0000,0000,0000,,more complicated: Something called\Nreinforcement learning where we perform a Dialogue: 0,0:30:13.05,0:30:19.38,Default,,0000,0000,0000,,sequence of actions and then in the end\Nare told "yeah that was great". Which is Dialogue: 0,0:30:19.38,0:30:24.08,Default,,0000,0000,0000,,often not enough information to really\Nperform proper training. But of course Dialogue: 0,0:30:24.08,0:30:28.19,Default,,0000,0000,0000,,there are also methods for that. As well\Nas there are methods for the unsupervised Dialogue: 0,0:30:28.19,0:30:33.59,Default,,0000,0000,0000,,case where we don't have annotations,\Nlabeled data – no ground truth at all – Dialogue: 0,0:30:33.59,0:30:41.24,Default,,0000,0000,0000,,just the picture itself. Well I talked\Nabout pictures. I told you that we can Dialogue: 0,0:30:41.24,0:30:45.32,Default,,0000,0000,0000,,learn features and create images from\Nthem. And we can use them for Dialogue: 0,0:30:45.32,0:30:51.64,Default,,0000,0000,0000,,classification. And for this there exist\Nmany databases. There are public data sets Dialogue: 0,0:30:51.64,0:30:56.66,Default,,0000,0000,0000,,we can use. Often they refer to for\Nexample Flickr. They're just hyperlinks Dialogue: 0,0:30:56.66,0:31:00.96,Default,,0000,0000,0000,,which is also why I didn't show you many\Npictures right here, because I am honestly Dialogue: 0,0:31:00.96,0:31:05.69,Default,,0000,0000,0000,,not sure about the copyright in those\Ncases. But there are also challenge Dialogue: 0,0:31:05.69,0:31:11.19,Default,,0000,0000,0000,,datasets where you can just sign up, get\Nsome for example medical data sets, and Dialogue: 0,0:31:11.19,0:31:16.65,Default,,0000,0000,0000,,then compete against other researchers.\NAnd of course there are those companies Dialogue: 0,0:31:16.65,0:31:22.09,Default,,0000,0000,0000,,that just have lots of data. And those\Ncompanies also have the means, the Dialogue: 0,0:31:22.09,0:31:28.11,Default,,0000,0000,0000,,capacity to perform intense computations.\NAnd those are also often the companies you Dialogue: 0,0:31:28.11,0:31:36.18,Default,,0000,0000,0000,,hear from in terms of innovation for deep\Nlearning. Well this was mostly to tell you Dialogue: 0,0:31:36.18,0:31:40.20,Default,,0000,0000,0000,,that you can process images quite well\Nwith deep learning if you have enough Dialogue: 0,0:31:40.20,0:31:46.03,Default,,0000,0000,0000,,training data, if you have a proper\Ntraining process and also a little if you Dialogue: 0,0:31:46.03,0:31:52.09,Default,,0000,0000,0000,,know what you're doing. But you can also\Nprocess text, you can process audio and Dialogue: 0,0:31:52.09,0:31:58.52,Default,,0000,0000,0000,,time series like prices or a stack\Nexchange – stuff like that. You can Dialogue: 0,0:31:58.52,0:32:02.93,Default,,0000,0000,0000,,process almost everything if you make it\Nencodeable to your network. Sounds like a Dialogue: 0,0:32:02.93,0:32:08.12,Default,,0000,0000,0000,,dream come true. But – as I already told\Nyou – you need data, a lot of it. I told Dialogue: 0,0:32:08.12,0:32:14.02,Default,,0000,0000,0000,,you about those companies that have lots\Nof data sets and the publicly available Dialogue: 0,0:32:14.02,0:32:21.37,Default,,0000,0000,0000,,data sets which you can actually use to\Nget started with your own experiments. But Dialogue: 0,0:32:21.37,0:32:24.31,Default,,0000,0000,0000,,that also makes it a little dangerous\Nbecause deep learning still is a black box Dialogue: 0,0:32:24.31,0:32:30.82,Default,,0000,0000,0000,,to us. I told you what happens inside the\Nblack box on a level that teaches you how Dialogue: 0,0:32:30.82,0:32:36.53,Default,,0000,0000,0000,,we learn and how the network is\Nstructured, but not really what the Dialogue: 0,0:32:36.53,0:32:42.83,Default,,0000,0000,0000,,network learned. It is for us computer\Nvision engineers really nice that we can Dialogue: 0,0:32:42.83,0:32:48.59,Default,,0000,0000,0000,,visualize the first layers of a neural\Nnetwork and see what is actually encoded Dialogue: 0,0:32:48.59,0:32:53.95,Default,,0000,0000,0000,,in those first layers; what information\Nthe network looks at. But you can't really Dialogue: 0,0:32:53.95,0:32:59.06,Default,,0000,0000,0000,,mathematically prove what happens in a\Nnetwork. Which is one major downside. And Dialogue: 0,0:32:59.06,0:33:02.15,Default,,0000,0000,0000,,so if you want to use it, the numbers may\Nbe really great but be sure to properly Dialogue: 0,0:33:02.15,0:33:08.06,Default,,0000,0000,0000,,evaluate them. In summary I call that\N"easy to learn". Every one – every single Dialogue: 0,0:33:08.06,0:33:12.68,Default,,0000,0000,0000,,one of you – can just start with deep\Nlearning right away. You don't need to do Dialogue: 0,0:33:12.68,0:33:19.44,Default,,0000,0000,0000,,much work. You don't need to do much\Nlearning. The model learns for you. But Dialogue: 0,0:33:19.44,0:33:23.77,Default,,0000,0000,0000,,they're hard to master in a way that makes\Nthem useful for production use cases for Dialogue: 0,0:33:23.77,0:33:29.90,Default,,0000,0000,0000,,example. So if you want to use deep\Nlearning for something – if you really Dialogue: 0,0:33:29.90,0:33:34.30,Default,,0000,0000,0000,,want to seriously use it –, make sure that\Nit really does what you wanted to and Dialogue: 0,0:33:34.30,0:33:38.90,Default,,0000,0000,0000,,doesn't learn something else – which also\Nhappens. Pretty sure you saw some talks Dialogue: 0,0:33:38.90,0:33:43.67,Default,,0000,0000,0000,,about deep learning fails – which is not\Nwhat this talk is about. They're quite Dialogue: 0,0:33:43.67,0:33:47.37,Default,,0000,0000,0000,,funny to look at. Just make sure that they\Ndon't happen to you! Herald Angel: So now it's question and answer time. So if you have a question, please line up at the mikes. We have in total eight, so it shouldn't be far from you. They are here in the corridors and on these sides. Please line up! For everybody: A question consists of one sentence with the question mark in the end – not three minutes of rambling. And also if you go to the microphone, speak into the microphone, so you really get close to it. Okay. Where do we have … Number 7! We start with mic number 7:
Question: Hello. My question is: How did you compute the example for the fonts, the numbers? I didn't really understand it, you just said it was made from white noise. And also\Nif you go to the microphone, speak into Dialogue: 0,0:34:38.45,0:34:53.89,Default,,0000,0000,0000,,the microphone, so you really get close to\Nit. Okay. Where do we have … Number 7! Dialogue: 0,0:34:53.89,0:35:02.20,Default,,0000,0000,0000,,We start with mic number 7:\NQuestion: Hello. My question is: How did Dialogue: 0,0:35:02.20,0:35:13.02,Default,,0000,0000,0000,,you compute the example for the fonts, the\Nnumbers? I didn't really understand it, Dialogue: 0,0:35:13.02,0:35:19.77,Default,,0000,0000,0000,,you just said it was made from white\Nnoise. Dialogue: 0,0:35:19.77,0:35:25.58,Default,,0000,0000,0000,,Teubi: I'll give you a really brief recap\Nof what I did. I showed you that we have a Dialogue: 0,0:35:25.58,0:35:31.14,Default,,0000,0000,0000,,model that maps image to some meaningful\Nvalues, that an image can be encoded in Dialogue: 0,0:35:31.14,0:35:36.86,Default,,0000,0000,0000,,just a few values. What happens here is\Nexactly the other way round. We have some Dialogue: 0,0:35:36.86,0:35:43.27,Default,,0000,0000,0000,,values, just some arbitrary values we\Nactually know nothing about. We can Dialogue: 0,0:35:43.27,0:35:47.48,Default,,0000,0000,0000,,generate pictures out of those. So I\Ntrained this model to just take some Dialogue: 0,0:35:47.48,0:35:54.56,Default,,0000,0000,0000,,random values and show the pictures\Ngenerated from the model. The training Dialogue: 0,0:35:54.56,0:36:03.32,Default,,0000,0000,0000,,process was this "min max game", as its\Ncalled. We have two networks that try to Dialogue: 0,0:36:03.32,0:36:08.26,Default,,0000,0000,0000,,compete against each other. One network\Ntrying to distinguish, whether a picture Dialogue: 0,0:36:08.26,0:36:12.79,Default,,0000,0000,0000,,it sees is real or one of those fake\Npictures, and the network that actually Dialogue: 0,0:36:12.79,0:36:18.51,Default,,0000,0000,0000,,generates those pictures and in training\Nthe network that is able to distinguish Dialogue: 0,0:36:18.51,0:36:24.60,Default,,0000,0000,0000,,between those, we can also get information\Nfor the training of the network that Dialogue: 0,0:36:24.60,0:36:30.41,Default,,0000,0000,0000,,generates the pictures. So the videos you\Nsaw were just animations of what happens Dialogue: 0,0:36:30.41,0:36:36.44,Default,,0000,0000,0000,,during this training process. At first if\Nwe input noise we get noise. But as the Dialogue: 0,0:36:36.44,0:36:41.51,Default,,0000,0000,0000,,network is able to better and better\Nrecreate those images from the dataset we Dialogue: 0,0:36:41.51,0:36:47.39,Default,,0000,0000,0000,,used as input, in this case pictures of\Nhandwritten digits, the output also became Dialogue: 0,0:36:47.39,0:36:54.66,Default,,0000,0000,0000,,more lookalike to those numbers, these\Nhandwritten digits. Hope that helped. Dialogue: 0,0:36:54.66,0:37:06.59,Default,,0000,0000,0000,,Herald Angel: Now we go to the\NInternet. – Can we get sound for the signal Dialogue: 0,0:37:06.59,0:37:10.04,Default,,0000,0000,0000,,Angel, please? Teubi: Sounded so great,\N"now we go to the Internet." Dialogue: 0,0:37:10.04,0:37:11.04,Default,,0000,0000,0000,,Herald Angel: Yeah, that sounds like\N"yeeaah". Dialogue: 0,0:37:11.04,0:37:13.04,Default,,0000,0000,0000,,Signal Angel: And now we're finally ready\Nto go to the interwebs. "Schorsch" is Dialogue: 0,0:37:13.04,0:37:18.04,Default,,0000,0000,0000,,asking: Do you have any recommendations\Nfor a beginner regarding the framework or Dialogue: 0,0:37:18.04,0:37:26.46,Default,,0000,0000,0000,,the software?\NTeubi: I, of course, am very biased to Dialogue: 0,0:37:26.46,0:37:34.15,Default,,0000,0000,0000,,recommend what I use everyday. But I also\Nthink that it is a great start. Basically, Dialogue: 0,0:37:34.15,0:37:40.21,Default,,0000,0000,0000,,use python and use pytorch. Many people\Nwill disagree with me and tell you Dialogue: 0,0:37:40.21,0:37:45.93,Default,,0000,0000,0000,,"tensorflow is better." It might be, in my\Nopinion not for getting started, and there Dialogue: 0,0:37:45.93,0:37:51.56,Default,,0000,0000,0000,,are also some nice tutorials on the\Npytorch website. What you can also do is Dialogue: 0,0:37:51.56,0:37:57.20,Default,,0000,0000,0000,,look at websites like OpenAI, where they\Nhave a gym to get you started with some Dialogue: 0,0:37:57.20,0:38:02.37,Default,,0000,0000,0000,,training exercises, where you already have\Ndatasets. Yeah, basically my Dialogue: 0,0:38:02.37,0:38:08.60,Default,,0000,0000,0000,,recommendation is get used to Python and\Nstart with a pytorch tutorial, see where Dialogue: 0,0:38:08.60,0:38:13.59,Default,,0000,0000,0000,,to go from there. Often there also some\Ngithub repositories linked with many Dialogue: 0,0:38:13.59,0:38:18.74,Default,,0000,0000,0000,,examples for already established network\Narchitectures like the cycle GAN or the Dialogue: 0,0:38:18.74,0:38:26.25,Default,,0000,0000,0000,,GAN itself or basically everything else.\NThere will be a repo you can use to get Dialogue: 0,0:38:26.25,0:38:29.94,Default,,0000,0000,0000,,started.\NHerald Angel: OK, we stay with the Dialogue: 0,0:38:29.94,0:38:32.59,Default,,0000,0000,0000,,internet. There's some more questions, I\Nheard. Dialogue: 0,0:38:32.59,0:38:37.92,Default,,0000,0000,0000,,Signal Angel: Yes. Rubin8 is asking: Have\Nyou have you ever come across an example Dialogue: 0,0:38:37.92,0:38:42.58,Default,,0000,0000,0000,,of a neural network that deals with audio\Ninstead of images? Dialogue: 0,0:38:42.58,0:38:49.41,Default,,0000,0000,0000,,Teubi: Me personally, no. At least not\Ndirectly. I've heard about examples, like Dialogue: 0,0:38:49.41,0:38:54.86,Default,,0000,0000,0000,,where you can change the voice to sound\Nlike another person, but there is not much Dialogue: 0,0:38:54.86,0:38:59.98,Default,,0000,0000,0000,,I can reliably tell about that. My\Nexpertise really is in image processing, Dialogue: 0,0:38:59.98,0:39:05.55,Default,,0000,0000,0000,,I'm sorry.\NHerald Angel: And I think we have time for Dialogue: 0,0:39:05.55,0:39:12.34,Default,,0000,0000,0000,,one more question. We have one at number\N8. Microphone number 8. Dialogue: 0,0:39:12.34,0:39:20.73,Default,,0000,0000,0000,,Question: Is the current Face recognition\Ntechnologies in, for example iPhone X, is Dialogue: 0,0:39:20.73,0:39:26.42,Default,,0000,0000,0000,,it also a deep learning algorithm or is\Nit something more simple? Do you have any Dialogue: 0,0:39:26.42,0:39:31.88,Default,,0000,0000,0000,,idea about that?\NTeubi: As far as I know, yes. That's all I Dialogue: 0,0:39:31.88,0:39:38.63,Default,,0000,0000,0000,,can reliably tell you about that, but it\Nis not only based on images but also uses Dialogue: 0,0:39:38.63,0:39:45.42,Default,,0000,0000,0000,,other information. I think distance\Ninformation encoded with some infrared Dialogue: 0,0:39:45.42,0:39:50.60,Default,,0000,0000,0000,,signals. I don't really know exactly how\Nit works, but at least iPhones already Dialogue: 0,0:39:50.60,0:39:56.00,Default,,0000,0000,0000,,have a neural network\Nprocessing engine built in, so a chip Dialogue: 0,0:39:56.00,0:40:01.19,Default,,0000,0000,0000,,dedicated to just doing those\Ncomputations. You saw that many of those Dialogue: 0,0:40:01.19,0:40:05.82,Default,,0000,0000,0000,,things can be parallelized, and this is\Nwhat those hardware architectures make use Dialogue: 0,0:40:05.82,0:40:10.38,Default,,0000,0000,0000,,of. So I'm pretty confident in saying,\Nyes, they also do it there. Dialogue: 0,0:40:10.38,0:40:12.79,Default,,0000,0000,0000,,How exactly, no clue. Dialogue: 0,0:40:13.76,0:40:15.32,Default,,0000,0000,0000,,\NHerald Angel: OK. I myself have a last Dialogue: 0,0:40:15.39,0:40:20.68,Default,,0000,0000,0000,,completely unrelated question: Did you\Ncreate the design of the slides yourself? Dialogue: 0,0:40:20.68,0:40:29.06,Default,,0000,0000,0000,,Teubi: I had some help. We have a really\Ngreat Congress design and I use that as an Dialogue: 0,0:40:29.06,0:40:32.79,Default,,0000,0000,0000,,inspiration to create those slides, yes.\N Dialogue: 0,0:40:32.79,0:40:36.76,Default,,0000,0000,0000,,Herald Angel: OK, yeah, because those are really amazing. I love them.\N Dialogue: 0,0:40:36.76,0:40:38.14,Default,,0000,0000,0000,,Teubi: Thank you! Dialogue: 0,0:40:38.47,0:40:41.20,Default,,0000,0000,0000,,Herald Angel: OK, thank you very much\NTeubi. 