< Return to Video

10.4: Neural Networks: Multilayer Perceptron Part 1 - The Nature of Code

  • 0:00 - 0:01
    Hi again!
  • 0:01 - 0:06
    So, maybe you just watched my
    previous videos about coding a perceptron
  • 0:06 - 0:10
    and now, I want to ask the question:
  • 0:10 - 0:12
    Why not just stop here?
  • 0:12 - 0:13
    laughs
  • 0:13 - 0:20
    So okay, we have this very simple
    scenario, where we have a canvas,
  • 0:20 - 0:24
    and it has a whole bunch of points in
    that canvas, or Cartesian plane,
  • 0:24 - 0:25
    whatever we want to call it
  • 0:25 - 0:29
    and we drew a line in between, and
    we were trying to classify
  • 0:29 - 0:32
    some points that are on one side of the line,
  • 0:32 - 0:35
    and some other points that are on the
    other side of the line.
  • 0:35 - 0:39
    So, that was a scenario where we had the
    single perceptron, the sort of like,
  • 0:39 - 0:45
    processing unit, we can call it the Neuron™
    or the processor, and it received inputs,
  • 0:45 - 0:51
    it had like x_0 and x_1 were like the x and y
    coordinates of the points,
  • 0:51 - 0:54
    it also had this thing called a bias,
  • 0:54 - 0:57
    and then it generated an output.
  • 0:59 - 1:03
    Each one of these inputs was connected
    to the processor with a weight,
  • 1:05 - 1:07
    You know, weight 1, weight 2 or whatever
  • 1:07 - 1:07
    weight
  • 1:07 - 1:07
    weight (2)
  • 1:07 - 1:08
    weight (3)
  • 1:08 - 1:14
    and the processor creates a weighted sum
    of all the inputs multiplied by the weights,
  • 1:14 - 1:18
    that weighted sum is passed through
    an activation function
  • 1:19 - 1:21
    to generate the output.
  • 1:21 - 1:23
    So, why isn't this good enough?
  • 1:23 - 1:24
    Now
  • 1:26 - 1:26
    exhale
  • 1:26 - 1:30
    Let's first think about what's the limit here.
  • 1:30 - 1:32
    So, the idea is that
  • 1:32 - 1:38
    what if I want any number of inputs
    to generate any number of outputs?
  • 1:39 - 1:45
    That's the essence of what I want to do in
    a lot of different machine learning applications.
  • 1:45 - 1:50
    Let's take a very classic classification algorithm
    which is to say:
  • 1:50 - 1:54
    Okay, what if I have a handwritten digit
    like the number 8
  • 1:55 - 1:58
    and I have all of the pixels of this digit,
  • 1:59 - 2:02
    and I want those to be the inputs to this perceptron,
  • 2:02 - 2:10
    and I want the output to tell me a set of
    probabilities as to which digit it is.
  • 2:10 - 2:13
    So the output should look something like:
  • 2:13 - 2:16
    there's a 0.1 chance it's a 0,
  • 2:16 - 2:18
    there's a 0.2 chance it's a 1,
  • 2:18 - 2:20
    there's a 0.1 chance it's a 2,
  • 2:20 - 2:24
    0, 3, 4, 5, 6, 7
  • 2:24 - 2:26
    Oh! And there's like a 0.99 chance it's an 8,
  • 2:26 - 2:30
    and a 0.05 chance it's a 10
    (pretty sure Dan meant 9)
  • 2:30 - 2:32
    and I don't think I got those to add up to 1
    (it doesn't, it adds up to 1.44)
  • 2:32 - 2:34
    but you get the idea.
  • 2:34 - 2:39
    So the idea here is that we want to be able
    to have some type of processing unit
  • 2:39 - 2:42
    that can take an arbitrary amount of inputs
  • 2:42 - 2:49
    like maybe this is a 28x28 pixel image
    so there's 784 greyscale values
  • 2:49 - 2:52
    and instead, those are coming into the processor
  • 2:52 - 2:54
    which is weighted and summed and all this stuff,
  • 2:54 - 2:57
    and we get an output that has an
    arbitrary amounts of probabilities
  • 2:57 - 3:01
    to help us guess that this is an 8.
  • 3:03 - 3:04
    exhale
  • 3:04 - 3:08
    This model, why couldn't I just have a
    whole bunch more inputs,
  • 3:08 - 3:12
    and then a whole bunch more outputs
    but still have one single processing unit?
  • 3:12 - 3:22
    And the reason why I can't stems from an article,
    sorry, a book, that was published in 1969,
  • 3:22 - 3:25
    by Marvin Minsky and Seymour Papert called Perceptrons
  • 3:25 - 3:28
    you know, AI luminaries here.
  • 3:28 - 3:33
    In the book Perceptrons, Marvin Minsky and
    Seymour Papert...
  • 3:35 - 3:44
    point out that a simple perceptron, the thing that I built in the previous 2 videos, can only solve linearly
  • 3:44 - 3:46
    separable problems.
  • 3:46 - 3:49
    So, what does that mean anyway?
    And why should you care about that?
Title:
10.4: Neural Networks: Multilayer Perceptron Part 1 - The Nature of Code
Description:

more » « less
Video Language:
English
Duration:
15:56

English subtitles

Incomplete

Revisions