-
Hi again!
-
So, maybe you just watched my
previous videos about coding a perceptron
-
and now, I want to ask the question:
-
Why not just stop here?
-
laughs
-
So okay, we have this very simple
scenario, where we have a canvas,
-
and it has a whole bunch of points in
that canvas, or Cartesian plane,
-
whatever we want to call it
-
and we drew a line in between, and
we were trying to classify
-
some points that are on one side of the line,
-
and some other points that are on the
other side of the line.
-
So, that was a scenario where we had the
single perceptron, the sort of like,
-
processing unit, we can call it the Neuron™
or the processor, and it received inputs,
-
it had like x_0 and x_1 were like the x and y
coordinates of the points,
-
it also had this thing called a bias,
-
and then it generated an output.
-
Each one of these inputs was connected
to the processor with a weight,
-
You know, weight 1, weight 2 or whatever
-
weight
-
weight (2)
-
weight (3)
-
and the processor creates a weighted sum
of all the inputs multiplied by the weights,
-
that weighted sum is passed through
an activation function
-
to generate the output.
-
So, why isn't this good enough?
-
Now
-
exhale
-
Let's first think about what's the limit here.
-
So, the idea is that
-
what if I want any number of inputs
to generate any number of outputs?
-
That's the essence of what I want to do in
a lot of different machine learning applications.
-
Let's take a very classic classification algorithm
which is to say:
-
Okay, what if I have a handwritten digit
like the number 8
-
and I have all of the pixels of this digit,
-
and I want those to be the inputs to this perceptron,
-
and I want the output to tell me a set of
probabilities as to which digit it is.
-
So the output should look something like:
-
there's a 0.1 chance it's a 0,
-
there's a 0.2 chance it's a 1,
-
there's a 0.1 chance it's a 2,
-
0, 3, 4, 5, 6, 7
-
Oh! And there's like a 0.99 chance it's an 8,
-
and a 0.05 chance it's a 10
(pretty sure Dan meant 9)
-
and I don't think I got those to add up to 1
(it doesn't, it adds up to 1.44)
-
but you get the idea.
-
So the idea here is that we want to be able
to have some type of processing unit
-
that can take an arbitrary amount of inputs
-
like maybe this is a 28x28 pixel image
so there's 784 greyscale values
-
and instead, those are coming into the processor
-
which is weighted and summed and all this stuff,
-
and we get an output that has an
arbitrary amounts of probabilities
-
to help us guess that this is an 8.
-
exhale
-
This model, why couldn't I just have a
whole bunch more inputs,
-
and then a whole bunch more outputs
but still have one single processing unit?
-
And the reason why I can't stems from an article,
sorry, a book, that was published in 1969,
-
by Marvin Minsky and Seymour Papert called Perceptrons
-
you know, AI luminaries here.
-
In the book Perceptrons, Marvin Minsky and
Seymour Papert...
-
point out that a simple perceptron, the thing that I built in the previous 2 videos, can only solve linearly
-
separable problems.
-
So, what does that mean anyway?
And why should you care about that?