0:00:00.310,0:00:02.620
So the way that I got the result that Chris,
0:00:02.620,0:00:05.770
should be wearing the same size tee shirt as Sarah rather than Cameron,
0:00:05.770,0:00:08.890
is that I compared these numbers that I computed for each of these people.
0:00:08.890,0:00:13.020
And I said is Chris closer to Cameron's number or to Sarah's number, and as it
0:00:13.020,0:00:17.980
turns out, he's about 26 away from Sarah and he's about 35 away from Cameron.
0:00:17.980,0:00:19.370
Closer to Sarah.
0:00:19.370,0:00:23.000
Now what went wrong here is that this metric of height plus weight
0:00:23.000,0:00:27.030
has two very imbalanced features in it, height and weight.
0:00:27.030,0:00:30.050
So here's what I mean by that, the height is going to
0:00:30.050,0:00:34.280
be a number that generally goes between let's say, the numbers of five and
0:00:34.280,0:00:38.620
seven, the weight, on the other hand, takes on much larger values.
0:00:38.620,0:00:42.250
Between 115 and 175 pounds in this example.
0:00:42.250,0:00:45.700
So what ends up happening when you compute the sum of the two of them, is that
0:00:45.700,0:00:49.530
the weight almost always will completely dominate the answer that you get.
0:00:49.530,0:00:52.470
And height ends up being effectively a rounding error.
0:00:52.470,0:00:56.750
Whereas what you probably want is something where the two features are equally
0:00:56.750,0:00:59.430
weighted in, in the sum when you add them together.
0:00:59.430,0:01:01.490
And this is what feature scaling does.
0:01:01.490,0:01:04.180
It's a method for re-scaling features like these ones,
0:01:04.180,0:01:09.980
so that they always span comparable ranges, usually between zero and one.
0:01:09.980,0:01:13.600
So then, the numbers that you get from height will be between zero and one,
0:01:13.600,0:01:16.040
they'll still contain the same information.
0:01:16.040,0:01:18.300
But just expressed in different units.
0:01:18.300,0:01:21.630
And the weight will also be expressed between zero and one.
0:01:21.630,0:01:24.970
Again, you'll still have the information there, that Cameron raised the most and
0:01:24.970,0:01:29.010
Sarah raised the least, but it'll be expressed over this much smaller range.
0:01:29.010,0:01:30.480
Then when you add them together,
0:01:30.480,0:01:33.420
weight won't completely dominate the equation anymore.
0:01:33.420,0:01:36.390
And when that happens, you should get a much more sensible result for
0:01:36.390,0:01:37.880
Chris's t-shirt size.
0:01:37.880,0:01:41.120
Because even though he's a little bit closer to Sarah in weight,
0:01:41.120,0:01:43.520
he's a lot closer to Cameron in height and so
0:01:43.520,0:01:45.760
he'll probably end up getting grouped with Cameron.
0:01:45.760,0:01:48.340
In the next video I'll show you the equation for feature scaling.