So the way that I got the result that Chris,
should be wearing the same size tee shirt as Sarah rather than Cameron,
is that I compared these numbers that I computed for each of these people.
And I said is Chris closer to Cameron's number or to Sarah's number, and as it
turns out, he's about 26 away from Sarah and he's about 35 away from Cameron.
Closer to Sarah.
Now what went wrong here is that this metric of height plus weight
has two very imbalanced features in it, height and weight.
So here's what I mean by that, the height is going to
be a number that generally goes between let's say, the numbers of five and
seven, the weight, on the other hand, takes on much larger values.
Between 115 and 175 pounds in this example.
So what ends up happening when you compute the sum of the two of them, is that
the weight almost always will completely dominate the answer that you get.
And height ends up being effectively a rounding error.
Whereas what you probably want is something where the two features are equally
weighted in, in the sum when you add them together.
And this is what feature scaling does.
It's a method for re-scaling features like these ones,
so that they always span comparable ranges, usually between zero and one.
So then, the numbers that you get from height will be between zero and one,
they'll still contain the same information.
But just expressed in different units.
And the weight will also be expressed between zero and one.
Again, you'll still have the information there, that Cameron raised the most and
Sarah raised the least, but it'll be expressed over this much smaller range.
Then when you add them together,
weight won't completely dominate the equation anymore.
And when that happens, you should get a much more sensible result for
Chris's t-shirt size.
Because even though he's a little bit closer to Sarah in weight,
he's a lot closer to Cameron in height and so
he'll probably end up getting grouped with Cameron.
In the next video I'll show you the equation for feature scaling.