WEBVTT 00:00:00.310 --> 00:00:02.620 So the way that I got the result that Chris, 00:00:02.620 --> 00:00:05.770 should be wearing the same size tee shirt as Sarah rather than Cameron, 00:00:05.770 --> 00:00:08.890 is that I compared these numbers that I computed for each of these people. 00:00:08.890 --> 00:00:13.020 And I said is Chris closer to Cameron's number or to Sarah's number, and as it 00:00:13.020 --> 00:00:17.980 turns out, he's about 26 away from Sarah and he's about 35 away from Cameron. 00:00:17.980 --> 00:00:19.370 Closer to Sarah. 00:00:19.370 --> 00:00:23.000 Now what went wrong here is that this metric of height plus weight 00:00:23.000 --> 00:00:27.030 has two very imbalanced features in it, height and weight. 00:00:27.030 --> 00:00:30.050 So here's what I mean by that, the height is going to 00:00:30.050 --> 00:00:34.280 be a number that generally goes between let's say, the numbers of five and 00:00:34.280 --> 00:00:38.620 seven, the weight, on the other hand, takes on much larger values. 00:00:38.620 --> 00:00:42.250 Between 115 and 175 pounds in this example. 00:00:42.250 --> 00:00:45.700 So what ends up happening when you compute the sum of the two of them, is that 00:00:45.700 --> 00:00:49.530 the weight almost always will completely dominate the answer that you get. 00:00:49.530 --> 00:00:52.470 And height ends up being effectively a rounding error. 00:00:52.470 --> 00:00:56.750 Whereas what you probably want is something where the two features are equally 00:00:56.750 --> 00:00:59.430 weighted in, in the sum when you add them together. 00:00:59.430 --> 00:01:01.490 And this is what feature scaling does. 00:01:01.490 --> 00:01:04.180 It's a method for re-scaling features like these ones, 00:01:04.180 --> 00:01:09.980 so that they always span comparable ranges, usually between zero and one. 00:01:09.980 --> 00:01:13.600 So then, the numbers that you get from height will be between zero and one, 00:01:13.600 --> 00:01:16.040 they'll still contain the same information. 00:01:16.040 --> 00:01:18.300 But just expressed in different units. 00:01:18.300 --> 00:01:21.630 And the weight will also be expressed between zero and one. 00:01:21.630 --> 00:01:24.970 Again, you'll still have the information there, that Cameron raised the most and 00:01:24.970 --> 00:01:29.010 Sarah raised the least, but it'll be expressed over this much smaller range. 00:01:29.010 --> 00:01:30.480 Then when you add them together, 00:01:30.480 --> 00:01:33.420 weight won't completely dominate the equation anymore. 00:01:33.420 --> 00:01:36.390 And when that happens, you should get a much more sensible result for 00:01:36.390 --> 00:01:37.880 Chris's t-shirt size. 00:01:37.880 --> 00:01:41.120 Because even though he's a little bit closer to Sarah in weight, 00:01:41.120 --> 00:01:43.520 he's a lot closer to Cameron in height and so 00:01:43.520 --> 00:01:45.760 he'll probably end up getting grouped with Cameron. 00:01:45.760 --> 00:01:48.340 In the next video I'll show you the equation for feature scaling.