
Title:
Algorithms Requiring Rescaling Solution  Intro to Machine Learning

Description:

So Katie, what do you think?

Which ones are the right answers here?

>> The right answers, so the, the ones that do need rescaled features will

be the SVM and the kmeans clustering.

>> So both and, support vector machines in, in, in kmeans clustering, you're

really trading off one dimension to the other when you calculate the distance.

So take, for example, support vector machines.

And you look at the separation line that maximizes distance.

In there, you calculate a distance.

And that distance calculation, tradeoffs one dimension against the other.

So we make one twice as big as the other, it counts for twice as much.

The same is true, coincidentally, for

kmeans clustering, where you have a cluster center.

And you compute the distance of the cluster center, to all the data points.

And that distance itself has exactly the same characterization.

If you make one variable twice as big, it's going to count for twice as much.

So, as a result, support vector machines and

kmeans both are affected by feature rescaling.

So, Katie, tell me about the decision trees and linear regression.

Why aren't they included?

Decision trees aren't going to give you a diagonal line like that, right?

They're going to give you a series of vertical and horizontal lines.

So there's no trade off.

You just, make a cut in one direction, and then a cut in another.

So, you don't have to worry about what's going on in one dimension,

when you're doing something with the other one.

>> So if you squeeze this area little area over here to half the size,

because you rescale the feature where the image line lies.

Well, it'll lie in a different place but

the separation is chronologically the same as before.

It scales with it,

so there's no tradeoff between these two different variables.

And how about, linear regression?

>> Something similar happens in linear regression.

Remember that in linear regression,

each of our features is going to have a coefficient that's associated with it.

And that coefficient and that feature always go together.

What's going on with feature A doesn't effect anything with

the coefficient of feature B.

So they're separated in the same way.

>> In fact, if you were to double the variable scale of one specific variable,

that feature will just become half as big.

And the output would be exactly the same as before.

So it's really interesting to see, and for some algorithms,

rescaling is really a potential if we can use it, for others, don't even bother.