Now that I've explained r-squared to you, question you might be
asking is this is all well and good Katie but how do I get this information?
You haven't given me an equation for it or anything like that.
And what I want to do instead of giving you a big mathematical equation,
which I don't find that interesting and you could look up on your own.
I want to show you how to get this information out of scikit-learn.
This is the code we were looking at a few videos ago when we were building our
net worth predictor.
Now, I filled in these lines that are importing the linear progression and
making some predictions.
Another thing that happened was I printed some information to the screen,
you may remember.
Two of these things I explained to you already.
The slope and the intercept.
I access that information by looking for the coefficients and
the intercept of the regression.
These are just lines of code that I found in an example online.
But one thing I did promise you we would come back to,
and now we are, is this r-squared score that I was printing out.
And the way access that, is through the reg.score quantity.
This is kind of similar to how we computed the accuracy in
our supervised classifier.
So what we do is we pass the ages, which are the features in this case,
the input, and
the net_worths, which are the outputs, the things we're trying to predict.
And then since the regression has already been fit, up here,
it knows what it thinks the relationship between these two quantities are.
So this is all the information that it needs to compute an r-squared score.
And then, I can just print it out.
So let me take you over here and show you again what that looks like.
I have the same output as I had before,
this might look a little bit familiar so I'm predicting my own net worth.
I have my slope, my intercept.
But now you understand the importance of the r-squared score.
So my r-squared score is about point eight six which is actually really good.
I'm predicting, I'm doing about 85% of what the best I could doing is.
I would say 86% is close to one.
It can be a little bit of an art to translate between an r-squared numerically,
and saying whether it's a good fit or not.
And this is something you'll get some intuition for
overtime, as you play with things.
I would certainly say that .857 is a good r-squared.
We're doing a good job of capturing the relationship between the age and
the net worth of people here.
I've also seen higher r-squareds in my life.
So it's possible that there still could be variables out there.
For example, features that if we were able to incorporate the information from
additional features we would be better able to predict a person's net worth.
So in other words, if we were able to use more than one feature,
sometimes we can push up this r squared even further.
On the other hand, there are sometimes really complicated problems where it's
almost impossible to get an r squared that would be anywhere near this high.
So sometimes, in Political Science for example they're trying to
run a regression that will predict whether a country will go to war.