-
♪ [music] ♪
-
- [Thomas] What is an outlier?
-
You probably already have
a good intuition of it.
-
It's that 6'10" kid in high school
who towers over his classmates.
-
It's that little guy
outeating people twice his size.
-
Outliers are those
that don't fit the pattern
-
we're used to seeing.
-
Let's go back to our data set.
-
If you remember,
we're examining the relationship
-
between professors' beauty scores
and their student evaluations.
-
We generally see
that better-looking teachers
-
get better evaluations.
-
Is there an outlier here?
-
This point.
-
It shows a professor
with an extremely high beauty score
-
but a very low evaluation score.
-
So, I guess in this case,
looks aren't everything.
-
A single outlier
can have a major effect.
-
We can see this in our data
by removing or adding points
-
and seeing the effect
on the regression line.
-
Let me show you.
-
If you right click
or control click,
-
you can remove a point
from our regression analysis.
-
Let's remove this outlier,
and see what happens.
-
The dotted line
is our old regression line,
-
and the solid one is our new one.
-
You can see it's gotten steeper.
-
That makes sense,
as we're removing an outlier
-
that is pulling the line down.
-
You can also see down below
-
that our slope value has changed
from 0.2 to 0.3.
-
So removing a single outlier
resulted in us finding
-
a much stronger relationship
-
between beauty
and evaluation scores.
-
If you remember, in our last video
we did an exercise
-
where we predicted the expected
change in evaluation score
-
when going from a 2 to a 7
in beauty score.
-
Our old slope was 0.2,
-
which meant a 1-point
predicted increase in evaluation.
-
With this new line,
-
we'd predict an increase
of 1.5 points in evaluation.
-
That's a 50% stronger effect,
-
and that's the power of an outlier.
-
That one really good-looking
but really bad teacher
-
is changing our model a lot.
-
The point is, regressions
can be sensitive to outliers.
-
If you want
to get your outlier back,
-
you can just click the undo button.
-
Additionally, we can add
data points to our regression
-
by just clicking
anywhere in the canvas.
-
You can see, again,
the line moves each time,
-
showing my new regression line
with a solid line
-
and the previous line
with a dotted line.
-
Now it's your turn.
-
I'd like you to try two things.
-
First, remove a point
so that the relationship weakens.
-
The line should get flatter,
so the slope should go down.
-
And, second, add some data points,
to make the relationship stronger.
-
So pause the video now,
and try it out.
-
- [Narrator] Find the answer
-
by clicking to see
the data in this video.
-
The link is also
in the video description below.
-
If you just want
to see the answer already,
-
hang tight.
-
Hopefully you've had some fun
adding and removing data points.
-
It's good to play with data
to build your intuition.
-
Let's cover the first question --
-
how to remove a point
to make the relationship weaker.
-
For example, remove this point.
-
You would have seen a flatter line
-
and our slope go down
from 0.2 to 0.14.
-
So how about to make
the relationship stronger?
-
Let's go back
to our original data set,
-
with a slope of 0.2.
-
If I add a point,
way up in the upper-right,
-
I can see my slope go up from 0.2.
-
I can also add one
to the lower-left,
-
and see another increase.
-
The more points I add
to these two regions,
-
the stronger the relationship gets.
-
So now that we've seen
the power of outliers,
-
your next question might be --
can I just remove the outlier?
-
it's important to remember
we're typically using a regression
-
to make a prediction
about the real world.
-
In the real world,
-
we see guys who are 130 pounds
eat 60 hot dogs.
-
So if you want to predict
hot dog eating based on weight,
-
removing an outlier without thought
can make your predictions worse.
-
However, in certain cases,
-
it might make sense
to remove an outlier.
-
What if you are trying to predict
the number of hot dogs
-
an average person would eat
at your upcoming barbeque?
-
Then, including Kobayashi would make
your prediction much worse.
-
In this case,
he's clearly an outlier,
-
and should be omitted
from the model.
-
You might have also noticed
something else changing
-
when we removed the outlier --
these funny numbers.
-
What do those mean?
-
That's what we turn to next.
-
- [Narrator] Congratulations!
-
You're one step closer
to being a data ninja!
-
If you want to master
linear regression,
-
Click to train up
-
on everything
from p-values to residuals,
-
and then prepare to unleash fury
-
on whatever poor data set
crosses your path.
-
Or, you can explore other skills
-
in our Understanding Data playlist.
-
♪ [music] ♪