-
- [Instructor] We have a
list of 15 numbers here,
-
and what I want to do is
think about the outliers.
-
And to help us with that,
let's actually visualize this,
-
the distribution of actual numbers.
-
So let us do that.
-
So here, on a number line,
-
I have all the numbers from one to 19.
-
And let's see, we have two ones.
-
So I could say that's one
one and then two ones.
-
We have one six.
-
So let's put that six there.
-
We have got a 13,
-
or we have two 13s.
-
So we're gonna go up
here, one 13 and two 13s.
-
Let's see, we have three 14s.
-
So 14,
-
14,
-
and 14.
-
We have a couple of 15s, 15, 15.
-
So 15,
-
15.
-
We have one 16.
-
So that's our 16 there.
-
We have three 18s.
-
One, two, three.
-
So one,
-
two,
-
and then three.
-
And then we have a 19.
-
Then we have a 19.
-
So when you look,
-
when you look visually at
the distribution of numbers,
-
it looks like the meat of the
distribution, so to speak,
-
is in this area, right over here.
-
And so some people might say,
-
"Okay, we have three outliers.
-
"There are these two ones and the six."
-
Some people might say,
-
"Well, the six is kinda close enough.
-
"Maybe only these two ones are outliers."
-
And those would actually be
both reasonable things to say.
-
Now to get on the same page,
-
statisticians will use a rule sometimes.
-
We say, well, anything that is more than
-
one and a half times
the interquartile range
-
from below Q-one or above Q-three,
-
well, those are going to be outliers.
-
Well, what am I talking about?
-
Well, let's actually, let's
figure out the median,
-
Q-one and Q-three here.
-
Then we can figure out
the interquartile range.
-
And then we can figure
out by that definition,
-
what is going to be an outlier?
-
And if that all made sense to you so far,
-
I encourage you to pause this video
-
and try to work through it on your own,
-
or I'll do it for you right now.
-
All right, so what's the median here?
-
Well, the median is the middle number.
-
We have 15 numbers, so the
middle number is going to be
-
whatever number has seven on either side.
-
So it's gonna be the eighth number.
-
One, two, three, four, five, six, seven.
-
Is that right?
-
Yep, six, seven, so that's the median.
-
And then you have one, two,
three, four, five, six, seven
-
numbers on the right side too.
-
So that is the median,
sometimes called Q-two.
-
That is our median.
-
Now what is Q-one?
-
Well, Q-one is going to be the
middle of this first group.
-
This first group has seven numbers in it.
-
And so the middle is going
to be the fourth number.
-
It has three and three,
-
three to the left, three to the right.
-
So that is Q-one.
-
And then Q-three is going
-
to be the middle of this upper group.
-
Well, that also has seven numbers in it.
-
So the middle is going
to be right over there.
-
It has three on either side.
-
So that is Q-three.
-
Now what is the interquartile
range going to be?
-
Interquartile range
-
is going to be equal to
-
Q-three
-
minus Q-one,
-
the difference between 18 and 13.
-
Between 18 and 13,
-
well, that is going to be 18 minus 13,
-
which is equal to five.
-
Now to figure out outliers,
-
well, outliers are gonna
be anything that is below.
-
So outliers,
-
outliers,
-
are going to be less than
-
our Q-one
-
minus 1.5,
-
times our interquartile range.
-
And this, once again, this
isn't some rule of the universe.
-
This is something that statisticians
-
have kind of said, well,
-
if we want to have a better
definition for outliers,
-
let's just agree that
it's something that's
-
more than one and half times
-
the interquartile range below Q-one.
-
Or,
-
or an outlier could be
greater than Q-three
-
plus one and half times
the interquartile range,
-
interquartile range.
-
And once again, this is somewhat,
-
you know, people just
decided it felt right.
-
One could argue it should be 1.6.
-
Or one could argue it should
be one, or two, or whatever.
-
But this is what people
have tended to agree on.
-
So let's think about
what these numbers are.
-
Q-one we already know.
-
So this is going to be 13
-
minus 1.5 times our interquartile range.
-
Our interquartile range here is five.
-
So it's 1.5 times five, which is 7.5.
-
So this is 7.5.
-
13 minus 7.5 is what?
-
13 minus seven is six,
-
and then you subtract another .5, is 5.5.
-
So we have outliers,
-
outliers.
-
Outliers
-
would be less than 5.5.
-
Or
-
the Q-three is 18,
-
this is, once again, 7.5.
-
18 plus 7.5
-
is 25.5,
-
or outliers,
-
outliers greater than 25,
-
25.5.
-
So based on this, we have a,
-
kind of a numerical definition
for what's an outlier.
-
We're not just subjectively saying,
-
well, this feels right
or that feels right.
-
And based on this, we
only have two outliers,
-
that only these two
ones are less than 5.5.
-
Only these two ones are less than 5.5.
-
This is the cutoff, right over here.
-
So this dot just happened to make it.
-
And we don't have any
outliers on the high side.
-
Now another thing to think about
-
is drawing box-and-whiskers plots
-
based on Q-one, our median, our range,
-
all the range of numbers.
-
And you could do it either
-
taking in consideration your outliers
-
or not taking into
consideration your outliers.
-
So there's a couple of
ways that we can do it.
-
So let me actually clear,
let me clear all of this.
-
We've figured out all of this stuff.
-
So let me clear all of that out.
-
And let's actually draw
a box-and-whiskers plot.
-
So I'll put another,
-
another, actually let me do two here.
-
That's one,
-
and then let me put
another one down there.
-
And then this is another.
-
Now if we were to just draw
-
a classic box-and-whiskers plot here,
-
we would say, all right,
our median's at 14.
-
And actually, I'll do it both ways.
-
Our median's at 14.
-
Median's at 14.
-
Q-one's at 13.
-
Q-one's at 13,
-
and Q-one's at 13.
-
Q-three is at 18.
-
Q-three is at 18,
-
Q-three is 18.
-
So that's the box part.
-
Now let me draw that as an actual,
-
let me actually draw that as a box.
-
So my best attempt,
-
there you go.
-
That's the box.
-
And this is also a box.
-
So far, I'm doing the exact same thing.
-
Now if we don't want to consider outliers,
-
we would say, well, what's
the entire range here?
-
Well, we have things that go
from one all the way to 19.
-
So one way to do it is to, hey,
-
we start at one.
-
And so our entire range, we go,
-
actually let me draw it a
little bit better than that.
-
We're going all the way,
-
all the way from one
-
to 19.
-
Now in this one, we're
including everything.
-
We're including even these two outliers.
-
But if we don't want to
include those outliers,
-
we want to make it clear
that they're outliers,
-
well, let's not include them.
-
And what we can do instead is say,
-
all right, including
(chuckles) our non-outliers,
-
we would start at six
-
'cause six we're saying
is in our data set,
-
but it is not an outlier.
-
Let me make this look better.
-
So we're gonna,
-
we are going to
-
start at six and go all the way to 19.
-
And then to say that
we have these outliers,
-
we would put this, we
have outliers over there.
-
So once again, this is
a box-and-whiskers plot
-
of the same data set without outliers.
-
And this is one where we make specific,
-
we make it clear where
the outliers actually are.