-
- [Voiceover] In this video, I wanna do some examples
-
looking at distributions, in particular, different features
-
in distributions like clusters, gaps, and peaks.
-
So over here, I wanna do some examples.
-
Which of the following are accurate descriptions
-
of the distribution below?
-
Select all that apply.
-
So the first statement is the distribution has an outlier.
-
So an outlier is a data point that's way off
-
of where the other data points are,
-
it's way larger or way smaller
-
than where all of the other data points
-
seem to be clustered and if we look over here,
-
we have a lot of data points between zero and six.
-
And let's just think about what they're measuring:
-
this is shelf time for each apple
-
at Gorg's Grocier.
-
So, for example, we see there's one, two, three, four,
-
five, six, seven apples that have a shelf life
-
of zero days, so (laughs), they're about to go bad.
-
You see you have one, two, three, four, five, six, seven,
-
eight apples that are gonna be good for another day.
-
You have two apples that are gonna be good
-
for another six days, and you have one apple
-
that's gonna be good for 10 days, and this is unusual.
-
This is an outlier here, it has a way larger shelf life
-
than all of the other data, so I would say
-
this definitely does have an outlier.
-
We just have this one data point
-
sitting all the way to the right, way larger,
-
way more shelf life than everything else, so it definitely
-
has an outlier, and this one would be the outlier.
-
The distribution has a cluster from four to six days.
-
And we indeed do see a cluster from four to six days.
-
A cluster, you can imagine, it's a grouping of data
-
that's sitting there, or you have a grouping of apples
-
that have a shelf life between four and six days,
-
and you definitely do see that cluster there.
-
And since I already selected two things,
-
I'm definitely not gonna select none of the above.
-
Let me check my answer.
-
Let me do a few more of these.
-
Which of the following are accurate descriptions
-
of the distribution below?
-
And once again we're going to select all that apply.
-
So the distribution has an outlier.
-
So let's see this distribution.
-
I do have a data point here that is at the high end
-
and I have another data point here that's at the low end,
-
but I don't have any data points that are sitting
-
far above or far below the bulk of the data.
-
If I had a data point that was out here, then yeah,
-
I would say that was an outlier to the right,
-
or a positive outlier, if I had a data point way to the left
-
off the screen over here, maybe that would be an outlier,
-
but I don't really see any obvious outliers.
-
All of the data, it's pretty clustered together.
-
So I would not say that the distribution has an outlier.
-
The distribution has a peak at 22 degrees.
-
Yeah, it does indeed look like we have,
-
and let's just look at what we're actually measuring:
-
high temperature each day in Edgeton, Iowa in July.
-
So it does indeed look like we have the most number
-
of days that had a high temperature at 22,
-
most number of days in July had a high tempurature
-
at 22 degrees Celsius, so that is a peak.
-
You can see it, if you imagine this as kind of a mountain
-
this is a peak right here, this is a high point.
-
You have, at least locally, the most number of days
-
at 22 degrees Celsius.
-
So I would say it definitely has a peak there.
-
Since I selected something, I'm not gonna select
-
none of the above.
-
Let's do a couple more of these.
-
Which of the following are accurate descrptions
-
of the distribution below?
-
So the first one, the distribution has an outlier.
-
So...
-
number of guests by day at Seth's Sandwich Shop.
-
So, let's see, the lowest...
-
They have no days...
-
No days where he had between zero and 19 guests,
-
no days where he had between 20 and 39 guests,
-
looks like there's about nine days
-
where he had between 40 and 59 guests,
-
looks like 20 days where he had between 60 and 79 guests,
-
all the way where it looks like maybe 8 days
-
that he had between 180 and 199 guests.
-
But the question of outliers, there doesn't seem to be
-
any day where he had an unusual number of guests.
-
There's not a day that's way out here,
-
where he had, like, 500 guests.
-
So I would say this distribution does not have an outlier.
-
The distribution has a cluster from zero to 39 guests.
-
So zero to 39 guests is right over here, zero to 39 guests.
-
And there is no days where he had between zero and 39 guests
-
neither zero to 19, or 20 to 39.
-
So there's definitely not a cluster there.
-
I would say that the cluster would be between days
-
that had between 40 and 199 guests.
-
Definitely not zero and 39, there was no days
-
that were between zero and 39 guests.
-
So I would say none of the above very confidently.
-
Let's do one more of these.
-
Which of the following are accurate descriptions
-
of the distribution below?
-
(laughs) Alright.
-
The distribution has a peak from 12 to 13 points.
-
Let me see what this is measuring, what this data is about.
-
Test scores by student in Mrs. Frine's class.
-
So you had one student who got between a zero and a one
-
on a 20-point scale, so got between,
-
I guess out of 20 questions, got between zero and one point.
-
And then you see that there's no students got
-
between two and three, or four and five, or six and seven.
-
Then we have another student who got between eight and nine,
-
looks like three students got between 10 and 11,
-
and then we keep increasing, this looks like about
-
12 students got either a 16 or a 17,
-
or something in between maybe,
-
if you could get decimal points on that test.
-
And then it looks like 10 students got from 18 to 19.
-
Alright, so this says the distribution has a peak
-
from 12 to 13 points, 12 to 13 points,
-
there were five students, but this isn't a peak.
-
If you just go to 14 to 15 points, you have more students.
-
So this is definitely not a peak.
-
If you were looking at this as a mountain of some kind,
-
you definitely wouldn't describe this point as a peak.
-
You would say this distribution has a peak,
-
it has the most number of students
-
who got between 16 and 17 points,
-
so that's the peak right there, not 12 to 13 points.
-
So I would not select that first choice.
-
The distribution has an outlier.
-
Well, yeah, look at this: you have this outlier.
-
Most of the students scored between eight and 19 points,
-
and then you have this one student
-
who got between zero and one, it's really an outlier.
-
You even see this when you look at it visually,
-
it's not even connected to the rest of the distribution.
-
It's way to the left.
-
If something is way to the left or way to the right,
-
that's an outlier if it's unusually low or unusually high.
-
So I would say this distribution definitely does
-
have an outlier, and I'm not gonna pick none of the above
-
since I found a choice.
-
And I think we're all done.