-
- [Voiceover] What I wanna do with this video is look
-
at some examples of data represented in different ways,
-
and think about which representation is the best,
-
or can help us answer different questions?
-
So we see this first example.
-
A statistician recorded the length of each
-
of Pixar's first 14 films.
-
The statistician made a dot plot, each dot is a film,
-
a histogram, and a box plot
-
to display the running time data.
-
Which display could be used to find the median?
-
To find the median.
-
All right, so let's look at these displays.
-
So over here we see, this is the dot plot.
-
We have a dot for each of the 14 films.
-
So one film had a running time of 81 minutes.
-
We see that there.
-
One film had a running time of 92.
-
One had a running time of 93.
-
We see one had a running time of 95.
-
We see two had running times of 96 minutes,
-
and so on and so forth.
-
So I claim that I could use this to figure out the median,
-
because I could make a list of all of the running times
-
of the films, I could order them,
-
and then I could find the middle value.
-
I could literally make a list.
-
I could write down 81, and then write down 92,
-
then write down 93, then write down 95,
-
then I could write down 96 twice,
-
and then I could write down 98,
-
then I could write down 100.
-
I think you see where this is going.
-
I could write out the entire list,
-
and then I could find the middle values.
-
So the dot plot, I could definitely use to find the median.
-
Now, what about the histogram?
-
This is the histogram right over here.
-
And the key here is, for a median, to figure out a median,
-
I just need to figure out a list of numbers.
-
I need to figure out a list of numbers.
-
So here, I don't know, they say I have one film
-
that's between 80 and 85,
-
but I don't know its exact running time.
-
Its running time might have been 81 minutes,
-
its running time might have been 84 minutes.
-
So I don't know here, and so I can't really make a list
-
of the running times of the films
-
and find the middle values,
-
so I don't think I'm gonna be able
-
to do it using the histogram.
-
Now, with the box plot right over here,
-
so I'm not gonna click histogram.
-
With the box plot over here,
-
I might not be able to make a list of all the values,
-
but the box plot explicitly tells us what the median is.
-
This middle line in the middle of the box,
-
that tells us the median is, what is this,
-
this median is, if this is 100, this is 99.
-
So this is 95, 96, 97, 98, 99.
-
It explicitly tells us the median is 99.
-
This is actually the easiest for calculating the median.
-
So I'll go with the box plot.
-
So the histogram is of no use to me
-
if I wanna calculate the median.
-
Let's do a couple more of these.
-
Nam owns a used car lot.
-
He checked the odometers of the cars
-
and recorded how far they had driven.
-
He then created both a histogram and a box plot
-
to display the same data, both diagrams are shown below.
-
Which display can be used
-
to find how many vehicles had driven
-
more than 200,000 kilometers?
-
So how many vehicles had driven
-
more than 200,000 kilometers?
-
So it looks like here in this histogram,
-
I have three vehicles that were between 200 and 250,
-
and then I have two vehicles that are between 250 and 300.
-
So it looks pretty clear that I have five vehicles,
-
three that had a mileage between 200,000 and 250,000,
-
and then I had two that had mileage
-
between 250,000 and 300,000.
-
So I may be able to answer the question.
-
Five vehicles had a mileage more than 200,000,
-
and so I would say that the histogram is pretty useful.
-
But let's verify that the box plot isn't so useful.
-
So I wanna know how many vehicles had a mileage
-
more than 200,000.
-
Well, I know that if I have a mileage more than 200,000,
-
I'm going to be in the fourth quartile,
-
but I don't know how many values I have sitting there
-
in the fourth quartile just looking at this data over here,
-
so that's not gonna be useful for answering that question.
-
Let's look at the second question.
-
Which display can be used
-
to find that the median distance,
-
which display can be used to find
-
that the median distance
-
was approximately 140,000 kilometers?
-
Well, to calculate the median,
-
you essentially wanna be able to list all of the numbers
-
and then find the middle number.
-
And over here, I can't list all of the numbers.
-
I know that there's three values that are
-
between zero and 50,000 kilometers,
-
but I don't know what they are.
-
Could be 10,000, 10,000, 10,000.
-
It could be 10,000, 15,000, and 40,000.
-
I don't know what they are, and so if I can't list all
-
of these things and put them in order,
-
I really am going to have trouble finding the middle value.
-
The middle value, it's going to be
-
in this range right around here,
-
but I don't know exactly what it's going to be.
-
The histogram is not useful,
-
because throwing all the values into these buckets.
-
While on the box plot, it explicitly,
-
it directly tells me the median value.
-
This line right over here, the middle of the box,
-
this tells us the median value,
-
and we see that the median value here,
-
this is 140,000 kilometers.
-
Right, this is 100, 110, 120, 130,
-
140,000 kilometers is the median mileage for the cars.
-
And so the box plot clearly...
-
clearly gives us that data.