< Return to Video

Comparing dot plots, histograms and box plots

  • 0:00 - 0:02
    - [Voiceover] What I wanna do with this video is look
  • 0:02 - 0:05
    at some examples of data represented in different ways,
  • 0:05 - 0:08
    and think about which representation is the best,
  • 0:08 - 0:11
    or can help us answer different questions?
  • 0:11 - 0:13
    So we see this first example.
  • 0:13 - 0:15
    A statistician recorded the length of each
  • 0:15 - 0:18
    of Pixar's first 14 films.
  • 0:18 - 0:22
    The statistician made a dot plot, each dot is a film,
  • 0:22 - 0:24
    a histogram, and a box plot
  • 0:24 - 0:26
    to display the running time data.
  • 0:26 - 0:30
    Which display could be used to find the median?
  • 0:30 - 0:32
    To find the median.
  • 0:32 - 0:35
    All right, so let's look at these displays.
  • 0:35 - 0:38
    So over here we see, this is the dot plot.
  • 0:38 - 0:40
    We have a dot for each of the 14 films.
  • 0:40 - 0:44
    So one film had a running time of 81 minutes.
  • 0:44 - 0:45
    We see that there.
  • 0:45 - 0:47
    One film had a running time of 92.
  • 0:47 - 0:50
    One had a running time of 93.
  • 0:50 - 0:53
    We see one had a running time of 95.
  • 0:53 - 0:56
    We see two had running times of 96 minutes,
  • 0:56 - 0:58
    and so on and so forth.
  • 0:58 - 1:01
    So I claim that I could use this to figure out the median,
  • 1:01 - 1:04
    because I could make a list of all of the running times
  • 1:04 - 1:06
    of the films, I could order them,
  • 1:06 - 1:08
    and then I could find the middle value.
  • 1:08 - 1:09
    I could literally make a list.
  • 1:09 - 1:12
    I could write down 81, and then write down 92,
  • 1:12 - 1:15
    then write down 93, then write down 95,
  • 1:15 - 1:17
    then I could write down 96 twice,
  • 1:17 - 1:19
    and then I could write down 98,
  • 1:19 - 1:20
    then I could write down 100.
  • 1:20 - 1:23
    I think you see where this is going.
  • 1:23 - 1:25
    I could write out the entire list,
  • 1:25 - 1:27
    and then I could find the middle values.
  • 1:27 - 1:31
    So the dot plot, I could definitely use to find the median.
  • 1:31 - 1:33
    Now, what about the histogram?
  • 1:33 - 1:35
    This is the histogram right over here.
  • 1:35 - 1:38
    And the key here is, for a median, to figure out a median,
  • 1:38 - 1:40
    I just need to figure out a list of numbers.
  • 1:40 - 1:42
    I need to figure out a list of numbers.
  • 1:42 - 1:45
    So here, I don't know, they say I have one film
  • 1:45 - 1:47
    that's between 80 and 85,
  • 1:47 - 1:49
    but I don't know its exact running time.
  • 1:49 - 1:52
    Its running time might have been 81 minutes,
  • 1:52 - 1:55
    its running time might have been 84 minutes.
  • 1:55 - 1:58
    So I don't know here, and so I can't really make a list
  • 1:58 - 2:00
    of the running times of the films
  • 2:00 - 2:02
    and find the middle values,
  • 2:02 - 2:03
    so I don't think I'm gonna be able
  • 2:03 - 2:05
    to do it using the histogram.
  • 2:05 - 2:09
    Now, with the box plot right over here,
  • 2:09 - 2:10
    so I'm not gonna click histogram.
  • 2:10 - 2:12
    With the box plot over here,
  • 2:12 - 2:14
    I might not be able to make a list of all the values,
  • 2:14 - 2:18
    but the box plot explicitly tells us what the median is.
  • 2:18 - 2:20
    This middle line in the middle of the box,
  • 2:20 - 2:23
    that tells us the median is, what is this,
  • 2:23 - 2:27
    this median is, if this is 100, this is 99.
  • 2:27 - 2:30
    So this is 95, 96, 97, 98, 99.
  • 2:30 - 2:32
    It explicitly tells us the median is 99.
  • 2:32 - 2:35
    This is actually the easiest for calculating the median.
  • 2:35 - 2:36
    So I'll go with the box plot.
  • 2:36 - 2:38
    So the histogram is of no use to me
  • 2:38 - 2:40
    if I wanna calculate the median.
  • 2:40 - 2:42
    Let's do a couple more of these.
  • 2:42 - 2:45
    Nam owns a used car lot.
  • 2:45 - 2:47
    He checked the odometers of the cars
  • 2:47 - 2:49
    and recorded how far they had driven.
  • 2:49 - 2:52
    He then created both a histogram and a box plot
  • 2:52 - 2:55
    to display the same data, both diagrams are shown below.
  • 2:56 - 2:58
    Which display can be used
  • 2:58 - 3:01
    to find how many vehicles had driven
  • 3:01 - 3:04
    more than 200,000 kilometers?
  • 3:04 - 3:06
    So how many vehicles had driven
  • 3:06 - 3:10
    more than 200,000 kilometers?
  • 3:10 - 3:13
    So it looks like here in this histogram,
  • 3:13 - 3:17
    I have three vehicles that were between 200 and 250,
  • 3:17 - 3:20
    and then I have two vehicles that are between 250 and 300.
  • 3:20 - 3:22
    So it looks pretty clear that I have five vehicles,
  • 3:22 - 3:26
    three that had a mileage between 200,000 and 250,000,
  • 3:26 - 3:28
    and then I had two that had mileage
  • 3:28 - 3:30
    between 250,000 and 300,000.
  • 3:30 - 3:32
    So I may be able to answer the question.
  • 3:32 - 3:36
    Five vehicles had a mileage more than 200,000,
  • 3:36 - 3:40
    and so I would say that the histogram is pretty useful.
  • 3:40 - 3:43
    But let's verify that the box plot isn't so useful.
  • 3:43 - 3:45
    So I wanna know how many vehicles had a mileage
  • 3:45 - 3:47
    more than 200,000.
  • 3:47 - 3:51
    Well, I know that if I have a mileage more than 200,000,
  • 3:51 - 3:55
    I'm going to be in the fourth quartile,
  • 3:55 - 3:58
    but I don't know how many values I have sitting there
  • 3:58 - 4:02
    in the fourth quartile just looking at this data over here,
  • 4:02 - 4:05
    so that's not gonna be useful for answering that question.
  • 4:05 - 4:06
    Let's look at the second question.
  • 4:06 - 4:08
    Which display can be used
  • 4:08 - 4:10
    to find that the median distance,
  • 4:10 - 4:11
    which display can be used to find
  • 4:11 - 4:12
    that the median distance
  • 4:12 - 4:15
    was approximately 140,000 kilometers?
  • 4:15 - 4:17
    Well, to calculate the median,
  • 4:17 - 4:19
    you essentially wanna be able to list all of the numbers
  • 4:19 - 4:20
    and then find the middle number.
  • 4:20 - 4:23
    And over here, I can't list all of the numbers.
  • 4:23 - 4:25
    I know that there's three values that are
  • 4:25 - 4:28
    between zero and 50,000 kilometers,
  • 4:28 - 4:29
    but I don't know what they are.
  • 4:29 - 4:31
    Could be 10,000, 10,000, 10,000.
  • 4:31 - 4:34
    It could be 10,000, 15,000, and 40,000.
  • 4:34 - 4:37
    I don't know what they are, and so if I can't list all
  • 4:37 - 4:39
    of these things and put them in order,
  • 4:39 - 4:42
    I really am going to have trouble finding the middle value.
  • 4:42 - 4:45
    The middle value, it's going to be
  • 4:45 - 4:48
    in this range right around here,
  • 4:48 - 4:50
    but I don't know exactly what it's going to be.
  • 4:50 - 4:51
    The histogram is not useful,
  • 4:51 - 4:54
    because throwing all the values into these buckets.
  • 4:54 - 4:56
    While on the box plot, it explicitly,
  • 4:56 - 4:58
    it directly tells me the median value.
  • 4:58 - 5:01
    This line right over here, the middle of the box,
  • 5:01 - 5:03
    this tells us the median value,
  • 5:03 - 5:05
    and we see that the median value here,
  • 5:05 - 5:08
    this is 140,000 kilometers.
  • 5:08 - 5:11
    Right, this is 100, 110, 120, 130,
  • 5:11 - 5:16
    140,000 kilometers is the median mileage for the cars.
  • 5:16 - 5:19
    And so the box plot clearly...
  • 5:21 - 5:23
    clearly gives us that data.
Title:
Comparing dot plots, histograms and box plots
Description:

Comparing dot plots, histograms and box plots

more » « less
Video Language:
English
Duration:
05:26

English subtitles

Revisions