0:00:00.147,0:00:01.935 - [Voiceover] What I wanna do with this video is look 0:00:01.935,0:00:04.907 at some examples of data represented in different ways, 0:00:04.907,0:00:07.670 and think about which representation is the best, 0:00:07.670,0:00:10.921 or can help us answer different questions? 0:00:10.921,0:00:12.779 So we see this first example. 0:00:12.779,0:00:14.984 A statistician recorded the length of each 0:00:14.984,0:00:17.623 of Pixar's first 14 films. 0:00:17.623,0:00:21.757 The statistician made a dot plot, each dot is a film, 0:00:21.757,0:00:23.961 a histogram, and a box plot 0:00:23.961,0:00:26.423 to display the running time data. 0:00:26.423,0:00:30.254 Which display could be used to find the median? 0:00:30.254,0:00:31.531 To find the median. 0:00:31.531,0:00:34.572 All right, so let's look at these displays. 0:00:34.572,0:00:38.171 So over here we see, this is the dot plot. 0:00:38.171,0:00:40.400 We have a dot for each of the 14 films. 0:00:40.400,0:00:43.996 So one film had a running time of 81 minutes. 0:00:43.996,0:00:44.882 We see that there. 0:00:44.882,0:00:47.064 One film had a running time of 92. 0:00:47.064,0:00:50.083 One had a running time of 93. 0:00:50.083,0:00:52.544 We see one had a running time of 95. 0:00:52.544,0:00:55.841 We see two had running times of 96 minutes, 0:00:55.841,0:00:57.931 and so on and so forth. 0:00:57.931,0:01:01.042 So I claim that I could use this to figure out the median, 0:01:01.042,0:01:03.875 because I could make a list of all of the running times 0:01:03.875,0:01:05.941 of the films, I could order them, 0:01:05.941,0:01:07.503 and then I could find the middle value. 0:01:07.503,0:01:08.565 I could literally make a list. 0:01:08.565,0:01:12.234 I could write down 81, and then write down 92, 0:01:12.234,0:01:14.927 then write down 93, then write down 95, 0:01:14.927,0:01:16.947 then I could write down 96 twice, 0:01:16.947,0:01:19.302 and then I could write down 98, 0:01:19.302,0:01:20.430 then I could write down 100. 0:01:20.430,0:01:22.682 I think you see where this is going. 0:01:22.682,0:01:24.540 I could write out the entire list, 0:01:24.540,0:01:26.537 and then I could find the middle values. 0:01:26.537,0:01:31.134 So the dot plot, I could definitely use to find the median. 0:01:31.134,0:01:32.574 Now, what about the histogram? 0:01:32.574,0:01:35.313 This is the histogram right over here. 0:01:35.313,0:01:38.146 And the key here is, for a median, to figure out a median, 0:01:38.146,0:01:40.415 I just need to figure out a list of numbers. 0:01:40.415,0:01:41.722 I need to figure out a list of numbers. 0:01:41.722,0:01:45.089 So here, I don't know, they say I have one film 0:01:45.089,0:01:47.410 that's between 80 and 85, 0:01:47.410,0:01:49.339 but I don't know its exact running time. 0:01:49.339,0:01:52.101 Its running time might have been 81 minutes, 0:01:52.101,0:01:54.608 its running time might have been 84 minutes. 0:01:54.608,0:01:58.299 So I don't know here, and so I can't really make a list 0:01:58.299,0:02:00.111 of the running times of the films 0:02:00.111,0:02:01.504 and find the middle values, 0:02:01.504,0:02:02.683 so I don't think I'm gonna be able 0:02:02.683,0:02:04.609 to do it using the histogram. 0:02:04.609,0:02:08.586 Now, with the box plot right over here, 0:02:08.586,0:02:10.164 so I'm not gonna click histogram. 0:02:10.164,0:02:11.721 With the box plot over here, 0:02:11.721,0:02:14.358 I might not be able to make a list of all the values, 0:02:14.358,0:02:17.781 but the box plot explicitly tells us what the median is. 0:02:17.781,0:02:20.428 This middle line in the middle of the box, 0:02:20.428,0:02:23.168 that tells us the median is, what is this, 0:02:23.168,0:02:26.744 this median is, if this is 100, this is 99. 0:02:26.744,0:02:29.832 So this is 95, 96, 97, 98, 99. 0:02:29.832,0:02:32.061 It explicitly tells us the median is 99. 0:02:32.061,0:02:34.894 This is actually the easiest for calculating the median. 0:02:34.894,0:02:36.287 So I'll go with the box plot. 0:02:36.287,0:02:38.353 So the histogram is of no use to me 0:02:38.353,0:02:40.001 if I wanna calculate the median. 0:02:40.001,0:02:41.929 Let's do a couple more of these. 0:02:41.929,0:02:44.808 Nam owns a used car lot. 0:02:44.808,0:02:46.596 He checked the odometers of the cars 0:02:46.596,0:02:49.127 and recorded how far they had driven. 0:02:49.127,0:02:51.936 He then created both a histogram and a box plot 0:02:51.936,0:02:55.268 to display the same data, both diagrams are shown below. 0:02:55.998,0:02:58.194 Which display can be used 0:02:58.194,0:03:00.889 to find how many vehicles had driven 0:03:00.889,0:03:04.139 more than 200,000 kilometers? 0:03:04.139,0:03:06.158 So how many vehicles had driven 0:03:06.158,0:03:09.525 more than 200,000 kilometers? 0:03:09.525,0:03:13.264 So it looks like here in this histogram, 0:03:13.264,0:03:16.700 I have three vehicles that were between 200 and 250, 0:03:16.700,0:03:20.415 and then I have two vehicles that are between 250 and 300. 0:03:20.415,0:03:22.158 So it looks pretty clear that I have five vehicles, 0:03:22.158,0:03:26.105 three that had a mileage between 200,000 and 250,000, 0:03:26.105,0:03:27.938 and then I had two that had mileage 0:03:27.938,0:03:30.115 between 250,000 and 300,000. 0:03:30.115,0:03:31.507 So I may be able to answer the question. 0:03:31.507,0:03:36.368 Five vehicles had a mileage more than 200,000, 0:03:36.368,0:03:40.036 and so I would say that the histogram is pretty useful. 0:03:40.036,0:03:42.915 But let's verify that the box plot isn't so useful. 0:03:42.915,0:03:45.029 So I wanna know how many vehicles had a mileage 0:03:45.029,0:03:47.141 more than 200,000. 0:03:47.141,0:03:50.880 Well, I know that if I have a mileage more than 200,000, 0:03:50.880,0:03:54.943 I'm going to be in the fourth quartile, 0:03:54.943,0:03:58.194 but I don't know how many values I have sitting there 0:03:58.194,0:04:01.583 in the fourth quartile just looking at this data over here, 0:04:01.583,0:04:04.671 so that's not gonna be useful for answering that question. 0:04:04.671,0:04:06.227 Let's look at the second question. 0:04:06.227,0:04:07.690 Which display can be used 0:04:07.690,0:04:09.518 to find that the median distance, 0:04:09.518,0:04:11.478 which display can be used to find 0:04:11.478,0:04:12.478 that the median distance 0:04:12.478,0:04:15.420 was approximately 140,000 kilometers? 0:04:15.420,0:04:16.744 Well, to calculate the median, 0:04:16.744,0:04:18.716 you essentially wanna be able to list all of the numbers 0:04:18.716,0:04:20.250 and then find the middle number. 0:04:20.250,0:04:23.315 And over here, I can't list all of the numbers. 0:04:23.315,0:04:25.405 I know that there's three values that are 0:04:25.405,0:04:27.541 between zero and 50,000 kilometers, 0:04:27.541,0:04:28.585 but I don't know what they are. 0:04:28.585,0:04:30.792 Could be 10,000, 10,000, 10,000. 0:04:30.792,0:04:34.483 It could be 10,000, 15,000, and 40,000. 0:04:34.483,0:04:37.020 I don't know what they are, and so if I can't list all 0:04:37.020,0:04:39.142 of these things and put them in order, 0:04:39.146,0:04:42.228 I really am going to have trouble finding the middle value. 0:04:42.228,0:04:45.392 The middle value, it's going to be 0:04:45.392,0:04:47.783 in this range right around here, 0:04:47.783,0:04:49.687 but I don't know exactly what it's going to be. 0:04:49.687,0:04:50.964 The histogram is not useful, 0:04:50.964,0:04:53.918 because throwing all the values into these buckets. 0:04:53.918,0:04:56.258 While on the box plot, it explicitly, 0:04:56.258,0:04:58.116 it directly tells me the median value. 0:04:58.116,0:05:00.508 This line right over here, the middle of the box, 0:05:00.508,0:05:02.737 this tells us the median value, 0:05:02.737,0:05:05.034 and we see that the median value here, 0:05:05.034,0:05:08.425 this is 140,000 kilometers. 0:05:08.425,0:05:11.341 Right, this is 100, 110, 120, 130, 0:05:11.341,0:05:16.011 140,000 kilometers is the median mileage for the cars. 0:05:16.011,0:05:18.808 And so the box plot clearly... 0:05:20.835,0:05:23.242 clearly gives us that data.