WEBVTT 00:00:00.147 --> 00:00:01.935 - [Voiceover] What I wanna do with this video is look 00:00:01.935 --> 00:00:04.907 at some examples of data represented in different ways, 00:00:04.907 --> 00:00:07.670 and think about which representation is the best, 00:00:07.670 --> 00:00:10.921 or can help us answer different questions? 00:00:10.921 --> 00:00:12.779 So we see this first example. 00:00:12.779 --> 00:00:14.984 A statistician recorded the length of each 00:00:14.984 --> 00:00:17.623 of Pixar's first 14 films. 00:00:17.623 --> 00:00:21.757 The statistician made a dot plot, each dot is a film, 00:00:21.757 --> 00:00:23.961 a histogram, and a box plot 00:00:23.961 --> 00:00:26.423 to display the running time data. 00:00:26.423 --> 00:00:30.254 Which display could be used to find the median? 00:00:30.254 --> 00:00:31.531 To find the median. 00:00:31.531 --> 00:00:34.572 All right, so let's look at these displays. 00:00:34.572 --> 00:00:38.171 So over here we see, this is the dot plot. 00:00:38.171 --> 00:00:40.400 We have a dot for each of the 14 films. 00:00:40.400 --> 00:00:43.996 So one film had a running time of 81 minutes. 00:00:43.996 --> 00:00:44.882 We see that there. 00:00:44.882 --> 00:00:47.064 One film had a running time of 92. 00:00:47.064 --> 00:00:50.083 One had a running time of 93. 00:00:50.083 --> 00:00:52.544 We see one had a running time of 95. 00:00:52.544 --> 00:00:55.841 We see two had running times of 96 minutes, 00:00:55.841 --> 00:00:57.931 and so on and so forth. 00:00:57.931 --> 00:01:01.042 So I claim that I could use this to figure out the median, 00:01:01.042 --> 00:01:03.875 because I could make a list of all of the running times 00:01:03.875 --> 00:01:05.941 of the films, I could order them, 00:01:05.941 --> 00:01:07.503 and then I could find the middle value. 00:01:07.503 --> 00:01:08.565 I could literally make a list. 00:01:08.565 --> 00:01:12.234 I could write down 81, and then write down 92, 00:01:12.234 --> 00:01:14.927 then write down 93, then write down 95, 00:01:14.927 --> 00:01:16.947 then I could write down 96 twice, 00:01:16.947 --> 00:01:19.302 and then I could write down 98, 00:01:19.302 --> 00:01:20.430 then I could write down 100. 00:01:20.430 --> 00:01:22.682 I think you see where this is going. 00:01:22.682 --> 00:01:24.540 I could write out the entire list, 00:01:24.540 --> 00:01:26.537 and then I could find the middle values. 00:01:26.537 --> 00:01:31.134 So the dot plot, I could definitely use to find the median. 00:01:31.134 --> 00:01:32.574 Now, what about the histogram? 00:01:32.574 --> 00:01:35.313 This is the histogram right over here. 00:01:35.313 --> 00:01:38.146 And the key here is, for a median, to figure out a median, 00:01:38.146 --> 00:01:40.415 I just need to figure out a list of numbers. 00:01:40.415 --> 00:01:41.722 I need to figure out a list of numbers. 00:01:41.722 --> 00:01:45.089 So here, I don't know, they say I have one film 00:01:45.089 --> 00:01:47.410 that's between 80 and 85, 00:01:47.410 --> 00:01:49.339 but I don't know its exact running time. 00:01:49.339 --> 00:01:52.101 Its running time might have been 81 minutes, 00:01:52.101 --> 00:01:54.608 its running time might have been 84 minutes. 00:01:54.608 --> 00:01:58.299 So I don't know here, and so I can't really make a list 00:01:58.299 --> 00:02:00.111 of the running times of the films 00:02:00.111 --> 00:02:01.504 and find the middle values, 00:02:01.504 --> 00:02:02.683 so I don't think I'm gonna be able 00:02:02.683 --> 00:02:04.609 to do it using the histogram. 00:02:04.609 --> 00:02:08.586 Now, with the box plot right over here, 00:02:08.586 --> 00:02:10.164 so I'm not gonna click histogram. 00:02:10.164 --> 00:02:11.721 With the box plot over here, 00:02:11.721 --> 00:02:14.358 I might not be able to make a list of all the values, 00:02:14.358 --> 00:02:17.781 but the box plot explicitly tells us what the median is. 00:02:17.781 --> 00:02:20.428 This middle line in the middle of the box, 00:02:20.428 --> 00:02:23.168 that tells us the median is, what is this, 00:02:23.168 --> 00:02:26.744 this median is, if this is 100, this is 99. 00:02:26.744 --> 00:02:29.832 So this is 95, 96, 97, 98, 99. 00:02:29.832 --> 00:02:32.061 It explicitly tells us the median is 99. 00:02:32.061 --> 00:02:34.894 This is actually the easiest for calculating the median. 00:02:34.894 --> 00:02:36.287 So I'll go with the box plot. 00:02:36.287 --> 00:02:38.353 So the histogram is of no use to me 00:02:38.353 --> 00:02:40.001 if I wanna calculate the median. 00:02:40.001 --> 00:02:41.929 Let's do a couple more of these. 00:02:41.929 --> 00:02:44.808 Nam owns a used car lot. 00:02:44.808 --> 00:02:46.596 He checked the odometers of the cars 00:02:46.596 --> 00:02:49.127 and recorded how far they had driven. 00:02:49.127 --> 00:02:51.936 He then created both a histogram and a box plot 00:02:51.936 --> 00:02:55.268 to display the same data, both diagrams are shown below. 00:02:55.998 --> 00:02:58.194 Which display can be used 00:02:58.194 --> 00:03:00.889 to find how many vehicles had driven 00:03:00.889 --> 00:03:04.139 more than 200,000 kilometers? 00:03:04.139 --> 00:03:06.158 So how many vehicles had driven 00:03:06.158 --> 00:03:09.525 more than 200,000 kilometers? 00:03:09.525 --> 00:03:13.264 So it looks like here in this histogram, 00:03:13.264 --> 00:03:16.700 I have three vehicles that were between 200 and 250, 00:03:16.700 --> 00:03:20.415 and then I have two vehicles that are between 250 and 300. 00:03:20.415 --> 00:03:22.158 So it looks pretty clear that I have five vehicles, 00:03:22.158 --> 00:03:26.105 three that had a mileage between 200,000 and 250,000, 00:03:26.105 --> 00:03:27.938 and then I had two that had mileage 00:03:27.938 --> 00:03:30.115 between 250,000 and 300,000. 00:03:30.115 --> 00:03:31.507 So I may be able to answer the question. 00:03:31.507 --> 00:03:36.368 Five vehicles had a mileage more than 200,000, 00:03:36.368 --> 00:03:40.036 and so I would say that the histogram is pretty useful. 00:03:40.036 --> 00:03:42.915 But let's verify that the box plot isn't so useful. 00:03:42.915 --> 00:03:45.029 So I wanna know how many vehicles had a mileage 00:03:45.029 --> 00:03:47.141 more than 200,000. 00:03:47.141 --> 00:03:50.880 Well, I know that if I have a mileage more than 200,000, 00:03:50.880 --> 00:03:54.943 I'm going to be in the fourth quartile, 00:03:54.943 --> 00:03:58.194 but I don't know how many values I have sitting there 00:03:58.194 --> 00:04:01.583 in the fourth quartile just looking at this data over here, 00:04:01.583 --> 00:04:04.671 so that's not gonna be useful for answering that question. 00:04:04.671 --> 00:04:06.227 Let's look at the second question. 00:04:06.227 --> 00:04:07.690 Which display can be used 00:04:07.690 --> 00:04:09.518 to find that the median distance, 00:04:09.518 --> 00:04:11.478 which display can be used to find 00:04:11.478 --> 00:04:12.478 that the median distance 00:04:12.478 --> 00:04:15.420 was approximately 140,000 kilometers? 00:04:15.420 --> 00:04:16.744 Well, to calculate the median, 00:04:16.744 --> 00:04:18.716 you essentially wanna be able to list all of the numbers 00:04:18.716 --> 00:04:20.250 and then find the middle number. 00:04:20.250 --> 00:04:23.315 And over here, I can't list all of the numbers. 00:04:23.315 --> 00:04:25.405 I know that there's three values that are 00:04:25.405 --> 00:04:27.541 between zero and 50,000 kilometers, 00:04:27.541 --> 00:04:28.585 but I don't know what they are. 00:04:28.585 --> 00:04:30.792 Could be 10,000, 10,000, 10,000. 00:04:30.792 --> 00:04:34.483 It could be 10,000, 15,000, and 40,000. 00:04:34.483 --> 00:04:37.020 I don't know what they are, and so if I can't list all 00:04:37.020 --> 00:04:39.142 of these things and put them in order, 00:04:39.146 --> 00:04:42.228 I really am going to have trouble finding the middle value. 00:04:42.228 --> 00:04:45.392 The middle value, it's going to be 00:04:45.392 --> 00:04:47.783 in this range right around here, 00:04:47.783 --> 00:04:49.687 but I don't know exactly what it's going to be. 00:04:49.687 --> 00:04:50.964 The histogram is not useful, 00:04:50.964 --> 00:04:53.918 because throwing all the values into these buckets. 00:04:53.918 --> 00:04:56.258 While on the box plot, it explicitly, 00:04:56.258 --> 00:04:58.116 it directly tells me the median value. 00:04:58.116 --> 00:05:00.508 This line right over here, the middle of the box, 00:05:00.508 --> 00:05:02.737 this tells us the median value, 00:05:02.737 --> 00:05:05.034 and we see that the median value here, 00:05:05.034 --> 00:05:08.425 this is 140,000 kilometers. 00:05:08.425 --> 00:05:11.341 Right, this is 100, 110, 120, 130, 00:05:11.341 --> 00:05:16.011 140,000 kilometers is the median mileage for the cars. 00:05:16.011 --> 00:05:18.808 And so the box plot clearly... 00:05:20.835 --> 00:05:23.242 clearly gives us that data.