1 00:00:00,147 --> 00:00:01,935 - [Voiceover] What I wanna do with this video is look 2 00:00:01,935 --> 00:00:04,907 at some examples of data represented in different ways, 3 00:00:04,907 --> 00:00:07,670 and think about which representation is the best, 4 00:00:07,670 --> 00:00:10,921 or can help us answer different questions? 5 00:00:10,921 --> 00:00:12,779 So we see this first example. 6 00:00:12,779 --> 00:00:14,984 A statistician recorded the length of each 7 00:00:14,984 --> 00:00:17,623 of Pixar's first 14 films. 8 00:00:17,623 --> 00:00:21,757 The statistician made a dot plot, each dot is a film, 9 00:00:21,757 --> 00:00:23,961 a histogram, and a box plot 10 00:00:23,961 --> 00:00:26,423 to display the running time data. 11 00:00:26,423 --> 00:00:30,254 Which display could be used to find the median? 12 00:00:30,254 --> 00:00:31,531 To find the median. 13 00:00:31,531 --> 00:00:34,572 All right, so let's look at these displays. 14 00:00:34,572 --> 00:00:38,171 So over here we see, this is the dot plot. 15 00:00:38,171 --> 00:00:40,400 We have a dot for each of the 14 films. 16 00:00:40,400 --> 00:00:43,996 So one film had a running time of 81 minutes. 17 00:00:43,996 --> 00:00:44,882 We see that there. 18 00:00:44,882 --> 00:00:47,064 One film had a running time of 92. 19 00:00:47,064 --> 00:00:50,083 One had a running time of 93. 20 00:00:50,083 --> 00:00:52,544 We see one had a running time of 95. 21 00:00:52,544 --> 00:00:55,841 We see two had running times of 96 minutes, 22 00:00:55,841 --> 00:00:57,931 and so on and so forth. 23 00:00:57,931 --> 00:01:01,042 So I claim that I could use this to figure out the median, 24 00:01:01,042 --> 00:01:03,875 because I could make a list of all of the running times 25 00:01:03,875 --> 00:01:05,941 of the films, I could order them, 26 00:01:05,941 --> 00:01:07,503 and then I could find the middle value. 27 00:01:07,503 --> 00:01:08,565 I could literally make a list. 28 00:01:08,565 --> 00:01:12,234 I could write down 81, and then write down 92, 29 00:01:12,234 --> 00:01:14,927 then write down 93, then write down 95, 30 00:01:14,927 --> 00:01:16,947 then I could write down 96 twice, 31 00:01:16,947 --> 00:01:19,302 and then I could write down 98, 32 00:01:19,302 --> 00:01:20,430 then I could write down 100. 33 00:01:20,430 --> 00:01:22,682 I think you see where this is going. 34 00:01:22,682 --> 00:01:24,540 I could write out the entire list, 35 00:01:24,540 --> 00:01:26,537 and then I could find the middle values. 36 00:01:26,537 --> 00:01:31,134 So the dot plot, I could definitely use to find the median. 37 00:01:31,134 --> 00:01:32,574 Now, what about the histogram? 38 00:01:32,574 --> 00:01:35,313 This is the histogram right over here. 39 00:01:35,313 --> 00:01:38,146 And the key here is, for a median, to figure out a median, 40 00:01:38,146 --> 00:01:40,415 I just need to figure out a list of numbers. 41 00:01:40,415 --> 00:01:41,722 I need to figure out a list of numbers. 42 00:01:41,722 --> 00:01:45,089 So here, I don't know, they say I have one film 43 00:01:45,089 --> 00:01:47,410 that's between 80 and 85, 44 00:01:47,410 --> 00:01:49,339 but I don't know its exact running time. 45 00:01:49,339 --> 00:01:52,101 Its running time might have been 81 minutes, 46 00:01:52,101 --> 00:01:54,608 its running time might have been 84 minutes. 47 00:01:54,608 --> 00:01:58,299 So I don't know here, and so I can't really make a list 48 00:01:58,299 --> 00:02:00,111 of the running times of the films 49 00:02:00,111 --> 00:02:01,504 and find the middle values, 50 00:02:01,504 --> 00:02:02,683 so I don't think I'm gonna be able 51 00:02:02,683 --> 00:02:04,609 to do it using the histogram. 52 00:02:04,609 --> 00:02:08,586 Now, with the box plot right over here, 53 00:02:08,586 --> 00:02:10,164 so I'm not gonna click histogram. 54 00:02:10,164 --> 00:02:11,721 With the box plot over here, 55 00:02:11,721 --> 00:02:14,358 I might not be able to make a list of all the values, 56 00:02:14,358 --> 00:02:17,781 but the box plot explicitly tells us what the median is. 57 00:02:17,781 --> 00:02:20,428 This middle line in the middle of the box, 58 00:02:20,428 --> 00:02:23,168 that tells us the median is, what is this, 59 00:02:23,168 --> 00:02:26,744 this median is, if this is 100, this is 99. 60 00:02:26,744 --> 00:02:29,832 So this is 95, 96, 97, 98, 99. 61 00:02:29,832 --> 00:02:32,061 It explicitly tells us the median is 99. 62 00:02:32,061 --> 00:02:34,894 This is actually the easiest for calculating the median. 63 00:02:34,894 --> 00:02:36,287 So I'll go with the box plot. 64 00:02:36,287 --> 00:02:38,353 So the histogram is of no use to me 65 00:02:38,353 --> 00:02:40,001 if I wanna calculate the median. 66 00:02:40,001 --> 00:02:41,929 Let's do a couple more of these. 67 00:02:41,929 --> 00:02:44,808 Nam owns a used car lot. 68 00:02:44,808 --> 00:02:46,596 He checked the odometers of the cars 69 00:02:46,596 --> 00:02:49,127 and recorded how far they had driven. 70 00:02:49,127 --> 00:02:51,936 He then created both a histogram and a box plot 71 00:02:51,936 --> 00:02:55,268 to display the same data, both diagrams are shown below. 72 00:02:55,998 --> 00:02:58,194 Which display can be used 73 00:02:58,194 --> 00:03:00,889 to find how many vehicles had driven 74 00:03:00,889 --> 00:03:04,139 more than 200,000 kilometers? 75 00:03:04,139 --> 00:03:06,158 So how many vehicles had driven 76 00:03:06,158 --> 00:03:09,525 more than 200,000 kilometers? 77 00:03:09,525 --> 00:03:13,264 So it looks like here in this histogram, 78 00:03:13,264 --> 00:03:16,700 I have three vehicles that were between 200 and 250, 79 00:03:16,700 --> 00:03:20,415 and then I have two vehicles that are between 250 and 300. 80 00:03:20,415 --> 00:03:22,158 So it looks pretty clear that I have five vehicles, 81 00:03:22,158 --> 00:03:26,105 three that had a mileage between 200,000 and 250,000, 82 00:03:26,105 --> 00:03:27,938 and then I had two that had mileage 83 00:03:27,938 --> 00:03:30,115 between 250,000 and 300,000. 84 00:03:30,115 --> 00:03:31,507 So I may be able to answer the question. 85 00:03:31,507 --> 00:03:36,368 Five vehicles had a mileage more than 200,000, 86 00:03:36,368 --> 00:03:40,036 and so I would say that the histogram is pretty useful. 87 00:03:40,036 --> 00:03:42,915 But let's verify that the box plot isn't so useful. 88 00:03:42,915 --> 00:03:45,029 So I wanna know how many vehicles had a mileage 89 00:03:45,029 --> 00:03:47,141 more than 200,000. 90 00:03:47,141 --> 00:03:50,880 Well, I know that if I have a mileage more than 200,000, 91 00:03:50,880 --> 00:03:54,943 I'm going to be in the fourth quartile, 92 00:03:54,943 --> 00:03:58,194 but I don't know how many values I have sitting there 93 00:03:58,194 --> 00:04:01,583 in the fourth quartile just looking at this data over here, 94 00:04:01,583 --> 00:04:04,671 so that's not gonna be useful for answering that question. 95 00:04:04,671 --> 00:04:06,227 Let's look at the second question. 96 00:04:06,227 --> 00:04:07,690 Which display can be used 97 00:04:07,690 --> 00:04:09,518 to find that the median distance, 98 00:04:09,518 --> 00:04:11,478 which display can be used to find 99 00:04:11,478 --> 00:04:12,478 that the median distance 100 00:04:12,478 --> 00:04:15,420 was approximately 140,000 kilometers? 101 00:04:15,420 --> 00:04:16,744 Well, to calculate the median, 102 00:04:16,744 --> 00:04:18,716 you essentially wanna be able to list all of the numbers 103 00:04:18,716 --> 00:04:20,250 and then find the middle number. 104 00:04:20,250 --> 00:04:23,315 And over here, I can't list all of the numbers. 105 00:04:23,315 --> 00:04:25,405 I know that there's three values that are 106 00:04:25,405 --> 00:04:27,541 between zero and 50,000 kilometers, 107 00:04:27,541 --> 00:04:28,585 but I don't know what they are. 108 00:04:28,585 --> 00:04:30,792 Could be 10,000, 10,000, 10,000. 109 00:04:30,792 --> 00:04:34,483 It could be 10,000, 15,000, and 40,000. 110 00:04:34,483 --> 00:04:37,020 I don't know what they are, and so if I can't list all 111 00:04:37,020 --> 00:04:39,142 of these things and put them in order, 112 00:04:39,146 --> 00:04:42,228 I really am going to have trouble finding the middle value. 113 00:04:42,228 --> 00:04:45,392 The middle value, it's going to be 114 00:04:45,392 --> 00:04:47,783 in this range right around here, 115 00:04:47,783 --> 00:04:49,687 but I don't know exactly what it's going to be. 116 00:04:49,687 --> 00:04:50,964 The histogram is not useful, 117 00:04:50,964 --> 00:04:53,918 because throwing all the values into these buckets. 118 00:04:53,918 --> 00:04:56,258 While on the box plot, it explicitly, 119 00:04:56,258 --> 00:04:58,116 it directly tells me the median value. 120 00:04:58,116 --> 00:05:00,508 This line right over here, the middle of the box, 121 00:05:00,508 --> 00:05:02,737 this tells us the median value, 122 00:05:02,737 --> 00:05:05,034 and we see that the median value here, 123 00:05:05,034 --> 00:05:08,425 this is 140,000 kilometers. 124 00:05:08,425 --> 00:05:11,341 Right, this is 100, 110, 120, 130, 125 00:05:11,341 --> 00:05:16,011 140,000 kilometers is the median mileage for the cars. 126 00:05:16,011 --> 00:05:18,808 And so the box plot clearly... 127 00:05:20,835 --> 00:05:23,242 clearly gives us that data.