1 00:00:00,660 --> 00:00:06,650 We will now begin our journey into the world of statistics, 2 00:00:06,650 --> 00:00:09,750 which is really a way to understand or get 3 00:00:09,750 --> 00:00:11,520 our head around data. 4 00:00:11,520 --> 00:00:14,670 So statistics is all about data. 5 00:00:14,670 --> 00:00:19,000 And as we begin our journey into the world of statistics, 6 00:00:19,000 --> 00:00:20,610 we will be doing a lot of what we 7 00:00:20,610 --> 00:00:23,210 can call descriptive statistics. 8 00:00:23,210 --> 00:00:25,470 So if we have a bunch of data, and if we 9 00:00:25,470 --> 00:00:27,990 want to tell something about all of that data 10 00:00:27,990 --> 00:00:29,890 without giving them all of the data, 11 00:00:29,890 --> 00:00:33,870 can we somehow describe it with a smaller set of numbers? 12 00:00:33,870 --> 00:00:35,720 So that's what we're going to focus on. 13 00:00:35,720 --> 00:00:37,360 And then once we build our toolkit 14 00:00:37,360 --> 00:00:39,260 on the descriptive statistics, then we 15 00:00:39,260 --> 00:00:41,710 can start to make inferences about that data, 16 00:00:41,710 --> 00:00:44,200 start to make conclusions, start to make judgments. 17 00:00:44,200 --> 00:00:49,430 And we'll start to do a lot of inferential statistics, 18 00:00:49,430 --> 00:00:51,160 make inferences. 19 00:00:51,160 --> 00:00:53,110 So with that out of the way, let's think 20 00:00:53,110 --> 00:00:56,390 about how we can describe data. 21 00:00:56,390 --> 00:01:00,710 So let's say we have a set of numbers. 22 00:01:00,710 --> 00:01:02,360 We can consider this to be data. 23 00:01:02,360 --> 00:01:04,580 Maybe we're measuring the heights of our plants 24 00:01:04,580 --> 00:01:05,740 in our garden. 25 00:01:05,740 --> 00:01:07,400 And let's say we have six plants. 26 00:01:07,400 --> 00:01:13,870 And the heights are 4 inches, 3 inches, 1 inch, 6 inches, 27 00:01:13,870 --> 00:01:17,990 and another one's 1 inch, and another one is 7 inches. 28 00:01:17,990 --> 00:01:20,934 And let's say someone just said-- in another room, not 29 00:01:20,934 --> 00:01:22,350 looking at your plants, just said, 30 00:01:22,350 --> 00:01:24,657 well, you know, how tall are your plants? 31 00:01:24,657 --> 00:01:26,240 And they only want to hear one number. 32 00:01:26,240 --> 00:01:30,560 They want to somehow have one number that 33 00:01:30,560 --> 00:01:33,410 represents all of these different heights of plants. 34 00:01:33,410 --> 00:01:36,580 How would you do that? 35 00:01:36,580 --> 00:01:38,810 Well, you'd say, well, how can I find something 36 00:01:38,810 --> 00:01:40,990 that-- maybe I want a typical number. 37 00:01:40,990 --> 00:01:44,060 Maybe I want some number that somehow represents the middle. 38 00:01:44,060 --> 00:01:46,250 Maybe I want the most frequent number. 39 00:01:46,250 --> 00:01:48,830 Maybe I want the number that somehow represents 40 00:01:48,830 --> 00:01:51,270 the center of all of these numbers. 41 00:01:51,270 --> 00:01:53,220 And if you said any of those things, 42 00:01:53,220 --> 00:01:55,189 you would actually have done the same things 43 00:01:55,189 --> 00:01:57,730 that the people who first came up with descriptive statistics 44 00:01:57,730 --> 00:01:58,230 said. 45 00:01:58,230 --> 00:02:00,150 They said, well, how can we do it? 46 00:02:00,150 --> 00:02:04,960 And we'll start by thinking of the idea of average. 47 00:02:04,960 --> 00:02:07,610 And in every day terminology, average 48 00:02:07,610 --> 00:02:09,720 has a very particular meaning, as we'll see. 49 00:02:09,720 --> 00:02:11,570 When many people talk about average, 50 00:02:11,570 --> 00:02:13,070 they're talking about the arithmetic 51 00:02:13,070 --> 00:02:14,960 mean, which we'll see shortly. 52 00:02:14,960 --> 00:02:18,100 But in statistics, average means something more general. 53 00:02:18,100 --> 00:02:22,980 It really means give me a typical, 54 00:02:22,980 --> 00:02:29,810 or give me a middle number, or-- and these are or's. 55 00:02:29,810 --> 00:02:31,930 And really it's an attempt to find 56 00:02:31,930 --> 00:02:33,490 a measure of central tendency. 57 00:02:38,550 --> 00:02:40,560 So once again, you have a bunch of numbers. 58 00:02:40,560 --> 00:02:42,970 You're somehow trying to represent these 59 00:02:42,970 --> 00:02:45,840 with one number we'll call the average, that's somehow 60 00:02:45,840 --> 00:02:49,130 typical, or middle, or the center somehow 61 00:02:49,130 --> 00:02:50,450 of these numbers. 62 00:02:50,450 --> 00:02:54,110 And as we'll see, there's many types of averages. 63 00:02:54,110 --> 00:02:56,690 The first is the one that you're probably most familiar with. 64 00:02:56,690 --> 00:02:58,398 It's the one-- and people talk about hey, 65 00:02:58,398 --> 00:03:00,840 the average on this exam or the average height. 66 00:03:00,840 --> 00:03:02,970 And that's the arithmetic mean. 67 00:03:02,970 --> 00:03:05,470 Just let me write it in. 68 00:03:05,470 --> 00:03:13,100 I'll write in yellow, arithmetic mean. 69 00:03:13,100 --> 00:03:16,010 When arithmetic is a noun, we call it arithmetic. 70 00:03:16,010 --> 00:03:19,960 When it's an adjective like this, we call it arithmetic, 71 00:03:19,960 --> 00:03:21,620 arithmetic mean. 72 00:03:21,620 --> 00:03:25,300 And this is really just the sum of all the numbers divided 73 00:03:25,300 --> 00:03:28,180 by-- this is a human-constructed definition that we've 74 00:03:28,180 --> 00:03:31,630 found useful-- the sum of all these numbers divided 75 00:03:31,630 --> 00:03:34,460 by the number of numbers we have. 76 00:03:34,460 --> 00:03:36,830 So given that, what is the arithmetic mean 77 00:03:36,830 --> 00:03:39,114 of this data set? 78 00:03:39,114 --> 00:03:40,280 Well, let's just compute it. 79 00:03:40,280 --> 00:03:46,160 It's going to be 4 plus 3 plus 1 plus 6 plus 1 80 00:03:46,160 --> 00:03:51,210 plus 7 over the number of data points we have. 81 00:03:51,210 --> 00:03:53,210 So we have six data points. 82 00:03:53,210 --> 00:03:54,860 So we're going to divide by 6. 83 00:03:54,860 --> 00:04:01,840 And we get 4 plus 3 is 7, plus 1 is 8, plus 6 is 14, 84 00:04:01,840 --> 00:04:04,934 plus 1 is 15, plus 7. 85 00:04:04,934 --> 00:04:07,927 15 plus 7 is 22. 86 00:04:07,927 --> 00:04:09,135 Let me do that one more time. 87 00:04:09,135 --> 00:04:15,180 You have 7, 8, 14, 15, 22, all of that over 6. 88 00:04:15,180 --> 00:04:17,070 And we could write this as a mixed number. 89 00:04:17,070 --> 00:04:21,120 6 goes into 22 three times with a remainder of 4. 90 00:04:21,120 --> 00:04:25,200 So it's 3 and 4/6, which is the same thing as 3 and 2/3. 91 00:04:25,200 --> 00:04:28,670 We could write this as a decimal with 3.6 repeating. 92 00:04:28,670 --> 00:04:32,360 So this is also 3.6 repeating. 93 00:04:32,360 --> 00:04:34,380 We could write it any one of those ways. 94 00:04:34,380 --> 00:04:36,700 But this is kind of a representative number. 95 00:04:36,700 --> 00:04:39,820 This is trying to get at a central tendency. 96 00:04:39,820 --> 00:04:41,620 Once again, these are human-constructed. 97 00:04:41,620 --> 00:04:43,590 No one ever-- it's not like someone just 98 00:04:43,590 --> 00:04:46,140 found some religious document that said, 99 00:04:46,140 --> 00:04:47,990 this is the way that the arithmetic mean 100 00:04:47,990 --> 00:04:49,180 must be defined. 101 00:04:49,180 --> 00:04:52,700 It's not as pure of a computation 102 00:04:52,700 --> 00:04:55,005 as, say, finding the circumference of the circle, 103 00:04:55,005 --> 00:04:56,880 which there really is-- that was kind of-- we 104 00:04:56,880 --> 00:04:57,840 studied the universe. 105 00:04:57,840 --> 00:05:00,600 And that just fell out of our study of the universe. 106 00:05:00,600 --> 00:05:02,250 It's a human-constructed definition 107 00:05:02,250 --> 00:05:04,110 that we found useful. 108 00:05:04,110 --> 00:05:07,260 Now there are other ways to measure the average 109 00:05:07,260 --> 00:05:10,130 or find a typical or middle value. 110 00:05:10,130 --> 00:05:14,470 The other very typical way is the median. 111 00:05:14,470 --> 00:05:15,667 And I will write median. 112 00:05:15,667 --> 00:05:16,750 I'm running out of colors. 113 00:05:16,750 --> 00:05:18,660 I will write median in pink. 114 00:05:18,660 --> 00:05:21,280 So there is the median. 115 00:05:21,280 --> 00:05:25,160 And the median is literally looking for the middle number. 116 00:05:25,160 --> 00:05:27,350 So if you were to order all the numbers in your set 117 00:05:27,350 --> 00:05:31,460 and find the middle one, then that is your median. 118 00:05:31,460 --> 00:05:34,050 So given that, what's the median of this set of numbers 119 00:05:34,050 --> 00:05:35,806 going to be? 120 00:05:35,806 --> 00:05:36,930 Let's try to figure it out. 121 00:05:36,930 --> 00:05:38,170 Let's try to order it. 122 00:05:38,170 --> 00:05:39,810 So we have 1. 123 00:05:39,810 --> 00:05:41,010 Then we have another 1. 124 00:05:41,010 --> 00:05:42,860 Then we have a 3. 125 00:05:42,860 --> 00:05:46,630 Then we have a 4, a 6, and a 7. 126 00:05:46,630 --> 00:05:48,700 So all I did is I reordered this. 127 00:05:48,700 --> 00:05:50,890 And so what's the middle number? 128 00:05:50,890 --> 00:05:52,320 Well, you look here. 129 00:05:52,320 --> 00:05:54,960 Since we have an even number of numbers, we have six numbers, 130 00:05:54,960 --> 00:05:57,260 there's not one middle number. 131 00:05:57,260 --> 00:05:59,650 You actually have two middle numbers here. 132 00:05:59,650 --> 00:06:02,050 You have two middle numbers right over here. 133 00:06:02,050 --> 00:06:03,160 You have the 3 and the 4. 134 00:06:03,160 --> 00:06:05,940 And in this case, when you have two middle numbers, 135 00:06:05,940 --> 00:06:09,640 you actually go halfway between these two numbers. 136 00:06:09,640 --> 00:06:12,080 You're essentially taking the arithmetic mean of these two 137 00:06:12,080 --> 00:06:14,272 numbers to find the median. 138 00:06:14,272 --> 00:06:16,230 So the median is going to be halfway in-between 139 00:06:16,230 --> 00:06:19,190 3 and 4, which is going to be 3.5. 140 00:06:19,190 --> 00:06:24,424 So the median in this case is 3.5. 141 00:06:24,424 --> 00:06:26,590 So if you have an even number of numbers, the median 142 00:06:26,590 --> 00:06:28,714 or the middle two, the-- essentially the arithmetic 143 00:06:28,714 --> 00:06:31,329 mean of the middle two, or halfway between the middle two. 144 00:06:31,329 --> 00:06:32,870 If you have an odd number of numbers, 145 00:06:32,870 --> 00:06:34,270 it's a little bit easier to compute. 146 00:06:34,270 --> 00:06:35,644 And just so that we see that, let 147 00:06:35,644 --> 00:06:36,920 me give you another data set. 148 00:06:36,920 --> 00:06:39,030 Let's say our data set-- and I'll 149 00:06:39,030 --> 00:06:41,740 order it for us-- let's say our data set 150 00:06:41,740 --> 00:06:55,689 was 0, 7, 50, I don't know, 10,000, and 1 million. 151 00:06:55,689 --> 00:06:56,980 Let's say that is our data set. 152 00:06:56,980 --> 00:06:58,450 Kind of a crazy data set. 153 00:06:58,450 --> 00:07:02,400 But in this situation, what is our median? 154 00:07:02,400 --> 00:07:04,045 Well, here we have five numbers. 155 00:07:04,045 --> 00:07:05,420 We have an odd number of numbers. 156 00:07:05,420 --> 00:07:07,200 So it's easier to pick out a middle. 157 00:07:07,200 --> 00:07:12,040 The middle is the number that is greater than two of the numbers 158 00:07:12,040 --> 00:07:13,540 and is less than two of the numbers. 159 00:07:13,540 --> 00:07:14,760 It's exactly in the middle. 160 00:07:14,760 --> 00:07:18,840 So in this case, our median is 50. 161 00:07:18,840 --> 00:07:20,742 Now, the third measure of central tendency, 162 00:07:20,742 --> 00:07:22,200 and this is the one that's probably 163 00:07:22,200 --> 00:07:26,426 used least often in life, is the mode. 164 00:07:26,426 --> 00:07:27,800 And people often forget about it. 165 00:07:27,800 --> 00:07:29,852 It sounds like something very complex. 166 00:07:29,852 --> 00:07:31,310 But what we'll see is it's actually 167 00:07:31,310 --> 00:07:33,080 a very straightforward idea. 168 00:07:33,080 --> 00:07:36,180 And in some ways, it is the most basic idea. 169 00:07:36,180 --> 00:07:40,510 So the mode is actually the most common number in a data set, 170 00:07:40,510 --> 00:07:41,885 if there is a most common number. 171 00:07:41,885 --> 00:07:43,801 If all of the numbers are represented equally, 172 00:07:43,801 --> 00:07:45,760 if there's no one single most common number, 173 00:07:45,760 --> 00:07:47,320 then you have no mode. 174 00:07:47,320 --> 00:07:50,240 But given that definition of the mode, 175 00:07:50,240 --> 00:07:54,190 what is the single most common number in our original data 176 00:07:54,190 --> 00:07:58,300 set, in this data set right over here? 177 00:07:58,300 --> 00:08:00,100 Well, we only have one 4. 178 00:08:00,100 --> 00:08:01,490 We only have one 3. 179 00:08:01,490 --> 00:08:03,370 But we have two 1's. 180 00:08:03,370 --> 00:08:04,880 We have one 6 and one 7. 181 00:08:04,880 --> 00:08:08,730 So the number that shows up the most number of times here 182 00:08:08,730 --> 00:08:11,060 is our 1. 183 00:08:11,060 --> 00:08:14,070 So the mode, the most typical number, the most common number 184 00:08:14,070 --> 00:08:17,610 here is a 1. 185 00:08:17,610 --> 00:08:19,590 So, you see, these are all different ways 186 00:08:19,590 --> 00:08:23,320 of trying to get at a typical, or middle, or central tendency. 187 00:08:23,320 --> 00:08:25,600 But they do it in very, very different ways. 188 00:08:25,600 --> 00:08:27,350 And as we study more and more statistics, 189 00:08:27,350 --> 00:08:29,760 we'll see that they're good for different things. 190 00:08:29,760 --> 00:08:31,730 This is used very frequently. 191 00:08:31,730 --> 00:08:34,574 The median is really good if you have some kind of crazy number 192 00:08:34,574 --> 00:08:35,990 out here that could have otherwise 193 00:08:35,990 --> 00:08:38,100 skewed the arithmetic mean. 194 00:08:38,100 --> 00:08:41,449 The mode could also be useful in situations like that, 195 00:08:41,449 --> 00:08:43,240 especially if you do have one number that's 196 00:08:43,240 --> 00:08:45,960 showing up a lot more frequently. 197 00:08:45,960 --> 00:08:47,570 Anyway, I'll leave you there. 198 00:08:47,570 --> 00:08:51,710 And we'll-- the next few videos, we will explore statistics even 199 00:08:51,710 --> 00:08:53,260 deeper.