WEBVTT 00:00:00.520 --> 00:00:02.820 Let's say I've got a set of numbers. 00:00:02.820 --> 00:00:08.710 2, say I've got three 3's, I've got a couple of 4's, and 00:00:08.710 --> 00:00:10.620 I've got a 10 there. 00:00:10.620 --> 00:00:14.210 And what we want to do is find the middle of these numbers. 00:00:14.210 --> 00:00:17.820 We want to represent these numbers with the center of the 00:00:17.820 --> 00:00:19.900 numbers, or the middle of the numbers, just so we have a 00:00:19.900 --> 00:00:24.350 sense of where these numbers roughly are. 00:00:24.350 --> 00:00:27.360 And this central tendency that we're going to try to get out 00:00:27.360 --> 00:00:31.020 of these numbers, we're going to call the average. 00:00:31.020 --> 00:00:34.990 The average of this set of numbers. 00:00:34.990 --> 00:00:38.840 And you've, I'm sure, heard the word average before, but 00:00:38.840 --> 00:00:41.110 we're going to get a little bit more detailed on the 00:00:41.110 --> 00:00:44.270 different types of averages in this video. 00:00:44.270 --> 00:00:46.890 The one you're probably most familiar with, although you 00:00:46.890 --> 00:00:49.950 might have not seen it referred to in this way, is 00:00:49.950 --> 00:00:58.290 the arithmetic mean, which literally says, look, I, the 00:00:58.290 --> 00:01:01.270 arithmetic mean of this set of numbers, is literally the sum 00:01:01.270 --> 00:01:03.420 of all of these numbers divided by the number of 00:01:03.420 --> 00:01:04.209 numbers there are. 00:01:04.209 --> 00:01:07.440 So the arithmetic mean for this set right here is going 00:01:07.440 --> 00:01:17.070 to be 2 plus 3 plus 3 plus 3 plus 4 plus 4 plus 10, all of 00:01:17.070 --> 00:01:19.230 that over, how many numbers do I have? 00:01:19.230 --> 00:01:22.870 1, 2, 3, 4, 5, 6, 7. 00:01:22.870 --> 00:01:24.720 All of that over 7. 00:01:24.720 --> 00:01:25.660 And what is this equal to? 00:01:25.660 --> 00:01:34.560 This is 2 plus 9, which is 11, plus 8, which is 19, plus 10, 00:01:34.560 --> 00:01:35.520 which is 29. 00:01:35.520 --> 00:01:41.290 So this is going to be equal to 29/7, or you could say it's 00:01:41.290 --> 00:01:43.540 equal to 4 and 1/7. 00:01:43.540 --> 00:01:45.240 If I got my calculator out, we could figure out 00:01:45.240 --> 00:01:46.480 the decimal of this. 00:01:46.480 --> 00:01:50.990 But this is a representation of the central tendency, or 00:01:50.990 --> 00:01:52.450 the middle of these numbers. 00:01:52.450 --> 00:01:53.670 And it kind of makes sense. 00:01:53.670 --> 00:01:57.670 4 and 1/7, it's a little bit higher than 4. 00:01:57.670 --> 00:02:01.070 We're kind of close to the middle of our number range 00:02:01.070 --> 00:02:01.810 right there. 00:02:01.810 --> 00:02:03.590 And you might say, well, it's a little skewed to the right 00:02:03.590 --> 00:02:04.700 and what caused that? 00:02:04.700 --> 00:02:06.950 And well, gee, 10 is a little bit larger than all of the 00:02:06.950 --> 00:02:07.840 other numbers. 00:02:07.840 --> 00:02:09.320 It's kind of an outlier. 00:02:09.320 --> 00:02:13.370 Maybe that skewed this average up, the arithmetic mean. 00:02:13.370 --> 00:02:16.680 So there are other types of averages, although this is the 00:02:16.680 --> 00:02:19.200 one that, if people just say, hey, let's take the average of 00:02:19.200 --> 00:02:21.660 these numbers, and they don't really tell you more, they're 00:02:21.660 --> 00:02:23.940 probably talking about the arithmetic mean. 00:02:23.940 --> 00:02:30.270 The other forms of average, though, are the median, and 00:02:30.270 --> 00:02:32.290 this literally is the middle number. 00:02:35.970 --> 00:02:38.360 If there are two middle numbers, you actually take the 00:02:38.360 --> 00:02:40.730 arithmetic mean of those two middle numbers. 00:02:40.730 --> 00:02:43.310 You actually find the number halfway in between those two 00:02:43.310 --> 00:02:44.330 middle numbers. 00:02:44.330 --> 00:02:47.260 So the median of this set right here-- let me just 00:02:47.260 --> 00:02:48.350 rewrite them. 00:02:48.350 --> 00:02:57.060 So I have a 2, a 3, 3, 3, 4, 4, 10. 00:02:57.060 --> 00:02:59.690 So, let's see, we have seven numbers right here. 00:02:59.690 --> 00:03:02.800 The middle number, if I go 1, 2, 3, to the 00:03:02.800 --> 00:03:04.140 right, we're there. 00:03:04.140 --> 00:03:06.470 If we go 1, 2, 3 to the left, we're there. 00:03:06.470 --> 00:03:09.850 The middle number is that 3 right there. 00:03:09.850 --> 00:03:13.360 I just listed them in order, and I said, well, look, 3, you 00:03:13.360 --> 00:03:16.640 could think of it as the fourth number from the right, 00:03:16.640 --> 00:03:19.550 and it's also the fourth number from the left. 00:03:19.550 --> 00:03:21.360 3 is the middle number. 00:03:21.360 --> 00:03:23.940 And this case, it is the median. 00:03:23.940 --> 00:03:29.280 So in this case, 3, if you use the median, is our average. 00:03:29.280 --> 00:03:30.360 And that also makes sense. 00:03:30.360 --> 00:03:32.030 I mean, it's literally the middle number, and if you look 00:03:32.030 --> 00:03:35.670 at this set of numbers, it kind of does represent the 00:03:35.670 --> 00:03:37.790 central tendency of this set. 00:03:37.790 --> 00:03:40.880 Now just to be clear, it was very clear what the middle 00:03:40.880 --> 00:03:45.350 number was, because I had an odd number of numbers. 00:03:45.350 --> 00:03:48.560 I had three on each side of the three, so it was very easy 00:03:48.560 --> 00:03:50.790 to figure out the median, the middle number. 00:03:50.790 --> 00:03:53.800 But if I had a situation-- let's say I have the situation 00:03:53.800 --> 00:03:57.850 where I have 2, 3, 4, and 5. 00:03:57.850 --> 00:04:01.510 Let's say that's my set of numbers. 00:04:01.510 --> 00:04:04.350 Well, here, there is no one middle number. 00:04:04.350 --> 00:04:07.460 The 3 is closer to the left than it is to the right. 00:04:07.460 --> 00:04:10.040 The 4 is closer to the right than it is to the left. 00:04:10.040 --> 00:04:12.220 There's actually two middle numbers here. 00:04:12.220 --> 00:04:17.380 The two middle numbers here are the 3 and the 4. 00:04:17.380 --> 00:04:20.640 And here, when you have two middle numbers, which occurs 00:04:20.640 --> 00:04:24.430 when you have an even number in your data set, there the 00:04:24.430 --> 00:04:27.630 median is halfway in between these two numbers. 00:04:27.630 --> 00:04:31.070 So in this situation, the median is going to be 3 plus 4 00:04:31.070 --> 00:04:35.670 over 2, which is equal to 3.5. 00:04:35.670 --> 00:04:37.610 And if you look at this data set, that's not what our 00:04:37.610 --> 00:04:44.430 original problem was, but if you look at this data set 00:04:44.430 --> 00:04:47.770 right there, you're actually going to find that the 00:04:47.770 --> 00:04:52.070 arithmetic mean and the median here is the exact same thing. 00:04:52.070 --> 00:04:53.230 Let's calculate it. 00:04:53.230 --> 00:04:55.470 What 's the arithmetic mean over here? 00:04:55.470 --> 00:05:00.240 It's going to be 2 plus 3 plus 4 plus 5, which is what? 00:05:00.240 --> 00:05:09.490 5 plus 9, which is equal to 14, over 4. 00:05:09.490 --> 00:05:10.910 And what's this equal to? 00:05:10.910 --> 00:05:16.740 14/4 is 3 and 2/4, or 3 and 1/2, the exact same thing. 00:05:16.740 --> 00:05:19.310 So for this data set, they were the same thing. 00:05:19.310 --> 00:05:24.020 For this data set, our median is a little bit lower. 00:05:24.020 --> 00:05:28.090 It's 3, while our arithmetic mean is 4 and 1/7. 00:05:28.090 --> 00:05:30.080 And I really want you to think about why that is. 00:05:30.080 --> 00:05:32.840 And it has a lot to do with this 10 that sits out there. 00:05:32.840 --> 00:05:37.190 All of these other numbers are pretty close to whichever 00:05:37.190 --> 00:05:39.590 average you want to pick, whether it's the arithmetic 00:05:39.590 --> 00:05:42.560 mean or it's the median. 00:05:42.560 --> 00:05:47.860 But this 10 is kind of an outlier, or it 00:05:47.860 --> 00:05:50.650 skews the data set. 00:05:50.650 --> 00:05:53.790 Maybe it's so much larger than the other numbers, that it 00:05:53.790 --> 00:05:57.680 makes the arithmetic mean seem larger than maybe is 00:05:57.680 --> 00:05:59.670 representative of this data set. 00:05:59.670 --> 00:06:01.500 And that's something important to think about. 00:06:01.500 --> 00:06:07.570 When you're finding the average for something, most 00:06:07.570 --> 00:06:09.740 people will immediately go to the arithmetic mean. 00:06:09.740 --> 00:06:13.520 But in a lot of cases, median will make a lot more sense, if 00:06:13.520 --> 00:06:16.970 you have these really large or really small numbers that 00:06:16.970 --> 00:06:18.460 could skew the data set. 00:06:18.460 --> 00:06:21.530 I mean, you can imagine, if this wasn't a 10-- or let's 00:06:21.530 --> 00:06:22.790 imagine adding another number here. 00:06:22.790 --> 00:06:27.810 If I added the number 1 million, if I added 1 million 00:06:27.810 --> 00:06:30.390 to this data set, if that was the eighth number, the 00:06:30.390 --> 00:06:32.260 arithmetic mean is going to be this huge number. 00:06:32.260 --> 00:06:36.400 It's going to be much larger than what is representative of 00:06:36.400 --> 00:06:38.470 most of the numbers in this data set. 00:06:38.470 --> 00:06:40.240 But the median is still going to work. 00:06:40.240 --> 00:06:43.730 The median is still going to be about 3 and a half, right? 00:06:43.730 --> 00:06:47.750 If you had 1 million here, it would be 1, 2, 3, 4. 00:06:47.750 --> 00:06:49.200 The middle two numbers would be that. 00:06:49.200 --> 00:06:50.580 It would be 3 and 1/2. 00:06:50.580 --> 00:06:53.900 So the median is less sensitive to one or two 00:06:53.900 --> 00:06:58.430 numbers at the extremes that otherwise would skew the mean. 00:06:58.430 --> 00:07:01.130 Now, the last form of average I want to talk 00:07:01.130 --> 00:07:02.840 about is the mode. 00:07:05.400 --> 00:07:08.000 It has nothing to do with ice cream. 00:07:08.000 --> 00:07:10.910 The mode is literally the most frequent number. 00:07:16.800 --> 00:07:19.880 And in this data set, it's pretty clear what the most 00:07:19.880 --> 00:07:21.130 frequent number is. 00:07:21.130 --> 00:07:25.820 I only have one 2, I have three 3's, I have two 4's, I 00:07:25.820 --> 00:07:28.445 have one 10, and even if want to include the million, I only 00:07:28.445 --> 00:07:29.710 have one million there. 00:07:29.710 --> 00:07:31.990 So here, the number that occurs most 00:07:31.990 --> 00:07:35.360 frequently is the 3. 00:07:35.360 --> 00:07:38.330 So, once again, the mode seems like a pretty good measure of 00:07:38.330 --> 00:07:41.260 central tendency or a pretty good average 00:07:41.260 --> 00:07:43.300 for this data set. 00:07:43.300 --> 00:07:45.950 Now the mode, it's a little tricky to deal with, and you 00:07:45.950 --> 00:07:49.070 won't see it used that often, because it becomes a little 00:07:49.070 --> 00:07:52.920 ambiguous when-- you know, look at this data set: 00:07:52.920 --> 00:07:55.270 2, 3, 4, and 5. 00:07:55.270 --> 00:07:56.370 What is the mode there? 00:07:56.370 --> 00:07:59.200 All of these numbers are equally frequent. 00:07:59.200 --> 00:08:01.360 So if you have a situation like this, then you might 00:08:01.360 --> 00:08:03.930 just-- the mode really loses its meaning. 00:08:03.930 --> 00:08:06.350 It might force you anyway to take the median or the 00:08:06.350 --> 00:08:07.530 mean in some form. 00:08:07.530 --> 00:08:10.700 But if you really do have numbers that one shows up a 00:08:10.700 --> 00:08:14.640 lot more than the other, then the mode starts to make sense. 00:08:14.640 --> 00:08:18.220 So, hopefully, this has given you a pretty good overview of 00:08:18.220 --> 00:08:25.400 how to represent the central tendency of a data set. 00:08:25.400 --> 00:08:26.430 Very fancy word. 00:08:26.430 --> 00:08:28.180 But it's just saying, look, we're trying to represent with 00:08:28.180 --> 00:08:30.200 one number all of this data. 00:08:30.200 --> 00:08:31.810 And you might say, hey, why do we even worry about that? 00:08:31.810 --> 00:08:34.059 It only has seven numbers here or eight numbers here. 00:08:34.059 --> 00:08:36.770 But you can imagine if you had 7 million numbers or 7 billion 00:08:36.770 --> 00:08:39.440 numbers, and you don't want to show someone all of that data. 00:08:39.440 --> 00:08:42.210 You just want to give someone a sense of what those numbers 00:08:42.210 --> 00:08:45.345 are on average. 00:08:45.345 --> 00:08:49.420 And as we said, the arithmetic mean is what I see being used 00:08:49.420 --> 00:08:53.170 the most. But in situations where you might have numbers 00:08:53.170 --> 00:08:55.940 that would skew the arithmetic mean, because they're so large 00:08:55.940 --> 00:08:58.670 or they're so small, the median might 00:08:58.670 --> 00:09:00.480 make a lot of sense.