0:00:00.000,0:00:02.775 - [Instructor] We have a[br]list of 15 numbers here, 0:00:02.775,0:00:05.634 and what I want to do is[br]think about the outliers. 0:00:05.634,0:00:09.502 And to help us with that,[br]let's actually visualize this, 0:00:09.502,0:00:12.318 the distribution of actual numbers. 0:00:12.318,0:00:13.818 So let us do that. 0:00:14.887,0:00:16.242 So here, on a number line, 0:00:16.242,0:00:19.409 I have all the numbers from one to 19. 0:00:20.494,0:00:23.540 And let's see, we have two ones. 0:00:23.540,0:00:27.666 So I could say that's one[br]one and then two ones. 0:00:27.666,0:00:29.403 We have one six. 0:00:29.403,0:00:31.681 So let's put that six there. 0:00:31.681,0:00:33.098 We have got a 13, 0:00:34.531,0:00:36.009 or we have two 13s. 0:00:36.009,0:00:39.842 So we're gonna go up[br]here, one 13 and two 13s. 0:00:41.367,0:00:43.784 Let's see, we have three 14s. 0:00:45.105,0:00:45.938 So 14, 0:00:46.962,0:00:48.258 14, 0:00:48.258,0:00:49.091 and 14. 0:00:50.153,0:00:53.050 We have a couple of 15s, 15, 15. 0:00:53.050,0:00:53.883 So 15, 0:00:55.148,0:00:56.444 15. 0:00:56.444,0:00:58.471 We have one 16. 0:00:58.471,0:01:01.103 So that's our 16 there. 0:01:01.103,0:01:03.128 We have three 18s. 0:01:03.128,0:01:04.725 One, two, three. 0:01:04.725,0:01:05.558 So one, 0:01:06.680,0:01:08.085 two, 0:01:08.085,0:01:09.996 and then three. 0:01:09.996,0:01:12.575 And then we have a 19. 0:01:12.575,0:01:14.590 Then we have a 19. 0:01:14.590,0:01:15.667 So when you look, 0:01:15.667,0:01:17.819 when you look visually at[br]the distribution of numbers, 0:01:17.819,0:01:20.824 it looks like the meat of the[br]distribution, so to speak, 0:01:20.824,0:01:23.757 is in this area, right over here. 0:01:23.757,0:01:25.012 And so some people might say, 0:01:25.012,0:01:26.610 "Okay, we have three outliers. 0:01:26.610,0:01:28.411 "There are these two ones and the six." 0:01:28.411,0:01:29.244 Some people might say, 0:01:29.244,0:01:31.333 "Well, the six is kinda close enough. 0:01:31.333,0:01:33.889 "Maybe only these two ones are outliers." 0:01:33.889,0:01:37.758 And those would actually be[br]both reasonable things to say. 0:01:37.758,0:01:40.849 Now to get on the same page, 0:01:40.849,0:01:44.829 statisticians will use a rule sometimes. 0:01:44.829,0:01:46.991 We say, well, anything that is more than 0:01:46.991,0:01:49.440 one and a half times[br]the interquartile range 0:01:49.440,0:01:52.483 from below Q-one or above Q-three, 0:01:52.483,0:01:54.602 well, those are going to be outliers. 0:01:54.602,0:01:56.325 Well, what am I talking about? 0:01:56.325,0:01:58.021 Well, let's actually, let's[br]figure out the median, 0:01:58.021,0:01:59.686 Q-one and Q-three here. 0:01:59.686,0:02:01.715 Then we can figure out[br]the interquartile range. 0:02:01.715,0:02:03.741 And then we can figure[br]out by that definition, 0:02:03.741,0:02:05.950 what is going to be an outlier? 0:02:05.950,0:02:07.563 And if that all made sense to you so far, 0:02:07.563,0:02:08.863 I encourage you to pause this video 0:02:08.863,0:02:10.372 and try to work through it on your own, 0:02:10.372,0:02:13.176 or I'll do it for you right now. 0:02:13.176,0:02:16.350 All right, so what's the median here? 0:02:16.350,0:02:17.551 Well, the median is the middle number. 0:02:17.551,0:02:20.626 We have 15 numbers, so the[br]middle number is going to be 0:02:20.626,0:02:22.698 whatever number has seven on either side. 0:02:22.698,0:02:24.307 So it's gonna be the eighth number. 0:02:24.307,0:02:27.980 One, two, three, four, five, six, seven. 0:02:27.980,0:02:29.013 Is that right? 0:02:29.013,0:02:32.900 Yep, six, seven, so that's the median. 0:02:32.900,0:02:35.619 And then you have one, two,[br]three, four, five, six, seven 0:02:35.619,0:02:36.836 numbers on the right side too. 0:02:36.836,0:02:41.064 So that is the median,[br]sometimes called Q-two. 0:02:41.064,0:02:43.297 That is our median. 0:02:43.297,0:02:44.928 Now what is Q-one? 0:02:44.928,0:02:47.927 Well, Q-one is going to be the[br]middle of this first group. 0:02:47.927,0:02:50.347 This first group has seven numbers in it. 0:02:50.347,0:02:53.163 And so the middle is going[br]to be the fourth number. 0:02:53.163,0:02:54.850 It has three and three, 0:02:54.850,0:02:56.649 three to the left, three to the right. 0:02:56.649,0:02:58.066 So that is Q-one. 0:02:59.507,0:03:00.710 And then Q-three is going 0:03:00.710,0:03:02.674 to be the middle of this upper group. 0:03:02.674,0:03:04.178 Well, that also has seven numbers in it. 0:03:04.178,0:03:06.267 So the middle is going[br]to be right over there. 0:03:06.267,0:03:08.340 It has three on either side. 0:03:08.340,0:03:09.923 So that is Q-three. 0:03:11.670,0:03:14.239 Now what is the interquartile[br]range going to be? 0:03:14.239,0:03:15.822 Interquartile range 0:03:16.911,0:03:19.377 is going to be equal to 0:03:19.377,0:03:20.661 Q-three 0:03:20.661,0:03:21.661 minus Q-one, 0:03:22.631,0:03:24.610 the difference between 18 and 13. 0:03:24.610,0:03:26.110 Between 18 and 13, 0:03:26.991,0:03:30.055 well, that is going to be 18 minus 13, 0:03:30.055,0:03:32.085 which is equal to five. 0:03:32.085,0:03:33.996 Now to figure out outliers, 0:03:33.996,0:03:36.415 well, outliers are gonna[br]be anything that is below. 0:03:36.415,0:03:37.415 So outliers, 0:03:38.579,0:03:39.996 outliers, 0:03:39.996,0:03:42.453 are going to be less than 0:03:42.453,0:03:43.286 our Q-one 0:03:44.554,0:03:45.387 minus 1.5, 0:03:46.620,0:03:49.120 times our interquartile range. 0:03:50.593,0:03:53.048 And this, once again, this[br]isn't some rule of the universe. 0:03:53.048,0:03:54.246 This is something that statisticians 0:03:54.246,0:03:55.346 have kind of said, well, 0:03:55.346,0:03:57.489 if we want to have a better[br]definition for outliers, 0:03:57.489,0:03:59.143 let's just agree that[br]it's something that's 0:03:59.143,0:04:00.408 more than one and half times 0:04:00.408,0:04:02.662 the interquartile range below Q-one. 0:04:02.662,0:04:03.926 Or, 0:04:03.926,0:04:07.751 or an outlier could be[br]greater than Q-three 0:04:07.751,0:04:11.629 plus one and half times[br]the interquartile range, 0:04:11.629,0:04:13.730 interquartile range. 0:04:13.730,0:04:15.102 And once again, this is somewhat, 0:04:15.102,0:04:16.929 you know, people just[br]decided it felt right. 0:04:16.929,0:04:18.488 One could argue it should be 1.6. 0:04:18.488,0:04:22.353 Or one could argue it should[br]be one, or two, or whatever. 0:04:22.353,0:04:25.040 But this is what people[br]have tended to agree on. 0:04:25.040,0:04:26.888 So let's think about[br]what these numbers are. 0:04:26.888,0:04:27.973 Q-one we already know. 0:04:27.973,0:04:30.058 So this is going to be 13 0:04:30.058,0:04:33.927 minus 1.5 times our interquartile range. 0:04:33.927,0:04:36.517 Our interquartile range here is five. 0:04:36.517,0:04:39.600 So it's 1.5 times five, which is 7.5. 0:04:43.020,0:04:44.270 So this is 7.5. 0:04:45.966,0:04:47.716 13 minus 7.5 is what? 0:04:48.891,0:04:50.592 13 minus seven is six, 0:04:50.592,0:04:53.566 and then you subtract another .5, is 5.5. 0:04:53.566,0:04:55.766 So we have outliers, 0:04:55.766,0:04:57.262 outliers. 0:04:57.262,0:04:58.095 Outliers 0:04:59.135,0:05:01.052 would be less than 5.5. 0:05:02.617,0:05:03.868 Or 0:05:03.868,0:05:06.028 the Q-three is 18, 0:05:06.028,0:05:08.111 this is, once again, 7.5. 0:05:09.726,0:05:10.643 18 plus 7.5 0:05:12.118,0:05:12.951 is 25.5, 0:05:14.287,0:05:15.287 or outliers, 0:05:16.855,0:05:18.938 outliers greater than 25, 0:05:20.698,0:05:21.531 25.5. 0:05:22.507,0:05:24.331 So based on this, we have a, 0:05:24.331,0:05:26.436 kind of a numerical definition[br]for what's an outlier. 0:05:26.436,0:05:28.373 We're not just subjectively saying, 0:05:28.373,0:05:29.854 well, this feels right[br]or that feels right. 0:05:29.854,0:05:32.595 And based on this, we[br]only have two outliers, 0:05:32.595,0:05:36.795 that only these two[br]ones are less than 5.5. 0:05:36.795,0:05:40.008 Only these two ones are less than 5.5. 0:05:40.008,0:05:42.595 This is the cutoff, right over here. 0:05:42.595,0:05:45.070 So this dot just happened to make it. 0:05:45.070,0:05:48.013 And we don't have any[br]outliers on the high side. 0:05:48.013,0:05:49.722 Now another thing to think about 0:05:49.722,0:05:51.998 is drawing box-and-whiskers plots 0:05:51.998,0:05:54.353 based on Q-one, our median, our range, 0:05:54.353,0:05:55.567 all the range of numbers. 0:05:55.567,0:05:56.650 And you could do it either 0:05:56.650,0:05:58.351 taking in consideration your outliers 0:05:58.351,0:06:02.370 or not taking into[br]consideration your outliers. 0:06:02.370,0:06:05.267 So there's a couple of[br]ways that we can do it. 0:06:05.267,0:06:08.909 So let me actually clear,[br]let me clear all of this. 0:06:08.909,0:06:11.645 We've figured out all of this stuff. 0:06:11.645,0:06:14.187 So let me clear all of that out. 0:06:14.187,0:06:17.587 And let's actually draw[br]a box-and-whiskers plot. 0:06:17.587,0:06:19.254 So I'll put another, 0:06:22.335,0:06:24.305 another, actually let me do two here. 0:06:24.305,0:06:25.222 That's one, 0:06:26.259,0:06:28.852 and then let me put[br]another one down there. 0:06:28.852,0:06:30.779 And then this is another. 0:06:30.779,0:06:32.077 Now if we were to just draw 0:06:32.077,0:06:35.307 a classic box-and-whiskers plot here, 0:06:35.307,0:06:37.745 we would say, all right,[br]our median's at 14. 0:06:37.745,0:06:39.207 And actually, I'll do it both ways. 0:06:39.207,0:06:40.468 Our median's at 14. 0:06:40.468,0:06:42.073 Median's at 14. 0:06:42.073,0:06:43.631 Q-one's at 13. 0:06:43.631,0:06:45.224 Q-one's at 13, 0:06:45.224,0:06:46.905 and Q-one's at 13. 0:06:46.905,0:06:48.453 Q-three is at 18. 0:06:48.453,0:06:50.015 Q-three is at 18, 0:06:50.015,0:06:51.042 Q-three is 18. 0:06:51.042,0:06:52.729 So that's the box part. 0:06:52.729,0:06:55.151 Now let me draw that as an actual, 0:06:55.151,0:06:57.886 let me actually draw that as a box. 0:06:57.886,0:06:59.469 So my best attempt, 0:07:00.655,0:07:01.651 there you go. 0:07:01.651,0:07:03.146 That's the box. 0:07:03.146,0:07:05.753 And this is also a box. 0:07:05.753,0:07:08.049 So far, I'm doing the exact same thing. 0:07:08.049,0:07:09.607 Now if we don't want to consider outliers, 0:07:09.607,0:07:11.373 we would say, well, what's[br]the entire range here? 0:07:11.373,0:07:14.430 Well, we have things that go[br]from one all the way to 19. 0:07:14.430,0:07:15.955 So one way to do it is to, hey, 0:07:15.955,0:07:17.661 we start at one. 0:07:17.661,0:07:19.940 And so our entire range, we go, 0:07:19.940,0:07:22.471 actually let me draw it a[br]little bit better than that. 0:07:22.471,0:07:24.691 We're going all the way, 0:07:24.691,0:07:26.358 all the way from one 0:07:27.974,0:07:28.807 to 19. 0:07:30.128,0:07:32.256 Now in this one, we're[br]including everything. 0:07:32.256,0:07:35.046 We're including even these two outliers. 0:07:35.046,0:07:37.003 But if we don't want to[br]include those outliers, 0:07:37.003,0:07:38.918 we want to make it clear[br]that they're outliers, 0:07:38.918,0:07:40.454 well, let's not include them. 0:07:40.454,0:07:42.671 And what we can do instead is say, 0:07:42.671,0:07:45.978 all right, including[br](chuckles) our non-outliers, 0:07:45.978,0:07:47.843 we would start at six 0:07:47.843,0:07:50.268 'cause six we're saying[br]is in our data set, 0:07:50.268,0:07:52.069 but it is not an outlier. 0:07:52.069,0:07:54.156 Let me make this look better. 0:07:54.156,0:07:55.893 So we're gonna, 0:07:55.893,0:07:57.143 we are going to 0:07:59.285,0:08:02.452 start at six and go all the way to 19. 0:08:04.053,0:08:05.689 And then to say that[br]we have these outliers, 0:08:05.689,0:08:09.453 we would put this, we[br]have outliers over there. 0:08:09.453,0:08:11.586 So once again, this is[br]a box-and-whiskers plot 0:08:11.586,0:08:14.167 of the same data set without outliers. 0:08:14.167,0:08:15.875 And this is one where we make specific, 0:08:15.875,0:08:19.875 we make it clear where[br]the outliers actually are.