WEBVTT 00:00:00.000 --> 00:00:02.775 - [Instructor] We have a list of 15 numbers here, 00:00:02.775 --> 00:00:05.634 and what I want to do is think about the outliers. 00:00:05.634 --> 00:00:09.502 And to help us with that, let's actually visualize this, 00:00:09.502 --> 00:00:12.318 the distribution of actual numbers. 00:00:12.318 --> 00:00:13.818 So let us do that. 00:00:14.887 --> 00:00:16.242 So here, on a number line, 00:00:16.242 --> 00:00:19.409 I have all the numbers from one to 19. 00:00:20.494 --> 00:00:23.540 And let's see, we have two ones. 00:00:23.540 --> 00:00:27.666 So I could say that's one one and then two ones. 00:00:27.666 --> 00:00:29.403 We have one six. 00:00:29.403 --> 00:00:31.681 So let's put that six there. 00:00:31.681 --> 00:00:33.098 We have got a 13, 00:00:34.531 --> 00:00:36.009 or we have two 13s. 00:00:36.009 --> 00:00:39.842 So we're gonna go up here, one 13 and two 13s. 00:00:41.367 --> 00:00:43.784 Let's see, we have three 14s. 00:00:45.105 --> 00:00:45.938 So 14, 00:00:46.962 --> 00:00:48.258 14, 00:00:48.258 --> 00:00:49.091 and 14. 00:00:50.153 --> 00:00:53.050 We have a couple of 15s, 15, 15. 00:00:53.050 --> 00:00:53.883 So 15, 00:00:55.148 --> 00:00:56.444 15. 00:00:56.444 --> 00:00:58.471 We have one 16. 00:00:58.471 --> 00:01:01.103 So that's our 16 there. 00:01:01.103 --> 00:01:03.128 We have three 18s. 00:01:03.128 --> 00:01:04.725 One, two, three. 00:01:04.725 --> 00:01:05.558 So one, 00:01:06.680 --> 00:01:08.085 two, 00:01:08.085 --> 00:01:09.996 and then three. 00:01:09.996 --> 00:01:12.575 And then we have a 19. 00:01:12.575 --> 00:01:14.590 Then we have a 19. 00:01:14.590 --> 00:01:15.667 So when you look, 00:01:15.667 --> 00:01:17.819 when you look visually at the distribution of numbers, 00:01:17.819 --> 00:01:20.824 it looks like the meat of the distribution, so to speak, 00:01:20.824 --> 00:01:23.757 is in this area, right over here. 00:01:23.757 --> 00:01:25.012 And so some people might say, 00:01:25.012 --> 00:01:26.610 "Okay, we have three outliers. 00:01:26.610 --> 00:01:28.411 "There are these two ones and the six." 00:01:28.411 --> 00:01:29.244 Some people might say, 00:01:29.244 --> 00:01:31.333 "Well, the six is kinda close enough. 00:01:31.333 --> 00:01:33.889 "Maybe only these two ones are outliers." 00:01:33.889 --> 00:01:37.758 And those would actually be both reasonable things to say. 00:01:37.758 --> 00:01:40.849 Now to get on the same page, 00:01:40.849 --> 00:01:44.829 statisticians will use a rule sometimes. 00:01:44.829 --> 00:01:46.991 We say, well, anything that is more than 00:01:46.991 --> 00:01:49.440 one and a half times the interquartile range 00:01:49.440 --> 00:01:52.483 from below Q-one or above Q-three, 00:01:52.483 --> 00:01:54.602 well, those are going to be outliers. 00:01:54.602 --> 00:01:56.325 Well, what am I talking about? 00:01:56.325 --> 00:01:58.021 Well, let's actually, let's figure out the median, 00:01:58.021 --> 00:01:59.686 Q-one and Q-three here. 00:01:59.686 --> 00:02:01.715 Then we can figure out the interquartile range. 00:02:01.715 --> 00:02:03.741 And then we can figure out by that definition, 00:02:03.741 --> 00:02:05.950 what is going to be an outlier? 00:02:05.950 --> 00:02:07.563 And if that all made sense to you so far, 00:02:07.563 --> 00:02:08.863 I encourage you to pause this video 00:02:08.863 --> 00:02:10.372 and try to work through it on your own, 00:02:10.372 --> 00:02:13.176 or I'll do it for you right now. 00:02:13.176 --> 00:02:16.350 All right, so what's the median here? 00:02:16.350 --> 00:02:17.551 Well, the median is the middle number. 00:02:17.551 --> 00:02:20.626 We have 15 numbers, so the middle number is going to be 00:02:20.626 --> 00:02:22.698 whatever number has seven on either side. 00:02:22.698 --> 00:02:24.307 So it's gonna be the eighth number. 00:02:24.307 --> 00:02:27.980 One, two, three, four, five, six, seven. 00:02:27.980 --> 00:02:29.013 Is that right? 00:02:29.013 --> 00:02:32.900 Yep, six, seven, so that's the median. 00:02:32.900 --> 00:02:35.619 And then you have one, two, three, four, five, six, seven 00:02:35.619 --> 00:02:36.836 numbers on the right side too. 00:02:36.836 --> 00:02:41.064 So that is the median, sometimes called Q-two. 00:02:41.064 --> 00:02:43.297 That is our median. 00:02:43.297 --> 00:02:44.928 Now what is Q-one? 00:02:44.928 --> 00:02:47.927 Well, Q-one is going to be the middle of this first group. 00:02:47.927 --> 00:02:50.347 This first group has seven numbers in it. 00:02:50.347 --> 00:02:53.163 And so the middle is going to be the fourth number. 00:02:53.163 --> 00:02:54.850 It has three and three, 00:02:54.850 --> 00:02:56.649 three to the left, three to the right. 00:02:56.649 --> 00:02:58.066 So that is Q-one. 00:02:59.507 --> 00:03:00.710 And then Q-three is going 00:03:00.710 --> 00:03:02.674 to be the middle of this upper group. 00:03:02.674 --> 00:03:04.178 Well, that also has seven numbers in it. 00:03:04.178 --> 00:03:06.267 So the middle is going to be right over there. 00:03:06.267 --> 00:03:08.340 It has three on either side. 00:03:08.340 --> 00:03:09.923 So that is Q-three. 00:03:11.670 --> 00:03:14.239 Now what is the interquartile range going to be? 00:03:14.239 --> 00:03:15.822 Interquartile range 00:03:16.911 --> 00:03:19.377 is going to be equal to 00:03:19.377 --> 00:03:20.661 Q-three 00:03:20.661 --> 00:03:21.661 minus Q-one, 00:03:22.631 --> 00:03:24.610 the difference between 18 and 13. 00:03:24.610 --> 00:03:26.110 Between 18 and 13, 00:03:26.991 --> 00:03:30.055 well, that is going to be 18 minus 13, 00:03:30.055 --> 00:03:32.085 which is equal to five. 00:03:32.085 --> 00:03:33.996 Now to figure out outliers, 00:03:33.996 --> 00:03:36.415 well, outliers are gonna be anything that is below. 00:03:36.415 --> 00:03:37.415 So outliers, 00:03:38.579 --> 00:03:39.996 outliers, 00:03:39.996 --> 00:03:42.453 are going to be less than 00:03:42.453 --> 00:03:43.286 our Q-one 00:03:44.554 --> 00:03:45.387 minus 1.5, 00:03:46.620 --> 00:03:49.120 times our interquartile range. 00:03:50.593 --> 00:03:53.048 And this, once again, this isn't some rule of the universe. 00:03:53.048 --> 00:03:54.246 This is something that statisticians 00:03:54.246 --> 00:03:55.346 have kind of said, well, 00:03:55.346 --> 00:03:57.489 if we want to have a better definition for outliers, 00:03:57.489 --> 00:03:59.143 let's just agree that it's something that's 00:03:59.143 --> 00:04:00.408 more than one and half times 00:04:00.408 --> 00:04:02.662 the interquartile range below Q-one. 00:04:02.662 --> 00:04:03.926 Or, 00:04:03.926 --> 00:04:07.751 or an outlier could be greater than Q-three 00:04:07.751 --> 00:04:11.629 plus one and half times the interquartile range, 00:04:11.629 --> 00:04:13.730 interquartile range. 00:04:13.730 --> 00:04:15.102 And once again, this is somewhat, 00:04:15.102 --> 00:04:16.929 you know, people just decided it felt right. 00:04:16.929 --> 00:04:18.488 One could argue it should be 1.6. 00:04:18.488 --> 00:04:22.353 Or one could argue it should be one, or two, or whatever. 00:04:22.353 --> 00:04:25.040 But this is what people have tended to agree on. 00:04:25.040 --> 00:04:26.888 So let's think about what these numbers are. 00:04:26.888 --> 00:04:27.973 Q-one we already know. 00:04:27.973 --> 00:04:30.058 So this is going to be 13 00:04:30.058 --> 00:04:33.927 minus 1.5 times our interquartile range. 00:04:33.927 --> 00:04:36.517 Our interquartile range here is five. 00:04:36.517 --> 00:04:39.600 So it's 1.5 times five, which is 7.5. 00:04:43.020 --> 00:04:44.270 So this is 7.5. 00:04:45.966 --> 00:04:47.716 13 minus 7.5 is what? 00:04:48.891 --> 00:04:50.592 13 minus seven is six, 00:04:50.592 --> 00:04:53.566 and then you subtract another .5, is 5.5. 00:04:53.566 --> 00:04:55.766 So we have outliers, 00:04:55.766 --> 00:04:57.262 outliers. 00:04:57.262 --> 00:04:58.095 Outliers 00:04:59.135 --> 00:05:01.052 would be less than 5.5. 00:05:02.617 --> 00:05:03.868 Or 00:05:03.868 --> 00:05:06.028 the Q-three is 18, 00:05:06.028 --> 00:05:08.111 this is, once again, 7.5. 00:05:09.726 --> 00:05:10.643 18 plus 7.5 00:05:12.118 --> 00:05:12.951 is 25.5, 00:05:14.287 --> 00:05:15.287 or outliers, 00:05:16.855 --> 00:05:18.938 outliers greater than 25, 00:05:20.698 --> 00:05:21.531 25.5. 00:05:22.507 --> 00:05:24.331 So based on this, we have a, 00:05:24.331 --> 00:05:26.436 kind of a numerical definition for what's an outlier. 00:05:26.436 --> 00:05:28.373 We're not just subjectively saying, 00:05:28.373 --> 00:05:29.854 well, this feels right or that feels right. 00:05:29.854 --> 00:05:32.595 And based on this, we only have two outliers, 00:05:32.595 --> 00:05:36.795 that only these two ones are less than 5.5. 00:05:36.795 --> 00:05:40.008 Only these two ones are less than 5.5. 00:05:40.008 --> 00:05:42.595 This is the cutoff, right over here. 00:05:42.595 --> 00:05:45.070 So this dot just happened to make it. 00:05:45.070 --> 00:05:48.013 And we don't have any outliers on the high side. 00:05:48.013 --> 00:05:49.722 Now another thing to think about 00:05:49.722 --> 00:05:51.998 is drawing box-and-whiskers plots 00:05:51.998 --> 00:05:54.353 based on Q-one, our median, our range, 00:05:54.353 --> 00:05:55.567 all the range of numbers. 00:05:55.567 --> 00:05:56.650 And you could do it either 00:05:56.650 --> 00:05:58.351 taking in consideration your outliers 00:05:58.351 --> 00:06:02.370 or not taking into consideration your outliers. 00:06:02.370 --> 00:06:05.267 So there's a couple of ways that we can do it. 00:06:05.267 --> 00:06:08.909 So let me actually clear, let me clear all of this. 00:06:08.909 --> 00:06:11.645 We've figured out all of this stuff. 00:06:11.645 --> 00:06:14.187 So let me clear all of that out. 00:06:14.187 --> 00:06:17.587 And let's actually draw a box-and-whiskers plot. 00:06:17.587 --> 00:06:19.254 So I'll put another, 00:06:22.335 --> 00:06:24.305 another, actually let me do two here. 00:06:24.305 --> 00:06:25.222 That's one, 00:06:26.259 --> 00:06:28.852 and then let me put another one down there. 00:06:28.852 --> 00:06:30.779 And then this is another. 00:06:30.779 --> 00:06:32.077 Now if we were to just draw 00:06:32.077 --> 00:06:35.307 a classic box-and-whiskers plot here, 00:06:35.307 --> 00:06:37.745 we would say, all right, our median's at 14. 00:06:37.745 --> 00:06:39.207 And actually, I'll do it both ways. 00:06:39.207 --> 00:06:40.468 Our median's at 14. 00:06:40.468 --> 00:06:42.073 Median's at 14. 00:06:42.073 --> 00:06:43.631 Q-one's at 13. 00:06:43.631 --> 00:06:45.224 Q-one's at 13, 00:06:45.224 --> 00:06:46.905 and Q-one's at 13. 00:06:46.905 --> 00:06:48.453 Q-three is at 18. 00:06:48.453 --> 00:06:50.015 Q-three is at 18, 00:06:50.015 --> 00:06:51.042 Q-three is 18. 00:06:51.042 --> 00:06:52.729 So that's the box part. 00:06:52.729 --> 00:06:55.151 Now let me draw that as an actual, 00:06:55.151 --> 00:06:57.886 let me actually draw that as a box. 00:06:57.886 --> 00:06:59.469 So my best attempt, 00:07:00.655 --> 00:07:01.651 there you go. 00:07:01.651 --> 00:07:03.146 That's the box. 00:07:03.146 --> 00:07:05.753 And this is also a box. 00:07:05.753 --> 00:07:08.049 So far, I'm doing the exact same thing. 00:07:08.049 --> 00:07:09.607 Now if we don't want to consider outliers, 00:07:09.607 --> 00:07:11.373 we would say, well, what's the entire range here? 00:07:11.373 --> 00:07:14.430 Well, we have things that go from one all the way to 19. 00:07:14.430 --> 00:07:15.955 So one way to do it is to, hey, 00:07:15.955 --> 00:07:17.661 we start at one. 00:07:17.661 --> 00:07:19.940 And so our entire range, we go, 00:07:19.940 --> 00:07:22.471 actually let me draw it a little bit better than that. 00:07:22.471 --> 00:07:24.691 We're going all the way, 00:07:24.691 --> 00:07:26.358 all the way from one 00:07:27.974 --> 00:07:28.807 to 19. 00:07:30.128 --> 00:07:32.256 Now in this one, we're including everything. 00:07:32.256 --> 00:07:35.046 We're including even these two outliers. 00:07:35.046 --> 00:07:37.003 But if we don't want to include those outliers, 00:07:37.003 --> 00:07:38.918 we want to make it clear that they're outliers, 00:07:38.918 --> 00:07:40.454 well, let's not include them. 00:07:40.454 --> 00:07:42.671 And what we can do instead is say, 00:07:42.671 --> 00:07:45.978 all right, including (chuckles) our non-outliers, 00:07:45.978 --> 00:07:47.843 we would start at six 00:07:47.843 --> 00:07:50.268 'cause six we're saying is in our data set, 00:07:50.268 --> 00:07:52.069 but it is not an outlier. 00:07:52.069 --> 00:07:54.156 Let me make this look better. 00:07:54.156 --> 00:07:55.893 So we're gonna, 00:07:55.893 --> 00:07:57.143 we are going to 00:07:59.285 --> 00:08:02.452 start at six and go all the way to 19. 00:08:04.053 --> 00:08:05.689 And then to say that we have these outliers, 00:08:05.689 --> 00:08:09.453 we would put this, we have outliers over there. 00:08:09.453 --> 00:08:11.586 So once again, this is a box-and-whiskers plot 00:08:11.586 --> 00:08:14.167 of the same data set without outliers. 00:08:14.167 --> 00:08:15.875 And this is one where we make specific, 00:08:15.875 --> 00:08:19.875 we make it clear where the outliers actually are.