< Return to Video

Examples analyzing clusters, gaps, peaks and outliers for distributions

  • 0:00 - 0:02
    - [Voiceover] In this video, I wanna do some examples
  • 0:02 - 0:05
    looking at distributions, in particular, different features
  • 0:05 - 0:08
    in distributions like clusters, gaps, and peaks.
  • 0:08 - 0:11
    So over here, I wanna do some examples.
  • 0:11 - 0:13
    Which of the following are accurate descriptions
  • 0:13 - 0:15
    of the distribution below?
  • 0:15 - 0:17
    Select all that apply.
  • 0:17 - 0:20
    So the first statement is the distribution has an outlier.
  • 0:20 - 0:23
    So an outlier is a data point that's way off
  • 0:23 - 0:25
    of where the other data points are,
  • 0:25 - 0:27
    it's way larger or way smaller
  • 0:27 - 0:29
    than where all of the other data points
  • 0:29 - 0:31
    seem to be clustered and if we look over here,
  • 0:31 - 0:34
    we have a lot of data points between zero and six.
  • 0:34 - 0:36
    And let's just think about what they're measuring:
  • 0:36 - 0:38
    this is shelf time for each apple
  • 0:38 - 0:42
    at Gorg's Grocier.
  • 0:42 - 0:45
    So, for example, we see there's one, two, three, four,
  • 0:45 - 0:50
    five, six, seven apples that have a shelf life
  • 0:50 - 0:53
    of zero days, so (laughs), they're about to go bad.
  • 0:53 - 0:57
    You see you have one, two, three, four, five, six, seven,
  • 0:57 - 0:59
    eight apples that are gonna be good for another day.
  • 0:59 - 1:01
    You have two apples that are gonna be good
  • 1:01 - 1:04
    for another six days, and you have one apple
  • 1:04 - 1:06
    that's gonna be good for 10 days, and this is unusual.
  • 1:06 - 1:10
    This is an outlier here, it has a way larger shelf life
  • 1:10 - 1:12
    than all of the other data, so I would say
  • 1:12 - 1:14
    this definitely does have an outlier.
  • 1:14 - 1:15
    We just have this one data point
  • 1:15 - 1:18
    sitting all the way to the right, way larger,
  • 1:18 - 1:21
    way more shelf life than everything else, so it definitely
  • 1:21 - 1:24
    has an outlier, and this one would be the outlier.
  • 1:24 - 1:27
    The distribution has a cluster from four to six days.
  • 1:27 - 1:30
    And we indeed do see a cluster from four to six days.
  • 1:30 - 1:33
    A cluster, you can imagine, it's a grouping of data
  • 1:33 - 1:36
    that's sitting there, or you have a grouping of apples
  • 1:36 - 1:38
    that have a shelf life between four and six days,
  • 1:38 - 1:40
    and you definitely do see that cluster there.
  • 1:40 - 1:42
    And since I already selected two things,
  • 1:42 - 1:45
    I'm definitely not gonna select none of the above.
  • 1:45 - 1:46
    Let me check my answer.
  • 1:46 - 1:50
    Let me do a few more of these.
  • 1:50 - 1:53
    Which of the following are accurate descriptions
  • 1:53 - 1:54
    of the distribution below?
  • 1:54 - 1:57
    And once again we're going to select all that apply.
  • 1:57 - 2:00
    So the distribution has an outlier.
  • 2:00 - 2:02
    So let's see this distribution.
  • 2:02 - 2:04
    I do have a data point here that is at the high end
  • 2:04 - 2:07
    and I have another data point here that's at the low end,
  • 2:07 - 2:09
    but I don't have any data points that are sitting
  • 2:09 - 2:12
    far above or far below the bulk of the data.
  • 2:12 - 2:14
    If I had a data point that was out here, then yeah,
  • 2:14 - 2:16
    I would say that was an outlier to the right,
  • 2:16 - 2:19
    or a positive outlier, if I had a data point way to the left
  • 2:19 - 2:22
    off the screen over here, maybe that would be an outlier,
  • 2:22 - 2:23
    but I don't really see any obvious outliers.
  • 2:23 - 2:27
    All of the data, it's pretty clustered together.
  • 2:27 - 2:31
    So I would not say that the distribution has an outlier.
  • 2:31 - 2:35
    The distribution has a peak at 22 degrees.
  • 2:35 - 2:37
    Yeah, it does indeed look like we have,
  • 2:37 - 2:38
    and let's just look at what we're actually measuring:
  • 2:38 - 2:43
    high temperature each day in Edgeton, Iowa in July.
  • 2:43 - 2:45
    So it does indeed look like we have the most number
  • 2:45 - 2:49
    of days that had a high temperature at 22,
  • 2:49 - 2:52
    most number of days in July had a high tempurature
  • 2:52 - 2:56
    at 22 degrees Celsius, so that is a peak.
  • 2:56 - 2:58
    You can see it, if you imagine this as kind of a mountain
  • 2:58 - 2:59
    this is a peak right here, this is a high point.
  • 2:59 - 3:04
    You have, at least locally, the most number of days
  • 3:04 - 3:07
    at 22 degrees Celsius.
  • 3:07 - 3:09
    So I would say it definitely has a peak there.
  • 3:09 - 3:11
    Since I selected something, I'm not gonna select
  • 3:11 - 3:12
    none of the above.
  • 3:12 - 3:14
    Let's do a couple more of these.
  • 3:14 - 3:16
    Which of the following are accurate descrptions
  • 3:16 - 3:18
    of the distribution below?
  • 3:18 - 3:21
    So the first one, the distribution has an outlier.
  • 3:21 - 3:21
    So...
  • 3:22 - 3:26
    number of guests by day at Seth's Sandwich Shop.
  • 3:26 - 3:29
    So, let's see, the lowest...
  • 3:30 - 3:32
    They have no days...
  • 3:32 - 3:36
    No days where he had between zero and 19 guests,
  • 3:36 - 3:39
    no days where he had between 20 and 39 guests,
  • 3:39 - 3:40
    looks like there's about nine days
  • 3:40 - 3:42
    where he had between 40 and 59 guests,
  • 3:42 - 3:46
    looks like 20 days where he had between 60 and 79 guests,
  • 3:46 - 3:48
    all the way where it looks like maybe 8 days
  • 3:48 - 3:51
    that he had between 180 and 199 guests.
  • 3:51 - 3:54
    But the question of outliers, there doesn't seem to be
  • 3:54 - 3:58
    any day where he had an unusual number of guests.
  • 3:58 - 4:00
    There's not a day that's way out here,
  • 4:00 - 4:02
    where he had, like, 500 guests.
  • 4:02 - 4:06
    So I would say this distribution does not have an outlier.
  • 4:06 - 4:09
    The distribution has a cluster from zero to 39 guests.
  • 4:09 - 4:14
    So zero to 39 guests is right over here, zero to 39 guests.
  • 4:14 - 4:17
    And there is no days where he had between zero and 39 guests
  • 4:17 - 4:20
    neither zero to 19, or 20 to 39.
  • 4:20 - 4:21
    So there's definitely not a cluster there.
  • 4:21 - 4:24
    I would say that the cluster would be between days
  • 4:24 - 4:28
    that had between 40 and 199 guests.
  • 4:28 - 4:30
    Definitely not zero and 39, there was no days
  • 4:30 - 4:32
    that were between zero and 39 guests.
  • 4:32 - 4:36
    So I would say none of the above very confidently.
  • 4:36 - 4:38
    Let's do one more of these.
  • 4:38 - 4:40
    Which of the following are accurate descriptions
  • 4:40 - 4:41
    of the distribution below?
  • 4:41 - 4:42
    (laughs) Alright.
  • 4:42 - 4:46
    The distribution has a peak from 12 to 13 points.
  • 4:46 - 4:51
    Let me see what this is measuring, what this data is about.
  • 4:51 - 4:55
    Test scores by student in Mrs. Frine's class.
  • 4:55 - 4:58
    So you had one student who got between a zero and a one
  • 4:58 - 5:01
    on a 20-point scale, so got between,
  • 5:01 - 5:05
    I guess out of 20 questions, got between zero and one point.
  • 5:05 - 5:07
    And then you see that there's no students got
  • 5:07 - 5:10
    between two and three, or four and five, or six and seven.
  • 5:10 - 5:12
    Then we have another student who got between eight and nine,
  • 5:12 - 5:15
    looks like three students got between 10 and 11,
  • 5:15 - 5:16
    and then we keep increasing, this looks like about
  • 5:16 - 5:21
    12 students got either a 16 or a 17,
  • 5:21 - 5:22
    or something in between maybe,
  • 5:22 - 5:25
    if you could get decimal points on that test.
  • 5:25 - 5:29
    And then it looks like 10 students got from 18 to 19.
  • 5:29 - 5:32
    Alright, so this says the distribution has a peak
  • 5:32 - 5:36
    from 12 to 13 points, 12 to 13 points,
  • 5:36 - 5:38
    there were five students, but this isn't a peak.
  • 5:38 - 5:41
    If you just go to 14 to 15 points, you have more students.
  • 5:41 - 5:42
    So this is definitely not a peak.
  • 5:42 - 5:44
    If you were looking at this as a mountain of some kind,
  • 5:44 - 5:46
    you definitely wouldn't describe this point as a peak.
  • 5:46 - 5:48
    You would say this distribution has a peak,
  • 5:48 - 5:49
    it has the most number of students
  • 5:49 - 5:51
    who got between 16 and 17 points,
  • 5:51 - 5:54
    so that's the peak right there, not 12 to 13 points.
  • 5:54 - 5:57
    So I would not select that first choice.
  • 5:57 - 6:00
    The distribution has an outlier.
  • 6:00 - 6:01
    Well, yeah, look at this: you have this outlier.
  • 6:01 - 6:05
    Most of the students scored between eight and 19 points,
  • 6:05 - 6:07
    and then you have this one student
  • 6:07 - 6:09
    who got between zero and one, it's really an outlier.
  • 6:09 - 6:11
    You even see this when you look at it visually,
  • 6:11 - 6:14
    it's not even connected to the rest of the distribution.
  • 6:14 - 6:15
    It's way to the left.
  • 6:15 - 6:17
    If something is way to the left or way to the right,
  • 6:17 - 6:22
    that's an outlier if it's unusually low or unusually high.
  • 6:22 - 6:24
    So I would say this distribution definitely does
  • 6:24 - 6:26
    have an outlier, and I'm not gonna pick none of the above
  • 6:26 - 6:28
    since I found a choice.
  • 6:29 - 6:31
    And I think we're all done.
Title:
Examples analyzing clusters, gaps, peaks and outliers for distributions
Description:

more » « less
Video Language:
English
Team:
Khan Academy
Duration:
06:32

English subtitles

Revisions Compare revisions