Return to Video

https:/.../2020-02-21_psy317l_samp_dist_intro.mp4

  • 0:01 - 0:05
    Hello. In this video, I would
    like to introduce you
  • 0:05 - 0:08
    to the concept
    of sampling distributions.
  • 0:08 - 0:14
    Now, sampling distributions are going
    to be fundamental to understanding
  • 0:14 - 0:19
    how we are able to derive at
    certain knowledge about distributions.
  • 0:19 - 0:21
    So what -- Let's, first of all,
    let's just recap
  • 0:21 - 0:24
    what a sample is
    and what a population is.
  • 0:24 - 0:26
    So we've seen this diagram before.
  • 0:26 - 0:30
    A population is essentially all
    the individuals we could possibly
  • 0:30 - 0:35
    ever get information on,
    get data. And populations --
  • 0:35 - 0:38
    let's just pick a very,
    very boring example of height.
  • 0:38 - 0:43
    Maybe the population is
    all UT students, all UT students.
  • 0:43 - 0:46
    Now, every single student
    enrolled at UT,
  • 0:46 - 0:50
    if we were able to get all of their
    heights, there would be some true
  • 0:50 - 0:54
    measure of the average height.
    The average height would be
  • 0:54 - 0:57
    a population mean, and there would
    be some value that that is.
  • 0:57 - 1:01
    We'd also be able to get some
    other truth, we call these truths.
  • 1:01 - 1:03
    A standard deviation, if we were
    able to measure every
  • 1:03 - 1:05
    however many thousand
    UT students (inaudible).
  • 1:05 - 1:09
    If we were able to measure those things,
    we'd be able to get those values,
  • 1:09 - 1:12
    and they're the population values,
    and you may remember
  • 1:12 - 1:15
    we call them parameters.
  • 1:15 - 1:18
    Now, that's pretty much impossible.
    We would never actually, in theory,
  • 1:18 - 1:21
    ever be able to measure
    the height of all UT students.
  • 1:21 - 1:24
    So what we do, if we were interested,
    if this was something we were
  • 1:24 - 1:27
    interested in for some strange reason,
    we'd be able to just collect a sample.
  • 1:27 - 1:30
    I don't know. Maybe we'd get
    a sample of five people,
  • 1:30 - 1:33
    or maybe we would get
    a sample of 50 people,
  • 1:33 - 1:35
    or maybe we would
    get a sample of 500.
  • 1:35 - 1:38
    In future videos,
    we'll talk about
  • 1:38 - 1:40
    what's an appropriate size
    of sample to collect
  • 1:40 - 1:42
    when we're interested
    in finding out something.
  • 1:42 - 1:46
    But for now, who knows?
    Maybe we picked ten people,
  • 1:46 - 1:48
    but we pick a sample
    from those individuals.
  • 1:48 - 1:51
    Maybe we measure, we measure
    all their heights and we calculate
  • 1:51 - 1:56
    their average height and we call
    that the sample mean, "x-bar."
  • 1:56 - 2:00
    We're also able to calculate their,
    maybe their sample standard deviation.
  • 2:00 - 2:03
    We'd call that "s,"
    something like that.
  • 2:03 - 2:09
    These things we call statistics.
    This is just recap stuff.
  • 2:09 - 2:13
    And the purpose of
    calculating the statistics
  • 2:13 - 2:17
    is because we want
    to estimate these parameters.
  • 2:17 - 2:20
    We want to make an estimation
    of what the true value is.
  • 2:21 - 2:26
    Now the issue is, let's say we're
    just collecting samples of five,
  • 2:26 - 2:30
    and then maybe another day,
    we got a different sample of five,
  • 2:30 - 2:33
    and then on another day,
    we're addicted to measuring
  • 2:33 - 2:36
    the heights of people,
    we get another sample of five.
  • 2:36 - 2:38
    Every time we get
    another sample of five,
  • 2:38 - 2:41
    let's, for now, just consider
    the sample mean.
  • 2:41 - 2:44
    We're not that interested in
    the standard deviation of these.
  • 2:44 - 2:48
    We're just gonna get a different
    average every time we collect a
  • 2:48 - 2:51
    sample of five. It'd be
    very, very unusual
  • 2:51 - 2:55
    if we randomly sampled five individuals
    from all UT students,
  • 2:55 - 2:59
    and it was the same five people.
    It's almost unlikely
  • 2:59 - 3:02
    ever to be the same five people,
    and so the average height
  • 3:02 - 3:05
    of the sample mean is going
    to be different every single time.
  • 3:05 - 3:08
    And I don't know, let's say
    another day we did another five,
  • 3:08 - 3:09
    we got another sample mean.
  • 3:09 - 3:12
    So we're trying
    to estimate these things,
  • 3:12 - 3:15
    but our values are going
    to generally be different,
  • 3:15 - 3:17
    and then they're never going
    to be exactly -- well they could
  • 3:17 - 3:20
    actually be exactly the population
    mean, but it's unlikely.
  • 3:20 - 3:24
    So what we've just done, by the way,
    is a sampling distribution.
  • 3:24 - 3:27
    I kind of, by accident, have
    introduced you to it.
  • 3:27 - 3:30
    A sampling distribution is these things:
  • 3:30 - 3:34
    It's when you collect lots,
    and lots, and lots of samples.
  • 3:34 - 3:36
    You collect lots and lots
    of samples, and then you
  • 3:36 - 3:38
    just look at all
    the values that you got.
  • 3:38 - 3:41
    And then you tend to plot
    them as a histogram.
  • 3:41 - 3:44
    So in the next couple of slides,
    I'll just more formally go through this,
  • 3:44 - 3:48
    because it gets very slightly
    more exciting than this.
  • 3:48 - 3:50
    Well this distribution, actually,
  • 3:50 - 3:53
    looks quite nice
    when you collect the data.
  • 3:53 - 3:57
    Okay, so in this case, I don't have --
    my population is not all UT students.
  • 3:57 - 4:01
    My population is four people.
    There's just four people.
  • 4:01 - 4:05
    Maybe my population is four people
    that are roommates together,
  • 4:05 - 4:10
    and I ask them, "How many people
    do you phone in one week?"
  • 4:10 - 4:14
    And one person said they phoned
    four people, one person said seven,
  • 4:14 - 4:16
    one person said five,
    one person said eight.
  • 4:16 - 4:19
    So, if you look at
    our population mean, it's six.
  • 4:19 - 4:24
    That's just 4+5+7+8 divided by
    the number of individuals,
  • 4:24 - 4:27
    which is four. Our mean is six.
  • 4:27 - 4:30
    So, I was really interested
    in knowing how many,
  • 4:30 - 4:33
    this population of this one house,
  • 4:33 - 4:37
    I wanted to know what's
    the average number of people
  • 4:37 - 4:38
    that they phone in a week.
  • 4:38 - 4:41
    But I thought collecting data on
    four people was too much work,
  • 4:41 - 4:44
    so I decided, I'm just going
    to collect samples of three people.
  • 4:44 - 4:47
    And so, I got a five,
    a seven, and an eight.
  • 4:47 - 4:49
    They were the first three people
    that I sampled.
  • 4:49 - 4:52
    And so my sample mean
    here is 6.67,
  • 4:52 - 4:56
    which is 5+7+8 divided by
    my sample size of three.
  • 4:56 - 5:00
    So I got one sample mean,
    but this is not a sampling distribution
  • 5:00 - 5:02
    because it's just one sample.
  • 5:02 - 5:05
    For a sampling distribution,
    I need to do this over and over
  • 5:05 - 5:06
    and over again.
  • 5:06 - 5:09
    So I can do something like this,
    where I do, let's say,
  • 5:10 - 5:14
    one sample, two samples, three samples,
    four samples, five samples, six samples,
  • 5:14 - 5:18
    and here are my values:
    5,7,8; 4,5,8; 4,7,8; 4, 8, 8;
  • 5:18 - 5:21
    and for each of these,
    I calculate the sample mean.
  • 5:21 - 5:24
    This is a sampling distribution.
    I've now got many samples,
  • 5:24 - 5:27
    and I've collected
    the sample mean for each of them.
  • 5:28 - 5:31
    One thing -- there's two things
    you may have noticed about this:
  • 5:31 - 5:34
    one thing I want you to notice is that
    I've actually sampled with replacement.
  • 5:34 - 5:37
    So in the UT example, I said
    it would be really unlikely
  • 5:37 - 5:39
    to ever get
    the same five people.
  • 5:39 - 5:42
    More than that, if you randomly
    selected five people,
  • 5:42 - 5:45
    it's unlikely to get the same
    individual twice in one sample.
  • 5:45 - 5:48
    But for sampling distributions,
    we tend to actually say
  • 5:48 - 5:50
    that we will sample
    with replacement.
  • 5:50 - 5:52
    That's just a -- it's not something
    we actually need to worry
  • 5:52 - 5:54
    too much about right now.
    I just want you to be aware of --
  • 5:54 - 5:57
    in this particular example,
    I sampled with replacement,
  • 5:57 - 6:00
    which means you could sample
    the same individual twice,
  • 6:00 - 6:03
    or even three times, because
    you'll notice the 64th sample
  • 6:03 - 6:06
    had the same individual
    three times and I got 8,8,8,
  • 6:06 - 6:09
    and the mean of
    that sample mean is 8.
  • 6:09 - 6:11
    The second thing you may
    have noticed is that
  • 6:11 - 6:15
    the sample number
    only goes up to 64 here.
  • 6:15 - 6:17
    I've done 64 samples,
    and that's because
  • 6:17 - 6:21
    when you collect three numbers
    from four potential numbers,
  • 6:21 - 6:25
    there's only 64 combinations of the data,
    so I just stopped at 64.
  • 6:25 - 6:29
    Okay, but what I want you to realize
    is that I have collected 64 samples.
  • 6:29 - 6:32
    These dots just refer to --
    I didn't put all the data
  • 6:32 - 6:36
    between 6 and 64.
    I have 64 sample means.
  • 6:36 - 6:40
    So I have 64 of these things,
    so what should I do with them?
  • 6:40 - 6:43
    Well, I could plot them on a histogram.
  • 6:43 - 6:47
    So, here's my histogram
    of the 64 sample means,
  • 6:48 - 6:51
    and I wonder what you think about it.
  • 6:51 - 6:55
    Well, one thing is I've used a nice color,
    I think, for the bars, but another thing
  • 6:55 - 6:59
    is maybe you see that there's
    potentially a shape in this data.
  • 6:59 - 7:05
    So if I, if you allow me to be
    kind of, draw a curve over it,
  • 7:05 - 7:07
    I haven't done a great job,
    but there's a curve,
  • 7:07 - 7:10
    and it looks normal-ish.
  • 7:10 - 7:12
    I'm using the word, "ish," because
    it's obviously not normal.
  • 7:12 - 7:15
    It's kind of getting there to be
    normal distributed.
  • 7:15 - 7:18
    So this is just a histogram of all
    the possible sample means.
  • 7:18 - 7:23
    And here is the one where we had 8,8,8,
    and actually there's one down here
  • 7:23 - 7:28
    where we got 4,4,4, but there's all
    the ones in between as well.
  • 7:30 - 7:33
    Now what we --
    And I'll just put the curve back on it.
  • 7:33 - 7:36
    Maybe I'll actually this time
    do a better job.
  • 7:36 - 7:38
    I'm not sur --
    No, I didn't.
  • 7:38 - 7:39
    I'm gonna undo that because that
    was a terrible job.
  • 7:39 - 7:41
    Let's see if I can --
  • 7:41 - 7:45
    I think going slow...
    no, it will do.
  • 7:45 - 7:46
    Mmm... no it won't.
  • 7:46 - 7:47
    Let's do it again.
  • 7:47 - 7:49
    This time, third time's lucky.
  • 7:50 - 7:52
    Okay I'm happy with that.
  • 7:52 - 7:55
    This is the sampling distribution.
  • 7:55 - 7:57
    I just told you that,
    but I want you to formally know
  • 7:57 - 7:58
    what it means.
  • 7:58 - 8:04
    It's the sampling distribution,
    dot dot dot, for the sample mean,
  • 8:04 - 8:05
    that's what we collected.
  • 8:05 - 8:08
    We collected the sample mean,
    so it's a sampling distribution of the
  • 8:08 - 8:12
    sample mean, and then we say
    for n=3.
  • 8:12 - 8:16
    Because our sample
    could have been 2, n=2m
  • 8:16 - 8:17
    in which case we would have
    had a different distribution.
  • 8:17 - 8:20
    It could've been 1. We could've
    just been really lazy and said,
  • 8:20 - 8:22
    "I only want to just check
    one person and ask them,"
  • 8:22 - 8:25
    and calculated
    these sample mean for 1.
  • 8:25 - 8:28
    But is the sample --
    we did three people,
  • 8:28 - 8:30
    and so this is technically
    the sampling distribution
  • 8:30 - 8:33
    for the sample mean for n=3.
  • 8:35 - 8:39
    Okay, so that's probably enough for now
    on introducing sampling distribution.
  • 8:39 - 8:43
    I hope you understand
    about what it is.
  • 8:43 - 8:44
    It is the --
    you collect many samples,
  • 8:44 - 8:46
    and you collect some information
    about each of those samples --
  • 8:46 - 8:51
    in this case, it was the sample mean --
    and then you plot them as a histogram,
  • 8:51 - 8:53
    and you are able to look at
    the shape of the distribution.
  • 8:53 - 8:55
    And that's what we call
    a sampling distribution.
  • 8:55 - 8:58
    And we're going to extend
    this idea in future videos.
Title:
https:/.../2020-02-21_psy317l_samp_dist_intro.mp4
Video Language:
English
Duration:
09:15

English subtitles

Revisions