< Return to Video

Math 1080 Lecture 15 Normal Distribution 3 Sampling Distributions x bar and s squared

  • 0:04 - 0:08
    Okay, let's go back to basics.
  • 0:10 - 0:13
    Let's see sampling distributions.
  • 0:13 - 0:17
    We had something
    called "population."
  • 0:23 - 0:27
    Like, let's take some
    huge population,
  • 0:28 - 0:34
    say, all high school students
    in the United States.
  • 0:35 - 0:37
    That's a huge population.
  • 0:38 - 0:40
    I don't know
    how many there are,
  • 0:41 - 0:46
    and we want to study
    something like weight,
  • 0:46 - 0:51
    obesity of students, and
    we need the weight of them.
  • 0:53 - 0:56
    That's huge data.
  • 0:57 - 1:03
    What we usually do
    when we have a huge data--
  • 1:03 - 1:10
    or it might be the case that we
    do not have access to data--
  • 1:10 - 1:15
    [unclear] for example, you
    want to study something about...
  • 1:15 - 1:20
    ...some rabbits, for example,
    in state of Minnesota.
  • 1:22 - 1:29
    Well, you cannot just gather all the rabbits
    and maybe weigh them, for example.
  • 1:33 - 1:38
    So, what you do is you sample.
    This one makes more sense
  • 1:38 - 1:46
    to just sample some rabbits at random from maybe
    different places in the state and weigh them.
  • 1:47 - 1:53
    So, that's when you really don't
    have access to the whole population.
  • 1:54 - 1:57
    The other case you might have
    access to the whole population
  • 1:57 - 2:02
    of all high school students in the
    country, but it's not feasible.
  • 2:03 - 2:11
    So, in that case, too, we choose
    some samples from every state--
  • 2:11 - 2:15
    there are different
    ways to sample them--
  • 2:15 - 2:23
    you can say based on population of each
    state we sample according to that population.
  • 2:23 - 2:29
    So, a more populous state like
    California you sample more students
  • 2:29 - 2:33
    from California than, say,
    I don't know...Minnesota.
  • 2:36 - 2:41
    So, we have a population and we're
    studying something about this population.
  • 2:42 - 2:47
    Usually we are talking about
    the normal distribution,
  • 2:47 - 2:51
    although this applies
    to any distribution.
  • 2:51 - 2:54
    So, this population
    has some distribution.
  • 3:00 - 3:02
    It has some parameters.
  • 3:07 - 3:11
    Now, parameters for the
    distribution of the population are--
  • 3:11 - 3:21
    usually what we study: mu is mean,
    and sigma squared which is variance,
  • 3:23 - 3:27
    and sigma which is
    standard deviation.
  • 3:36 - 3:41
    Well, there might be other things that
    we study, depending on the situation,
  • 3:41 - 3:48
    if you're working for an insurance
    company you might even consider other--
  • 3:48 - 3:52
    these are some kind of "moments,"
    we call them "moments"--
  • 3:52 - 4:00
    so you can choose other moments.
    You might need more than just these two.
  • 4:01 - 4:02
    So...
  • 4:04 - 4:08
    For the moment we
    just need these two,
  • 4:08 - 4:11
    in fact this one [sigma/standard deviation]
    and this one [mu/mean].
  • 4:11 - 4:14
    Now, that's for the population.
  • 4:14 - 4:21
    But, I said it's not feasible or it's not
    possible to look at the whole population.
  • 4:22 - 4:25
    So, what we do is we sample.
  • 4:26 - 4:30
    And that's where statistics really
    comes into the picture.
  • 4:31 - 4:34
    Statistics basically starts here.
  • 4:34 - 4:40
    Otherwise, if we know the population,
    we know everybody and the weight
  • 4:40 - 4:48
    of every student in this country,
    then that's it, we have data. So what?
  • 4:49 - 4:57
    Statistics comes in when it is not possible (for
    whatever reason) to study the whole population.
  • 4:58 - 5:02
    In fact, when we do data science,
    that's another story.
  • 5:02 - 5:06
    In data science, say, we
    usually have the population.
  • 5:06 - 5:11
    Well, sometimes we would sample,
    but we usually have the population.
  • 5:11 - 5:17
    And most often what
    we want to do is predict.
  • 5:17 - 5:19
    So, we have to--
  • 5:19 - 5:24
    you want to predict something about
    the population in the future.
  • 5:24 - 5:30
    But, anyways, this is not data science,
    this is statistics, although they are related.
  • 5:30 - 5:37
    We have a sample here and this sample also has
    some kind of mean and some kind of variance.
  • 5:38 - 5:44
    This mean and variance we denote
    them differently. Remember? Sample--
  • 5:44 - 5:51
    This is population, and for sample
    we have x̅ [x-bar] for mean,
  • 5:53 - 5:57
    and remember we have s
    for standard deviation.
  • 6:05 - 6:10
    Okay, s for standard deviation
    and this is for the sample.
  • 6:10 - 6:15
    Oh, it's getting so dark,
    let me turn a light on.
  • 6:23 - 6:25
    Now...
  • 6:26 - 6:32
    First of all, you might say--
    you might intuitively [unclear]--
  • 6:32 - 6:39
    You might feel like having more
    sample, so you might say,
  • 6:39 - 6:45
    "A larger sample size will give me
    a better idea of the population."
  • 6:45 - 6:49
    Where does that feeling
    or intuition come from?
  • 6:53 - 7:00
    In fact, it's true. It's true that a larger
    sample will give a much better idea.
  • 7:01 - 7:06
    But, there's something here, this
    mean and this standard deviation,
  • 7:06 - 7:12
    so the larger sample will give me a mean and
    a standard deviation. But, what do we want?
  • 7:12 - 7:15
    We want this [sample] mean and
    this [sample] standard deviation
  • 7:15 - 7:19
    be close to this [population] mean and
    this [population] standard deviation.
  • 7:19 - 7:27
    In other words, you want these two [from the
    sample] to estimate [the population] for me.
  • 7:27 - 7:30
    Right? So, you want these
    two [from the sample]
  • 7:30 - 7:33
    to estimate this [population] mean
    and this standard deviation.
  • 7:33 - 7:37
    And that's where the
    word "estimators" comes in.
  • 7:37 - 7:41
    So, this x-bar, which is
    the mean of the sample,
  • 7:41 - 7:44
    and s, which is the standard
    deviation of the sample,
  • 7:44 - 7:50
    these two are estimators for
    mu and sigma. In fact, in--
  • 7:51 - 8:00
    Well, the best-- I mean, x-bar is
    an unbiased estimator for mu,
  • 8:01 - 8:07
    and s squared is an unbiased
    estimator for sigma squared.
  • 8:09 - 8:12
    So, this estimates this,
    and this estimates this.
  • 8:12 - 8:18
    Not sigma, not sigma and s,
    that's why we always write variance.
  • 8:19 - 8:26
    This guy x̅ estimates this μ, and
    this guy s^2 estimates this σ^2.
  • 8:27 - 8:36
    And they're unbiased. Now, what "unbiased" is,
    I'm not going to discuss in more detail, but it's...
  • 8:39 - 8:45
    Well...at some point I will
    touch that, but anyways.
  • 8:46 - 8:55
    But, the word "unbiased" itself, should
    give you some idea of what they really are.
  • 8:55 - 8:56
    So...
  • 8:58 - 9:00
    Okay, now...
  • 9:01 - 9:02
    This...
  • 9:05 - 9:09
    So, really there's something
    that's coming into the picture.
  • 9:09 - 9:15
    We have the sample, and this
    sample has some distribution.
  • 9:15 - 9:18
    So, the sample has
    some distribution.
  • 9:18 - 9:22
    Now, what do we assume
    for that distribution?
  • 9:22 - 9:28
    Well, most often what we do
    is we graph the histogram
  • 9:28 - 9:33
    and by looking at the histogram
    we guess, and we say,
  • 9:33 - 9:40
    "Okay, the histogram is telling me that
    this distribution looks like normal."
  • 9:40 - 9:46
    Or, "This distribution looks like a
    exponential distribution," so on and so forth.
  • 9:47 - 9:54
    The first thing we usually do is we graph
    the histogram and look at the picture
  • 9:54 - 10:01
    and try to just guess
    what the distribution is.
  • 10:01 - 10:05
    Well, anyway, the sample
    comes with some numbers,
  • 10:05 - 10:12
    so if it is weight of students or age of students,
    something [like that], you have some numbers.
  • 10:12 - 10:18
    Those numbers give you x-bar and s,
    so you have at least 2 numbers here.
  • 10:18 - 10:23
    And these two numbers
    definitely help you to--
  • 10:23 - 10:28
    not the distribution, but most often
    you guess the distribution, but anyway--
  • 10:28 - 10:36
    to write down the distribution more clearly
    by including these two in the distribution.
  • 10:37 - 10:38
    Okay, so...
  • 10:39 - 10:41
    First thing we do is histogram.
  • 10:41 - 10:46
    So, we have a sample
    and we draw a histogram.
  • 10:49 - 10:51
    So, given a sample
  • 10:57 - 11:01
    graph the histogram first.
  • 11:06 - 11:13
    So, the histogram should give you an idea
    of what the distribution should look like.
  • 11:14 - 11:21
    When you graph the histogram this means that
    you are using some computer program,
  • 11:21 - 11:27
    whatever program you
    are using, you'll find:
  • 11:29 - 11:35
    x-bar and s.
    What is x-bar and what is s?
  • 11:37 - 11:44
    X-bar is [uppercase] sigma
    [Σ meaning sum] of all x's
  • 11:44 - 11:52
    divided by n. And s squared
    is [uppercase] sigma Σ,
  • 11:52 - 11:58
    x minus x-bar squared
    over n minus 1.
  • 12:02 - 12:05
    That is s squared.
    Now, what is n?
  • 12:06 - 12:12
    N is the sample size. Let me
    write here: "n = sample size."
  • 12:15 - 12:21
    And these two formulas, in fact--
    n is sample size-- these two formulas--
  • 12:21 - 12:24
    First of all, it means that your
    sample must be more than 1.
  • 12:24 - 12:28
    [He points to denominator where 1 minus 1
    would equal 0 and be undefined.]
  • 12:28 - 12:33
    Definitely, can you say anything just
    by having 1 [item in a] sample?
  • 12:35 - 12:45
    So, this one and this one give me two numbers,
    and these two numbers theoretically--
  • 12:45 - 12:51
    by theory we know-- that these are unbiased
    estimators for mu and [lowercase] sigma squared.
  • 12:51 - 12:57
    Now, as I said, and I'll say it again, this chapter,
    in fact, is about normal distribution.
  • 12:57 - 13:07
    So, what we usually do is we just say, "Okay, well
    let's assume that the population is normal."
  • 13:08 - 13:13
    Although there are some theorems which have
    nothing to do with normal distributions,
  • 13:13 - 13:20
    but they work for every distribution.
    But, in order to work with a sample
  • 13:20 - 13:26
    and make some predictions
    and other things,
  • 13:26 - 13:33
    we usually assume that the distribution of
    the population is normal, and also the sample.
  • 13:34 - 13:40
    So, although there are some problems
    with this assumption, but...
  • 13:41 - 13:43
    Well, we have to assume sometimes.
  • 13:46 - 13:50
    Unless we have a large sample,
    if we have a large enough sample,
  • 13:50 - 13:54
    then by looking at the histogram we might say,
    "I think this is not a normal distribution,
  • 13:54 - 13:59
    this is like an exponential distribution,
    or like 'blah-blah' distribution."
  • 14:00 - 14:07
    That idea needs more experience and
    knowledge of statistics and probability.
  • 14:08 - 14:13
    So, these two guys
    are good estimators.
  • 14:21 - 14:22
    Now...
  • 14:23 - 14:26
    Let me erase...
    what should I erase?
  • 14:57 - 15:09
    [unclear]
  • 15:14 - 15:20
    So, it's saying something like this.
    When we study population--
  • 15:20 - 15:23
    So, this is the population...
  • 15:26 - 15:30
    You have to listen carefully
    to what I'm saying.
  • 15:30 - 15:35
    So, we want to
    study a population.
  • 15:36 - 15:45
    Say I'm a stats professor and I choose
    some students from my class,
  • 15:45 - 15:49
    n students, say 10 students,
  • 15:49 - 15:58
    and send them to different places in the country
    to measure weight of some students.
  • 15:59 - 16:04
    So, to weigh some students,
    high school students.
  • 16:04 - 16:12
    Today I sent them and they go around
    the country in the morning and weigh,
  • 16:12 - 16:18
    and bring me 10 numbers.
    So, today, day 1...
  • 16:21 - 16:26
    they bring me ten numbers.
    [counting]
  • 16:27 - 16:29
    So, they give me ten numbers.
  • 16:29 - 16:34
    These ten numbers, you
    can find using that formula,
  • 16:34 - 16:44
    you can find for these ten numbers
    both x-bar and s squared for day 1.
  • 16:45 - 16:50
    So for day 1 we have
    x-bar and s squared.
  • 16:51 - 16:54
    And, again, day 2,
  • 16:55 - 17:02
    I send students again to give me
    another x-bar and another s squared.
  • 17:03 - 17:07
    Day 3, another one.
  • 17:11 - 17:16
    Day-- well, how many days do
    you think? Let's say 100 days.
  • 17:17 - 17:24
    Day 100, so I have x-bar [subscript] 100,
    and s squared [subscript] 100.
  • 17:26 - 17:32
    Say I send them 100 days, and I have
    no money to do that. [chuckles]
  • 17:32 - 17:35
    That's why I'm saying
    "if it's feasible."
  • 17:35 - 17:41
    Anyway, so I have some x-bars
    here and some s squares.
  • 17:42 - 17:49
    And samples themselves are kind
    of random, the whole set of ten--
  • 17:49 - 17:52
    How many students did I send? Ten.
  • 17:52 - 17:56
    So, this sample has ten students from
    high schools around the country.
  • 17:56 - 18:02
    This sample [day 2] has ten students,
    but they do not have to be the same,
  • 18:02 - 18:10
    they can be at random, so it's unlikely
    to have two samples exactly the same.
  • 18:11 - 18:18
    Right? So, just imagine. You choose some
    students from the high schools in the country.
  • 18:18 - 18:23
    The next day you choose other students,
    but since you are doing it at random,
  • 18:23 - 18:28
    some of these students might be the same.
    But, just imagine out of-- I don't know,
  • 18:28 - 18:34
    like out of a hundred million students--
    I don't know, let's say ten million--
  • 18:34 - 18:40
    out of ten million students you are choosing
    ten in 2 days and they are the same?
  • 18:40 - 18:43
    It's really unlikely to have the same.
  • 18:43 - 18:47
    In fact, we can find the probability of being
    the same, but that's not the point here.
  • 18:47 - 18:52
    And then day 3, another ten students,
    up to day 100, ten students.
  • 18:52 - 18:57
    The point is, this sample
    itself is like random.
  • 18:58 - 19:04
    Although each one of them is random,
    the sample in whole is also like random.
  • 19:05 - 19:11
    Which makes these x-bars
    and s squared's random.
  • 19:12 - 19:16
    Right? This makes them random.
  • 19:16 - 19:21
    So, being random, it means that we can
    talk about some kind of random variable.
  • 19:21 - 19:25
    X-bar, for example, or s squared.
  • 19:28 - 19:34
    And this x-bar has values: x-bar [subscript] 1,
    x-bar [subscript] 2, up to x-bar [subscript] 100.
  • 19:34 - 19:36
    This one [s squared] takes values from
  • 19:36 - 19:41
    s^2 [subscript] 1, s^2 [subscript] 2,
    up to s^2 [subscript] 100.
  • 19:43 - 19:45
    Since they are random variables,
  • 19:45 - 19:53
    with these as values taken by these two
    random variables, they have a distribution.
  • 19:53 - 20:00
    So, the distribution of these two, that's what
    this thing is about: sampling distribution.
  • 20:01 - 20:07
    And that's what we usually graph.
    That's what we graph. And...
  • 20:08 - 20:11
    So, we have this x-bar
    and s squared for ten--
  • 20:11 - 20:17
    Well, ten's not enough, but say
    I send 200 students, okay?
  • 20:17 - 20:23
    And then we graph it-- a histogram for
    x-bar, and a histogram for s squared--
  • 20:23 - 20:28
    those two histograms will give us
    an idea of what the distributions are.
  • 20:28 - 20:34
    Now, next time I will discuss
    the distributions of these...
  • 20:36 - 20:44
    ...random variables and also some amazing
    theorems about those distributions.
  • 20:45 - 20:47
    So, see you next time.
Title:
Math 1080 Lecture 15 Normal Distribution 3 Sampling Distributions x bar and s squared
Video Language:
English
Duration:
20:49

English subtitles

Revisions