< Return to Video

Hypothesis Test for Difference of Means

  • 0:00 - 0:01
  • 0:01 - 0:05
    In the last video, we came up
    with a 95% confidence interval
  • 0:05 - 0:10
    for the mean weight loss between
    the low-fat group and
  • 0:10 - 0:11
    the control group.
  • 0:11 - 0:14
    In this video, I actually want
    to do a hypothesis test,
  • 0:14 - 0:18
    really to test if this data
    makes us believe that the
  • 0:18 - 0:21
    low-fat diet actually does
    anything at all.
  • 0:21 - 0:23
    And to do that let's set up
    our null and alternative
  • 0:23 - 0:24
    hypotheses.
  • 0:24 - 0:30
    So our null hypothesis
    should be that this
  • 0:30 - 0:32
    low-fat diet does nothing.
  • 0:32 - 0:36
    And if the low-fat diet does
    nothing, that means that the
  • 0:36 - 0:41
    population mean on our low-fat
    diet minus the population mean
  • 0:41 - 0:45
    on our control should
    be equal to zero.
  • 0:45 - 0:50
    And this is a completely
    equivalent statement to saying
  • 0:50 - 0:56
    that the mean of the sampling
    distribution of our low-fat
  • 0:56 - 1:00
    diet minus the mean of the
    sampling distribution of our
  • 1:00 - 1:03
    control should be
    equal to zero.
  • 1:03 - 1:05
    And that's because we've seen
    this multiple times.
  • 1:05 - 1:08
    The mean of your sampling
    distribution is going to be
  • 1:08 - 1:10
    the same thing as your
    population mean.
  • 1:10 - 1:11
    So this is the same
    thing is that.
  • 1:11 - 1:13
    That is the same
    thing is that.
  • 1:13 - 1:19
    Or, another way of saying it is,
    if we think about the mean
  • 1:19 - 1:21
    of the distribution of the
    difference of the sample
  • 1:21 - 1:24
    means, and we focused on this
    in the last video, that that
  • 1:24 - 1:26
    should be equal to zero.
  • 1:26 - 1:31
    Because this thing right over
    here is the same thing as that
  • 1:31 - 1:32
    right over there.
  • 1:32 - 1:34
    So that is our null
    hypothesis.
  • 1:34 - 1:40
    And our alternative hypothesis,
  • 1:40 - 1:42
    I'll write over here.
  • 1:42 - 1:44
    It's just that it actually
    does do something.
  • 1:44 - 1:49
  • 1:49 - 1:52
    And let's say that it actually
    has an improvement.
  • 1:52 - 1:54
    So that would mean that we
    have more weight loss.
  • 1:54 - 1:57
    So if we have the mean of Group
    One, the population mean
  • 1:57 - 2:01
    of Group One minus the
    population mean of Group Two
  • 2:01 - 2:03
    should be greater then zero.
  • 2:03 - 2:07
    So this is going to be a one
    tailed distribution.
  • 2:07 - 2:13
    Or another way we can view it,
    is that the mean of the
  • 2:13 - 2:17
    difference of the distributions,
    x1 minus x2 is
  • 2:17 - 2:20
    going to be greater then zero.
  • 2:20 - 2:21
    These are equivalent
    statements.
  • 2:21 - 2:24
    Because we know that this is the
    same thing as this, which
  • 2:24 - 2:26
    is the same thing as this,
    which is what I
  • 2:26 - 2:27
    wrote right over here.
  • 2:27 - 2:30
    Now, to do any type of
    hypothesis test, we have to
  • 2:30 - 2:32
    decide on a level
    of significance.
  • 2:32 - 2:35
  • 2:35 - 2:38
    What we're going to do is, we're
    going to assume that our
  • 2:38 - 2:39
    null hypothesis is correct.
  • 2:39 - 2:43
    And then with that assumption
    that the null hypothesis is
  • 2:43 - 2:47
    correct, we're going to see
    what is the probability of
  • 2:47 - 2:50
    getting this sample data
    right over here.
  • 2:50 - 2:55
    And if that probability is below
    some threshold, we will
  • 2:55 - 2:58
    reject the null hypothesis in
    favor of the alternative
  • 2:58 - 2:59
    hypothesis.
  • 2:59 - 3:01
    Now, that probability threshold,
    and we've seen this
  • 3:01 - 3:03
    before, is called the
    significance level, sometimes
  • 3:03 - 3:05
    called alpha.
  • 3:05 - 3:07
    And here, we're going to decide
    for a significance
  • 3:07 - 3:12
    level of 95%.
  • 3:12 - 3:15
    Or another way to think about
    it, assuming that the null
  • 3:15 - 3:19
    hypothesis is correct, we want
    there to be no more than a 5%
  • 3:19 - 3:21
    chance of getting this
    result here.
  • 3:21 - 3:26
    Or no more than a 5% chance of
    incorrectly rejecting the null
  • 3:26 - 3:28
    hypothesis when it
    is actually true.
  • 3:28 - 3:29
    Or that would be a
    type one error.
  • 3:29 - 3:36
    So if there's less than a 5%
    probability of this happening,
  • 3:36 - 3:39
    we're going to reject
    the null hypothesis.
  • 3:39 - 3:42
    Less than a 5% probability given
    the null hypothesis is
  • 3:42 - 3:44
    true, then we're going to reject
    the null hypothesis in
  • 3:44 - 3:46
    favor of the alternative.
  • 3:46 - 3:47
    So let's think about this.
  • 3:47 - 3:50
    So we have the null
    hypothesis.
  • 3:50 - 3:52
    Let me draw a distribution
    over here.
  • 3:52 - 4:01
    The null hypothesis says that
    the mean of the differences of
  • 4:01 - 4:06
    the sampling distributions
    should be equal to zero.
  • 4:06 - 4:11
    Now, in that situation, what
    is going to be our critical
  • 4:11 - 4:12
    region here?
  • 4:12 - 4:14
    Well, we need a result, so
    we're going to need some
  • 4:14 - 4:21
    critical value here.
  • 4:21 - 4:28
    Because this isn't a
    normalized normal
  • 4:28 - 4:29
    distribution.
  • 4:29 - 4:31
    But there's some critical
    value here.
  • 4:31 - 4:35
  • 4:35 - 4:37
    The hardest thing is statistics
    is getting the
  • 4:37 - 4:38
    wording right.
  • 4:38 - 4:40
    There's some critical value here
    that the probability of
  • 4:40 - 4:46
    getting a sample from this
    distribution above that value
  • 4:46 - 4:47
    is only 5%.
  • 4:47 - 4:50
  • 4:50 - 4:54
    So we just need to figure out
    what this critical value is.
  • 4:54 - 4:57
    And if our value is larger than
    that critical value, then
  • 4:57 - 4:59
    we can reject the
    null hypothesis.
  • 4:59 - 5:01
    Because that means the
    probability of getting this is
  • 5:01 - 5:02
    less than 5%.
  • 5:02 - 5:06
    We could reject the null
    hypothesis and go with the
  • 5:06 - 5:09
    alternative hypothesis.
  • 5:09 - 5:12
    Remember, once again, we can
    use Z-scores, and we can
  • 5:12 - 5:14
    assume this is a normal
    distribution because our
  • 5:14 - 5:16
    sample size is large for either
    of those samples.
  • 5:16 - 5:19
    We have a sample size of 100.
  • 5:19 - 5:26
    And to figure that out, the
    first step, if we just look at
  • 5:26 - 5:33
    a normalized normal distribution
    like this, what
  • 5:33 - 5:35
    is your critical Z value?
  • 5:35 - 5:39
  • 5:39 - 5:42
    We're getting a result
    above that Z value,
  • 5:42 - 5:45
    only has a 5% chance.
  • 5:45 - 5:46
    So this is actually
    cumulative.
  • 5:46 - 5:47
    So this whole area right
    over here is
  • 5:47 - 5:49
    going to be 95% chance.
  • 5:49 - 5:51
    We can just look
    at the Z table.
  • 5:51 - 5:55
    We're looking for 95% percent.
  • 5:55 - 5:57
    We're looking at the
    one tailed case.
  • 5:57 - 5:59
    So let's look for 95%.
  • 5:59 - 6:01
    This is the closest thing.
  • 6:01 - 6:04
    We want to err on the side of
    being a little bit maybe to
  • 6:04 - 6:05
    the right of this.
  • 6:05 - 6:08
    So let's say 95.05
    is pretty good.
  • 6:08 - 6:11
    So that's 1.65.
  • 6:11 - 6:15
    So this critical Z value
    is equal to 1.65.
  • 6:15 - 6:19
    Or another way to view it is,
    this distance right here is
  • 6:19 - 6:23
    going to be 1.65 standard
    deviations.
  • 6:23 - 6:27
  • 6:27 - 6:28
    I know my writing
    is really small.
  • 6:28 - 6:29
    I'm just saying the standard
    deviation of that
  • 6:29 - 6:30
    distribution.
  • 6:30 - 6:32
    So what is the standard
    deviation of that
  • 6:32 - 6:33
    distribution?
  • 6:33 - 6:35
    We actually calculated it in
    the last video, and I'll
  • 6:35 - 6:36
    recalculate it here.
  • 6:36 - 6:42
    The standard deviation of our
    distribution of the difference
  • 6:42 - 6:47
    of the sample means is going to
    be equal to the square root
  • 6:47 - 6:51
    of the variance of our
    first population.
  • 6:51 - 6:54
    Now, the variance of our first
    population, we don't know it.
  • 6:54 - 6:58
    But we could estimate it with
    our sample standard deviation.
  • 6:58 - 7:02
    If you take your sample standard
    deviation, 4.67 and
  • 7:02 - 7:04
    you square it, you get
    your sample variance.
  • 7:04 - 7:05
    And so this is the variance.
  • 7:05 - 7:10
    This is our best estimate
    of the variance of the
  • 7:10 - 7:13
    population.
  • 7:13 - 7:17
    And we want to divide that
    by the sample size.
  • 7:17 - 7:20
    And then plus our best estimate
    of the variance of
  • 7:20 - 7:26
    the population of group two,
    which is 4.04 squared.
  • 7:26 - 7:28
    The sample standard deviation
    of group two squared.
  • 7:28 - 7:32
    That gives us variance
    divided by 100.
  • 7:32 - 7:34
    I did before in the last. Maybe
    it's still sitting on my
  • 7:34 - 7:37
    calculator.
  • 7:37 - 7:39
    Yes, it's still sitting
    on the calculator.
  • 7:39 - 7:40
    It's this quantity
    right up here.
  • 7:40 - 7:43
    4.67 squared divided
    by 100 plus 4.04
  • 7:43 - 7:44
    squared divided by 100.
  • 7:44 - 7:47
    So it's 0.617.
  • 7:47 - 7:58
    So this right here is
    going to be 0.617.
  • 7:58 - 8:02
    So this distance right
    here, is going to
  • 8:02 - 8:06
    be 1.65 times 0.617.
  • 8:06 - 8:08
    So let's figure out
    what that is.
  • 8:08 - 8:17
    So let's take 0.617
    times 1.65.
  • 8:17 - 8:23
    So it's 1.02.
  • 8:23 - 8:28
    This distance right
    here is 1.02.
  • 8:28 - 8:36
    So what this tells us is, if
    we assume that the diet
  • 8:36 - 8:43
    actually does nothing, there's a
    only a 5% chance of having a
  • 8:43 - 8:48
    difference between the means of
    these two samples to have a
  • 8:48 - 8:50
    difference of more than 1.02.
  • 8:50 - 8:52
    There's only a 5%
    chance of that.
  • 8:52 - 8:59
    Well, the mean that we
    actually got is 1.91.
  • 8:59 - 9:01
    So that's sitting out
    here someplace.
  • 9:01 - 9:03
    So it definitely falls in
    this critical region.
  • 9:03 - 9:08
    The probability of getting this,
    assuming that the null
  • 9:08 - 9:12
    hypothesis is correct,
    is less than 5%.
  • 9:12 - 9:17
    So it's smaller probability than
    our significance level.
  • 9:17 - 9:19
    Actually, let me
    be very clear.
  • 9:19 - 9:21
    The significance level,
    this alpha right
  • 9:21 - 9:26
    here, needs to be 5%.
  • 9:26 - 9:28
    Not the 95%.
  • 9:28 - 9:29
    I think I might have
    said here.
  • 9:29 - 9:30
    But I wrote down the
    wrong number there.
  • 9:30 - 9:33
    I subtracted it from
    one by accident.
  • 9:33 - 9:34
    Probably in my head.
  • 9:34 - 9:36
    But anyway, the significance
    level is 5%.
  • 9:36 - 9:40
    The probability given that the
    null hypothesis is true, the
  • 9:40 - 9:44
    probability of getting the
    result that we got, the
  • 9:44 - 9:47
    probability of getting that
    difference, is less than our
  • 9:47 - 9:48
    significance level.
  • 9:48 - 9:50
    It is less than 5%.
  • 9:50 - 9:53
    So based on the rules that we
    set out for ourselves of
  • 9:53 - 9:58
    having a significance level of
    5%, we will reject the null
  • 9:58 - 10:02
    hypothesis in favor of the
    alternative that the diet
  • 10:02 - 10:05
    actually does make you
    lose more weight.
  • 10:05 - 10:06
Title:
Hypothesis Test for Difference of Means
Description:

Hypothesis Test for Difference of Means

more » « less
Video Language:
English
Duration:
10:06

English subtitles

Incomplete

Revisions