-
Title:
Welchs Two-Sample t-Test - Intro to Data Science
-
Description:
-
Let's talk more about the two sample t-test, since we'll want
-
to compare two different samples in our class project. There are
-
a few different versions of the t-test that one might employ
-
,and they depend on really on what assumptions we make about the
-
data. So we might want to ask questions such as ,do our
-
samples have the same size ?,and do they have the same
-
variance? . Let's discuss a variant of the t-test called Welch's
-
t-test in more depth. Since it's the most general. It doesn't assume
-
equal sample size ,or equal variance. In Welch's
-
t-test ,we compute a t-statistic using following equation.
-
T equals mu1 minus mu2, divided by the
-
square root of sigma1 squared over n1. Plus
-
sigma 2 squared over n2. Where mu I ,is the sample mean for the Ith sample.
-
Sigma squared I is the sample variance for
-
the Ith sample. And NI is the sample size
-
for the Ith sample. We'll also want to estimate the number of degrees
-
of freedom, nu, using the following equation.
-
Nu is approximately equal to. Quantity sigma1
-
squared ,over n1 ,plus sigma2 squared over n2 ,squared over sigma1 of the
-
4th over n1 squared nu1 ,plus sigma2 to the 4th ,over n2 squared nu2.
-
Where mu I is equal to mi minus one ,and
-
this is the degrees of freedom associated with the Ith variance
-
estimate. If you're unfamiliar with degrees of freedom again it might
-
be a good idea to brush up on your stats concepts
-
with the audacity's intro to stats course. A link is
-
provided in the instructor comments. All right so once we have
-
these two values, we can estimate the P value. Conceptually, the
-
P-value is the probability of obtaining the test statistic at least
-
as extreme as the one that was actually observed
-
,assuming that the null hypothesis was true. The P
-
value is not the probability of the null hypothesis
-
is true given the data. So again, just as a
-
thought experiment. Say we were testing whether left handed
-
or right handed baseball players. Were better batters by looking
-
at their average batting average. If the P value
-
is .05, this would mean that ,even if there is
-
no difference between left handed and right handed batters, since
-
that's our null hypothesis. So, even if this was true,
-
we would see a value of t ,equal or greater
-
to the one that we saw 5% of the time.
-
When performing a statistical test like this, we usually set
-
some critical value of P. Let's call it P critical.
-
If P falls below P critical, then we would reject
-
the null hypothesis. In the two sample case, this is equivalent
-
to stating that the mean for our two samples
-
is not equal. Calculating this P value for a
-
given set of data can be kind of of tedious.
-
Thankfully, we seldom have to perform this calculation explicitly.