
Title:
Welchs TwoSample tTest  Intro to Data Science

Description:

Let's talk more about the two sample ttest, since we'll want

to compare two different samples in our class project. There are

a few different versions of the ttest that one might employ

,and they depend on really on what assumptions we make about the

data. So we might want to ask questions such as ,do our

samples have the same size ?,and do they have the same

variance? . Let's discuss a variant of the ttest called Welch's

ttest in more depth. Since it's the most general. It doesn't assume

equal sample size ,or equal variance. In Welch's

ttest ,we compute a tstatistic using following equation.

T equals mu1 minus mu2, divided by the

square root of sigma1 squared over n1. Plus

sigma 2 squared over n2. Where mu I ,is the sample mean for the Ith sample.

Sigma squared I is the sample variance for

the Ith sample. And NI is the sample size

for the Ith sample. We'll also want to estimate the number of degrees

of freedom, nu, using the following equation.

Nu is approximately equal to. Quantity sigma1

squared ,over n1 ,plus sigma2 squared over n2 ,squared over sigma1 of the

4th over n1 squared nu1 ,plus sigma2 to the 4th ,over n2 squared nu2.

Where mu I is equal to mi minus one ,and

this is the degrees of freedom associated with the Ith variance

estimate. If you're unfamiliar with degrees of freedom again it might

be a good idea to brush up on your stats concepts

with the audacity's intro to stats course. A link is

provided in the instructor comments. All right so once we have

these two values, we can estimate the P value. Conceptually, the

Pvalue is the probability of obtaining the test statistic at least

as extreme as the one that was actually observed

,assuming that the null hypothesis was true. The P

value is not the probability of the null hypothesis

is true given the data. So again, just as a

thought experiment. Say we were testing whether left handed

or right handed baseball players. Were better batters by looking

at their average batting average. If the P value

is .05, this would mean that ,even if there is

no difference between left handed and right handed batters, since

that's our null hypothesis. So, even if this was true,

we would see a value of t ,equal or greater

to the one that we saw 5% of the time.

When performing a statistical test like this, we usually set

some critical value of P. Let's call it P critical.

If P falls below P critical, then we would reject

the null hypothesis. In the two sample case, this is equivalent

to stating that the mean for our two samples

is not equal. Calculating this P value for a

given set of data can be kind of of tedious.

Thankfully, we seldom have to perform this calculation explicitly.