English subtitles

← Welchs Two-Sample t-Test - Intro to Data Science

Get Embed Code
4 Languages

Showing Revision 5 created 05/24/2016 by Udacity Robot.

  1. Let's talk more about the two sample t-test, since we'll want
  2. to compare two different samples in our class project. There are
  3. a few different versions of the t-test that one might employ
  4. ,and they depend on really on what assumptions we make about the
  5. data. So we might want to ask questions such as ,do our
  6. samples have the same size ?,and do they have the same
  7. variance? . Let's discuss a variant of the t-test called Welch's
  8. t-test in more depth. Since it's the most general. It doesn't assume
  9. equal sample size ,or equal variance. In Welch's
  10. t-test ,we compute a t-statistic using following equation.
  11. T equals mu1 minus mu2, divided by the
  12. square root of sigma1 squared over n1. Plus
  13. sigma 2 squared over n2. Where mu I ,is the sample mean for the Ith sample.
  14. Sigma squared I is the sample variance for
  15. the Ith sample. And NI is the sample size
  16. for the Ith sample. We'll also want to estimate the number of degrees
  17. of freedom, nu, using the following equation.
  18. Nu is approximately equal to. Quantity sigma1
  19. squared ,over n1 ,plus sigma2 squared over n2 ,squared over sigma1 of the
  20. 4th over n1 squared nu1 ,plus sigma2 to the 4th ,over n2 squared nu2.
  21. Where mu I is equal to mi minus one ,and
  22. this is the degrees of freedom associated with the Ith variance
  23. estimate. If you're unfamiliar with degrees of freedom again it might
  24. be a good idea to brush up on your stats concepts
  25. with the audacity's intro to stats course. A link is
  26. provided in the instructor comments. All right so once we have
  27. these two values, we can estimate the P value. Conceptually, the
  28. P-value is the probability of obtaining the test statistic at least
  29. as extreme as the one that was actually observed
  30. ,assuming that the null hypothesis was true. The P
  31. value is not the probability of the null hypothesis
  32. is true given the data. So again, just as a
  33. thought experiment. Say we were testing whether left handed
  34. or right handed baseball players. Were better batters by looking
  35. at their average batting average. If the P value
  36. is .05, this would mean that ,even if there is
  37. no difference between left handed and right handed batters, since
  38. that's our null hypothesis. So, even if this was true,
  39. we would see a value of t ,equal or greater
  40. to the one that we saw 5% of the time.
  41. When performing a statistical test like this, we usually set
  42. some critical value of P. Let's call it P critical.
  43. If P falls below P critical, then we would reject
  44. the null hypothesis. In the two sample case, this is equivalent
  45. to stating that the mean for our two samples
  46. is not equal. Calculating this P value for a
  47. given set of data can be kind of of tedious.
  48. Thankfully, we seldom have to perform this calculation explicitly.