• Managing your videos in Amara teams just got easier! Read about it in our latest blog post. Hide

## ← Non-Normal Data - Intro to Data Science

• 2 Followers
• 31 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=k0IgMKb7vzs" data-team="udacity"></div> ``` 4 Languages

Showing Revision 2 created 05/25/2016 by Udacity Robot.

1. When performing a t-test, we assume that our data
2. is normal. In the wild, you'll often encounter probability distributions.
3. They're distinctly not normal. They might look like this, or
4. like this, or completely different. As you'd imagine, there are
5. still statistical tests that we can utilize when our data
6. is not normal. Why don't we briefly discuss what you
7. might do in situations like this. First off, we should
8. have some machinery in place for determining whether or not
9. our data is Gaussian in the first place. A crude, inaccurate
10. way of determining whether or not our data is normal is
11. simply to plot a histogram of our data and ask, does
12. this look like a bell curve? In both of these cases, the
13. answer would definitely be no. But, we can do a little
14. bit better than that. There are some statistical tests that we
15. can use to measure the likelihood that a sample is drawn
16. from a normally distributed population. One such test is the shapiro-wilk test.
17. I don't want to go into great depth with
18. regards to the theory behind this test, but I do
19. want to let you know that it's implemented in scipy.
20. You can call it really easily like this. W and
21. P are going to be equal to scipy.stats.shapiro data, where
22. our data here is just an array, or list containing
23. all of our data points. This function's going to return these
24. two values. The first, W is the Shapiro-Wilk Test statistic.
25. The second value in this two-pole is going
26. to be our P value, which should be interpreted
27. in the same way that we would interpret
28. the p-value for our t-test. That is, given the
29. null hypothesis that this data is drawn from
30. a normal distribution, what is the likelihood that we
31. would observe a value of W that was at least as extreme as the one that we see?