## ← Info Loss and Principal Components - Intro to Machine Learning

• 2 Followers
• 25 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=LTPV8lxQeZQ" data-team="udacity"></div> ``` 4 Languages

Showing Revision 3 created 05/25/2016 by Udacity Robot.

1. So if this red line is our principle component,
2. then the information loss is going to be something like the sum of all
3. these distances that I'm drawing in here.
4. The distance between the points and
5. their new projected spots on the new feature on the line.
6. And we can sum this up over all the points, we'll get some number.
7. And here's the key insight.
8. Let me draw in another principle component that we could have hypothesised as,
9. as the first principle component, as the one we wanted to use.
10. Let's suppose that instead of the red line,
11. we were looking at this purple line instead.
12. Then we can ask the same question of the purple line,
13. what's the information loss when we project all of the points down onto it.
14. And we'll start to get something that looks like this.
15. I know it's a little bit cluttered, but
16. I hope what you can see is that on average these purple lines are all
17. going to be significantly longer than the red lines.
18. For any given point that might not be true, but for
19. the points in aggregate, it will be true.
20. Then when we maximize the variance, we're actually minimizing the distance
21. between the points, and their new spot on the line.
22. In other words, it's a mathematical fact that when we do this projection onto
23. the direction of maximal variance, and only onto that direction, we'll be
24. minimizing the distance from the old point to the new transformed point.
25. And what this is necessarily doing is minimizing the information loss.