English subtitles

← Info Loss and Principal Components - Intro to Machine Learning

Get Embed Code
4 Languages

Showing Revision 3 created 05/25/2016 by Udacity Robot.

  1. So if this red line is our principle component,
  2. then the information loss is going to be something like the sum of all
  3. these distances that I'm drawing in here.
  4. The distance between the points and
  5. their new projected spots on the new feature on the line.
  6. And we can sum this up over all the points, we'll get some number.
  7. And here's the key insight.
  8. Let me draw in another principle component that we could have hypothesised as,
  9. as the first principle component, as the one we wanted to use.
  10. Let's suppose that instead of the red line,
  11. we were looking at this purple line instead.
  12. Then we can ask the same question of the purple line,
  13. what's the information loss when we project all of the points down onto it.
  14. And we'll start to get something that looks like this.
  15. I know it's a little bit cluttered, but
  16. I hope what you can see is that on average these purple lines are all
  17. going to be significantly longer than the red lines.
  18. For any given point that might not be true, but for
  19. the points in aggregate, it will be true.
  20. Then when we maximize the variance, we're actually minimizing the distance
  21. between the points, and their new spot on the line.
  22. In other words, it's a mathematical fact that when we do this projection onto
  23. the direction of maximal variance, and only onto that direction, we'll be
  24. minimizing the distance from the old point to the new transformed point.
  25. And what this is necessarily doing is minimizing the information loss.