English subtitles

← Problem with SSE - Intro to Machine Learning

Get Embed Code
4 Languages

Showing Revision 3 created 05/25/2016 by Udacity Robot.

  1. In this case, the distribution on the right is going to have the larger sum of
  2. squared errors, and it should be fairly straightforward to see why.
  3. You can compare point by point, these would be all the errors on the left.
  4. There's very similar sum of squared errors on the right for these data points.
  5. But, then on the right, you have all these additional data points.
  6. And each one of those is going to contribute a little bit of error
  7. that we'll add to the overall sum of squared errors of the fit here.
  8. So what that means is that the distribution on the right has a larger sum of
  9. squared errors even though we agreed that it's probably not
  10. doing a much worse job of fitting the data than the distribution on the left.
  11. And this is one of the shortcomings of the sum of squared error in
  12. general as an evaluation metric.
  13. Is that as you add more data the sum of the squared error will almost certainly
  14. go up, but it doesn't necessarily mean that your fit is doing a worse job.
  15. However, if your comparing two sets of data that have different number of
  16. points in them then this can be a big problem, because if your using the sum of
  17. square errors to figure out which one is being fit better.
  18. Then the sum of squared errors can be jerked around by the number of data
  19. points that you're using, even though the fit might be perfectly fine.
  20. So this motivates me to tell you about one other evaluation metric that's very
  21. popular when evaluating regressions.