Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← Understanding Noise - Age to Age Months - Data Analysis with R

Get Embed Code
5 Languages

Showing Revision 2 created 05/24/2016 by Udacity Robot.

  1. Let's return to our scatter plot that summarized the relationship between
  2. age and mean friend count. Recall that we ended up creating this
  3. plot from the new data frame that we created using the
  4. d ply r package. The plot looked like this. As you can
  5. see, the black line has a lot of random noise to
  6. it. That is, the mean friend count rises and falls over each
  7. age. Let's print out some of our data frame to have
  8. a closer look. As we can see, the mean friend count increases,
  9. then decreases later. In one particular case, we can see
  10. that for 30 year olds, the mean friend count is
  11. actually lower compared to the 29 year olds and the
  12. 31 year olds. Now some year to year discontinuities might make
  13. sense, such as the spike at age 69. But others
  14. are likely just to be noise around the true smoother
  15. relationship between age and friend count. That is, they reflect
  16. that we just have a sample from the data generating process.
  17. And so the estimated mean friend count for each age
  18. is the true mean plus some noise. We can imagine
  19. that the noise for this plot would be worse if
  20. we chose finer bins for age. For example, we could estimate
  21. conditional means for each age, measured in months instead of
  22. years. Over the next few programming exercises, you're going to
  23. do just that. You're going to create a plot just like
  24. this one with a new variable that measures ages in months
  25. instead of years. Then you'll plot the conditional mean for
  26. ages in months, and we'll compare this graph to the
  27. one that you create. To start, you're going to create
  28. the age with months variable, and save it into the data
  29. frame. This variable will have each user's age measured in
  30. months rather than in years. So, if a user is 36
  31. years old and was born in March, the user's age
  32. would be 36.75. Try coding this up in R for yourself.
  33. And then once you have the code, copy and
  34. paste it into the browser and submit. Now, this is
  35. one of the exercises where the grader will automatically
  36. check your output. Don't worry if you don't get this
  37. one right on your first try. It's pretty tough.
  38. I really recommend thinking about ages and people being born
  39. in different months. How would that affect the variable age
  40. with months? Working with actual values might help you here.