English subtitles

← Box Plots

Get Embed Code
5 Languages

Showing Revision 2 created 05/24/2016 by Udacity Robot.

  1. It looks like females on average have slightly more friends
  2. than men. Since I can see that this median line
  3. is slightly higher. That's what this black line is. It
  4. represents the median or the middle 50% of friend counts for
  5. females and for males. Now this difference isn't very large.
  6. So let's zoom in to take a closer look. This box
  7. for females and this box for males Represents the middle
  8. 50% of values in our sample. So, I think it makes
  9. sense that we zoom in even more to take a closer
  10. look. We should consider any values less than 250. Now, there's no
  11. exact choice here, I'm just choosing something that seems reasonable, since the
  12. bulk of my data is down here. After running this code, we
  13. can now see that the bulk of user friend count is
  14. similar for the middle 50% of men as it is for the
  15. middle 50% of women. Its just our females are slightly higher for
  16. friend count. Lets look at actual values though and compare the values
  17. to what we see in our box plot. We can look at
  18. those values by using the by command and running a summary of
  19. our friend count split by gender. So first, I want to include
  20. my friend count which is the variable I want a summary of. I
  21. want to split it over gender and I want a summary. Running this
  22. code, I get an output of my table, which shows me the
  23. minimum maximum values for both genders, as well as the core tiles.
  24. The first core tile for women is 37 and that looks about right
  25. in our graph. The third quartile or the 75%
  26. mark is at 244 and that's all the way up
  27. here. This means that 75% of female users have friend
  28. counts below 244. Or another way to say this is
  29. that 25% of female users have more than 244 friends.
  30. Similarly for the men, we can see how the first
  31. quartiles and the third quartiles match up to the box
  32. plot. Now, you might have remembered that we used coord_cartesian
  33. in the solution video from before. We did this so that
  34. way, the table output would match our box plots. If we would
  35. have just used the ylim parameter inside of qplot, we would have
  36. gotten different quantiles that wouldn't match our picture. This is just a
  37. subtle difference that you should be aware of when working in
  38. R. Now, it's your turn to answer a different question. On average,
  39. who initiated more friendships in our sample? Was it men or was
  40. it women? Used some of the techniques that we just covered and
  41. then write a few sentences explaining how you
  42. came up with your answer. This second question won't
  43. be automatically graded, but it's important that you
  44. know how to communicate your analysis to other people.