English subtitles

← Plotting in Python - Intro to Data Science

Get Embed Code
5 Languages

Showing Revision 5 created 05/25/2016 by Udacity Robot.

  1. Now, we know a bunch about how you may
  2. encode information into your visualization and how to make an
  3. effective visualization, but we still haven't yet discussed how you
  4. can make graphics like this, short of drawing them with
  5. pen and paper. There are a number of packages
  6. for plotting in Python. One of the most popular is
  7. Matplotlib. For this course, however, I'd like to go over
  8. plotting using a Python library called ggplot, which very closely
  9. recreates the syntax used in R's ggplot2 library.
  10. If Matplotlib is so widely used, why should we
  11. use ggplot? Well, I'd like to use this package
  12. for a few reasons. First, what it produces is
  13. a bit more aesthetically pleasing than Matplotlib. Second, it's
  14. an implementation of a pretty neat concept called the
  15. grammar of graphics, which basically claims that there's a
  16. grammar involved in composing graphical components of statistical graphics.
  17. The gg in ggplot actually comes from
  18. grammar of graphics. It also plays nicely with
  19. the pandas DataFrames we've been using in
  20. this course. To quickly summarize the ideas behind
  21. the grammar of graphics, plots convey information
  22. through their aesthetics such as x-position or y-position.
  23. The elements in a given plot are
  24. geometric shapes, such as points, lines, or bars.
  25. Some of these shapes can have aesthetics of
  26. their own, such as their size or their
  27. color. You can think of creating plots in
  28. ggplot through the grammar of graphics as adding layers
  29. to our plot. The first step in creating
  30. a graphic is always to create our plot, which
  31. is essentially going to be our canvas. This
  32. can be done by calling ggplot data aes(xvar, yvar).
  33. Data here is going to be a pandas DataFrame, and xvar and yvar are going to be
  34. columns in that data frame. So what we're doing
  35. here is saying let's make a ggplot. The data
  36. source is going to be our data frame,
  37. and the quantities that we're interested in plotting are
  38. xvar and yvar. This might be district and number
  39. of Aadhaar enrollments or position and number of players,
  40. something like that. So what we've done here is
  41. we've made our ggplot. We've said that the data source
  42. that it will use is pandas DataFrame, and that the
  43. variables that we'll look at are xvar and yvar. This
  44. might be district and number of Aadhaar enrolled if were
  45. using our Aadhaar data or team and total number of
  46. players if we were using our baseball data, something like
  47. that. Okay, so, so far that we've said that we'll
  48. have a plot which is mapping xvar to the
  49. x-axis, yvar to the y-axis, but we haven't said yet
  50. what type of geometric object is going to represent
  51. this data. So if we add plus geom point to
  52. this statement, we'll create a scatter plot. If we
  53. also add plus geom line to the graphic, we'll connect
  54. all these points to each other with lines. Now, say
  55. that we wanted these points to have a particular color.
  56. We can pass color equals coral into geom point,
  57. and also pass color equals coral into geom line.
  58. And after we do that, both the points and
  59. the lines will have the color coral. This is the
  60. second step of making a plot in ggplot, that
  61. is choosing which type geometric objects will represent the data.
  62. The final step here is going to be adding
  63. some labels so that our plot will have some context,
  64. like a title or an x-label or a y-label. This can be done much in the same way
  65. that we added the points and lines to our
  66. plot. We can add a ggtitle to title our
  67. plot. An xlab, which will be x-label, to provide
  68. an x-label. And a ylab to do the same
  69. with the y-axis. Now all I have to do
  70. is precede this entire command by Print. And I'll produce
  71. a plot in Python. Why don't you try implementing
  72. these ideas to create a graphic of your own?