English subtitles

← Scatterplot Transformation - Data Analysis with R

Get Embed Code
5 Languages

Showing Revision 2 created 05/25/2016 by Udacity Robot.

  1. Now that we have a better understanding of our variables, and
  2. the overall demand for diamonds, let's replot the data. This time
  3. we'll put price on a log10 scale, and here's what it
  4. looks like. This plot looks better than before. On the log
  5. scale, the prices look less dispersed at the high end of
  6. Carat size and price, but actually we can do better. Let's
  7. try using the cube root of Carat in light of our
  8. speculation about flaws being exponentially more
  9. likely in diamonds with more volume.
  10. Remember, volume is on a cubic scale. First, we need
  11. a function to transform the Carat variable. If you'd like to
  12. learn more about writing your own functions in R, check out
  13. the links in the instructor notes. This may seem like a
  14. lot of code, but really, there's only one new piece here.
  15. It's this cube root trans-function. It's a function that takes the
  16. cube root of any input variable, and it also has an
  17. inverse function to undo that operation, which we need to display
  18. the plot correctly. Then when we get to
  19. our actual ggplot command. What we'll do is we'll
  20. use the scale_x_continuous argument to transform the x
  21. axis with this cube root transformation function. Keep in
  22. mind we're also transforming the y axis with
  23. this log10 transformation that we discussed previously. And, let's
  24. see what this plot looks like. Taking a
  25. look at the plot, we can actually see that
  26. with these transformations that we used to get
  27. our data on this nice scale. Things look almost
  28. linear. We can now move forward and see
  29. about modelling our data using just a linear model.