English subtitles

← Complex Queries 2 - Intro to Data Science

Get Embed Code
4 Languages

Showing Revision 5 created 05/25/2016 by Udacity Robot.

  1. The last thing that I want to quickly
  2. discuss ,are some functions that usually exist in query
  3. languages like Sequel. Such as ,group bys and
  4. aggregate functions. So let's say that I wanted to
  5. create some transformed version of my data. For
  6. example, what are the total number of enrollments per
  7. district? . I could write something like this. Select
  8. district,. Sum (aadhaar - generated) from aadhaar - data,
  9. group by district. Let's talk about this
  10. a little bit more. What's happening exactly?
  11. . Sum is what we call an aggregate function. An aggregate function takes some
  12. set of values, usually numbers and performs
  13. a mathematical operation on them. We've used
  14. sum ,but other aggregate functions include count.
  15. Min, mean, max. You get the idea.
  16. Operations that one could perform on a collection
  17. of numbers. But wait. Every single row is
  18. only one number. So how do we get
  19. to collections of numbers? What we basically say here
  20. is take each distinct district. And then for
  21. all of the different values of aadhaar_generated corresponding
  22. to a row ,for that district. Sum them
  23. up. So, we start with our aadhaar data table.
  24. Take each district ,and sum up the count aadhaar
  25. generated. Now in order for our results to make
  26. sense, we are only going to want one row
  27. in our output for each district. So we throw in
  28. this group by clause on the end, which essentially
  29. says. Let's only have one row per district in
  30. our results. There can be numerous clauses in a
  31. group buy. So we could also say this, select district,
  32. subdistrict, sum aadhaar generated, from aadhaar data, group
  33. by district, subdistrict. Note that whatever columns we
  34. select, that we don't aggregate, we need to
  35. group by. In this case, district and sub-district. We
  36. could also put a where clause here, so
  37. in order to sum up aadhaar generated for people
  38. over 60 in each district, I can just add a where clause in after the table name,
  39. as we discussed earlier. If we were to run this query. Giving us select
  40. district, sub-district, sum aadhaar generated, from aadhaar data,
  41. where age greater than 60, group by district, sub-district. If we
  42. were to run this query, we would have
  43. a row for every combination of district and
  44. subdistrict. And we would also have for each
  45. row, account of how many aadhaar were generated ,for
  46. people over the age of 60.