## ← Complex Queries 2 - Intro to Data Science

• 2 Followers
• 46 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=utx7ObmE25U" data-team="udacity"></div> ``` 4 Languages

Showing Revision 5 created 05/25/2016 by Udacity Robot.

1. The last thing that I want to quickly
2. discuss ,are some functions that usually exist in query
3. languages like Sequel. Such as ,group bys and
4. aggregate functions. So let's say that I wanted to
5. create some transformed version of my data. For
6. example, what are the total number of enrollments per
7. district? . I could write something like this. Select
10. a little bit more. What's happening exactly?
11. . Sum is what we call an aggregate function. An aggregate function takes some
12. set of values, usually numbers and performs
13. a mathematical operation on them. We've used
14. sum ,but other aggregate functions include count.
15. Min, mean, max. You get the idea.
16. Operations that one could perform on a collection
17. of numbers. But wait. Every single row is
18. only one number. So how do we get
19. to collections of numbers? What we basically say here
20. is take each distinct district. And then for
21. all of the different values of aadhaar_generated corresponding
22. to a row ,for that district. Sum them
24. Take each district ,and sum up the count aadhaar
25. generated. Now in order for our results to make
26. sense, we are only going to want one row
27. in our output for each district. So we throw in
28. this group by clause on the end, which essentially
29. says. Let's only have one row per district in
30. our results. There can be numerous clauses in a
31. group buy. So we could also say this, select district,
33. by district, subdistrict. Note that whatever columns we
34. select, that we don't aggregate, we need to
35. group by. In this case, district and sub-district. We
36. could also put a where clause here, so
37. in order to sum up aadhaar generated for people
38. over 60 in each district, I can just add a where clause in after the table name,
39. as we discussed earlier. If we were to run this query. Giving us select
41. where age greater than 60, group by district, sub-district. If we
42. were to run this query, we would have
43. a row for every combination of district and
44. subdistrict. And we would also have for each
45. row, account of how many aadhaar were generated ,for
46. people over the age of 60.