YouTube

Got a YouTube account?

New: enable viewer-created translations and captions on your YouTube channel!

English subtitles

← 08-13 Mapper and Reducer with Aadhaar Data

08-13 Mapper and Reducer with Aadhaar Data

Get Embed Code
4 Languages

Showing Revision 4 created 02/02/2015 by Udacity Robot.

  1. Lets look at the CSV file containing our
  2. Aadhaar enrollment data once again. Each row has
  3. a number of columns such as, registrar, enrollment
  4. agency, state, district, Aadhaar generated, enrollment rejected, and a
  5. bunch of other information. If we want to count
  6. the number of Aadhaar generated per district. The
  7. columns that we're most interested in are district
  8. and Aadhaar generated. Can you fill in the missing
  9. pieces of the mapper? So if you wanted to complete this
  10. job using the mapper programming model, we would need to write
  11. a mapper and reducer. Why don't you give it a try.
  12. Here is the skeleton of a mapper for this job. We
  13. go through every single line in the input. In this case,
  14. it's going to be our CSV file containing all of the
  15. rows and our Aadhar-generated data. You're going to have to go
  16. through each line, which will be a list of comma-separated values.
  17. The header row will be included. Took a nice each
  18. row using the commas and emit a key value pair
  19. containing the district and the number of Aadhhar generated separated
  20. by a tab. Make sure that each row has the
  21. correct number of tokens and make sure it's not the
  22. header row. In order to count the number of Aadhaar
  23. generated per district using map reduce, we'll also have to
  24. write a reducer. Here's the skeleton of a reducer function
  25. that you'll fill in. We initialize aadhaar_generated to 0,
  26. and set old_key to None. You'll cycle through the list
  27. of key value pairs emitted by your mapper, and print
  28. out each key once, along with the total number of
  29. Aadhaar generated, separated by a tab. You can assume that
  30. the list of key value pairs will be ordered by
  31. key. Make sure that each key value pair is formatted
  32. correctly before you process it. Here's a sample final key
  33. value pair. Gujarat\t5.0.