English subtitles

← Reducer Code - Intro to Hadoop and MapReduce

Get Embed Code
2 Languages

Showing Revision 3 created 05/25/2016 by Udacity Robot.

  1. So here's the reducer code. Let's step through it. We'll
  2. start by setting up a couple of variables. Sales total,
  3. which we'll keep the running total, is initialized to 0.
  4. Because we haven't read any keys yet, old key starts at
  5. none. Then we start reading from standard input. As with
  6. the mappers we receive a line at a time that's tabbed
  7. eliminated. In this case we expect a store name, a
  8. tab, and one of the sales. So we'll strip out the
  9. new line, separate the value and the key by
  10. tab. That should give us two items which we'll store
  11. in an array called data. If we don't have two
  12. items, something strange has happened so we'll skip that line,
  13. but that should never be the case. Because we know
  14. that mappers send us the data in this format. For
  15. the sake of clarity, we'll store the two items in
  16. the array into variables. ThisKey will get the store name,
  17. thisSale, the value. So here is the tricky part. We want to
  18. know if the key has changed since the last row we
  19. read. So we'll check if oldKey is even set, because if
  20. it's not, then this is the first line we're reading. And if
  21. it is, we'll see if it's different to the key we
  22. just read in. If that's true, then the key has just
  23. changed. In our example data, we had just switched from the
  24. Miami row to the New York City row. So now we need
  25. to write out the result. Which'll be key, which
  26. is the store name, a tab and the sales total
  27. for that store. Then we reset sales total back
  28. to 0. Now for every row that we process, we'll
  29. set oldKey to the key we are working on,
  30. then add the current sale to the running total. And
  31. then we'll loop back to the next row. Eventually, we'll
  32. run out of data, which will take us out of
  33. the loop. Do you think we're done?