Return to Video

Reducer Code - Intro to Hadoop and MapReduce

  • 0:00 - 0:03
    So here's the reducer code. Let's step through it. We'll
  • 0:03 - 0:07
    start by setting up a couple of variables. Sales total,
  • 0:07 - 0:09
    which we'll keep the running total, is initialized to 0.
  • 0:09 - 0:13
    Because we haven't read any keys yet, old key starts at
  • 0:13 - 0:16
    none. Then we start reading from standard input. As with
  • 0:16 - 0:19
    the mappers we receive a line at a time that's tabbed
  • 0:19 - 0:22
    eliminated. In this case we expect a store name, a
  • 0:22 - 0:25
    tab, and one of the sales. So we'll strip out the
  • 0:25 - 0:29
    new line, separate the value and the key by
  • 0:29 - 0:31
    tab. That should give us two items which we'll store
  • 0:31 - 0:34
    in an array called data. If we don't have two
  • 0:34 - 0:38
    items, something strange has happened so we'll skip that line,
  • 0:38 - 0:40
    but that should never be the case. Because we know
  • 0:40 - 0:42
    that mappers send us the data in this format. For
  • 0:42 - 0:45
    the sake of clarity, we'll store the two items in
  • 0:45 - 0:50
    the array into variables. ThisKey will get the store name,
  • 0:50 - 0:54
    thisSale, the value. So here is the tricky part. We want to
  • 0:54 - 0:57
    know if the key has changed since the last row we
  • 0:57 - 1:01
    read. So we'll check if oldKey is even set, because if
  • 1:01 - 1:03
    it's not, then this is the first line we're reading. And if
  • 1:03 - 1:05
    it is, we'll see if it's different to the key we
  • 1:05 - 1:08
    just read in. If that's true, then the key has just
  • 1:08 - 1:12
    changed. In our example data, we had just switched from the
  • 1:12 - 1:15
    Miami row to the New York City row. So now we need
  • 1:15 - 1:18
    to write out the result. Which'll be key, which
  • 1:18 - 1:22
    is the store name, a tab and the sales total
  • 1:22 - 1:25
    for that store. Then we reset sales total back
  • 1:25 - 1:29
    to 0. Now for every row that we process, we'll
  • 1:29 - 1:31
    set oldKey to the key we are working on,
  • 1:31 - 1:35
    then add the current sale to the running total. And
  • 1:35 - 1:38
    then we'll loop back to the next row. Eventually, we'll
  • 1:38 - 1:40
    run out of data, which will take us out of
  • 1:40 - 1:42
    the loop. Do you think we're done?
Title:
Reducer Code - Intro to Hadoop and MapReduce
Description:

more » « less
Video Language:
English
Team:
Udacity
Project:
ud617 - Intro to Hadoop and Mapreduce
Duration:
01:43
Udacity Robot edited English subtitles for 05-10 Reducer Code
Udacity Robot edited English subtitles for 05-10 Reducer Code
Cogi-Admin edited English subtitles for 05-10 Reducer Code

English subtitles

Revisions Compare revisions