Return to Video

Input Data - Intro to Hadoop and MapReduce

  • 0:00 - 0:03
    So, let's take a closer look at our input data.
  • 0:03 - 0:06
    Remember that each mapper processes a portion of the input data.
  • 0:06 - 0:08
    And, each one will be given a line at a
  • 0:08 - 0:12
    time. The lines look like this. The mapper needs to take
  • 0:12 - 0:15
    that line, and extract the information it needs. Often, when
  • 0:15 - 0:18
    we're dealing with text, it's pretty free-form. So we'd use something
  • 0:18 - 0:22
    like a regular expression. But in this case, it's regular, it's
  • 0:22 - 0:26
    tab limited. So we can split the line on tab and
  • 0:26 - 0:29
    extract the values. In this example, they're date,
  • 0:29 - 0:34
    time, store name, product type, cost, and method
  • 0:34 - 0:36
    of payment. Let's say you've been asked to
  • 0:36 - 0:40
    find the total sales per store. How would you
  • 0:40 - 0:42
    choose an intermediate key and value? Could it
  • 0:42 - 0:46
    be the time and the store name? Or the
  • 0:46 - 0:51
    cost of the item and the store name? Or rather the store name as a key and
  • 0:51 - 0:54
    the cost? Or the store name and the
  • 0:54 - 0:57
    product type. Remember that our line looks like this.
Cím:
Input Data - Intro to Hadoop and MapReduce
Leírás:

05-02 Input Data

more » « less
Video Language:
English
Team:
Udacity
Projekt:
ud617 - Intro to Hadoop and Mapreduce
Duration:
0:58
Udacity Robot edited Angol subtitles for 05-02 Input Data
Udacity Robot edited Angol subtitles for 05-02 Input Data
Cogi-Admin edited Angol subtitles for 05-02 Input Data

English subtitles

Felülvizsgálatok Compare revisions