Return to Video

Top 10 - Intro to Hadoop and MapReduce

  • 0:00 - 0:03
    An interesting application of MapReduce and these design patterns
  • 0:03 - 0:07
    is making top N record lists. These are especially useful
  • 0:07 - 0:10
    if you work at a company like BuzzFeed. But even
  • 0:10 - 0:13
    if you don't, you still might want to use these. And
  • 0:13 - 0:16
    in this question we're going to, in particular, find the top
  • 0:16 - 0:19
    ten longest forum posts. Now, with a relational data base
  • 0:19 - 0:23
    management system you just first sort your data. In this
  • 0:23 - 0:27
    case forum posts. And then pick the top N records.
  • 0:27 - 0:31
    But with MapReduce, this just won't work. The
  • 0:31 - 0:34
    data isn't sorted and it's processed in several machines.
  • 0:34 - 0:36
    So, instead, what we're going to have to do
  • 0:36 - 0:40
    is have each mapper generate a top N list
  • 0:41 - 0:43
    and then send these local lists to the
  • 0:43 - 0:46
    reducers who can then find the global top N.
  • 0:48 - 0:49
    It's sort of like what happens in the Olympics. If
  • 0:49 - 0:52
    you want to find the top three I don't know,
  • 0:52 - 0:57
    swimmers, every country needs to send their top three swimmers
  • 0:57 - 1:02
    to the Olympics. And then, through a competition, figure who globally
  • 1:02 - 1:05
    the top three swimmers are. Anyways, let's go ahead and find
  • 1:05 - 1:08
    the top ten longest forum posts in the Udacity forum data.
Tytuł:
Top 10 - Intro to Hadoop and MapReduce
Opis:

more » « less
Video Language:
English
Team:
Udacity
Projekt:
ud617 - Intro to Hadoop and Mapreduce
Duration:
01:10
Udacity Robot edited angielski subtitles for 07-05 Top 10
Udacity Robot edited angielski subtitles for 07-05 Top 10
Cogi-Admin edited angielski subtitles for 07-05 Top 10

English subtitles

Revisions Compare revisions