Return to Video

Doug Cutting - The Origins of Hadoop - Intro to Hadoop and MapReduce

  • 0:00 - 0:06
    So, let me tell you how Hadoop came to be. About
  • 0:06 - 0:10
    ten years ago in around 2003, I was working on an Open
  • 0:10 - 0:15
    Source web search engine called Nutch, and
  • 0:15 - 0:20
    we knew it needed to be something very scalable, because the Web was you know,
  • 0:20 - 0:26
    billions of pages. terabytes, petabytes, of data, that we needed
  • 0:26 - 0:30
    to be able to process, and we set about, you know, doing the best job we
  • 0:30 - 0:36
    could and it was tough. We got things up and running on four or five machines
  • 0:36 - 0:40
    not very well, and around that time Google
  • 0:40 - 0:43
    published some papers. About how they were doing
  • 0:43 - 0:46
    things internally. Published a paper about their distributed
  • 0:46 - 0:51
    file system, TFS. and about their processing, framework, MapReduce.
  • 0:53 - 0:58
    So my partner and I, at the time, in this project, Mike
  • 0:58 - 1:03
    Cafarella. Said about trying to reimplement these in Open Source. So that
  • 1:03 - 1:08
    more people could use them than just folks at Google. Took us a couple of years,
  • 1:08 - 1:13
    and we had Nutch up and running on, instead of four or five
  • 1:13 - 1:18
    machines, on, 20 to 40 machines. It wasn't perfect,
  • 1:18 - 1:22
    it was it wasn't, wasn't totally reliable, but it
  • 1:22 - 1:25
    worked. And we realize that to get it to the
  • 1:25 - 1:27
    point where it was scaled to thousands of machines,
  • 1:27 - 1:30
    and be as bullet proof as it needed to be,
  • 1:30 - 1:32
    would take more than just the two of us,
  • 1:32 - 1:36
    working part time. Around that time, Yahoo approached me and
  • 1:36 - 1:39
    said they were interested in investing in this. So
  • 1:39 - 1:42
    I went to work for Yahoo in January of 2006.
  • 1:43 - 1:47
    First thing I did there, was, we took the parts of Nutch that were a distributed
  • 1:47 - 1:50
    computing platform, and put em into a separate
  • 1:50 - 1:55
    project. A new project christened Hadoop. Over the
  • 1:55 - 1:58
    next couple years, with, Yahoo's help, and the
  • 1:58 - 2:03
    help of others. we took Hadoop, and really
  • 2:03 - 2:04
    got it to the point where it did
  • 2:04 - 2:09
    scale to petabytes, and running on thousands of processors.
  • 2:09 - 2:16
    And doing so quite reliably. it spread to lots of companies. and mostly in the
  • 2:16 - 2:21
    Internet sector, and became quite a success. after that, we, we started to see
  • 2:21 - 2:27
    a bunch of other projects grow up around it. And Hadoop's grown to be the kernel
  • 2:27 - 2:34
    of a, which, pretty much an operating system for big data. we've got,
  • 2:34 - 2:41
    we've got tools that, allow you to, more easily do, MapReduce programming.
  • 2:41 - 2:48
    so, you can develop using SQL or a data flow language called Pig. and
  • 2:48 - 2:50
    we've also got the beginnings of
  • 2:50 - 2:54
    higher-level tools. We've got interactive SQL with
  • 2:54 - 3:00
    Impala. We've got Search. and so we're really seeing this develop to being a
  • 3:00 - 3:06
    general purpose platform for data processing. that scale's much better and
  • 3:06 - 3:11
    that it is much more flexible than anything that's, that's, else is out there.
Cím:
Doug Cutting - The Origins of Hadoop - Intro to Hadoop and MapReduce
Leírás:

01-18 Doug Cutting: The Origins of Hadoop

more » « less
Video Language:
English
Team:
Udacity
Projekt:
ud617 - Intro to Hadoop and Mapreduce
Duration:
03:12

English subtitles

Felülvizsgálatok Compare revisions