English subtitles

← The 3 Vs - Volume - Intro to Hadoop and MapReduce

Get Embed Code
2 Languages

Showing Revision 4 created 05/25/2016 by Udacity Robot.

  1. When you read or talk about big data, you often
  2. hear people say the three Vs. Volume refers to the size
  3. of the data you're dealing with. Variety refers to the
  4. fact that the data is often coming from a lot of
  5. different sources and different formats. And velocity refers to the
  6. speed at which it's being generated, and the speed at which
  7. it needs to be made available for processing. So let's look
  8. in more detail at each of these. Let's start with volume.
  9. The price to store data has dropped incredibly
  10. over the last 60 years. In 1980, the cost
  11. per gigabyte was several hundred thousand dollars. In
  12. 2013, it's merely $0.10. Although it's worth saying, to
  13. actually store the data reliably. You're going to
  14. end up paying more than that. That's particularly the
  15. case with more traditional storage devices such as storage
  16. area networks, or SANS, which can be extremely expensive.
  17. The high cost of reliable storage puts a cap on
  18. the amount of data companies can practically store. At some
  19. point they say, okay, it's too expensive to store all
  20. that data. And I'm not doing anything with it. Let's just
  21. store the critical stuff, like my actual sales. But it
  22. turns out, as you'll see, that the data you're currently
  23. throwing away can be incredibly useful. What we need is
  24. a cheaper way to store it reliably. And of course storing
  25. the data is only part of the equation. You also need to be able
  26. read and process it efficiently. Storing a terabyte of data on a SAN, is
  27. not that hard. But streaming the data from the SAN across the network to
  28. say, some central processor, can take a
  29. long time and processing can be extremely slow.