When you read or talk about big data, you often
hear people say the three Vs. Volume refers to the size
of the data you're dealing with. Variety refers to the
fact that the data is often coming from a lot of
different sources and different formats. And velocity refers to the
speed at which it's being generated, and the speed at which
it needs to be made available for processing. So let's look
in more detail at each of these. Let's start with volume.
The price to store data has dropped incredibly
over the last 60 years. In 1980, the cost
per gigabyte was several hundred thousand dollars. In
2013, it's merely $0.10. Although it's worth saying, to
actually store the data reliably. You're going to
end up paying more than that. That's particularly the
case with more traditional storage devices such as storage
area networks, or SANS, which can be extremely expensive.
The high cost of reliable storage puts a cap on
the amount of data companies can practically store. At some
point they say, okay, it's too expensive to store all
that data. And I'm not doing anything with it. Let's just
store the critical stuff, like my actual sales. But it
turns out, as you'll see, that the data you're currently
throwing away can be incredibly useful. What we need is
a cheaper way to store it reliably. And of course storing
the data is only part of the equation. You also need to be able
read and process it efficiently. Storing a terabyte of data on a SAN, is
not that hard. But streaming the data from the SAN across the network to
say, some central processor, can take a
long time and processing can be extremely slow.
ビッグデータについて表す3つのVがあります
Volume(容量)は扱うデータのサイズを表します
Variety(多様性)は異なるフォーマットや
ソースをデータが持つことを表します
Velocity(速度)はデータの生成や処理に
必要な速さを表します
詳しく見ていきましょう
まずVolume(容量)についてです
データ保存のコストは過去60年で大幅に下がりました
1980年当時1GBを保存するコストは
数万ドルでしたが
2013年現在わずか0.1ドルです
今でも高いコストを支払う場合があります
ストレージエリアネットワーク(SAN)などは
非常に高価な従来型のストレージです
信頼性が高い高価なストレージは
保存できるデータの量に制限を設けます
企業はコスト削減のため
売上データなど重要なデータのみ保存することにします
しかし破棄したデータが
あとで価値があったと判明したりします
データを低コストで確実に保存する方法が必要です
もちろんデータの保存だけでなく
データを効果的に処理する必要があります
1TBのデータをSAN上で保存するのは簡単です
データをSANからCPUにストリーミングするには
時間がかかり処理速度も遅くなります