-
TOBY HEDE: Good morning everybody.
-
Friday. Yes. It's been a long week. I'm excited.
-
I'm highly caffeinated. So without further
ado,
-
I present An Ode to 17 Databases in 33 Minutes.
-
I'm gonna mangle a large number of metaphors.
-
There'll be a lot of animated gifs.
-
I've learned that this week, if you see it
like that,
-
there's Star Wars, Dungeons and Dragons,
-
and all of that's very, unfortunately, stereotypical.
-
So a bit of an indictment.
-
This whole thing started as a joke. Seventeen
databases.
-
I actually did in five minutes. Thirty-three
minutes is
-
worse. The whole thing is just a catastrophe,
really.
-
But anyway.
-
We're gonna cover a whole bunch of different
databases
-
and a little bit of the underlying theory,
and
-
hopefully you'll walk out and you'll understand
why to
-
use PostGres.
-
[laughter]
-
I'm Toby. You can find me on the internet.
-
I work at a company called Nine Fold.
-
V.O.: We're having a problem, there's no screen.
-
T.H.: Oh. No screens. Is that me?
-
Before it was, there was no red. So, now
-
there's no any, anything.
-
V.O.: Nothing.
-
T.H.: Hey.
-
AUDIENCE: Hey!
-
T.H.: I have no slides.
-
Well, you missed my beautiful slides. There's.
You missed
-
the first animation. That's a shame. You missed
the
-
list. It's awesome. You missed me and my excellent
-
job titles. So yes.
-
I work at Nine Fold. They have very kindly
-
flown me over here from Australia, which explains
why
-
I sound like I come from the deep south.
-
Cause I do.
-
Most of this week, this has been me. So
-
today I'm finally over the jetlag just in
time
-
to go home and have it all over again
-
next week.
-
So, a couple of quick facts about Straya.
There
-
are much fewer syllables than you're used
to using.
-
This is an, a genuine Australian politician.
He's a
-
mining magnate billionaire and he is currently
running a
-
MVP Jurrassic theme park with giant fiberglass
dinosaurs. And
-
I, I for one am for it. So I
-
realize there wasn't enough Star Wars references
so this
-
is just completely gratuitous.
-
Anyway. So. The thrust is that distributed
systems are
-
hard and databases are fun. Pictured here
is a
-
distributed system. You can see there's two
app nodes
-
and then there's two, there's like a master/slave
kind
-
of setup going on here as well. So we're
-
gonna talk about some of the complexities
of running
-
these types of systems, and it's really fun
stuff
-
once you get under the cover and start thinking
-
about some of the complexities.
-
So. NoSQL is a thing. We have NewSQL now.
-
I'm gonna be covering some of these things.
We've
-
also got PostSQL, Post-Rock Ambient SQL. And
there's a
-
whole gammit of these things. They all make
my
-
brain explode and the, I think the trick to
-
understanding all of this stuff is to actually
think
-
about some of what's happening underneath.
And you can
-
make decisions about your databases.
-
Hopefully you're all familiar with some of
the concepts
-
of traditional relational databases. We have
Acid, which provides
-
certain guarantees about the way that your
data behaves.
-
You can update data and be sure it was
-
updated. Things are isolated from each other.
Things persist
-
over time.
-
Another thing that you may have heard of,
this
-
is a, this is a leap that I need
-
to another animation, is a thing called the
CAP
-
Theorem. So this gets talked about a lot when
-
we start talking about this new generation
of databases.
-
CAP stands for consistency, availability,
and partition tolerance, and
-
it provides, basically, some strong foundation
for reasoning about
-
the way distributed systems behave and how
they interoperate
-
and how they communicate. So I'm gonna give
you
-
a brief introduction to how that all kind
of
-
works.
-
So, the original CAP Theorem, as stated, was,
is
-
called Brewer's Conjecture. A guy called Brewer
just sort
-
of had this idea. It's actually on some really
-
awesomely-designed PowerPoint slides from
some thing he did. And
-
he was saying that with consistency, availability,
and partition
-
tolerance - so the data can, can only be
-
two of these things at any one time. So
-
the data can be consistent or it can be
-
accessible or it can handle network failures.
-
So people then took this conjecture and actually
made
-
a formal kind of proof in, in much more
-
rigorous computer science terms. And actually
said, it's impossible,
-
in an asynchronous network model, to implement
a read/write
-
data object that is simultaneously available
and is also
-
atomically consistent.
-
And so all of this stuff around NewSQL and
-
NoSQL and bleh, all of that stuff, is about
-
manipulating these different variables. There's
also a thing called
-
Base but I'm not gonna talk about it cause
-
it's actually just a made-up acronym that
has no
-
relevance to anything.
-
So, what, what does CAP actually, what, what
are
-
we talking about here? And why is it important?
-
It's important, actually, because everything
is already distributed. What
-
we do today is inherently a distributed system.
You
-
have a browser talking to a server, an app
-
server, Rails server - cause we're at RailsConf
-
-
and then that's talking to a PostGres database,
or
-
a MySQL database or something even fancier
and shinier.
-
That's a distributed system. And as we move
into
-
more heavy client-based operations, that distribution
is getting much
-
more front-loaded, so you, you've got state
in the
-
browser that's now synchronizing with state
on the server.
-
So we already actually suffer many of these
problems.
-
This is a handy and completely untrue guide
to
-
NoSQL systems and breaking them into this
idea of
-
some things are available and some things
are consistent.
-
So, all of that is almost but not quite
-
entirely untrue.
-
What the actual theorem says is that under
a
-
network failure - so you've got multiple nodes
and
-
they now can no longer communicate - you can
-
choose whether the data is consistent or whether
the
-
data is available. And I have some demonstrations
here
-
to just - it actually ends up being very
-
easy to understand.
-
So, here we have typical cluster of nodes
working
-
together. We're gonna model some communication
between them. So
-
there's a, there's a write on this system.
It
-
comes in, that gets replicated across, and
then on
-
the other system we now have that data coming
-
out. Someone's doing a read. And so this is
-
the kind of situation that we're talking about.
So
-
whether you're doing master/slave setup in
a relational database
-
or something trickier, this is kind of the
way
-
it works. A node gets some data and it
-
gives it to another node, and they have the
-
same information.
-
So when there's a network partition, that,
they no
-
longer can communicate. So a write comes in,
and
-
now we have to make a decision. And all
-
of this is actually just science, as you can
-
tell from this diagram. If those two nodes
can't
-
communicate, you can talk to the one that
got
-
the write - that's consistent. It got the
write.
-
It can now, can read out that same data.
-
That's all cool.
-
Or, you can have both nodes still communicating,
and
-
now you have someone reading data that is
no
-
longer in the write state. So we've got, you
-
know, we have updated a bank account. It's
got
-
a hundred dollars in it. It used to have
-
ten dollars in it. These people are reading
ten.
-
These people are reading a hundred. That's
available. The
-
data is now not consistent. But all of the
-
nodes can send back that data.
-
And so all of the discussion about CAP Theorem
-
and, and you know, people even claiming, we've
defeated
-
the CAP Theorem in our database at, you know,
-
low-low prices is incredibly awesome. Just
remember this image.
-
Two things that cannot communicate cannot
communicate. It's science.
-
And then when they can communicate, we're
back into
-
the realm of normal operations and things
get a
-
lot easier. If you were interested in any
of
-
the guts of how these things work, definitely
have
-
a look at a thing called jepsen, which is
-
this crazy motherfucker who is just analyzing
the network
-
operations of a whole variety of distributed
systems, and
-
it will, it's just, it will blow your mind.
-
OK. Good. That's, that's why. Now I remember.
-
So, here is our cast. We're about to go
-
on an adventure through a tortured maze of
ridiculous
-
Dungeons and Dragons metaphors. But, first
of all, a
-
shout out to the OwlBear. Yeah. The thing
I
-
love about the OwlBear is they've taken the
wrong,
-
the least scary aspects of a bear and an
-
owl, like if that was an owl with, you
-
know, if it had a bears head and wings,
-
that would be way more scary. Anyway.
-
It's just been bugging me for months. So.
-
PostGres. As we all know, it's MySQL for hipsters.
-
It's actually pretty good. So here's its character
reference
-
sheet. We, it's a relational database. It
has a
-
consistent model. So under conditions in network
partition, you
-
know, your, your slave is not in contact with
-
the master, it's, it's essentially unavailable.
That's the way
-
we treat it.
-
PostGres is actually really, really interesting
tic, because it
-
has a bunch of cool stuff hidden underneath
it.
-
So there's a thing called Hstore which is
a
-
key-value store that's baked right in. So
if you
-
need a lightweight key-value store and you're
already running
-
PostGres in production, you, you have one.
You don't
-
need to spin up any other thing. You can
-
actually do that today.
-
The really interesting thing about that is,
you can
-
index those keys. You can do joins across
an
-
Hstore reference into, across multiple tables.
It looks and
-
feels exactly like the kind of thing that
you're
-
already working with.
-
We've got, there's some things already baked
into the
-
Rails ecosystem that make this really easy
if you're
-
doing that kind of information. But the really
exciting
-
thing about what PostGres is up to at the
-
moment is JSON. And 9.2, 9.3, and upcoming
9.4
-
have pretty much a fully baked in JSON document
-
database. And it is crazy awesome. The new
one
-
is super high-performance. If you were sort
of, it's
-
the same thing. If you're thinking, ah, you
know,
-
documents would be easier for this use case,
let's
-
install something else, we're actually, you
already have one,
-
and it, it has all of those same properties.
-
You can index. You can do joins across your
-
normal table into the documents. It's crazy
cool.
-
MySQL. It's pretty much the same as PostGres,
is
-
my answer. But there's a slight caveat. So,
you
-
know, I, I recall, they're a company. Many
of
-
the same things apply. Like, this is why,
you
-
know, they're, they're kind of in the same
bucket.
-
For me, it doesn't particularly matter at
the end
-
of the day. Whatever you happen to have expertise
-
in, it's cool. It's got some kind of interesting
-
things that you can do. You can switch out
-
storage engines to actually get your different
performance profiles.
-
It is everywhere. It's got a thing called
Handler
-
Socket, which is essentially raw, right. Access
through a
-
low-level socket into the table infrastructure.
There's some paper
-
with really high performance kind of things.
-
You can actually just sort of bypass the whole
-
SQL engine, which is kind of interesting.
The other
-
thing that's happened since Oracle took over,
which is
-
kind of a really good thing, is that there's
-
some alternatives. So MariaDB is sort of the,
the
-
more open fork. There's a semi-commercial
addition that has
-
lots of really high-performance features,
and they basically run
-
binary compatible patches, that's Percona.
And they have, like,
-
huge expertise. And this Toku is quite interesting.
It's,
-
they're doing all of this crazy fractal indexing
and
-
things for particular use cases on very large
datasets.
-
But it still just looks and behaves in many
-
ways like the MySQL that you are kind of
-
used to.
-
So, there's some interesting things happening
there. So these,
-
hopefully none of that's a huge surprise.
That's databases.
-
You use it. It comes in the box, and
-
ActiveRecord talks to it.
-
So now we're gonna get slightly off the beaten
-
track. So, a lot of what we know SQL
-
comes from Dynamo, which was actually a paper
that
-
Amazon released years ago. I'm not gonna labor
too
-
much on this one. The paper's quite interesting.
It
-
talks about how you make a distributed system.
-
The interesting thing is actually that Riak
is essentially
-
an implementation of the underlying Dynamo
theory. So Riak
-
is crazy awesome. This is what happens to
you
-
when you run Riak in production.
-
[laughter]
-
I pretty much, like, it's a conversation I,
I
-
often have with people is like, wouldn't it
be
-
awesome to have a problem that needed Riak?
And
-
it was like, yeah, that would be so cool.
-
I'd be like the awesomeness engineer.
-
So Riak is, it's just crazy-well engineered.
They're doing
-
all sorts of interesting stuff. It's inherently,
it just
-
understands clustering. You know, you add
a new node,
-
it just, it's there. You know. With, with
those
-
older kind of databases, it's, it's a pain
in
-
the ass to actually get it working.
-
So, yeah, they're doing some really interesting
things. It's
-
got a cloud storage thing so you've got an
-
S3-compatible API and all of these kind of
stuff.
-
A lot of the magic of the way this
-
works is through consistent hashing. So, my
slides are
-
all mucked up. But anyway.
-
So, basically what it does is it just partitions
-
all of your data into a giant hash ring.
-
Excuse me. Physical nodes then just own parts
of
-
that hash. You add a new node or take
-
a node away and it repartitions all the rest
-
of the data across the remaining nodes. And
all
-
of that is just completely in the background
of
-
how Riak just works operationally.
-
So for large scale data and, you know, you,
-
you get away with, it has some really nice
-
operational characteristics that, that make
it quite cool to
-
manage.
-
And then the other thing is, it's a very
-
simple API. It's key-value store, you can
store JSON
-
documents in it, and it's just a bucket that
-
has keys, and then it's got other stuff on
-
top to retrieve data, do secondary indexes
and searching
-
and all of that kind of stuff.
-
So, it's a very cool piece of tech.
-
So, the other one we've got is, Google. Fucking
-
annoying. And you'll see why in a second.
So,
-
Google had this thing called BigTable that,
again, kind
-
of comes out of the internal research. You
have
-
access to it through some of their cloud properties.
-
As you can see, it's got, it's actually a
-
sparse distributed multidimensional sorted
map, which is good, I
-
guess. I imagine. It's awesome.
-
The stuff they're doing with this is crazy.
So
-
this is actually a, all, a couple years old
-
I think now. Some of these, some of the
-
information, so. Hundreds of petabytes of
data, you know,
-
ridiculous numbers of operations a second.
You do not
-
have any of these problems.
-
So, then they, they took this stuff, they
were
-
like, ah, we've got BigTable. You know, that
was,
-
that was fucking easy. Whatever. And so now
they've
-
got two other things. They've got one called
Spanner
-
and one called F-one, where they're basically
doing, you
-
know, proper, sort of relational looking data
across multiple
-
data centers and, you know, and. They're kind
of
-
really pushing the boundaries of some of that
CAP
-
stuff that's going on.
-
But all you need is a GPS in every
-
server, a couple of atomic clocks in each
data
-
center, and you, great. So, Google's basically
telling everyone
-
to, you know, just fuck off.
-
So, another one that I really, I really like,
-
and have used a long, a long time ago
-
in, in tech land, tech time, is Cassandra.
Cassandra
-
is a column-oriented database. Eventually
it's awesome. It's really
-
all about eventual consistency.
-
And you can see here, this is a man,
-
he eventually gets it right. So that's well
done
-
to him there. So Cassandra's a lot like that.
-
And, again, you know, the cool thing is, it's
-
a sparse distributor multi dimensional sorted
map. It, when
-
I was working with it, you, it was, you
-
had, you described your tables kind of thing
in
-
XML and hated yourself, and then every time
something
-
changed you rebooted the server and that took
awhile
-
and, yeah, the whole thing was really difficult.
-
What it basically does is it takes the availability
-
side of the question. Like, that's its world
model.
-
It has, again, a very simple clustering system.
New
-
nodes, add in, the data gets streamed out.
It
-
has a data model that is really complicated,
and
-
I, even though I've used it, it's really hard
-
to explain how it actually works.
-
So column databases basically kind of invert
the, the
-
whole table structure that you're used to
from the
-
relational world. And the advantage is that,
for some
-
types of data, and for some queries, it is
-
crazy blazing fast, cause you can just. Time
series
-
are always a good one, where you can just
-
have long streams of time series and it will
-
actually put that on disk or next to each
-
other and you can just pull it all out.
-
The cool thing in the new versions of Cassandra
-
is that they've abstracted all of that out,
and
-
you actually just get tables, so you can create
-
a table and give it a primary key, and
-
under the covers, it's setting up rows and
column
-
families and columns and all of, all of these
-
really abstract concepts, and they've completely
made some of
-
that go away. Which is really nice.
-
So you end up with something that looks a
-
lot like just SQL and, you know, a normal
-
table kind of structure. It's just clustering
out lots
-
of nodes. It's very tunable, so you can actually
-
set up, you know, it writes to a node
-
and you can say, actually write to five nodes
-
and that's a quorem and now we're cool. So
-
you can tune how much redundancy you have.
-
So that's kind of cool. That is a reminder.
-
That went cold really fast. Thank you.
-
So, the next one on our list is Memcache.
-
Memcache, there was, there was a talk earlier
in
-
the week that was describing using Memcache
and caching
-
and it, it had a very interesting observation,
which
-
was, it just works. He didn't even know what
-
version he was running in production, cause
neh. Doesn't
-
matter. That API has been stable for ages.
-
And I know, I know what you're saying. It's
-
not a database. It's a cache. Technically
true. But
-
it's interesting to think about, because the
moment you
-
add caching, even if you've been ignoring
the fact
-
that you had a distributed system before,
with caching
-
you now really have a distributed system.
You've got
-
data in one thing that may or may not
-
be fresh, and you've got data in your database
-
that, you know, you assume is up to date,
-
and now you've got a synchronization problem.
-
So, Memcache is actually really, you know,
it's, it's
-
just rock solid, old as the hills technology,
completely
-
simple. The API is everywhere. Lots of people
actually
-
have made their, you know, key-value store
they made
-
in the hacknight, which, you know, is a useful
-
hobby if you want to annoy everyone.
-
You have the, their API is actually the Memcached
-
API. It's got a handful of things. You can
-
set a key, you can replace one. It does
-
have something atomic operations so you can
increment and
-
decrement so that there is some flexibility
to actually
-
do a little bit of data storage in a,
-
in a more traditional sense.
-
It's actually a client-server model. Your,
your driver is
-
responsible for the clustering in a way, so
you
-
can have multiple Memcache nodes and the,
the hashing
-
algorithm determines which node, which node
a particular piece
-
of data is gonna be on.
-
That has the property of making it very, very
-
simple to use. And there's no cluster state.
There's
-
no coordination that nodes have. Like, a lot
of
-
the heavy lifting all of these other things
are
-
doing is about coordinating around all of
that information.
-
There's a whole bunch of awesome stuff just
baked
-
into Rails. So you can just easily cache into
-
Memcache, or your normal Rails fragment mutations.
All of
-
that kind of stuff.
-
And there's even some things we can, you can
-
actually put, push that into ActiveRecord
and have, have
-
caching at that level as well.
-
Redis is an interesting one for the, the Rails
-
community. Cause it's basically a queue, now.
Everyone seems
-
to be running Resq, Sidekiq, and, you know,
Redis
-
is, again, one of those just pieces of technology
-
that is beautifully engineered, incredibly
simple, incredibly robust. The
-
maintainers are just absolute, you know, scientists,
I guess.
-
Just a whole other level of crazy algorithm
stuff.
-
And they make blog posts and, you know, I'm
-
so stupid. I don't understand what you're
talking about.
-
It's really fast, it's slightly hard to distribute.
A
-
lot of that's in the pipeline with Redis.
It's
-
much more, it's much more simple to, to stick
-
it on one node and increase the RAM. It's
-
mu, more complicated then Memcache. It's essentially
just an
-
in-memory cache. It has a bunch of really
interesting
-
data structures, though. I think if you've
been confused
-
all week, now, which country I'm from, whether
I
-
say dayta or dahta, so now I just changed
-
them randomly.
-
So, you can, you have hashes you have lists,
-
you have strings. You've got all sorts of
other
-
interesting things. You can do optimistic
locking and have,
-
you know, a bunch of operations that are essentially
-
batched. You can do sort of, there's long
ways
-
of doing this kind of stuff. It's Resque and
-
Sidekiq both just make this, make it super
simple
-
to do background tasks with Rails and install
the
-
gem, have a worker, and it's all just magic.
-
It is Lua baked in, which is a whole
-
other thing. But Lua is a really cool programming
-
language that is designed for embeddability.
But one of
-
the things that happens if you can actually
write
-
little rule, Lua scripts that end up going
into
-
the Redis server to do more complex operations.
So,
-
in this case, this is a little script that
-
grabs something off a sorted hash and then
deletes
-
them and then returns the first thing, like,
then
-
returns what we had done. But it's, it's an
-
atomic kind of transactional way.
-
And, good news everybody! We've just invented
stored procedures.
-
So that's very exciting. Except now they're
much more
-
hip, because it's an in-memory database with
a language
-
no one's heard of. So. We are rocking it.
-
Also, maybe use a queue. Just, I know it's
-
crazy. But, if you're actually queuing, using
Redis as
-
your queue, maybe you have a queuing problem
and
-
you have queues. They exist. They're a thing.
It's
-
ridiculous. I know.
-
So, RabbitMQ is sort of the gold standard,
and
-
Kafka is another one that was talked about
earlier
-
this week, and it is crazy cool.
-
Where am I? Man. All right. Just gonna stretch.
-
I've lost count, so I don't know, now I'm
-
just gonna talk faster. Cool.
-
Neo4j is really interesting. It's a graph
database. That's.
-
It's slightly hard to explain. But you, the
way
-
I actually think about it, we'll just jump
straight
-
to here, is it's almost but not quite entirely
-
unlike a relational database. The difference,
essentially, is that
-
it is optimize for the connections rather
than aggregated
-
data. So relational database, you, puts things
in, in
-
a way where you can get a sum and
-
a count and like, that's kind of the heritage
-
of that kind of world view.
-
Whereas what the Neo4j people are doing is
actually
-
thinking about connections between pieces
of data, and for
-
some use cases, this is actually really, really
amazing
-
stuff. So you have, a graph is basically a
-
collection of nodes, and those nodes can have
relationships
-
between each other, and then a node just has
-
properties.
-
It's essentially an object database in a way.
It's
-
like very similar to the way that we think
-
about objects. So it has some really nice
properties
-
if you're working in a language like Ruby.
And
-
then it just does stuff that, you know, in
-
a really intuitive way. So if we've got a
-
graph of movies and actors, you actually define
a
-
relationship by name. Then an actor acts in
a
-
movie. And then when you were doing your queries,
-
this is a language called Cypher, you actually,
that's
-
a first-class thing.
-
Whereas in a relational world, you're, you're
using a
-
foreign key, which has no semantic meaning
at all.
-
You, you just have to remember that, you know,
-
an actor, you know, there's a table with an
-
actor id, and a movie id, and we're joining
-
across somewhere. Whereas Neo4j actually makes
those relationships first
-
class citizens. So if you've got problems
that are
-
graph problems, like social network friend
cloud stuff, some
-
of that stuff, Neo4j just makes trivially
easy in
-
a way that you would have had to do
-
a recursive self-join in PostGres and hate
your life
-
and, you know.
-
Couch is cool. I guess. Pretty much that's
my
-
opinion of it. It's really awesome. But, you
can't
-
query it. So cool.
-
That's it. That's a slight disservice to Couch
but,
-
you know, whatever. MongoDB, as we all know,
it
-
is webscale and that's excellent. If you think
of
-
it as Redis for JSON, that's good. Sixty percent
-
of the time, it works every time. Everyone's
familiar
-
with that.
-
So, the thing that's really, I mean, Mongo,
it
-
reminds me of My, MySQL. Like, Mongo is kind
-
of terrible, but MySQL was kind of terrible,
too.
-
Like, when that came out, it didn't do transactions,
-
for example, and I, I was working in enterprise-y
-
land, and transactions are actually a thing.
And, you're
-
like, you script kiddies with your database.
-
So Mongo feels like that, and not, you know,
-
what we learned is, if you make something
that's
-
awesome and useful and everywhere and ubiquitous
and it
-
doesn't work, you can make it work. And eventually,
-
you know, MySQL is a real database. So Mongo
-
feels a bit like that. It's come a massive
-
way, right about really early on with very
early
-
versions.
-
It stores JSON. Well sort of it. It stores
-
BSON, anyway. That's just binary JSON basically.
And it's
-
a, it's a really beautiful model to work with
-
in a development cycle, which is why think
is
-
why there's, why there's so much appeal. You've
just
-
got kind of, people treat it like an object
-
database. You've just got an object that's
in there,
-
and you can pull out objects and manipulate
them
-
and do all of this kind of crazy stuff.
-
The people who know what they're talking about,
though,
-
with distributed systems, if the reason you're
using Mongo
-
is because you think it's a panacea for all
-
of this, you know, we need to be webscale
-
and do all of this kind of stuff, that
-
is not a good reason to use it. Cause
-
there, there's still a lot of operational
problems and,
-
and stuff going on.
-
This, this one is interesting. It's essentially,
RethinkDB is
-
coming from the PostGres world view. Cause
PostGres made,
-
you know, MySQL was like, whatever, we'll
fix it.
-
PostGres was like, we'll do it right and it,
-
you can't use it cause it's so slow, but
-
at least it's correct. And they took lots
of
-
iterations to make it usable. So Rethink is
kind
-
of that school of thought. It's like, we're
gonna
-
make it all correct first, and then we'll
make
-
it usable. So it's very similar idea. JSON,
you
-
know, they're trying to make it operationally
great with
-
automatic clustering and all this kind of
stuff. You
-
know. Who knows what it is and how it's
-
actually gonna behave in the real world. It's
still
-
a very early piece of tech.
-
And that leads me into, there's a whole world
-
of databases around what I'm loosely calling
the commercial
-
fringe. So Couchbase is the Couch guys and
sort
-
of some commercial Memcached guys who got
together to
-
make a hybrid something. Aerospike is, their
marketing is
-
great. That's about the best you can say about
-
it.
-
So there's a whole bunch of people trying
to
-
solve these problems in interesting ways.
But all of
-
these ones cost money and, you know, they're,
the
-
mileage varies and all of that kind of stuff.
-
The cool thing about open sources ones is
you
-
get it and you try it and you hate
-
it and you go back to PostGres so it's
-
all fine.
-
So, Hyperdex. This is my favorite. Because
they have
-
HyperSpace Hashing, and it is so cool. These
guys
-
are making some really broad, amazing claims
about the,
-
the kind of things that they can do. Crazy
-
fast. It's, it's a key-value store but it
will
-
index, you know, it's not just a key but
-
it will index the properties of a value. So
-
now you can do que, you know, genuine queries
-
into the structure of objects that you're
storing.
-
They've got a whole bunch of papers around
what
-
they're doing. So, you can read that as, who
-
knows what it means. It maps objects to coordinates
-
in a multi-dimensioned Euclidean space. HyperSpace.
And I'm like.
-
Take my money!
-
And there's a, there's a picture of HyperSpace.
And,
-
like, I've read that like eight times. I don't
-
understand what's going on. But if, it does
seem
-
to be true. They're trying to solve some of
-
these problems and, you know, they call themselves
like
-
a second generation NoSQL thing, in a similar
way
-
to Google, you know, kind of taking all of
-
this stuff and trying to push the science
underneath
-
it forward.
-
So you can, you know, it's got a Ruby
-
client. You can use it now. It's got, just,
-
normal key-value. It's got atomic stuff. You
can do
-
conditional ports, so this is some code that's
basically
-
is only updating if the, only updating the
current
-
balance if the, updating the balance if the
current
-
balance is what we think it is. Otherwise
some
-
other thread has updated it.
-
So there's some really interesting stuff they
can do.
-
And they're guaranteeing those operations
across the cluster. And
-
it's also got a transactional engine as well,
so
-
that's really exciting.
-
Running out of time. HBase and Hadoop. You
don't
-
have any of these problems. Don't worry about
it.
-
You probably don't want to have any of these
-
problems. Cause this just ends up, you need
to
-
install every fucking thing the Apache foundation
has ever
-
made. And this isn't even the full list. This
-
is like, you probably need those.
-
I have a friend, he's a bit of a
-
dick, and he, he calls it, cause he, he
-
works in an actual big data organization,
and he
-
just, he goes, oh, you people with your small
-
to medium data. So, yeah, like, most of us,
-
we don't have big data in any sense of
-
the word, really. Like, if, if it's got GB
-
on the end of it, meh. You're not there
-
yet.
-
So, again, this is just you know, Facebook
is
-
using the hell out of this stuff, and they're
-
just like, this is all out of date. They're
-
like now just, they can't buy hard disks fast
-
enough. It's crazy. Yeah. There was a punch
line
-
at the end of all of that.
-
But my friend, the guy who I said was
-
a bit of a dick, he, he recommends having
-
a look at this. And this is his quote,
-
if you want to appear really cool and underground,
-
then I reckon the next big thing is the
-
Berkeley Data Analytics Stack. So, there's
a whole bunch
-
of people who are looking at that, you know,
-
crazy big data situation and trying to work
out
-
what that means and what the future is.
-
And so Apache and Berkeley are kind of in
-
a cold war for that at the moment. And
-
then there's heaps of people in the enterprise
space
-
because you can sell lots of products and
or
-
services to large companies who think they
have a
-
big data problem. So that's cool.
-
That's fine. This isn't, this is just a little
-
thing that's an embeddable document key-value
store that you
-
can, it's just kind of a fun team and
-
has an API that looks very similar to the
-
Mongo one. And it just sits in process.
-
Oh, ElasticSearch. Every time I use it, I
think,
-
why can you not be my database? It's awesome.
-
But it loses a couple of points there because
-
of its configurationability. It went, it works
when you
-
know how to make it works, and it's crazy
-
complicated sometimes.
-
So anyway. Thirty. Four minutes over technically,
I think.
-
Yeah. So that's good.
-
That's databases in a nutshell. I'm Toby Hede.
I'm
-
around the conference if you want to talk
about
-
databases. I think of myself as a lapa-, a
-
lap- a butterfly collector, I guess, is what
I'm
-
looking for, of databases.
-
Yeah. So come and say hi. Cool.