Ruby Conf 2013 - Fault Tolerant Data: Surviving the Zombie Apocalypse

Edit subtitles

0:17 - 0:21

CASEY ROSENTHAL: Hi. Hope everybody had a
good lunch.
0:21 - 0:28

I'd like to start by reading the title of
my presentation.
0:29 - 0:33

Fault Tolerant Data: Surviving the Zombie
Apocalypse.
0:33 - 0:38

I'm Casey Rosenthal. That's my Twitter handle.
If you're
0:38 - 0:39

interested in some of the stuff I talk about,
0:39 - 0:41

I figure the most efficient way for me to
0:41 - 0:43

get links to you is I'll, I'll Tweet the
0:43 - 0:47

links after the talk.
0:47 - 0:49

Very small bit of background on me - I
0:49 - 0:52

worked for a company called Basho. Our icons
look
0:52 - 0:56

like this because it's named after a Japanese
poet.
0:56 - 0:58

So that's me. I get to work with a
0:58 - 1:02

really good team of consultants. That's the
consulting team
1:02 - 1:05

I've had the opportunity to work with.
1:05 - 1:09

And we set up distributed databases for critical
data
1:09 - 1:16

for large enterprises. So recording things
like internet traffic,
1:16 - 1:19

transactions and things that lives depend
on, like medical
1:19 - 1:24

records and video game scores. Really important
things like
1:24 - 1:25

that.
1:25 - 1:28

And to give you a sense of where, what
1:28 - 1:34

my motivation is - I love working with data.
1:34 - 1:38

And I'm gonna make the, you can call databases
1:38 - 1:43

sequel versus no sequel, new sequel, no dbs,
there's
1:43 - 1:45

a bunch of different ways of chopping up the
1:45 - 1:49

database space right now. But from a software
engineer's
1:49 - 1:52

perspective, I'm gonna draw a primary distinction
between sequel
1:52 - 1:55

and other, OK.
1:55 - 2:00

So I work particularly with a distributed
key-value database.
2:00 - 2:02

Most of the data modeling techniques I'm gonna
talk
2:02 - 2:09

about are applicable to distributed key-value
and other kinds
2:09 - 2:12

of distributed databases. Not just the one
that I
2:12 - 2:17

happen to work with most, which is Reaq??
[00:02:15].
2:17 - 2:20

So the reason that I love this category is
2:20 - 2:24

other is because I'm gonna generalize, I'm
totally gonna
2:24 - 2:27

generalize about your experience. Most of
you have worked
2:27 - 2:30

with relational databases. I don't really
care if that's
2:30 - 2:32

true, I'm just gonna assume it.
2:32 - 2:35

So I also come from a background with sequel,
2:35 - 2:41

with relational databases, and over many,
many years working
2:41 - 2:44

in other languages, but also Ruby, I got to
2:44 - 2:47

the point where, OK, I kind of understood
what
2:47 - 2:50

an MBC framework looked like. I'm not gonna
come
2:50 - 2:53

across any new surprises there.
2:53 - 2:56

With the decades of institutional knowledge
we have about
2:56 - 3:00

sequel databases, nothing is really gonna
surprise me about
3:00 - 3:02

how to write, how to data- how to mile
3:02 - 3:05

data on a relational database.
3:05 - 3:08

And if you know your design patterns, and
you
3:08 - 3:11

probably have a couple that you use regularly,
it's
3:11 - 3:14

unlikely that you're gonna come across a design
pattern
3:14 - 3:16

that, like, completely blows your mind and
changes how
3:16 - 3:19

you do everything.
3:19 - 3:23

In the category of other databases, that happens
on
3:23 - 3:27

a regular basis. New databases come along
that completely
3:27 - 3:31

change, not only how you understand modeling
your data,
3:31 - 3:34

but the kinds of applications you're able
to build,
3:34 - 3:37

because they have different properties, they
can do new
3:37 - 3:40

things, they look at data in a different way
3:40 - 3:43

that you didn't have the capability of accessing
before.
3:43 - 3:45

So it's really cutting edge and it changes
a
3:45 - 3:50

lot and it's a good brain exercise.
3:50 - 3:52

So part of that is just the computology aspect.
3:52 - 3:55

I happen to like reading white papers, and
there's
3:55 - 3:58

a lot of great academic work going on right
3:58 - 4:01

now in the other database and the distributed
systems
4:01 - 4:03

sphere.
4:03 - 4:05

There are a lot of unsolved problems, still.
There
4:05 - 4:08

are cases where we don't know the implications
of
4:08 - 4:12

setting up certain kinds of distributed architectures.
And I'll
4:12 - 4:16

touch on a couple of them.
4:16 - 4:19

The other reason this field is exciting to
me
4:19 - 4:25

is because of this principle of high availability.
4:25 - 4:29

Yeah.
4:29 - 4:33

Zing.
4:33 - 4:39

So think for a moment about Google, right.
When
4:39 - 4:43

you go to search something, you don't necessarily
expect
4:43 - 4:48

that it has everything indexed up to the exact
4:48 - 4:51

second that you hit the search button, right.
We
4:51 - 4:54

allow a little bit of fuziness in the results.
4:54 - 4:55

In fact, we don't even expect that if you
4:55 - 4:59

search here, you're gonna get the same exact
results
4:59 - 5:01

at the same time as if somebody did the
5:01 - 5:04

same exact search in LA, right.
5:04 - 5:06

We allow for a little bit of oddness there.
5:06 - 5:09

What we do expect is that Google will show
5:09 - 5:12

up. If we get a 500 or a 400,
5:12 - 5:15

we'll probably start checking our internet
connection first, before
5:15 - 5:17

we think, Google is down, right.
5:17 - 5:20

That's high, that's a form of high availability.
To
5:20 - 5:23

give you another example - email. If you use
5:23 - 5:26

email on your phone, and your phone goes out
5:26 - 5:28

of range, you don't expect that you all of
5:28 - 5:29

the sudden won't be able to read your email
5:29 - 5:32

and not be able to write email. You expect
5:32 - 5:34

that when the internet connects again. Anything
you've written
5:34 - 5:37

gets sent out. You know, and so that's another
5:37 - 5:40

form of high availability, that the key ingredient
here
5:40 - 5:43

is there is not a global state that exists,
5:43 - 5:47

that is consistent, at any given point in
time.
5:47 - 5:52

Of course, health records. You don't expect,
when you
5:52 - 5:55

buy insurance or update your, your, or go
to
5:55 - 5:58

the doctor, that immediately that's gonna
be reflected on
5:58 - 6:01

your health record. But you do expect that
when
6:01 - 6:03

you go to the health insurance website, it
will
6:03 - 6:05

be up. And it will have, like, historical
data
6:05 - 6:08

available to you.
6:08 - 6:11

So that's really interesting, because that's
the expectation that
6:11 - 6:14

most of us have, most people familiar with
the
6:14 - 6:17

internet have, about using the internet. Is
that it's
6:17 - 6:21

highly available and not strongly consistent.
It doesn't have
6:21 - 6:23

one global state at any given point in time.
6:23 - 6:26

And there's kind of this tension between the
two.
6:26 - 6:29

Well if that's the expectation, the interesting
thing is
6:29 - 6:33

sequel was designed to be strongly consistent.
6:33 - 6:37

So the database that we use intrinsically
is not
6:37 - 6:40

the one that follows the paradigm that most
of
6:40 - 6:47

our expectations using online applications
follow. Other databases are
6:47 - 6:50

built with high availability in mind.
6:50 - 6:52

So they might not be strong consistent at
a
6:52 - 6:54

given point in time. They might be eventually
consistent.
6:54 - 6:59

There's allowance for data that hasn't propogated
to all
6:59 - 7:02

of the machines in the cluster to get around
7:02 - 7:05

to it.
7:05 - 7:09

So to really hammer home this distinction,
I'm gonna
7:09 - 7:14

focus on two different mindsets when we're
building applications.
7:14 - 7:18

This is your brain SQL. And nobody's rushing
the
7:18 - 7:20

screen, which is a good sign, so there's no
7:20 - 7:24

zombies in here.
7:24 - 7:26

When you're building - so, as an engineer,
you
7:26 - 7:28

get a use case and you're gonna build this
7:28 - 7:33

application on SQL. Again, I'm just gonna
generalize some
7:33 - 7:36

experience here. So you take the use case,
you
7:36 - 7:39

figure out what data's in there - say, addresses,
7:39 - 7:42

companies, users, whatever. And you break
those down into
7:42 - 7:46

tables. Figure out what the relationships
between those tables
7:46 - 7:49

are, and denormalize your data.
7:49 - 7:50

Do a lot of other things that, again, we
7:50 - 7:54

have decades of institutional knowledge, how
to structure tables
7:54 - 7:58

and rows and indexes in relational databases.
And if
7:58 - 8:02

we do that right, according to best practices,
if
8:02 - 8:05

we do that well, then we have a pretty
8:05 - 8:09

good level of confidense that when we go to
8:09 - 8:12

build an application on top of it, we'll be
8:12 - 8:14

able to get the data out in a way
8:14 - 8:16

that we want to present it to the client,
8:16 - 8:19

whether the client is another computer or
person or
8:19 - 8:20

whatever.
8:20 - 8:23

OK. Model your data, do that well, then show
8:23 - 8:26

it to the client.
8:26 - 8:31

By contrast, when you're building, when, an
application on
8:31 - 8:34

top of a key-value system, you have to have
8:34 - 8:37

a different mindset. And that mindset is,
is kind
8:37 - 8:39

of the reverse. You look at the use case
8:39 - 8:41

and you focus on how is this gonna look
8:41 - 8:42

to the client?
8:42 - 8:45

Again, whether the client is another machine,
or a
8:45 - 8:48

user or a terminal or whatever, how, how is
8:48 - 8:51

it gonna be presented at the end? If you
8:51 - 8:54

can figure that out, then modeling the data
kind
8:54 - 8:56

of falls out, OK.
8:56 - 8:58

Figure out how it's gonna be presented, and
then
8:58 - 9:00

with a high degree of confidense you know
you'll
9:00 - 9:03

be able to model the data in a K/V.
9:03 - 9:06

It's not that one is better than the other
9:06 - 9:10

in terms of data modeling. The difference
is that
9:10 - 9:17

SQL is more flexible, but harder to scale.
9:19 - 9:23

And that's not a principle, that's an observation.
So
9:23 - 9:27

I'm not saying that in principle SQL is harder
9:27 - 9:30

to scale. But I will make the observation,
that
9:30 - 9:33

the more sophisticated the query planner is
in your
9:33 - 9:37

database, the more difficult it is to scale
that
9:37 - 9:40

database in a way that's highly available
or fault
9:40 - 9:43

tolerant in particular.
9:43 - 9:46

OK. K/V, that's the simplest kind of database
you
9:46 - 9:48

could have, right. You give a database a key
9:48 - 9:50

and a value, it'll store the value. You give
9:50 - 9:53

it just the key, it gives back the value.
9:53 - 9:56

There's really no query planning going on
there, so
9:56 - 9:59

the design of the database can do other interesting
9:59 - 10:01

things, like focus on making it highly available
and
10:01 - 10:05

fault tolerant and scale horizontally.
10:05 - 10:07

So I want to put this in a perspective
10:07 - 10:11

that we can all relate to. So the situation
10:11 - 10:14

we've all had, well maybe you don't all take
10:14 - 10:17

the subway, but you know I get up, go
10:17 - 10:20

to work, get on the subway, and look over
10:20 - 10:23

and there's somebody else who's obviously
going to work,
10:23 - 10:29

but they look a little not right.
10:29 - 10:32

We've all had this experience. We see that
there's
10:32 - 10:36

a zombie on the subway and we know that,
10:36 - 10:39

you know, the apocalypse is upon us. So it's
10:39 - 10:44

a common theme in our careers. Zombies!
10:44 - 10:49

Right. So we've all had this experience, and
you
10:49 - 10:53

know, maybe the acute zombielepsy breaks out,
as it
10:53 - 10:55

does, and you know, maybe the zombies start
here
10:55 - 10:58

in Miami, and you got to admit, some of
10:58 - 11:03

those zombies can run fast. So they spread
and
11:03 - 11:07

pretty soon there's, let's say, a million
zombies around
11:07 - 11:08

the country.
11:08 - 11:13

So, again, as frequently happens to me, and
I'm
11:13 - 11:16

sure to all of you, the CDC comes and
11:16 - 11:19

says, hey, we need to get a handle on
11:19 - 11:21

this. We need to store all of this zombie
11:21 - 11:25

data. We need to do it quickly, so, you
11:25 - 11:29

know, we want an application built in Ruby,
because
11:29 - 11:32

developer time is of the essence. We want
this
11:32 - 11:34

to be agile. We don't know exactly what kind
11:34 - 11:36

of things we're gonna have to do with the
11:36 - 11:39

data.
11:39 - 11:41

And we have this database that we need to
11:41 - 11:46

store it in. OK. I'm sure anybody can do
11:46 - 11:47

this. This isn't too sophisticated.
11:47 - 11:49

What does the data look like? Well, here's
an
11:49 - 11:54

example of a value that's just in JSON. SO
11:54 - 11:58

we've got some DNA, DNA sample, of the zombie.
11:58 - 12:03

Their gender, their name, address, city, zip.
Weight, height,
12:03 - 12:05

latitude, longitude. I left some out - blood
type,
12:05 - 12:08

phone number, social security number, stuff
like that.
12:08 - 12:12

But the situation's a little bit more complicated.
The
12:12 - 12:16

CDC actually has databases all around the
country. And
12:16 - 12:19

they all need to store this data. And they're
12:19 - 12:22

kind of hooked up in a weird way because,
12:22 - 12:25

intertubes, mhmm.
12:25 - 12:28

So the data has to replicate between the data
12:28 - 12:33

centers like this, OK. This is, this is not
12:33 - 12:39

an uncommon situation with big data.
12:39 - 12:41

This can be done with the SQL database, but
12:41 - 12:44

it would kind of be a pain in the
12:44 - 12:47

ass so far, right, to set up the load
12:47 - 12:49

balancers and figure out the replication strategies.
12:49 - 12:53

Anyway, let's make it a little bit more interesting.
12:53 - 12:56

So what else could happen? Well, the CDC knows
12:56 - 12:59

that this could happen, right. The CDC is
not
12:59 - 13:03

yet immune to the acute zombielepsy. So there
could
13:03 - 13:07

be a scenario in which we lose data centers.
13:07 - 13:09

I know what you're thinking - this never happens,
13:09 - 13:16

right. The whole East coast never goes down.
13:16 - 13:18

So if that happens, then we lose those connections.
13:18 - 13:22

But, you know, if the human race is to
13:22 - 13:24

survive, we can't just ignore these guys out
here.
13:24 - 13:28

They have to be able to continue to accept
13:28 - 13:30

reads and writes.
13:30 - 13:33

And this is one of the stricter definitions
of
13:33 - 13:36

high availability, which is that if you can
connect
13:36 - 13:38

to any part of the system, any part of
13:38 - 13:44

the database, it will accept reads and writes.
OK,
13:44 - 13:47

it'll serve you whatever data it has access
to,
13:47 - 13:50

and if you write data, it will contain, it
13:50 - 13:54

will take it.
13:54 - 13:56

That's very difficult to do with the SQL database.
13:56 - 13:58

SQL databases just aren't designed to do that
sort
13:58 - 13:58

of thing.
13:58 - 14:01

But a lot of the other databases are.
14:01 - 14:07

OK, so this is, this is getting kind of
14:07 - 14:10

interesting. So let's take another, a closer
look at
14:10 - 14:17

that database. Well, it's big.
14:19 - 14:23

We're storing DNA, right, so that's about,
your genome,
14:23 - 14:25

well, I don't know about yours. Everybody
else's genome
14:25 - 14:32

is about 1.5 gigabytes per person. So 1.5
gigs,
14:32 - 14:33

that's getting to be a large database. It
won't
14:33 - 14:35

fit on one server.
14:35 - 14:37

So we're gonna have to store it on several
14:37 - 14:39

servers.
14:39 - 14:43

Again, other databases are designed to do
this. They
14:43 - 14:47

will automatically do one of two things. They'll
either
14:47 - 14:51

have a logical router that knows by looking
at
14:51 - 14:54

the key, where the data's supposed to be stored,
14:54 - 14:56

or they have some sort of meta data server
14:56 - 14:58

that keeps track of where all of the object
14:58 - 15:00

and the database are stored. Those are the
two
15:00 - 15:05

major paradigms for how a distributed database
stores data.
15:05 - 15:09

And this also has to be fault tolerant.
15:09 - 15:13

Let me just put a definition on that phrase,
15:13 - 15:17

fault tolerant. That's the optimistic view
that stuff is
15:17 - 15:21

gonna happen, that bad stuff is gonna happen.
Optimistically,
15:21 - 15:24

if you have a fault tolerant system, you're
expecting
15:24 - 15:28

bad things to happen. So in this case, server's
15:28 - 15:29

gonna die.
15:29 - 15:32

It's gonna catch on fire. The barings in the
15:32 - 15:34

harddrive are gonna dip, the rate is gonna
dive
15:34 - 15:36

- something, something's gonna happen and
a server's gonna
15:36 - 15:37

die.
15:37 - 15:40

And in a fault tolerant system, that's OK.
It
15:40 - 15:46

continues to run. In some cases worse, a cable
15:46 - 15:49

comes unplugged. You know, zombies get into
the data
15:49 - 15:51

center, chasing the ops guys and trip over
a
15:51 - 15:53

cable, right. And so now we've got part of
15:53 - 15:55

the cluster is connected and another part
of the
15:55 - 15:58

cluster is not connected, OK.
15:58 - 16:02

Again, I'll leave time for questions - we
can
16:02 - 16:05

talk more about how databases have different
strategies of
16:05 - 16:09

dealing with fault tolerance and replication
and anti-entropy -
16:09 - 16:16

entropy being when data sets get out of sync.
16:17 - 16:20

But let's continue talking about the requirements.
OK.
16:20 - 16:24

So we're gonna store this as key/value data
model,
16:24 - 16:28

thank you mister, or CDC person. So this is
16:28 - 16:31

the value - we need a key.
16:31 - 16:35

Well, fortunately they have a system for establishing
a
16:35 - 16:39

UUID for a key, so in this case it'll
16:39 - 16:42

be patient0. When we want to look up this
16:42 - 16:46

data, we give the system patient0, and it
gives
16:46 - 16:48

us back the data that we need to, to
16:48 - 16:50

do research on.
16:50 - 16:53

And CDC person says, oh, and also I want
16:53 - 16:55

to sometimes look it up by zip code. I
16:55 - 16:58

want all of the zombies in a given zip
16:58 - 16:59

code.
16:59 - 17:04

OK. A strict key value doesn't have a second
17:04 - 17:06

way of looking things up. So here's where
we
17:06 - 17:11

have to start consider, considering data modeling.
We can
17:11 - 17:14

always look this record up by the key patient0,
17:14 - 17:16

that makes sense. How do we look it up
17:16 - 17:19

by 33436, that zip code?
17:19 - 17:21

Well, if we've got the data on a bunch
17:21 - 17:26

of servers, right, the zombie data comes in,
we
17:26 - 17:27

store it on one machine - in this case,
17:27 - 17:31

the upper right machine. And the system knows
that
17:31 - 17:36

patient0 key points to that machine. So that's
how
17:36 - 17:41

it knows to get that data.
17:41 - 17:45

But say we have these three zombies in that
17:45 - 17:47

zip code. We don't have a way of asking
17:47 - 17:52

key value system how to find those three zombies
17:52 - 17:54

who are in the zip code.
17:54 - 17:58

The solution is to create an inverted index.
This
17:58 - 18:02

is an inverted index because a field from
the
18:02 - 18:06

data in the value of the object is pointing
18:06 - 18:10

back to the objects that contain it. So that's
18:10 - 18:12

the inversion.
18:12 - 18:14

And the index is really simple, in this case,
18:14 - 18:16

we'll just say, hey, and object with a key
18:16 - 18:21

zip underscore 33436 contains the objects
patient0, patient45, and
18:21 - 18:25

patient3924.
18:25 - 18:29

We're getting more zombies here.
18:29 - 18:31

OK.
18:31 - 18:36

So how do we store that index?
18:36 - 18:39

We know that this is zip object should point
18:39 - 18:44

to those three zombie values. So if we represent
18:44 - 18:47

that index as this green star thing, where
do
18:47 - 18:51

we put it? We have two options.
18:51 - 18:54

One is, when the zombie object comes into
the
18:54 - 18:58

system, we save the index on the same machine
18:58 - 19:02

with that document. This is called document
based inverted
19:02 - 19:05

indexing, because we're partitioning the index
with the document,
19:05 - 19:08

the object that we're indexing.
19:08 - 19:13

So as time goes on, we now have an
19:13 - 19:19

index for zip 33, 33436, on the upper right
19:19 - 19:21

machine, and on this lower left machine, because
those
19:21 - 19:24

all have zombies in that zip code.
19:24 - 19:27

OK, let's think about the performance implications
here. We
19:27 - 19:33

save an object, the system locally indexes
it, and
19:33 - 19:39

saves that index locally. Super efficient
for a write,
19:39 - 19:40

right.
19:40 - 19:43

Save the object, index it yourself. OK, that's
easily
19:43 - 19:45

done. Now how do we read it?
19:45 - 19:47

Well we have to go to each of the
19:47 - 19:50

machines and say, you know, top one - do
19:50 - 19:52

you have anybody in that zip? Nope. Second
one
19:52 - 19:54

- yup, I've got these two guys. OK.
19:54 - 19:58

Nope. Yup, one. Nope. And then we put together
19:58 - 20:00

the results in one answer. That was a really
20:00 - 20:05

inefficient read, right. But that's, but it
was inefficient
20:05 - 20:08

right. SO that's one way to do it.
20:08 - 20:10

Another way to do it is a zombie comes
20:10 - 20:14

in, we index them before we save them, and
20:14 - 20:17

then we save the index someplace else, a specific
20:17 - 20:20

place in the cluster. OK, this is called term
20:20 - 20:23

based inverted index. So it's a different
partitioning scheme
20:23 - 20:25

for the index. We're partitioning by the term
that
20:25 - 20:28

we're searching on.
20:28 - 20:31

More zombies come in. And it's always updating
that
20:31 - 20:34

same object, which is someplace else in the
cluster.
20:34 - 20:37

So let's think about the performance profile
of this.
20:37 - 20:40

For every zombie value that we write to the
20:40 - 20:43

database, we have to fetch the index from
some
20:43 - 20:45

place else, add to it, and write to it
20:45 - 20:49

and save it back. So that's an inefficient
write
20:49 - 20:51

pattern, but now look at the read. When we
20:51 - 20:53

want to know who's in that zip, we only
20:53 - 20:59

have to read from one machine.
20:59 - 21:02

So these are the two. We got document-based
inverted
21:02 - 21:06

index and term-based inverted index. These
are the two
21:06 - 21:09

paradigms for inverted indexes that we've
considered.
21:09 - 21:14

Again, document-based: fast read - fast write,
inefficient read.
21:14 - 21:16

And term-based inverted index has a fast read
but
21:16 - 21:18

an inefficient write.
21:18 - 21:21

The point being that, when we look at the
21:21 - 21:26

use case, that's what should determine how
we model
21:26 - 21:28

the data.
21:28 - 21:32

We have to understand from the CDC person,
is
21:32 - 21:35

this data gonna be written a lot, this index?
21:35 - 21:36

Or is it gonna be read a lot?
21:36 - 21:41

OK. It's a much different way of looking at
21:41 - 21:44

the problem than using a relational database,
where we
21:44 - 21:49

just would have indexed that as a separate
thing,
21:49 - 21:52

as a secondary index in a relational database.
21:52 - 21:56

So, consider- it's an important distinction
in this kind
21:56 - 21:59

of thing, and if you like charts, I'll link
21:59 - 22:03

to some charts later that show some comparisons
that
22:03 - 22:07

we did between different distributed databases
and the way
22:07 - 22:13

that, or different partitioning schemes in
a distributed database.
22:13 - 22:16

So back to zombies.
22:16 - 22:18

Because these guys haven't stopped eating
brains yet.
22:18 - 22:22

All right, so in this scenario, where we've
got
22:22 - 22:26

two data centers down, three are still up,
two
22:26 - 22:33

are connected. Consider the situation where
data comes in
22:33 - 22:40

to the top data center, writing to record
patient0.
22:40 - 22:42

And somebody down in Southern California also
writes data
22:42 - 22:49

to patient0.
22:49 - 22:53

How would you handle this in a SQL database?
22:53 - 22:57

Let's make the problem worse. Then because,
you know,
22:57 - 23:00

the crisis is happening, somebody runs along
cable between
23:00 - 23:02

those two data centers, and connects them
and so
23:02 - 23:04

now they have to replicate their data with
each
23:04 - 23:09

other.
23:09 - 23:11

How do we reconcile these two different versions
of
23:11 - 23:13

patient0?
23:13 - 23:17

First, first attempt might be, OK, let's take
the
23:17 - 23:21

one with the last time stamp on it. Two
23:21 - 23:24

problems with that. One, obviously, you might
lose data.
23:24 - 23:28

Two, time stamps in a distributed system are
entirely
23:28 - 23:31

unreliable, OK.
23:31 - 23:33

If you want to sync your clocks in a
23:33 - 23:36

distributed system, that's a particular kind
of pain that
23:36 - 23:40

I just wouldn't want to get into.
23:40 - 23:46

So in the simplest case, we have a system
23:46 - 23:49

that has what are called siblings for that
particular
23:49 - 23:56

key/value object. And basically the system
would get two
23:56 - 23:58

versions of data and just say, oh, I don't
23:58 - 24:00

know which is which, so I'm just gonna store
24:00 - 24:02

both. And then when you go to read it,
24:02 - 24:04

it says, well I've got this value and this
24:04 - 24:09

value. These siblings. Here.
24:09 - 24:11

And at the application level, we can do a
24:11 - 24:13

couple things with that, right. In the simplest
case,
24:13 - 24:16

none of the data overlaps. So we can just
24:16 - 24:18

combine them, right.
24:18 - 24:22

None of that data overlaps. None of it overwrote
24:22 - 24:23

it, so we just combined them.
24:23 - 24:25

What if that's not the case? Well then we
24:25 - 24:26

can do a couple things. We can write our
24:26 - 24:30

own policy. We could just present both versions
to
24:30 - 24:32

CDC person on screen, and say I don't know
24:32 - 24:35

- you pick, right.
24:35 - 24:37

This is a problem that I didn't have to
24:37 - 24:40

think about as an engineer with the SQL database,
24:40 - 24:41

because it was impossible to have siblings
in a
24:41 - 24:44

SQL database.
24:44 - 24:50

But with this highly available, huge fault
tolerant database,
24:50 - 24:51

this kind of stuff has to be considered.
24:51 - 24:56

There's a whole field of research into what
are
24:56 - 25:00

called CRDT, commative or convergent replicated
data types, that
25:00 - 25:06

specifically analyzes the, the, specifies
the kinds of data
25:06 - 25:12

structures that you can build that you can
automatically
25:12 - 25:17

converge into one value without human intervention,
right, without
25:17 - 25:21

any side effects or conflicts.
25:21 - 25:24

And that's an on-going field of study. We
don't,
25:24 - 25:26

we haven't, like, numerated all of the possible
data
25:26 - 25:29

types that we can do that for. I'll give
25:29 - 25:34

you an example of one simple one.
25:34 - 25:40

Think of an array that has unique values and
25:40 - 25:44

only grows. It only gets bigger. YOu never
remove
25:44 - 25:46

a value from it. This is kind of the
25:46 - 25:50

simplest case for CRDT. So say up in the
25:50 - 25:55

north west, we had somebody write this object
with
25:55 - 26:00

patient0, 45, 3924, and in the southwest,
we had
26:00 - 26:03

somebody write this object that the zip index
with
26:03 - 26:09

patient73 and 9217.
26:09 - 26:11

If we want to combine these two, since we
26:11 - 26:14

know none of those objects can ever be taken
26:14 - 26:17

out, we can simply add them all together into
26:17 - 26:21

the list and call unique on it, right. Problem
26:21 - 26:23

solved.
26:23 - 26:26

What if you were gonna allow items to be
26:26 - 26:31

removed. Problem gets a lot more difficult.
26:31 - 26:34

And that's a whole other area of research
that,
26:34 - 26:36

again, is very interesting. Other topics - I
want
26:36 - 26:39

to take some questions, so I won't get into
26:39 - 26:42

this too much, but. Other research topics
with distributed
26:42 - 26:49

systems: GeoHashes, right. If you want to
store longitutde
26:51 - 26:54

and latitude for an object, and then you want
26:54 - 26:56

to know, hey, find me all of the things
26:56 - 27:01

that are within a mile, if that's on one
27:01 - 27:05

computer, we have algorithms that do that.
27:05 - 27:07

What if you're trying to contain that data
set
27:07 - 27:10

on many computers? That becomes a much more
difficult
27:10 - 27:13

problem, because you can't just ask one computer,
give
27:13 - 27:16

me all the thigns within a mile, because some
27:16 - 27:18

of those things might be on another computer.
So
27:18 - 27:21

what do you do, ask all of the computers?
27:21 - 27:24

That's an inefficient read. A lot of interesting
research
27:24 - 27:28

going on there, and acid transactions. The
'I' is
27:28 - 27:31

lowercase on purpose.
27:31 - 27:35

So in a highly available system, you know
a
27:35 - 27:41

couple years ago people - I was asked, can
27:41 - 27:44

you have acid compliant transactions on top
of a
27:44 - 27:47

highly available system, and the general consensus
was no
27:47 - 27:51

- as little as two years ago.
27:51 - 27:54

Now, we know that that's not the case. It
27:54 - 27:57

turns out that the 'i' in ACiD actually means
27:57 - 28:01

a lot of things. And in a sequel database,
28:01 - 28:06

you probably think it means that one transaction
starts,
28:06 - 28:10

does stuff, stops, and then another one starts,
does
28:10 - 28:13

stuff, stops.
28:13 - 28:16

Most SQL databases that we use here - again,
28:16 - 28:20

I'm just making a huge generalization - don't
actually
28:20 - 28:25

work that way. Or they rely on serialization
at
28:25 - 28:27

the resource level to give you that kind of
28:27 - 28:31

result. But they have other levels of protection
called,
28:31 - 28:35

you know, repeatable reads, or read committed,
where, when
28:35 - 28:40

one transaction starts, maybe another one
starts, too, and
28:40 - 28:43

does some work before the first one ends.
28:43 - 28:46

OK, by default, a lot of the databases, the
28:46 - 28:50

SQL databases that we use do that. That kind
28:50 - 28:53

of transaction, you can build on top of a
28:53 - 28:57

highly available system. And you can model
your data
28:57 - 28:59

in a way to give you those properties if
28:59 - 29:05

you require them.
29:05 - 29:11

So to sum up, keep calm. Always bring a
29:11 - 29:12

towel. The fate of the human race depends
on
29:12 - 29:19

you. Distributed data, distributed databases
like this, highly available
29:20 - 29:24

databases, can help you survive the next zombie
apocalypse.
29:24 - 29:26

We've all been through them before. There's
no reason
29:26 - 29:29

for us to expect we won't go through more
29:29 - 29:31

in the future.
29:31 - 29:36

And, I will, again, Tweet links to this, or
29:36 - 29:37

you can just copy it - it's zombies dot
29:37 - 29:40

samples dot basher dot com.
29:40 - 29:44

We have a, an application that uses the data
29:44 - 29:47

modeling that I talked about online. It has
a
29:47 - 29:50

maps that you can see where the zombies in
29:50 - 29:54

the area are, and it allows you to search
29:54 - 29:57

using those two different indexing methods
that I mentioned:
29:57 - 30:02

term based inverted indexing, and document
based inverted indexing
30:02 - 30:05

for the zombies in a given zip code.
30:05 - 30:10

You can also search for them by GeoHash. There
30:10 - 30:16

is a blog post describing how that's done.
And
30:16 - 30:19

the code is available in Ruby, and a little
30:19 - 30:24

bit of JavaScript. OK.
30:24 - 30:27

And that is my presentation. So I would like
30:27 - 30:30

to thank four of my colleagues - Drew Kerrigan,
30:30 - 30:34

Dan Kerrigan, for writing the application
that I spoke
30:34 - 30:36

about. Nathan Aschbacher for some of the material
and
30:36 - 30:41

John Newman for the artwork. And I will include
30:41 - 30:45

links to all the things that I just referenced
30:45 - 30:49

there in Twitter, shortly.
30:49 - 30:53

So my time is up. Find me in the
30:53 - 30:54

halls. And good luck.

Title:: Ruby Conf 2013 - Fault Tolerant Data: Surviving the Zombie Apocalypse
Description:: more » « less
Duration:: 31:24

Amara Bot edited English subtitles for Ruby Conf 2013 - Fault Tolerant Data: Surviving the Zombie Apocalypse

English subtitles

Revisions

Revision 1 Imported

Amara Bot

Ruby Conf 2013 - Fault Tolerant Data: Surviving the Zombie Apocalypse

Revisions

Our website uses cookies

Operating cookies (Required)