RailsConf 2014 - Biggish Data With Rails and Postgresql by Starr Horne

Edit subtitles

0:17 - 0:19

STARR HORNE: So thank you guys for coming.
0:19 - 0:21

This is gonna be an awesome talk.
0:21 - 0:23

Quite possibly the best talk you'll ever hear
0:23 - 0:26

in your life. What? I actually have
0:26 - 0:28

a built-in escape hatch here,
0:28 - 0:30

so if things start going wrong,
0:30 - 0:33

I might just bail. Yeah.
0:33 - 0:40

So, anyway. A couple weeks ago, I was, well,
0:40 - 0:44

I was sort of having a panic attack about
0:44 - 0:48

this presentation, because I'm a programmer,
and my natural
0:48 - 0:51

habitat is in some sort of dark place looking
0:51 - 0:54

a screen. It's not, it's not talking to a
0:54 - 0:57

bunch of people with, like, lights. What's
up with
0:57 - 1:00

this?
1:00 - 1:01

So I, I was in the middle of this
1:01 - 1:03

panic attack and I went out on a walk,
1:03 - 1:05

cause that always clears my mind. I was listening
1:05 - 1:08

to a podcast. My favorite podcast. It's called
Hardcore
1:08 - 1:13

History. I highly recommend it. Yeah. Oh.
Somebody likes
1:13 - 1:17

it. Yeah!
1:17 - 1:20

So this, this particular episode about Hardcore,
in, of
1:20 - 1:24

Hardcore History was about the Punic Wars.
And in
1:24 - 1:27

case you haven't heard about the Punic Wars,
don't
1:27 - 1:28

know the story, I'll just give you, like,
the
1:28 - 1:32

brief rundown. Basically, a long, long time
ago, there
1:32 - 1:35

was this place called Rome. And they, they
decided
1:35 - 1:38

that they should take over the world. And,
for
1:38 - 1:41

the most part, they did. But there was this
1:41 - 1:43

one country that was a thorn in their side
1:43 - 1:45

for a long time called Carthidge. And they
fought
1:45 - 1:47

all these wars back and forth. It was kind
1:47 - 1:49

of a stale mate.
1:49 - 1:51

And then one day, when, I don't know, maybe
1:51 - 1:54

he was making a sacrifice to the fire god
1:54 - 1:57

or whatever, this guy named Hannibal had this
great
1:57 - 2:00

idea. He was like, I'm gonna, I'm gonna lead
2:00 - 2:03

this sneak-attack on Rome, and I'm gonna do
so
2:03 - 2:07

by marching my army through the Alps. Which
is,
2:07 - 2:11

is pretty cool. I mean, that, that's pretty
bad
2:11 - 2:13

ass. But I, I think the most awesome part
2:13 - 2:16

of the story, for me, at least, is that
2:16 - 2:19

the dude had war elephants. So, I, I don't
2:19 - 2:21

know if you can see it in the slide,
2:21 - 2:24

but there's actually a war elephant. It's,
it's kind
2:24 - 2:27

of, I don't know, it's kind of under the
2:27 - 2:31

G. And a lot of people have doubted this
2:31 - 2:35

story, you know, over the years. And so, in
2:35 - 2:39

1959, there was a British engineer who, on
a
2:39 - 2:44

bar bet, borrowed a circus elephant named
Jumbo and
2:44 - 2:47

marched it across the Alps, too.
2:47 - 2:50

So, I don't know what my point is here,
2:50 - 2:55

really. I guess don't underestimate elephants,
2:55 - 2:56

because they don't
2:56 - 3:00

like it and they have long memories.
3:00 - 3:07

So, OK. This talk is really about biggish
data.
3:08 - 3:10

So what, what the hell is biggish data. It's
3:10 - 3:13

not, what is? Whoa, I've got one of these
3:13 - 3:18

things. It's not big data. It's not about,
you
3:18 - 3:21

know, this talk isn't gonna be about Hadoop
clusters
3:21 - 3:25

and server farms and all that stuff. That's,
that's
3:25 - 3:27

way over my head.
3:27 - 3:31

It's not about fancy architectures. I'm not
gonna show
3:31 - 3:36

you how to make multi-right PostGres clusters
that do
3:36 - 3:39

automatic sharding and stuff. I mean, that's
all like
3:39 - 3:41

wicked cool, but it's not what we're gonna
talk
3:41 - 3:41

about.
3:41 - 3:44

Instead, I'm gonna talk about something that
I think
3:44 - 3:47

is, is more practical, and it's kind of more
3:47 - 3:49

interesting to me, and that is, how do you
3:49 - 3:52

keep your app working as your production dataset
grows?
3:52 - 3:59

And grows into biggish data territory?
3:59 - 4:01

And this is very easy. This is very easy
4:01 - 4:04

to happen. Even if you don't have a firehose
4:04 - 4:06

of data coming in. If you run a, a
4:06 - 4:10

popular e-commerce site or just a site with
a
4:10 - 4:12

lot of users, over the course of years you
4:12 - 4:15

can accumulate a ton of data. And as this
4:15 - 4:19

data accumulates, you find that your site
performance goes
4:19 - 4:21

downhill sort of gradually in ways that you
don't
4:21 - 4:24

understand.
4:24 - 4:29

So. What is biggish data? This talk is based
4:29 - 4:32

on my experience at Honeybadger. In case you
haven't
4:32 - 4:36

heard it, of us, we are an exception performance
4:36 - 4:38

and uptime monitoring company.
4:38 - 4:40

That means that we essentially
4:40 - 4:42

have a firehose of new data coming in every
4:42 - 4:46

day. Right now we have about a terabyte of
4:46 - 4:50

errors in our database. The world has a lot
4:50 - 4:52

of errors. You guys need to start, like, doing
4:52 - 4:54

a better job.
4:54 - 4:58

And, and we get about, we get two gigabytes
4:58 - 5:01

of new errors per day, roughly. And all this
5:01 - 5:07

goes into a plain vanilla PostGres database,
and it's
5:07 - 5:09

served, that backs a pretty plain
5:09 - 5:12

vanilla Rails application.
5:12 - 5:15

Well, the good news is that PostGres can handle
5:15 - 5:20

this pretty easily. PostGres has got it covered.
5:20 - 5:24

The bad news is that, unless you've engineered
your
5:24 - 5:28

app to deal with this level of data, you're
5:28 - 5:31

kind of screwed. And the reason is that a
5:31 - 5:34

hundred megabyte database behaves
5:34 - 5:36

fundamentally different than a one
5:36 - 5:42

terabyte database. And a lot of the conventions,
a
5:42 - 5:43

lot of the normal things we do in Rails
5:43 - 5:46

apps just stop working when you have this
much
5:46 - 5:50

data.
5:50 - 5:56

Yup. This is actually a live feed. This queries
5:56 - 6:03

been running since this morning.
6:04 - 6:07

Something I learned, pretty much every pagination
system breaks
6:07 - 6:13

on page, like, 2000. Even if you, you're like,
6:13 - 6:16

dear God, make this stop, I just want to
6:16 - 6:19

delete half my data. You're, you're just still
screwed,
6:19 - 6:22

because deleting data takes a lot of time.
6:22 - 6:26

Now, I'm gonna explain all of this. I'm gonna
6:26 - 6:28

explain why this happens, how you can work
around
6:28 - 6:31

it, and how you can optimize the queries in
6:31 - 6:35

your own database and optimize your stack.
But, to
6:35 - 6:37

really understand this, we've got to take
a little
6:37 - 6:44

trip back in time, to the summer of 1978.
6:45 - 6:49

The summer of disco, free love, and the VT-100
6:49 - 6:54

terminal. The VT-100 was the first sort of
computer-shaped
6:54 - 6:56

thing that you could buy from an actual company
6:56 - 6:59

and sit on your desk to prove that you
6:59 - 7:02

were the alpha nerd of the office. It was
7:02 - 7:05

also the time that Oracle 1 point 0 was
7:05 - 7:09

being developed, which was one of the first
databases
7:09 - 7:12

that used the SQL language. It's pretty cool.
It
7:12 - 7:15

was written in PDP-11 assembly language.
7:15 - 7:19

And, and in case you're like too young to
7:19 - 7:23

have ever seen a PDP-11, this is what a
7:23 - 7:30

PDP-11 looks like. Yeah. Give it, show some
love
7:30 - 7:33

for the PDP-11.
7:33 - 7:39

Now, this is pretty awesome. Nowadays, in
our, like,
7:39 - 7:43

modern day times, right this second, marketers
down in
7:43 - 7:46

the, the presentat- what, what do they call
that?
7:46 - 7:51

The exhibit hall. Are, right now, throwing
around all
7:51 - 7:54

sorts of words about cloud computing and platform
as
7:54 - 7:58

a service and, and I don't know, some, some
7:58 - 8:01

stuff I haven't even heard of, probably. And
back
8:01 - 8:03

then it was the same, but the, the buzz
8:03 - 8:05

words were different. The buzz words on everybody's,
the
8:05 - 8:10

buzz word on everybody's lips, at that time,
was
8:10 - 8:12

real time computing.
8:12 - 8:18

And I, I'm using air quotes around real time,
8:18 - 8:22

because that meant that you could have your,
your
8:22 - 8:26

report generated in five minutes instead of
five days.
8:26 - 8:29

Which I'm sure was pretty awesome. Like, I
would
8:29 - 8:33

have been really happy about that. But, you
know,
8:33 - 8:34

looking at this, it doesn't really seem like
the
8:34 - 8:35

sort of thing I would want to build a
8:35 - 8:38

web app on top of, right?
8:38 - 8:44

Like, I. Yeah. I, I wouldn't stick Honeybadger
on
8:44 - 8:48

that. But, a funny thing happened in the past
8:48 - 8:53

thirty, thirty-six years. A funny thing happened
is that
8:53 - 8:57

Moore's Law made this into a web stack. Now,
8:57 - 8:59

I know that everybody here knows sort of,
vaguely,
8:59 - 9:01

what Moore's Law is. But I thought it would
9:01 - 9:06

be good to go over the technical, precise
definition
9:06 - 9:12

of Moore's Law.
9:12 - 9:18

And that is that, whoa. That is that computers
9:18 - 9:20

get more awesome as time goes forward. Did
that
9:20 - 9:23

come back normally? Yeah. As time goes forward.
And
9:23 - 9:28

awesomeness is generally defined as, well,
technically, being able
9:28 - 9:31

to do more stuff faster. Being able to process
9:31 - 9:38

more data faster. So, based on this, I would
9:38 - 9:41

like to postulate, for the first time in public
9:41 - 9:43

- you guys are very lucky to be able
9:43 - 9:45

to hear this - the first time in public,
9:45 - 9:49

Starr's Corelary to Moore's Law, which is,
as database
9:49 - 9:56

growth outpaces Moore's Law, you literally
travel back in
9:57 - 10:00

time.
10:00 - 10:02

That was the Ted moment of this, of this
10:02 - 10:05

talk. It's like, I can hear you guys, your
10:05 - 10:10

minds being blown. It sounds like bubble wrap
popping.
10:10 - 10:17

So, based on my, my extensive research of
time-travel
10:17 - 10:21

movies, the, the first rule for surviving
your new
10:21 - 10:23

time period is that you need to understand
the
10:23 - 10:27

methods and the motivations of the people
in that
10:27 - 10:29

time period. We need to start thinking like
our
10:29 - 10:34

ancestors. And our ancestors were very interested
in this
10:34 - 10:38

thing called hard ware.
10:38 - 10:42

Well, I, I don't mean to be flip, but
10:42 - 10:46

I just want to bring this up first, because
10:46 - 10:49

if you happen to be having database scaling
issues,
10:49 - 10:52

like, if your app right now is getting slower
10:52 - 10:54

because the amount of data in your database
is
10:54 - 10:58

getting bigger, and your app happens to live
anywhere
10:58 - 11:04

named EC-2, Linode, I don't know, whatever,
whatever other
11:04 - 11:08

companies offer that, you can probably solve
your scaling
11:08 - 11:11

issue just, like, right now just by going
and
11:11 - 11:13

buying a real damn computer.
11:13 - 11:20

Yeah. You'll be shocked and amazed. Because
the two
11:20 - 11:24

things that databases need to perform well,
to work
11:24 - 11:26

at high performance with lots of data, is
they
11:26 - 11:29

need lots of ram, and they need really fast
11:29 - 11:32

disc io. And virtual servers give you neither
of
11:32 - 11:35

those things. So, just go buy a real damn
11:35 - 11:39

computer. Yeah. And, and while you're at it,
throw
11:39 - 11:43

in a bunch of discs.
11:43 - 11:45

I'm not talking about a radar ray. You probably
11:45 - 11:48

want each of your, you probably want each
of
11:48 - 11:53

your operating system, your PostGres data,
your PostGres log
11:53 - 11:55

file, you probably want all that stuff to
be
11:55 - 11:58

on separate hard drives. And that's just gonna
make
11:58 - 12:03

the operating system able to more efficiently
sort of
12:03 - 12:04

schedule disc io.
12:04 - 12:10

So, if, if that solves your problems, you
know,
12:10 - 12:14

great. You're welcome. If you guys want to
leave
12:14 - 12:17

the room now, that's fine. I won't be offended.
12:17 - 12:19

You can go call your local Colo facility.
Work
12:19 - 12:23

something out. If, if that doesn't solve your
problem,
12:23 - 12:25

or even if it does, you probably want to
12:25 - 12:29

look at your queries next.
12:29 - 12:34

Now, most people, most of us develop - oops.
12:34 - 12:39

Hey. Most of us develop against smallish datasets.
So,
12:39 - 12:41

when you develop against a small dataset,
you don't
12:41 - 12:45

notice inefficient queries. It's just life.
And books have
12:45 - 12:48

been written about query optimization. It's,
it's a very
12:48 - 12:53

huge topic, and I can't ev, explain everything
in
12:53 - 12:54

thirty minutes. So I'm just going to explain
one
12:54 - 12:57

thing, and that's called a, a explain.
12:57 - 13:03

Fortunately, PostGres gives us an awesome
tool called Explain,
13:03 - 13:06

which basically you pass it a query and it
13:06 - 13:13

gives us a query plan. Oh it, wait a
13:16 - 13:19

second. That's a, that's a chapter from the
Iliad.
13:19 - 13:24

Sorry. It gives us a query plan, which still
13:24 - 13:26

kind of looks like a chapter from the Iliad.
13:26 - 13:29

But, fortunately, we only have to, to look
at
13:29 - 13:31

one metric here. The only thing we need to
13:31 - 13:37

worry about is rows. For this talk.
13:37 - 13:39

And here we have an example of a very
13:39 - 13:42

efficient query, right. It's gonna use an
index, and
13:42 - 13:45

it's gonna look at one row. That's pretty
sweet.
13:45 - 13:47

Very fast. This is the type of query that
13:47 - 13:51

biggish data works with.
13:51 - 13:53

But there's one thing you need to know about
13:53 - 13:59

rows. And that, obviously, the more rows you're
dealing
13:59 - 14:02

with, the more data your computer has to crunch
14:02 - 14:04

to give you the answers you want. And so
14:04 - 14:07

the whole, the whole name of the game in
14:07 - 14:09

query optimization is to limit the number
of rows
14:09 - 14:10

that you have to touch.
14:10 - 14:16

Let's go back to that inefficient count query.
So
14:16 - 14:18

if you run Explain on that, it turns out
14:18 - 14:22

that when you ask PostGres to count all of
14:22 - 14:25

the tab, all of the rows in a table,
14:25 - 14:27

it actually literally counts all the rows
in the
14:27 - 14:32

table, one by one.
14:32 - 14:35

And so you wind up with a, one of
14:35 - 14:41

these things. It's not pretty. And it's often
hard
14:41 - 14:43

to know when stuff like this is gonna crop
14:43 - 14:47

up. For example, I mentioned that most pagination
systems
14:47 - 14:52

break at page 1000 or 2000. And that's because,
14:52 - 14:58

well, the offset and limit operators work
a lot
14:58 - 15:00

like count, in that, if you do an offset
15:00 - 15:05

of 500,000, PostGres is going to count up
500,000
15:05 - 15:07

rows, and then if you have a limit of
15:07 - 15:09

100, it's gonna count up another 100. And
so
15:09 - 15:12

what you see is that pages one, two, and
15:12 - 15:17

three load extremely quickly. And so you,
as a
15:17 - 15:19

developer, are actually, when I say you I
mean
15:19 - 15:22

me, cause I actually, I actually fell into
this
15:22 - 15:25

trap. You test it with, like, one or two
15:25 - 15:28

pages and it works fine, so you move on
15:28 - 15:30

and then later on your customers are complaining
because
15:30 - 15:34

they can't access page 500. It's timing out.
15:34 - 15:39

The solution to this is to use a range
15:39 - 15:42

query. Range queries are very fast, and I'm
gonna
15:42 - 15:45

give you a really stupid example here. Here's
a
15:45 - 15:52

link with a much more detailed analysis of
all
15:52 - 15:55

this. But essentially, if you do a range query,
15:55 - 15:57

you're gonna be able to use an index. It's
15:57 - 15:59

gonna touch 100 rows and it's gonna be really
15:59 - 16:00

fast.
16:00 - 16:03

Now, I know what you guys are thinking. That's
16:03 - 16:06

not the same as an offset limit query. And
16:06 - 16:08

that's the reason I gave you the link in
16:08 - 16:10

the previous slide, so.
16:10 - 16:15

Sorting. Sorting is tricky. Sorting is the
devil. Sometimes
16:15 - 16:17

it's super fast. Sometimes if you happen to
be
16:17 - 16:21

asking for data sorted in exactly the same
way
16:21 - 16:24

that an index is sorted, well, it'll be super
16:24 - 16:27

fast. Other times, even if you have an index
16:27 - 16:29

on a column, if it's not set up right
16:29 - 16:31

or if you're asking for the data in a
16:31 - 16:34

slightly unusual way, you'll wind up sorting
the whole
16:34 - 16:35

damn dataset.
16:35 - 16:41

It's no good. So here's your rundown for query
16:41 - 16:44

optimization. You want to develop against
a real dataset,
16:44 - 16:46

because otherwise you won't know when things
are going
16:46 - 16:51

wrong until they go wrong in production. Use,
use
16:51 - 16:54

Explain. Use it a lot. And the whole name
16:54 - 16:55

of the game here is to limit the number
16:55 - 16:57

of rows that you touch. Because the more rows
16:57 - 17:00

you touch, the slower your queries are. You
know,
17:00 - 17:02

in general.
17:02 - 17:06

So, I, I don't know about you, but that
17:06 - 17:09

just like, that just was a lot of cognitive
17:09 - 17:12

load happening, right up here. So I want to
17:12 - 17:19

give you guys a cookie.
17:24 - 17:27

For the second half of this talk, we're gonna
17:27 - 17:31

cover a lot of issues that relate to kind
17:31 - 17:36

of the infrastructure around biggish data,
around big datasets
17:36 - 17:41

and firehoses of data coming in. And here's
a,
17:41 - 17:44

a legend of, of how we're gonna approach this.
17:44 - 17:48

We're gonna have a cute picture that, hopefully,
sort
17:48 - 17:51

of symbolizes something about the, the topic.
We're gonna
17:51 - 17:53

have the name of the topic, and then we're
17:53 - 17:55

gonna have a link with more info about the
17:55 - 17:58

topic if, if you want to, you know, find
17:58 - 17:58

this later.
17:58 - 18:00

And you don't have to write all these links
18:00 - 18:03

down right now. At the end of the presentation,
18:03 - 18:06

at the very last slide, I'll have a url
18:06 - 18:08

where you can get a list of all of
18:08 - 18:12

them. And, yeah, and I promise this isn't
malware.
18:12 - 18:16

So, you just gotta take my word on that.
18:16 - 18:19

Our ancestors were really interested in disc
operating systems.
18:19 - 18:21

It seems like they were building a new disc
18:21 - 18:25

operating system like every two seconds, but.
What, what's
18:25 - 18:27

up with that? I, I don't know. Now we
18:27 - 18:29

have, like, three.
18:29 - 18:30

The first thing that you should do if you're
18:30 - 18:34

using Linux is to increase your read-ahead
cache. A
18:34 - 18:37

read-ahead cache is something that, well,
I found not,
18:37 - 18:40

not a ton of people know about, but it's
18:40 - 18:44

a super easy way to get up to a
18:44 - 18:50

doubling of your read-throughput. And essentially
what this means
18:50 - 18:55

is that Linux examines the request that you
make
18:55 - 18:57

to the, to the disc. And if it sees
18:57 - 19:00

that you're asking for lots of blocks that
come
19:00 - 19:02

right after another in a row, it's gonna preload
19:02 - 19:05

the next set of blocks into RAM.
19:05 - 19:10

The normal, the, the default for this is,
like,
19:10 - 19:15

256K RAM that it uses to prefix these blocks.
19:15 - 19:17

If you update this to use, like, two megs,
19:17 - 19:20

four megs, you'll get a really big increase
in
19:20 - 19:25

read performance. Use a modern file system.
That means
19:25 - 19:30

EXT-3 is not an option. If you want to
19:30 - 19:33

know why, check out the link.
19:33 - 19:38

And, and if you happen to be using EXT-4
19:38 - 19:43

or XFS. I can never say that unless I
19:43 - 19:46

say it super slow. You might want to consider,
19:46 - 19:48

you might want to look into journaling settings.
If
19:48 - 19:50

you have your database on a completely separate
hard
19:50 - 19:55

drive, and it's running EXT-4 and you have
full
19:55 - 20:00

data journaling enabled, since PostGres does
its own journaling,
20:00 - 20:02

you're gonna have an inefficiency there because
you have
20:02 - 20:03

two things journaling.
20:03 - 20:09

Finally, or, wait. Not finally. Anyway. You
gotta tell
20:09 - 20:15

PG, tell, tell Paul Gram about, about all
the
20:15 - 20:18

RAM that you bought in that, that fancy new
20:18 - 20:20

box. A, a really easy way to set this
20:20 - 20:24

up is to use a PG-tune script. It examines
20:24 - 20:29

your computer and writes a, a PostGres configuration
file
20:29 - 20:33

that has some pretty, pretty reasonable values
in it.
20:33 - 20:36

And you can tweak them from there. This stuff
20:36 - 20:38

is all really easy to find.
20:38 - 20:41

And then, finally, the bane of, of DB administrators
20:41 - 20:47

is the vacuum command. PostGres needs to vacuum,
because
20:47 - 20:51

it's messy. It. When you, when you run queries,
20:51 - 20:53

when you delete things, when you update things,
it
20:53 - 20:55

leaves sort of a lot of stuff undone. And
20:55 - 21:00

it does that, it does that in the, to,
21:00 - 21:04

it. In the name of speed, right. It only
21:04 - 21:06

does what is necessary at the time of query
21:06 - 21:09

to get you an answer to your query.
21:09 - 21:11

And then it uses vacuum to go and sort
21:11 - 21:15

of clean up after itself. The problem is that
21:15 - 21:19

vacuum can be really resource intensive. And
so if
21:19 - 21:22

your, if your server is under a lot of
21:22 - 21:25

load, and you see that, OK, vacuum is also
21:25 - 21:27

causing a lot of load, you may be tempted
21:27 - 21:31

to turn off vacuum or to, or to make
21:31 - 21:35

autovacuum happen maybe once a night or something.
And
21:35 - 21:39

that's generally a bad idea. We actually got
bitten
21:39 - 21:41

by this ourselves, so that's why I bring it
21:41 - 21:44

up. Usually the answer to vacuum problems
is to
21:44 - 21:46

vacuum more often, not less often.
21:46 - 21:52

All right. So. Velocity. I really wanted to
like
21:52 - 21:53

- I didn't have time - I really wanted
21:53 - 21:57

to have a little, a little like, Tron guy
21:57 - 22:01

on the motorcycle going down the little grid
there.
22:01 - 22:02

But. Eh.
22:02 - 22:05

Now we're gonna talk about a, a couple things
22:05 - 22:07

that are important if you have a ton of
22:07 - 22:11

data coming in, or a ton of read-requests
coming
22:11 - 22:14

in, a ton of queries coming in. The first
22:14 - 22:17

is too many database connections. Each database
connection in
22:17 - 22:21

PostGres is its own process, and each process
has
22:21 - 22:24

its own RAM overhead. So there's a limited
number
22:24 - 22:26

of connections that you want to have to your,
22:26 - 22:27

your database server.
22:27 - 22:32

If you have, I don't know, a thousand workers
22:32 - 22:35

and, and web app processes and all this, you
22:35 - 22:38

don't want to open a thousand database connections
and
22:38 - 22:41

you probably already know about this. The
solution is
22:41 - 22:42

to pool connections.
22:42 - 22:45

There's ways to do this in Ruby. There's also,
22:45 - 22:48

if you're interested in a more ops-y approach,
you
22:48 - 22:50

can check out something like PG-bouncer, which
is a
22:50 - 22:54

proxy that sits in between your Ruby app and
22:54 - 22:58

your database and functions as a, as a connection
22:58 - 23:01

pool.
23:01 - 23:03

You can also run into problems with too many
23:03 - 23:06

locks. And this is, this is the sort of
23:06 - 23:08

problem that you don't really ever run into
if
23:08 - 23:11

you don't have a firehose of database, of
data
23:11 - 23:16

coming in.
23:16 - 23:18

I don't expect you to know, like, everything
about
23:18 - 23:20

locks, because it's a, it's a pretty complex
topic.
23:20 - 23:24

But you should know that, within a transaction,
if
23:24 - 23:28

you go to update a row, that transaction is
23:28 - 23:31

gonna put a lock on the row until it's
23:31 - 23:33

done. It's gonna say that nothing else can
write
23:33 - 23:37

to that row until it's done. And, you know,
23:37 - 23:39

that makes sense.
23:39 - 23:41

But where this can bite you is if you
23:41 - 23:45

have, say, imagine you have two Rails models.
You
23:45 - 23:48

have a parent model and a child model. And
23:48 - 23:50

the parent model has a counter-cache that
gets incremented
23:50 - 23:54

every time you add a child. Normally, this
is
23:54 - 23:56

no big deal. People do this sort of thing
23:56 - 24:00

all the time. But if something crazy happens
and
24:00 - 24:05

someone, you know, bombards your API and suddenly
you
24:05 - 24:10

have, like, fifteen thousand children created,
you're, you're gonna
24:10 - 24:11

have some locking issues.
24:11 - 24:13

Cause what's gonna happen is your first query
is
24:13 - 24:17

going to execute fine. The child's gonna get
created.
24:17 - 24:20

It's gonna increment the counter. Everything's
gonna be fine.
24:20 - 24:22

And while it was doing that, it put a
24:22 - 24:27

lock on, on that row. And so, once the
24:27 - 24:29

first query's done, the lock is, is removed,
and
24:29 - 24:34

the second query happens. And if you had infinite
24:34 - 24:36

time, like, this would all be fine. It would
24:36 - 24:39

all eventually work itself out. But what happens
is
24:39 - 24:42

that by the time you finish query 100, query
24:42 - 24:48

15, 15,000 has timed out, which causes all
sorts
24:48 - 24:51

of fun in your, you know, in your Unicorns
24:51 - 24:55

and your Sidekiqs and all that. It's a, yeah,
24:55 - 24:57

it's a huge pain in the neck.
24:57 - 25:01

And the way you avoid this is to, it,
25:01 - 25:04

it's just an architectural thing. You just
have to
25:04 - 25:07

avoid any situation where you could be updating
the
25:07 - 25:12

same record in the database like a gillion
times
25:12 - 25:16

per second.
25:16 - 25:21

Intensive database queries. Like, sometimes,
sometimes we have our
25:21 - 25:26

production database, and we need that to be
very,
25:26 - 25:30

very performant for our users. But we also
need
25:30 - 25:32

to maybe do some preprocessing on data as
it
25:32 - 25:34

comes in. And a really simple way to do
25:34 - 25:39

this is to use PostGres's streaming replication
facilities to
25:39 - 25:43

create a read-only replicant. And then you
just do
25:43 - 25:46

your intensive queries against that and it
doesn't affect,
25:46 - 25:50

affect your users. It's super simple. Sorry.
I, I,
25:50 - 25:53

I kind of feel dumb even talking about it
25:53 - 25:56

here. But, there you go.
25:56 - 25:59

Partitioning is awesome. Partitioning is like
the best thing
25:59 - 26:05

ever, because it allows you to dramatically
speed up
26:05 - 26:11

data calling and data archiving. What you
can set
26:11 - 26:14

up in, in PostGres is a partitioning scheme
in
26:14 - 26:19

which data for different days go into different
physical
26:19 - 26:21

database tables.
26:21 - 26:24

And, you know, that's all right. But the really
26:24 - 26:28

cool thing is that you access that data by
26:28 - 26:34

querying a parent sort of virtual table that
then
26:34 - 26:37

propagates that query across all of its children
and,
26:37 - 26:39

you know, magically does its results and,
and spits
26:39 - 26:40

them back to you.
26:40 - 26:41

So you don't have to update your Ruby code
26:41 - 26:45

at all, which is sweet. And this way, if
26:45 - 26:49

you want to, say, delete data that's 180 days
26:49 - 26:52

old, you just drop the table that's associated
with
26:52 - 26:55

180 days ago. If you were to do a
26:55 - 27:00

delete where, you know, created_at is less
than 180
27:00 - 27:03

days ago, you're probably gonna be waiting
for weeks.
27:03 - 27:08

And last, but not least, backups are a real
27:08 - 27:10

pain in the neck when you have a huge
27:10 - 27:12

dataset. Like, that crime job that dumps your
entire
27:12 - 27:17

database and uploads it to S3 doesn't work
so
27:17 - 27:20

well when your database is, like, two terabytes
big.
27:20 - 27:25

And, yeah, that's a big S3 bill, too.
27:25 - 27:28

Fortunately, we can take advantage of the
same sort
27:28 - 27:32

of things that PostGres does to allow it to
27:32 - 27:37

do streaming replication to do, sort of, on
the
27:37 - 27:41

fly, incremental backups. And there's a tool
called Wall-E,
27:41 - 27:45

which makes this super easy. And it, it's
really
27:45 - 27:49

cool, because it allows, it, it makes it very
27:49 - 27:52

easy to upload incremental backups to, say,
S3, and
27:52 - 27:54

then when you want to restore your database,
it
27:54 - 27:58

makes it really easy to, to restore a specific
27:58 - 28:01

point in time.
28:01 - 28:04

And so that, Wall-E is really awesome. I love
28:04 - 28:04

it.
28:04 - 28:07

Now, I, I, I stayed in my hotel room
28:07 - 28:11

last night. I didn't get to go to the
28:11 - 28:13

Speakeasy thing, which kind of bummed me out.
But
28:13 - 28:15

I, I had to work on these slides for
28:15 - 28:20

you people. And, and, and when I got done
28:20 - 28:22

with this, when I sort of reached this point,
28:22 - 28:26

I looked back on all of these, these slides
28:26 - 28:28

- there's like fifty-four of them - and I
28:28 - 28:31

was like, holy crap. This is a lot of
28:31 - 28:32

information.
28:32 - 28:39

Yeah. This is a lot of information. But the
28:39 - 28:41

thing I, I want to stress is that, at
28:41 - 28:45

least in, in my experience, these issues tend
to
28:45 - 28:48

come at you one at a time. This isn't
28:48 - 28:49

the sort of thing where you have to know
28:49 - 28:54

all of this stuff in advance in order to
28:54 - 28:58

deal with biggish data in PostGres. Things
come at
28:58 - 28:59

you one at a time, and you can deal
28:59 - 29:01

with them one at a time.
29:01 - 29:03

And I have faith in you. I think you
29:03 - 29:08

can do it, because you're awesome, because
nobody told
29:08 - 29:12

us that we could turn, like, transistors into
LOLCats,
29:12 - 29:15

but we did it. Like, that's the type of
29:15 - 29:18

people we are, and that's, that's why I'm
proud
29:18 - 29:22

to be at RailsConf. If you're interested in,
like,
29:22 - 29:23

talking to me about this, if you think I'm
29:23 - 29:26

full of shit about anything, which I probably
am
29:26 - 29:31

on at least one point, just, just say so.
29:31 - 29:33

Just feel free to come up to me after
29:33 - 29:36

the conference, or after my talk here. I have
29:36 - 29:41

delicious hot and spicy Mexican candy, as
an incentive.
29:41 - 29:43

So there you go.
29:43 - 29:46

If you want to learn more about the stuff
29:46 - 29:48

that I. Are you taking? Do you want to
29:48 - 29:52

take a picture of the Mexican candy? OK. OK.
29:52 - 29:56

I can send it to you.
29:56 - 29:59

If. If you want to learn more, all of
29:59 - 30:01

the links referenced in this talk are at this
30:01 - 30:04

url, and if you are interested in having more
30:04 - 30:09

visibility into your production errors, check
out Honeybadger, because,
30:09 - 30:13

yeah. We love Rails devs. And that's it. That's.
30:13 - 30:15

It says end of show here, so I guess
30:15 - 30:16

it must be the end of show.

Title:: RailsConf 2014 - Biggish Data With Rails and Postgresql by Starr Horne
Description:: more » « less
Duration:: 30:45

Amara Bot edited English subtitles for RailsConf 2014 - Biggish Data With Rails and Postgresql by Starr Horne

English subtitles

Revisions

Revision 1 Imported

Amara Bot

RailsConf 2014 - Biggish Data With Rails and Postgresql by Starr Horne

Revisions

Our website uses cookies

Operating cookies (Required)