< Return to Video

RailsConf 2014 - Biggish Data With Rails and Postgresql by Starr Horne

  • 0:17 - 0:19
    STARR HORNE: So thank you guys for coming.
  • 0:19 - 0:21
    This is gonna be an awesome talk.
  • 0:21 - 0:23
    Quite possibly the best talk you'll ever hear
  • 0:23 - 0:26
    in your life. What? I actually have
  • 0:26 - 0:28
    a built-in escape hatch here,
  • 0:28 - 0:30
    so if things start going wrong,
  • 0:30 - 0:33
    I might just bail. Yeah.
  • 0:33 - 0:40
    So, anyway. A couple weeks ago, I was, well,
  • 0:40 - 0:44
    I was sort of having a panic attack about
  • 0:44 - 0:48
    this presentation, because I'm a programmer,
    and my natural
  • 0:48 - 0:51
    habitat is in some sort of dark place looking
  • 0:51 - 0:54
    a screen. It's not, it's not talking to a
  • 0:54 - 0:57
    bunch of people with, like, lights. What's
    up with
  • 0:57 - 1:00
    this?
  • 1:00 - 1:01
    So I, I was in the middle of this
  • 1:01 - 1:03
    panic attack and I went out on a walk,
  • 1:03 - 1:05
    cause that always clears my mind. I was listening
  • 1:05 - 1:08
    to a podcast. My favorite podcast. It's called
    Hardcore
  • 1:08 - 1:13
    History. I highly recommend it. Yeah. Oh.
    Somebody likes
  • 1:13 - 1:17
    it. Yeah!
  • 1:17 - 1:20
    So this, this particular episode about Hardcore,
    in, of
  • 1:20 - 1:24
    Hardcore History was about the Punic Wars.
    And in
  • 1:24 - 1:27
    case you haven't heard about the Punic Wars,
    don't
  • 1:27 - 1:28
    know the story, I'll just give you, like,
    the
  • 1:28 - 1:32
    brief rundown. Basically, a long, long time
    ago, there
  • 1:32 - 1:35
    was this place called Rome. And they, they
    decided
  • 1:35 - 1:38
    that they should take over the world. And,
    for
  • 1:38 - 1:41
    the most part, they did. But there was this
  • 1:41 - 1:43
    one country that was a thorn in their side
  • 1:43 - 1:45
    for a long time called Carthidge. And they
    fought
  • 1:45 - 1:47
    all these wars back and forth. It was kind
  • 1:47 - 1:49
    of a stale mate.
  • 1:49 - 1:51
    And then one day, when, I don't know, maybe
  • 1:51 - 1:54
    he was making a sacrifice to the fire god
  • 1:54 - 1:57
    or whatever, this guy named Hannibal had this
    great
  • 1:57 - 2:00
    idea. He was like, I'm gonna, I'm gonna lead
  • 2:00 - 2:03
    this sneak-attack on Rome, and I'm gonna do
    so
  • 2:03 - 2:07
    by marching my army through the Alps. Which
    is,
  • 2:07 - 2:11
    is pretty cool. I mean, that, that's pretty
    bad
  • 2:11 - 2:13
    ass. But I, I think the most awesome part
  • 2:13 - 2:16
    of the story, for me, at least, is that
  • 2:16 - 2:19
    the dude had war elephants. So, I, I don't
  • 2:19 - 2:21
    know if you can see it in the slide,
  • 2:21 - 2:24
    but there's actually a war elephant. It's,
    it's kind
  • 2:24 - 2:27
    of, I don't know, it's kind of under the
  • 2:27 - 2:31
    G. And a lot of people have doubted this
  • 2:31 - 2:35
    story, you know, over the years. And so, in
  • 2:35 - 2:39
    1959, there was a British engineer who, on
    a
  • 2:39 - 2:44
    bar bet, borrowed a circus elephant named
    Jumbo and
  • 2:44 - 2:47
    marched it across the Alps, too.
  • 2:47 - 2:50
    So, I don't know what my point is here,
  • 2:50 - 2:55
    really. I guess don't underestimate elephants,
  • 2:55 - 2:56
    because they don't
  • 2:56 - 3:00
    like it and they have long memories.
  • 3:00 - 3:07
    So, OK. This talk is really about biggish
    data.
  • 3:08 - 3:10
    So what, what the hell is biggish data. It's
  • 3:10 - 3:13
    not, what is? Whoa, I've got one of these
  • 3:13 - 3:18
    things. It's not big data. It's not about,
    you
  • 3:18 - 3:21
    know, this talk isn't gonna be about Hadoop
    clusters
  • 3:21 - 3:25
    and server farms and all that stuff. That's,
    that's
  • 3:25 - 3:27
    way over my head.
  • 3:27 - 3:31
    It's not about fancy architectures. I'm not
    gonna show
  • 3:31 - 3:36
    you how to make multi-right PostGres clusters
    that do
  • 3:36 - 3:39
    automatic sharding and stuff. I mean, that's
    all like
  • 3:39 - 3:41
    wicked cool, but it's not what we're gonna
    talk
  • 3:41 - 3:41
    about.
  • 3:41 - 3:44
    Instead, I'm gonna talk about something that
    I think
  • 3:44 - 3:47
    is, is more practical, and it's kind of more
  • 3:47 - 3:49
    interesting to me, and that is, how do you
  • 3:49 - 3:52
    keep your app working as your production dataset
    grows?
  • 3:52 - 3:59
    And grows into biggish data territory?
  • 3:59 - 4:01
    And this is very easy. This is very easy
  • 4:01 - 4:04
    to happen. Even if you don't have a firehose
  • 4:04 - 4:06
    of data coming in. If you run a, a
  • 4:06 - 4:10
    popular e-commerce site or just a site with
    a
  • 4:10 - 4:12
    lot of users, over the course of years you
  • 4:12 - 4:15
    can accumulate a ton of data. And as this
  • 4:15 - 4:19
    data accumulates, you find that your site
    performance goes
  • 4:19 - 4:21
    downhill sort of gradually in ways that you
    don't
  • 4:21 - 4:24
    understand.
  • 4:24 - 4:29
    So. What is biggish data? This talk is based
  • 4:29 - 4:32
    on my experience at Honeybadger. In case you
    haven't
  • 4:32 - 4:36
    heard it, of us, we are an exception performance
  • 4:36 - 4:38
    and uptime monitoring company.
  • 4:38 - 4:40
    That means that we essentially
  • 4:40 - 4:42
    have a firehose of new data coming in every
  • 4:42 - 4:46
    day. Right now we have about a terabyte of
  • 4:46 - 4:50
    errors in our database. The world has a lot
  • 4:50 - 4:52
    of errors. You guys need to start, like, doing
  • 4:52 - 4:54
    a better job.
  • 4:54 - 4:58
    And, and we get about, we get two gigabytes
  • 4:58 - 5:01
    of new errors per day, roughly. And all this
  • 5:01 - 5:07
    goes into a plain vanilla PostGres database,
    and it's
  • 5:07 - 5:09
    served, that backs a pretty plain
  • 5:09 - 5:12
    vanilla Rails application.
  • 5:12 - 5:15
    Well, the good news is that PostGres can handle
  • 5:15 - 5:20
    this pretty easily. PostGres has got it covered.
  • 5:20 - 5:24
    The bad news is that, unless you've engineered
    your
  • 5:24 - 5:28
    app to deal with this level of data, you're
  • 5:28 - 5:31
    kind of screwed. And the reason is that a
  • 5:31 - 5:34
    hundred megabyte database behaves
  • 5:34 - 5:36
    fundamentally different than a one
  • 5:36 - 5:42
    terabyte database. And a lot of the conventions,
    a
  • 5:42 - 5:43
    lot of the normal things we do in Rails
  • 5:43 - 5:46
    apps just stop working when you have this
    much
  • 5:46 - 5:50
    data.
  • 5:50 - 5:56
    Yup. This is actually a live feed. This queries
  • 5:56 - 6:03
    been running since this morning.
  • 6:04 - 6:07
    Something I learned, pretty much every pagination
    system breaks
  • 6:07 - 6:13
    on page, like, 2000. Even if you, you're like,
  • 6:13 - 6:16
    dear God, make this stop, I just want to
  • 6:16 - 6:19
    delete half my data. You're, you're just still
    screwed,
  • 6:19 - 6:22
    because deleting data takes a lot of time.
  • 6:22 - 6:26
    Now, I'm gonna explain all of this. I'm gonna
  • 6:26 - 6:28
    explain why this happens, how you can work
    around
  • 6:28 - 6:31
    it, and how you can optimize the queries in
  • 6:31 - 6:35
    your own database and optimize your stack.
    But, to
  • 6:35 - 6:37
    really understand this, we've got to take
    a little
  • 6:37 - 6:44
    trip back in time, to the summer of 1978.
  • 6:45 - 6:49
    The summer of disco, free love, and the VT-100
  • 6:49 - 6:54
    terminal. The VT-100 was the first sort of
    computer-shaped
  • 6:54 - 6:56
    thing that you could buy from an actual company
  • 6:56 - 6:59
    and sit on your desk to prove that you
  • 6:59 - 7:02
    were the alpha nerd of the office. It was
  • 7:02 - 7:05
    also the time that Oracle 1 point 0 was
  • 7:05 - 7:09
    being developed, which was one of the first
    databases
  • 7:09 - 7:12
    that used the SQL language. It's pretty cool.
    It
  • 7:12 - 7:15
    was written in PDP-11 assembly language.
  • 7:15 - 7:19
    And, and in case you're like too young to
  • 7:19 - 7:23
    have ever seen a PDP-11, this is what a
  • 7:23 - 7:30
    PDP-11 looks like. Yeah. Give it, show some
    love
  • 7:30 - 7:33
    for the PDP-11.
  • 7:33 - 7:39
    Now, this is pretty awesome. Nowadays, in
    our, like,
  • 7:39 - 7:43
    modern day times, right this second, marketers
    down in
  • 7:43 - 7:46
    the, the presentat- what, what do they call
    that?
  • 7:46 - 7:51
    The exhibit hall. Are, right now, throwing
    around all
  • 7:51 - 7:54
    sorts of words about cloud computing and platform
    as
  • 7:54 - 7:58
    a service and, and I don't know, some, some
  • 7:58 - 8:01
    stuff I haven't even heard of, probably. And
    back
  • 8:01 - 8:03
    then it was the same, but the, the buzz
  • 8:03 - 8:05
    words were different. The buzz words on everybody's,
    the
  • 8:05 - 8:10
    buzz word on everybody's lips, at that time,
    was
  • 8:10 - 8:12
    real time computing.
  • 8:12 - 8:18
    And I, I'm using air quotes around real time,
  • 8:18 - 8:22
    because that meant that you could have your,
    your
  • 8:22 - 8:26
    report generated in five minutes instead of
    five days.
  • 8:26 - 8:29
    Which I'm sure was pretty awesome. Like, I
    would
  • 8:29 - 8:33
    have been really happy about that. But, you
    know,
  • 8:33 - 8:34
    looking at this, it doesn't really seem like
    the
  • 8:34 - 8:35
    sort of thing I would want to build a
  • 8:35 - 8:38
    web app on top of, right?
  • 8:38 - 8:44
    Like, I. Yeah. I, I wouldn't stick Honeybadger
    on
  • 8:44 - 8:48
    that. But, a funny thing happened in the past
  • 8:48 - 8:53
    thirty, thirty-six years. A funny thing happened
    is that
  • 8:53 - 8:57
    Moore's Law made this into a web stack. Now,
  • 8:57 - 8:59
    I know that everybody here knows sort of,
    vaguely,
  • 8:59 - 9:01
    what Moore's Law is. But I thought it would
  • 9:01 - 9:06
    be good to go over the technical, precise
    definition
  • 9:06 - 9:12
    of Moore's Law.
  • 9:12 - 9:18
    And that is that, whoa. That is that computers
  • 9:18 - 9:20
    get more awesome as time goes forward. Did
    that
  • 9:20 - 9:23
    come back normally? Yeah. As time goes forward.
    And
  • 9:23 - 9:28
    awesomeness is generally defined as, well,
    technically, being able
  • 9:28 - 9:31
    to do more stuff faster. Being able to process
  • 9:31 - 9:38
    more data faster. So, based on this, I would
  • 9:38 - 9:41
    like to postulate, for the first time in public
  • 9:41 - 9:43
    - you guys are very lucky to be able
  • 9:43 - 9:45
    to hear this - the first time in public,
  • 9:45 - 9:49
    Starr's Corelary to Moore's Law, which is,
    as database
  • 9:49 - 9:56
    growth outpaces Moore's Law, you literally
    travel back in
  • 9:57 - 10:00
    time.
  • 10:00 - 10:02
    That was the Ted moment of this, of this
  • 10:02 - 10:05
    talk. It's like, I can hear you guys, your
  • 10:05 - 10:10
    minds being blown. It sounds like bubble wrap
    popping.
  • 10:10 - 10:17
    So, based on my, my extensive research of
    time-travel
  • 10:17 - 10:21
    movies, the, the first rule for surviving
    your new
  • 10:21 - 10:23
    time period is that you need to understand
    the
  • 10:23 - 10:27
    methods and the motivations of the people
    in that
  • 10:27 - 10:29
    time period. We need to start thinking like
    our
  • 10:29 - 10:34
    ancestors. And our ancestors were very interested
    in this
  • 10:34 - 10:38
    thing called hard ware.
  • 10:38 - 10:42
    Well, I, I don't mean to be flip, but
  • 10:42 - 10:46
    I just want to bring this up first, because
  • 10:46 - 10:49
    if you happen to be having database scaling
    issues,
  • 10:49 - 10:52
    like, if your app right now is getting slower
  • 10:52 - 10:54
    because the amount of data in your database
    is
  • 10:54 - 10:58
    getting bigger, and your app happens to live
    anywhere
  • 10:58 - 11:04
    named EC-2, Linode, I don't know, whatever,
    whatever other
  • 11:04 - 11:08
    companies offer that, you can probably solve
    your scaling
  • 11:08 - 11:11
    issue just, like, right now just by going
    and
  • 11:11 - 11:13
    buying a real damn computer.
  • 11:13 - 11:20
    Yeah. You'll be shocked and amazed. Because
    the two
  • 11:20 - 11:24
    things that databases need to perform well,
    to work
  • 11:24 - 11:26
    at high performance with lots of data, is
    they
  • 11:26 - 11:29
    need lots of ram, and they need really fast
  • 11:29 - 11:32
    disc io. And virtual servers give you neither
    of
  • 11:32 - 11:35
    those things. So, just go buy a real damn
  • 11:35 - 11:39
    computer. Yeah. And, and while you're at it,
    throw
  • 11:39 - 11:43
    in a bunch of discs.
  • 11:43 - 11:45
    I'm not talking about a radar ray. You probably
  • 11:45 - 11:48
    want each of your, you probably want each
    of
  • 11:48 - 11:53
    your operating system, your PostGres data,
    your PostGres log
  • 11:53 - 11:55
    file, you probably want all that stuff to
    be
  • 11:55 - 11:58
    on separate hard drives. And that's just gonna
    make
  • 11:58 - 12:03
    the operating system able to more efficiently
    sort of
  • 12:03 - 12:04
    schedule disc io.
  • 12:04 - 12:10
    So, if, if that solves your problems, you
    know,
  • 12:10 - 12:14
    great. You're welcome. If you guys want to
    leave
  • 12:14 - 12:17
    the room now, that's fine. I won't be offended.
  • 12:17 - 12:19
    You can go call your local Colo facility.
    Work
  • 12:19 - 12:23
    something out. If, if that doesn't solve your
    problem,
  • 12:23 - 12:25
    or even if it does, you probably want to
  • 12:25 - 12:29
    look at your queries next.
  • 12:29 - 12:34
    Now, most people, most of us develop - oops.
  • 12:34 - 12:39
    Hey. Most of us develop against smallish datasets.
    So,
  • 12:39 - 12:41
    when you develop against a small dataset,
    you don't
  • 12:41 - 12:45
    notice inefficient queries. It's just life.
    And books have
  • 12:45 - 12:48
    been written about query optimization. It's,
    it's a very
  • 12:48 - 12:53
    huge topic, and I can't ev, explain everything
    in
  • 12:53 - 12:54
    thirty minutes. So I'm just going to explain
    one
  • 12:54 - 12:57
    thing, and that's called a, a explain.
  • 12:57 - 13:03
    Fortunately, PostGres gives us an awesome
    tool called Explain,
  • 13:03 - 13:06
    which basically you pass it a query and it
  • 13:06 - 13:13
    gives us a query plan. Oh it, wait a
  • 13:16 - 13:19
    second. That's a, that's a chapter from the
    Iliad.
  • 13:19 - 13:24
    Sorry. It gives us a query plan, which still
  • 13:24 - 13:26
    kind of looks like a chapter from the Iliad.
  • 13:26 - 13:29
    But, fortunately, we only have to, to look
    at
  • 13:29 - 13:31
    one metric here. The only thing we need to
  • 13:31 - 13:37
    worry about is rows. For this talk.
  • 13:37 - 13:39
    And here we have an example of a very
  • 13:39 - 13:42
    efficient query, right. It's gonna use an
    index, and
  • 13:42 - 13:45
    it's gonna look at one row. That's pretty
    sweet.
  • 13:45 - 13:47
    Very fast. This is the type of query that
  • 13:47 - 13:51
    biggish data works with.
  • 13:51 - 13:53
    But there's one thing you need to know about
  • 13:53 - 13:59
    rows. And that, obviously, the more rows you're
    dealing
  • 13:59 - 14:02
    with, the more data your computer has to crunch
  • 14:02 - 14:04
    to give you the answers you want. And so
  • 14:04 - 14:07
    the whole, the whole name of the game in
  • 14:07 - 14:09
    query optimization is to limit the number
    of rows
  • 14:09 - 14:10
    that you have to touch.
  • 14:10 - 14:16
    Let's go back to that inefficient count query.
    So
  • 14:16 - 14:18
    if you run Explain on that, it turns out
  • 14:18 - 14:22
    that when you ask PostGres to count all of
  • 14:22 - 14:25
    the tab, all of the rows in a table,
  • 14:25 - 14:27
    it actually literally counts all the rows
    in the
  • 14:27 - 14:32
    table, one by one.
  • 14:32 - 14:35
    And so you wind up with a, one of
  • 14:35 - 14:41
    these things. It's not pretty. And it's often
    hard
  • 14:41 - 14:43
    to know when stuff like this is gonna crop
  • 14:43 - 14:47
    up. For example, I mentioned that most pagination
    systems
  • 14:47 - 14:52
    break at page 1000 or 2000. And that's because,
  • 14:52 - 14:58
    well, the offset and limit operators work
    a lot
  • 14:58 - 15:00
    like count, in that, if you do an offset
  • 15:00 - 15:05
    of 500,000, PostGres is going to count up
    500,000
  • 15:05 - 15:07
    rows, and then if you have a limit of
  • 15:07 - 15:09
    100, it's gonna count up another 100. And
    so
  • 15:09 - 15:12
    what you see is that pages one, two, and
  • 15:12 - 15:17
    three load extremely quickly. And so you,
    as a
  • 15:17 - 15:19
    developer, are actually, when I say you I
    mean
  • 15:19 - 15:22
    me, cause I actually, I actually fell into
    this
  • 15:22 - 15:25
    trap. You test it with, like, one or two
  • 15:25 - 15:28
    pages and it works fine, so you move on
  • 15:28 - 15:30
    and then later on your customers are complaining
    because
  • 15:30 - 15:34
    they can't access page 500. It's timing out.
  • 15:34 - 15:39
    The solution to this is to use a range
  • 15:39 - 15:42
    query. Range queries are very fast, and I'm
    gonna
  • 15:42 - 15:45
    give you a really stupid example here. Here's
    a
  • 15:45 - 15:52
    link with a much more detailed analysis of
    all
  • 15:52 - 15:55
    this. But essentially, if you do a range query,
  • 15:55 - 15:57
    you're gonna be able to use an index. It's
  • 15:57 - 15:59
    gonna touch 100 rows and it's gonna be really
  • 15:59 - 16:00
    fast.
  • 16:00 - 16:03
    Now, I know what you guys are thinking. That's
  • 16:03 - 16:06
    not the same as an offset limit query. And
  • 16:06 - 16:08
    that's the reason I gave you the link in
  • 16:08 - 16:10
    the previous slide, so.
  • 16:10 - 16:15
    Sorting. Sorting is tricky. Sorting is the
    devil. Sometimes
  • 16:15 - 16:17
    it's super fast. Sometimes if you happen to
    be
  • 16:17 - 16:21
    asking for data sorted in exactly the same
    way
  • 16:21 - 16:24
    that an index is sorted, well, it'll be super
  • 16:24 - 16:27
    fast. Other times, even if you have an index
  • 16:27 - 16:29
    on a column, if it's not set up right
  • 16:29 - 16:31
    or if you're asking for the data in a
  • 16:31 - 16:34
    slightly unusual way, you'll wind up sorting
    the whole
  • 16:34 - 16:35
    damn dataset.
  • 16:35 - 16:41
    It's no good. So here's your rundown for query
  • 16:41 - 16:44
    optimization. You want to develop against
    a real dataset,
  • 16:44 - 16:46
    because otherwise you won't know when things
    are going
  • 16:46 - 16:51
    wrong until they go wrong in production. Use,
    use
  • 16:51 - 16:54
    Explain. Use it a lot. And the whole name
  • 16:54 - 16:55
    of the game here is to limit the number
  • 16:55 - 16:57
    of rows that you touch. Because the more rows
  • 16:57 - 17:00
    you touch, the slower your queries are. You
    know,
  • 17:00 - 17:02
    in general.
  • 17:02 - 17:06
    So, I, I don't know about you, but that
  • 17:06 - 17:09
    just like, that just was a lot of cognitive
  • 17:09 - 17:12
    load happening, right up here. So I want to
  • 17:12 - 17:19
    give you guys a cookie.
  • 17:24 - 17:27
    For the second half of this talk, we're gonna
  • 17:27 - 17:31
    cover a lot of issues that relate to kind
  • 17:31 - 17:36
    of the infrastructure around biggish data,
    around big datasets
  • 17:36 - 17:41
    and firehoses of data coming in. And here's
    a,
  • 17:41 - 17:44
    a legend of, of how we're gonna approach this.
  • 17:44 - 17:48
    We're gonna have a cute picture that, hopefully,
    sort
  • 17:48 - 17:51
    of symbolizes something about the, the topic.
    We're gonna
  • 17:51 - 17:53
    have the name of the topic, and then we're
  • 17:53 - 17:55
    gonna have a link with more info about the
  • 17:55 - 17:58
    topic if, if you want to, you know, find
  • 17:58 - 17:58
    this later.
  • 17:58 - 18:00
    And you don't have to write all these links
  • 18:00 - 18:03
    down right now. At the end of the presentation,
  • 18:03 - 18:06
    at the very last slide, I'll have a url
  • 18:06 - 18:08
    where you can get a list of all of
  • 18:08 - 18:12
    them. And, yeah, and I promise this isn't
    malware.
  • 18:12 - 18:16
    So, you just gotta take my word on that.
  • 18:16 - 18:19
    Our ancestors were really interested in disc
    operating systems.
  • 18:19 - 18:21
    It seems like they were building a new disc
  • 18:21 - 18:25
    operating system like every two seconds, but.
    What, what's
  • 18:25 - 18:27
    up with that? I, I don't know. Now we
  • 18:27 - 18:29
    have, like, three.
  • 18:29 - 18:30
    The first thing that you should do if you're
  • 18:30 - 18:34
    using Linux is to increase your read-ahead
    cache. A
  • 18:34 - 18:37
    read-ahead cache is something that, well,
    I found not,
  • 18:37 - 18:40
    not a ton of people know about, but it's
  • 18:40 - 18:44
    a super easy way to get up to a
  • 18:44 - 18:50
    doubling of your read-throughput. And essentially
    what this means
  • 18:50 - 18:55
    is that Linux examines the request that you
    make
  • 18:55 - 18:57
    to the, to the disc. And if it sees
  • 18:57 - 19:00
    that you're asking for lots of blocks that
    come
  • 19:00 - 19:02
    right after another in a row, it's gonna preload
  • 19:02 - 19:05
    the next set of blocks into RAM.
  • 19:05 - 19:10
    The normal, the, the default for this is,
    like,
  • 19:10 - 19:15
    256K RAM that it uses to prefix these blocks.
  • 19:15 - 19:17
    If you update this to use, like, two megs,
  • 19:17 - 19:20
    four megs, you'll get a really big increase
    in
  • 19:20 - 19:25
    read performance. Use a modern file system.
    That means
  • 19:25 - 19:30
    EXT-3 is not an option. If you want to
  • 19:30 - 19:33
    know why, check out the link.
  • 19:33 - 19:38
    And, and if you happen to be using EXT-4
  • 19:38 - 19:43
    or XFS. I can never say that unless I
  • 19:43 - 19:46
    say it super slow. You might want to consider,
  • 19:46 - 19:48
    you might want to look into journaling settings.
    If
  • 19:48 - 19:50
    you have your database on a completely separate
    hard
  • 19:50 - 19:55
    drive, and it's running EXT-4 and you have
    full
  • 19:55 - 20:00
    data journaling enabled, since PostGres does
    its own journaling,
  • 20:00 - 20:02
    you're gonna have an inefficiency there because
    you have
  • 20:02 - 20:03
    two things journaling.
  • 20:03 - 20:09
    Finally, or, wait. Not finally. Anyway. You
    gotta tell
  • 20:09 - 20:15
    PG, tell, tell Paul Gram about, about all
    the
  • 20:15 - 20:18
    RAM that you bought in that, that fancy new
  • 20:18 - 20:20
    box. A, a really easy way to set this
  • 20:20 - 20:24
    up is to use a PG-tune script. It examines
  • 20:24 - 20:29
    your computer and writes a, a PostGres configuration
    file
  • 20:29 - 20:33
    that has some pretty, pretty reasonable values
    in it.
  • 20:33 - 20:36
    And you can tweak them from there. This stuff
  • 20:36 - 20:38
    is all really easy to find.
  • 20:38 - 20:41
    And then, finally, the bane of, of DB administrators
  • 20:41 - 20:47
    is the vacuum command. PostGres needs to vacuum,
    because
  • 20:47 - 20:51
    it's messy. It. When you, when you run queries,
  • 20:51 - 20:53
    when you delete things, when you update things,
    it
  • 20:53 - 20:55
    leaves sort of a lot of stuff undone. And
  • 20:55 - 21:00
    it does that, it does that in the, to,
  • 21:00 - 21:04
    it. In the name of speed, right. It only
  • 21:04 - 21:06
    does what is necessary at the time of query
  • 21:06 - 21:09
    to get you an answer to your query.
  • 21:09 - 21:11
    And then it uses vacuum to go and sort
  • 21:11 - 21:15
    of clean up after itself. The problem is that
  • 21:15 - 21:19
    vacuum can be really resource intensive. And
    so if
  • 21:19 - 21:22
    your, if your server is under a lot of
  • 21:22 - 21:25
    load, and you see that, OK, vacuum is also
  • 21:25 - 21:27
    causing a lot of load, you may be tempted
  • 21:27 - 21:31
    to turn off vacuum or to, or to make
  • 21:31 - 21:35
    autovacuum happen maybe once a night or something.
    And
  • 21:35 - 21:39
    that's generally a bad idea. We actually got
    bitten
  • 21:39 - 21:41
    by this ourselves, so that's why I bring it
  • 21:41 - 21:44
    up. Usually the answer to vacuum problems
    is to
  • 21:44 - 21:46
    vacuum more often, not less often.
  • 21:46 - 21:52
    All right. So. Velocity. I really wanted to
    like
  • 21:52 - 21:53
    - I didn't have time - I really wanted
  • 21:53 - 21:57
    to have a little, a little like, Tron guy
  • 21:57 - 22:01
    on the motorcycle going down the little grid
    there.
  • 22:01 - 22:02
    But. Eh.
  • 22:02 - 22:05
    Now we're gonna talk about a, a couple things
  • 22:05 - 22:07
    that are important if you have a ton of
  • 22:07 - 22:11
    data coming in, or a ton of read-requests
    coming
  • 22:11 - 22:14
    in, a ton of queries coming in. The first
  • 22:14 - 22:17
    is too many database connections. Each database
    connection in
  • 22:17 - 22:21
    PostGres is its own process, and each process
    has
  • 22:21 - 22:24
    its own RAM overhead. So there's a limited
    number
  • 22:24 - 22:26
    of connections that you want to have to your,
  • 22:26 - 22:27
    your database server.
  • 22:27 - 22:32
    If you have, I don't know, a thousand workers
  • 22:32 - 22:35
    and, and web app processes and all this, you
  • 22:35 - 22:38
    don't want to open a thousand database connections
    and
  • 22:38 - 22:41
    you probably already know about this. The
    solution is
  • 22:41 - 22:42
    to pool connections.
  • 22:42 - 22:45
    There's ways to do this in Ruby. There's also,
  • 22:45 - 22:48
    if you're interested in a more ops-y approach,
    you
  • 22:48 - 22:50
    can check out something like PG-bouncer, which
    is a
  • 22:50 - 22:54
    proxy that sits in between your Ruby app and
  • 22:54 - 22:58
    your database and functions as a, as a connection
  • 22:58 - 23:01
    pool.
  • 23:01 - 23:03
    You can also run into problems with too many
  • 23:03 - 23:06
    locks. And this is, this is the sort of
  • 23:06 - 23:08
    problem that you don't really ever run into
    if
  • 23:08 - 23:11
    you don't have a firehose of database, of
    data
  • 23:11 - 23:16
    coming in.
  • 23:16 - 23:18
    I don't expect you to know, like, everything
    about
  • 23:18 - 23:20
    locks, because it's a, it's a pretty complex
    topic.
  • 23:20 - 23:24
    But you should know that, within a transaction,
    if
  • 23:24 - 23:28
    you go to update a row, that transaction is
  • 23:28 - 23:31
    gonna put a lock on the row until it's
  • 23:31 - 23:33
    done. It's gonna say that nothing else can
    write
  • 23:33 - 23:37
    to that row until it's done. And, you know,
  • 23:37 - 23:39
    that makes sense.
  • 23:39 - 23:41
    But where this can bite you is if you
  • 23:41 - 23:45
    have, say, imagine you have two Rails models.
    You
  • 23:45 - 23:48
    have a parent model and a child model. And
  • 23:48 - 23:50
    the parent model has a counter-cache that
    gets incremented
  • 23:50 - 23:54
    every time you add a child. Normally, this
    is
  • 23:54 - 23:56
    no big deal. People do this sort of thing
  • 23:56 - 24:00
    all the time. But if something crazy happens
    and
  • 24:00 - 24:05
    someone, you know, bombards your API and suddenly
    you
  • 24:05 - 24:10
    have, like, fifteen thousand children created,
    you're, you're gonna
  • 24:10 - 24:11
    have some locking issues.
  • 24:11 - 24:13
    Cause what's gonna happen is your first query
    is
  • 24:13 - 24:17
    going to execute fine. The child's gonna get
    created.
  • 24:17 - 24:20
    It's gonna increment the counter. Everything's
    gonna be fine.
  • 24:20 - 24:22
    And while it was doing that, it put a
  • 24:22 - 24:27
    lock on, on that row. And so, once the
  • 24:27 - 24:29
    first query's done, the lock is, is removed,
    and
  • 24:29 - 24:34
    the second query happens. And if you had infinite
  • 24:34 - 24:36
    time, like, this would all be fine. It would
  • 24:36 - 24:39
    all eventually work itself out. But what happens
    is
  • 24:39 - 24:42
    that by the time you finish query 100, query
  • 24:42 - 24:48
    15, 15,000 has timed out, which causes all
    sorts
  • 24:48 - 24:51
    of fun in your, you know, in your Unicorns
  • 24:51 - 24:55
    and your Sidekiqs and all that. It's a, yeah,
  • 24:55 - 24:57
    it's a huge pain in the neck.
  • 24:57 - 25:01
    And the way you avoid this is to, it,
  • 25:01 - 25:04
    it's just an architectural thing. You just
    have to
  • 25:04 - 25:07
    avoid any situation where you could be updating
    the
  • 25:07 - 25:12
    same record in the database like a gillion
    times
  • 25:12 - 25:16
    per second.
  • 25:16 - 25:21
    Intensive database queries. Like, sometimes,
    sometimes we have our
  • 25:21 - 25:26
    production database, and we need that to be
    very,
  • 25:26 - 25:30
    very performant for our users. But we also
    need
  • 25:30 - 25:32
    to maybe do some preprocessing on data as
    it
  • 25:32 - 25:34
    comes in. And a really simple way to do
  • 25:34 - 25:39
    this is to use PostGres's streaming replication
    facilities to
  • 25:39 - 25:43
    create a read-only replicant. And then you
    just do
  • 25:43 - 25:46
    your intensive queries against that and it
    doesn't affect,
  • 25:46 - 25:50
    affect your users. It's super simple. Sorry.
    I, I,
  • 25:50 - 25:53
    I kind of feel dumb even talking about it
  • 25:53 - 25:56
    here. But, there you go.
  • 25:56 - 25:59
    Partitioning is awesome. Partitioning is like
    the best thing
  • 25:59 - 26:05
    ever, because it allows you to dramatically
    speed up
  • 26:05 - 26:11
    data calling and data archiving. What you
    can set
  • 26:11 - 26:14
    up in, in PostGres is a partitioning scheme
    in
  • 26:14 - 26:19
    which data for different days go into different
    physical
  • 26:19 - 26:21
    database tables.
  • 26:21 - 26:24
    And, you know, that's all right. But the really
  • 26:24 - 26:28
    cool thing is that you access that data by
  • 26:28 - 26:34
    querying a parent sort of virtual table that
    then
  • 26:34 - 26:37
    propagates that query across all of its children
    and,
  • 26:37 - 26:39
    you know, magically does its results and,
    and spits
  • 26:39 - 26:40
    them back to you.
  • 26:40 - 26:41
    So you don't have to update your Ruby code
  • 26:41 - 26:45
    at all, which is sweet. And this way, if
  • 26:45 - 26:49
    you want to, say, delete data that's 180 days
  • 26:49 - 26:52
    old, you just drop the table that's associated
    with
  • 26:52 - 26:55
    180 days ago. If you were to do a
  • 26:55 - 27:00
    delete where, you know, created_at is less
    than 180
  • 27:00 - 27:03
    days ago, you're probably gonna be waiting
    for weeks.
  • 27:03 - 27:08
    And last, but not least, backups are a real
  • 27:08 - 27:10
    pain in the neck when you have a huge
  • 27:10 - 27:12
    dataset. Like, that crime job that dumps your
    entire
  • 27:12 - 27:17
    database and uploads it to S3 doesn't work
    so
  • 27:17 - 27:20
    well when your database is, like, two terabytes
    big.
  • 27:20 - 27:25
    And, yeah, that's a big S3 bill, too.
  • 27:25 - 27:28
    Fortunately, we can take advantage of the
    same sort
  • 27:28 - 27:32
    of things that PostGres does to allow it to
  • 27:32 - 27:37
    do streaming replication to do, sort of, on
    the
  • 27:37 - 27:41
    fly, incremental backups. And there's a tool
    called Wall-E,
  • 27:41 - 27:45
    which makes this super easy. And it, it's
    really
  • 27:45 - 27:49
    cool, because it allows, it, it makes it very
  • 27:49 - 27:52
    easy to upload incremental backups to, say,
    S3, and
  • 27:52 - 27:54
    then when you want to restore your database,
    it
  • 27:54 - 27:58
    makes it really easy to, to restore a specific
  • 27:58 - 28:01
    point in time.
  • 28:01 - 28:04
    And so that, Wall-E is really awesome. I love
  • 28:04 - 28:04
    it.
  • 28:04 - 28:07
    Now, I, I, I stayed in my hotel room
  • 28:07 - 28:11
    last night. I didn't get to go to the
  • 28:11 - 28:13
    Speakeasy thing, which kind of bummed me out.
    But
  • 28:13 - 28:15
    I, I had to work on these slides for
  • 28:15 - 28:20
    you people. And, and, and when I got done
  • 28:20 - 28:22
    with this, when I sort of reached this point,
  • 28:22 - 28:26
    I looked back on all of these, these slides
  • 28:26 - 28:28
    - there's like fifty-four of them - and I
  • 28:28 - 28:31
    was like, holy crap. This is a lot of
  • 28:31 - 28:32
    information.
  • 28:32 - 28:39
    Yeah. This is a lot of information. But the
  • 28:39 - 28:41
    thing I, I want to stress is that, at
  • 28:41 - 28:45
    least in, in my experience, these issues tend
    to
  • 28:45 - 28:48
    come at you one at a time. This isn't
  • 28:48 - 28:49
    the sort of thing where you have to know
  • 28:49 - 28:54
    all of this stuff in advance in order to
  • 28:54 - 28:58
    deal with biggish data in PostGres. Things
    come at
  • 28:58 - 28:59
    you one at a time, and you can deal
  • 28:59 - 29:01
    with them one at a time.
  • 29:01 - 29:03
    And I have faith in you. I think you
  • 29:03 - 29:08
    can do it, because you're awesome, because
    nobody told
  • 29:08 - 29:12
    us that we could turn, like, transistors into
    LOLCats,
  • 29:12 - 29:15
    but we did it. Like, that's the type of
  • 29:15 - 29:18
    people we are, and that's, that's why I'm
    proud
  • 29:18 - 29:22
    to be at RailsConf. If you're interested in,
    like,
  • 29:22 - 29:23
    talking to me about this, if you think I'm
  • 29:23 - 29:26
    full of shit about anything, which I probably
    am
  • 29:26 - 29:31
    on at least one point, just, just say so.
  • 29:31 - 29:33
    Just feel free to come up to me after
  • 29:33 - 29:36
    the conference, or after my talk here. I have
  • 29:36 - 29:41
    delicious hot and spicy Mexican candy, as
    an incentive.
  • 29:41 - 29:43
    So there you go.
  • 29:43 - 29:46
    If you want to learn more about the stuff
  • 29:46 - 29:48
    that I. Are you taking? Do you want to
  • 29:48 - 29:52
    take a picture of the Mexican candy? OK. OK.
  • 29:52 - 29:56
    I can send it to you.
  • 29:56 - 29:59
    If. If you want to learn more, all of
  • 29:59 - 30:01
    the links referenced in this talk are at this
  • 30:01 - 30:04
    url, and if you are interested in having more
  • 30:04 - 30:09
    visibility into your production errors, check
    out Honeybadger, because,
  • 30:09 - 30:13
    yeah. We love Rails devs. And that's it. That's.
  • 30:13 - 30:15
    It says end of show here, so I guess
  • 30:15 - 30:16
    it must be the end of show.
Title:
RailsConf 2014 - Biggish Data With Rails and Postgresql by Starr Horne
Description:

more » « less
Duration:
30:45

English subtitles

Revisions