< Return to Video

RailsConf 2014 - Service Extraction at Groupon Scale by Jason Sisk & Abhishek Pillai

  • 0:19 - 0:20
    ABHISHEK PILLAI: Thanks for coming. I know
    there's
  • 0:20 - 0:23
    some other cool talks right now, but you're
    here so
  • 0:23 - 0:26
    that's awesome. Let's get started. You're
    here to
  • 0:26 - 0:29
    learn about how to tame COBRAs.
  • 0:29 - 0:34
    JASON SISK: My name is Jason Sisk. I work
  • 0:34 - 0:37
    at Groupon. I've been here for a couple of
  • 0:37 - 0:42
    years. I work on predominantly Ruby/Rails
    systems, backend development,
  • 0:42 - 0:44
    et cetera, and I do not like onions.
  • 0:44 - 0:48
    A.P.: My name is Abi, and I'm at, I've
  • 0:48 - 0:51
    been at Groupon for about two years, too.
    And
  • 0:51 - 0:53
    Jason and I work on a team that does
  • 0:53 - 0:58
    backend service, basically managing inventory.
    And I don't like
  • 0:58 - 0:59
    fruits.
  • 0:59 - 1:01
    J.S.: So part of what we're gonna tell you
  • 1:01 - 1:04
    today is a little bit of a history lesson
  • 1:04 - 1:07
    about the early pain of Groupon having site
    outages,
  • 1:07 - 1:11
    et cetera, due to Rails scaling. We want to
  • 1:11 - 1:13
    tell you about the story of the developers
    that
  • 1:13 - 1:16
    actually handled those problems and some of
    the decisions
  • 1:16 - 1:20
    that they made. So that's that.
  • 1:20 - 1:23
    But we want to lead off with one important
  • 1:23 - 1:23
    point.
  • 1:23 - 1:30
    A.P.: Boom! Pause. You don't have to pause
    for
  • 1:30 - 1:32
    that long. And, yeah.
  • 1:32 - 1:37
    J.S.: So. Back, back around 2007, we were
    doing
  • 1:37 - 1:39
    what all the other cool kids were doing. We
  • 1:39 - 1:43
    were using a Rails monolith, and to some degree
  • 1:43 - 1:46
    still are. Rails 2 is a great framework. Who
  • 1:46 - 1:48
    is using Rails 2? Anyone?
  • 1:48 - 1:49
    AUDIENCE: Yeah!
  • 1:49 - 1:50
    J.S.: All right.
  • 1:50 - 1:51
    A.P.: Awesome.
  • 1:51 - 1:55
    J.S.: You and us. Rails is a great framework.
  • 1:55 - 1:58
    We all love Rails. That's why we're here.
    We
  • 1:58 - 2:03
    still love Rails and that's why we're here.
    But
  • 2:03 - 2:05
    what's great about it is that it's great for
  • 2:05 - 2:08
    Agile teams. It's, and for us it was really
  • 2:08 - 2:12
    simple. We could make some really quick decisions.
    We
  • 2:12 - 2:16
    could iterate product very quickly. We could
    iterate new
  • 2:16 - 2:18
    features. And we could do it with a small
  • 2:18 - 2:20
    team of five to ten devs.
  • 2:20 - 2:23
    We had a single repository. We had a single
  • 2:23 - 2:25
    test suite. And we had a single deploy process.
  • 2:25 - 2:26
    Very simple.
  • 2:26 - 2:29
    A.P.: And, most importantly, you, we had like
    one
  • 2:29 - 2:32
    shared, conceptual understanding of the code
    base. When we
  • 2:32 - 2:33
    wanted to make a change, we knew where to
  • 2:33 - 2:37
    put it. And things were simple that way.
  • 2:37 - 2:40
    J.S.: Also what was great was, and still is,
  • 2:40 - 2:44
    about Rails, that integrating components is
    really easy. The
  • 2:44 - 2:48
    convention over configuration, model associations
    - all of that
  • 2:48 - 2:50
    business you can put together things very
    quickly and
  • 2:50 - 2:53
    very easily. But we didn't come here to talk
  • 2:53 - 2:55
    to you about Rails.
  • 2:55 - 2:59
    A.P.: We came here to tell you about cobras,
  • 2:59 - 3:03
    and how to tame them. At Groupon, we actually
  • 3:03 - 3:05
    have a mo- monolith, and we call it the
  • 3:05 - 3:08
    primary web app. But Jason had a thought for
  • 3:08 - 3:10
    the purposes of this talk, we'd come up with
  • 3:10 - 3:13
    a more scientifically accurate name for it.
  • 3:13 - 3:20
    Yeah. So. Centralized Omnipotent Big-ass Rails
    Application.
  • 3:20 - 3:22
    J.S.: Big-ass. So we want to take you back
  • 3:22 - 3:27
    to 2009 for just a minute. So Groupon was
  • 3:27 - 3:29
    about two years old, give or take, and we
  • 3:29 - 3:31
    were still kind of kicking into gear. People
    would
  • 3:31 - 3:34
    come into the office in Chicago we've got,
    open
  • 3:34 - 3:37
    up New Relic, and they'd see stuff like this.
  • 3:37 - 3:39
    A.P.: So as you can see, like, in the
  • 3:39 - 3:42
    middle of the night, it's great. Everything's
    working really
  • 3:42 - 3:44
    well. Soon as people woke up and started using
  • 3:44 - 3:48
    it - damn people - our performance immediately
    started
  • 3:48 - 3:52
    to drop.
  • 3:52 - 3:55
    And then eight months later, we had about
    thirty
  • 3:55 - 4:00
    thousand requests per minute and everything
    was on fire.
  • 4:00 - 4:02
    J.S.: We blame Oprah.
  • 4:02 - 4:04
    A.P.: As you do.
  • 4:04 - 4:08
    J.S.: It's Oprah's fault. Oprah crashed Groupon.
    Oprah crashed
  • 4:08 - 4:13
    Groupon not once, but at least twice. And
    also
  • 4:13 - 4:16
    the Gap crashed Groupon too. Actually, the
    truth is,
  • 4:16 - 4:20
    Groupon crashed Groupon. We were not scaling
    properly. Bad.
  • 4:20 - 4:22
    Bad Groupon.
  • 4:22 - 4:28
    The Cobra was getting fatter and fatter. We
    were
  • 4:28 - 4:29
    up to-
  • 4:29 - 4:34
    A.P.: Yeah. So. We were up to, we started,
  • 4:34 - 4:36
    we had, like, five to fifty devs. We started
  • 4:36 - 4:38
    with about three to five hundred commits per
    month.
  • 4:38 - 4:40
    Slowly, and in a couple of years, as you
  • 4:40 - 4:43
    can see, we were averaging about two thousand
    commits
  • 4:43 - 4:45
    in a single month. We had a lot of
  • 4:45 - 4:47
    developers developing a lot of things.
  • 4:47 - 4:50
    J.S.: This is all one cobra.
  • 4:50 - 4:54
    A.P.: And you know, we started thinking about
    SOA
  • 4:54 - 4:57
    at that point. It was already becoming really
    painful.
  • 4:57 - 5:01
    But we looked at the cobra, directly in the
  • 5:01 - 5:03
    eyes, and it scared the shit out of us.
  • 5:03 - 5:06
    J.S.: We had a lot of scoping problems. And
  • 5:06 - 5:10
    a lot of that had to do with model
  • 5:10 - 5:12
    coupling. So, one of the biggest things that
    was
  • 5:12 - 5:17
    keeping us from extracting services early
    was as the,
  • 5:17 - 5:19
    as the code grew, you had a lot of
  • 5:19 - 5:22
    sort of natural convention coupling that was
    happening in
  • 5:22 - 5:23
    the models.
  • 5:23 - 5:26
    So a little bit of a over-simplified example
    here.
  • 5:26 - 5:30
    But you have a, let's say you have, you're
  • 5:30 - 5:32
    on the MyGroupon's page. You want to look
    at
  • 5:32 - 5:34
    all of the Groupons that you've bought. And
    you
  • 5:34 - 5:36
    want to see all the titles for all of
  • 5:36 - 5:37
    those. So when we go to render the interface
  • 5:37 - 5:40
    we want to display all these deal titles.
    In
  • 5:40 - 5:42
    the cobra, you might find a set of dependent
  • 5:42 - 5:44
    relationships that are somewhat like this,
    where you can
  • 5:44 - 5:48
    see the cyclical dependencies.
  • 5:48 - 5:51
    But building these types of associations was
    fairly common
  • 5:51 - 5:56
    place, which was kind of bad in some ways.
  • 5:56 - 5:59
    So in this case, you would instantiate a user,
  • 5:59 - 6:01
    which would require a database lookup to the
    Users
  • 6:01 - 6:05
    table, select star, and, and you would map
    over
  • 6:05 - 6:08
    that, that user's orders to get all of the
  • 6:08 - 6:10
    deal titles.
  • 6:10 - 6:13
    In this, in this case, there is a Demeter
  • 6:13 - 6:17
    violation. Demeter violations are bad.
  • 6:17 - 6:20
    A.P.: And it looks clean. I mean, it looks
  • 6:20 - 6:23
    good. But, what it does is couples our components.
  • 6:23 - 6:26
    J.S.: Here is an example of what I was
  • 6:26 - 6:30
    talking about. You, you have a basically unnecessarily-
    unnecessary
  • 6:30 - 6:34
    table lookup to Users. Now, if you're designing
    your
  • 6:34 - 6:37
    applications well, you can avoid this right
    out of
  • 6:37 - 6:40
    the gate. But Rails conventions don't, don't
    encourage you
  • 6:40 - 6:42
    to avoid this right out of the gate. And
  • 6:42 - 6:46
    ActiveRecord DSL for, for advanced queries
    aren't something that
  • 6:46 - 6:49
    people just tend to do by default. Or at
  • 6:49 - 6:50
    least they didn't in 2009.
  • 6:50 - 6:55
    A.P.: Yeah. And, I mean. Things got a lot
  • 6:55 - 6:59
    worse, because our code base and cobra was
    just
  • 6:59 - 7:02
    getting bigger and bigger. You can see here
    it's
  • 7:02 - 7:08
    almost two million lines of code at this point.
  • 7:08 - 7:09
    And, oh yeah, we have to stay up 100%
  • 7:09 - 7:14
    of the time. So that's a problem. All right.
  • 7:14 - 7:17
    J.S.: Also, the database is completely on
    fire.
  • 7:17 - 7:22
    A.P.: So yeah. We were in quite a pickle.
  • 7:22 - 7:29
    It was painful. Testing sucked. I mean, we
    had
  • 7:31 - 7:33
    to wait like forty-five minutes for a build
    to
  • 7:33 - 7:36
    run. You basically ran your tests and then
    figure
  • 7:36 - 7:38
    out something else to do, because you had
    to
  • 7:38 - 7:41
    wait while your tests ran. And a lot of
  • 7:41 - 7:44
    our release engineer devoted a lot of effort
    to
  • 7:44 - 7:46
    make those tests run faster.
  • 7:46 - 7:50
    J.S.: Deploys were terrible. Deploy, deploy
    process was somewhere
  • 7:50 - 7:52
    on the, on the scale of three hours to
  • 7:52 - 7:56
    deploy the, the application. Just a really
    bad development
  • 7:56 - 7:58
    experience, especially as you start to have
    teams that,
  • 7:58 - 8:02
    that split, split ownership. They want to
    iterate on
  • 8:02 - 8:04
    features that matter to their team, and they
    don't
  • 8:04 - 8:07
    want to be held up by this gigantic monolithic
  • 8:07 - 8:07
    application.
  • 8:07 - 8:10
    And, and it's, you know, the, the deploy's
    only
  • 8:10 - 8:12
    happening once a week. That really hurts the
    team's
  • 8:12 - 8:14
    ability to set, that maybe wants to do continuous
  • 8:14 - 8:16
    deployment. So, it sucked.
  • 8:16 - 8:21
    A.P.: Yeah. I mean, and development pace was
    increasing,
  • 8:21 - 8:23
    as you saw, and, I mean, what's the best
  • 8:23 - 8:25
    place to put the next line of code, as
  • 8:25 - 8:27
    I heard in a talk earlier. It's the place
  • 8:27 - 8:30
    that you're changes. Models got bloated, and
    there's a
  • 8:30 - 8:31
    lot of cruft.
  • 8:31 - 8:33
    J.S.: So all of these things were terrible.
    It
  • 8:33 - 8:38
    was very painful. So, we decided to move towards
  • 8:38 - 8:41
    service extraction a little bit more seriously.
  • 8:41 - 8:45
    If there's a big take away from this first
  • 8:45 - 8:47
    section, we just want you to remember that
    cobras
  • 8:47 - 8:54
    are great. They are great. Until they aren't.
  • 8:55 - 8:58
    A.P.: So we needed to alleviate this pain
    immediately.
  • 8:58 - 9:00
    We needed to get that code out of there.
  • 9:00 - 9:04
    We needed a quick extraction. So we decided
    to
  • 9:04 - 9:07
    extract a new service and build it on top
  • 9:07 - 9:10
    of our current schema. We decided to start
    with
  • 9:10 - 9:14
    the order service, because. I mean. It was
    causing
  • 9:14 - 9:17
    a lot of database contention. We had a lot
  • 9:17 - 9:19
    of people buying a lot of Groupons, and, a
  • 9:19 - 9:21
    good problem to have, but it was bringing
    our
  • 9:21 - 9:21
    database down.
  • 9:21 - 9:23
    So we needed to get that code out of
  • 9:23 - 9:28
    it, and also another thing behind the, behind
    choosing
  • 9:28 - 9:30
    orders to start is that, you know, it's gonna
  • 9:30 - 9:32
    be a long-lived model, a long living model
    in
  • 9:32 - 9:35
    our domain. We know that for sure.
  • 9:35 - 9:38
    So, to illustrate, this is what it looks like
  • 9:38 - 9:41
    in the beginning. And this is what we're trying
  • 9:41 - 9:44
    to accomplish. You have an orders, you have
    the
  • 9:44 - 9:46
    cobra, and then we're trying to have a separate
  • 9:46 - 9:49
    orders codebase, which will have its own database.
    But
  • 9:49 - 9:53
    it continues to have re- a read-only access
    to
  • 9:53 - 9:56
    the cobra's database, because we didn't focus
    on completely
  • 9:56 - 10:01
    making the cobra, the order service, re, stopping,
    stopping
  • 10:01 - 10:04
    it from reaching back into the cobra's database.
  • 10:04 - 10:08
    And, I mean, the cobra was really sneaky.
    It
  • 10:08 - 10:11
    was really tough to find all the ways that,
  • 10:11 - 10:14
    with Rails callbacks and model associations,
    all the ways
  • 10:14 - 10:19
    that the components were coupled.
  • 10:19 - 10:22
    So we built some tools to make that easier.
  • 10:22 - 10:23
    This is one of them. The service wall, as
  • 10:23 - 10:25
    call it. We're trying to, the main goal here
  • 10:25 - 10:30
    is separating the concerns of orders within
    the application.
  • 10:30 - 10:33
    So, you start with having your services in
    a
  • 10:33 - 10:38
    separate directory. Let's see a closer look
    of it.
  • 10:38 - 10:40
    You have the order service in its own directory,
  • 10:40 - 10:43
    and you have its own app, its own lib,
  • 10:43 - 10:45
    its own specs. The way that works is that
  • 10:45 - 10:48
    in environment dot rb file, we iterated through
    these
  • 10:48 - 10:50
    services and added them to the load path.
    So
  • 10:50 - 10:53
    the application to the application looks like
    it's just
  • 10:53 - 10:57
    one big application, but for our purposes,
    the code
  • 10:57 - 10:59
    was separate.
  • 10:59 - 11:03
    So, this is like, a small example of how
  • 11:03 - 11:06
    service wall works. You have this disable
    model access
  • 11:06 - 11:12
    method that basically, if, if you specify
    the models
  • 11:12 - 11:15
    that you want to, if you specify the service
  • 11:15 - 11:19
    that you want to disable or deprecate, and
    it'll
  • 11:19 - 11:23
    figure out the models of that service and
    add
  • 11:23 - 11:29
    it to this do-not-touch list. And basically
    raise these
  • 11:29 - 11:31
    kinds of violations. So if you use the disable
  • 11:31 - 11:34
    model access model, when you run your tests,
    it
  • 11:34 - 11:37
    will put up this message saying, you don't
    have
  • 11:37 - 11:39
    access to this method.
  • 11:39 - 11:41
    When a deal is trying to access an order,
  • 11:41 - 11:43
    we can figure that out just by running our
  • 11:43 - 11:47
    tests. If you use the more friendlier, deprecate
    service
  • 11:47 - 11:50
    mo- deprecate model access method, then you
    can be
  • 11:50 - 11:53
    more permissive and it'll just log it to a
  • 11:53 - 11:55
    file. You can see that in development mode
    or
  • 11:55 - 11:58
    you can have it on staging, and that'll basically,
  • 11:58 - 11:59
    that'll allow you to find all the places where
  • 11:59 - 12:03
    you're having service infractions.
  • 12:03 - 12:05
    You can't do this in production though, because
    it
  • 12:05 - 12:09
    causes a serious produ- performance hit.
  • 12:09 - 12:14
    Oh yeah. So this is how, so this is
  • 12:14 - 12:18
    how you actually use the service wall. Use,
    you,
  • 12:18 - 12:22
    at the top of your controller, you disable,
    use
  • 12:22 - 12:25
    the method disable_model_access or deprecate_model_access,
    depending on what you
  • 12:25 - 12:27
    want to do. You tell it what service, and
  • 12:27 - 12:30
    it even lets you exempt some actions that
    you
  • 12:30 - 12:31
    don't want to raise violations on yet.
  • 12:31 - 12:36
    That way you can comment out that action and
  • 12:36 - 12:38
    tackle one action at a time. Which endpoints
    are
  • 12:38 - 12:41
    actually reaching over and causing the service
    wall infraction.
  • 12:41 - 12:46
    J.S.: So, in addition to the service wall,
    one,
  • 12:46 - 12:48
    one other problem with this approach, this
    extraction approach
  • 12:48 - 12:52
    is that, because you necessarily fork the
    code, you
  • 12:52 - 12:54
    get a lot of cruft left over from the
  • 12:54 - 13:00
    old, the old domain. So you find yourself
    asking,
  • 13:00 - 13:02
    teams find themselves asking, very often,
    is this endpoint
  • 13:02 - 13:04
    even used? Do we even care about this code
  • 13:04 - 13:05
    anymore?
  • 13:05 - 13:10
    So, a small team of Groupon developers hacked
    together
  • 13:10 - 13:13
    something called Route 66 that we use internally
    to
  • 13:13 - 13:17
    track down cruft in both our old cobra and
  • 13:17 - 13:21
    our new cobra. So it basically answers the
    question,
  • 13:21 - 13:23
    are these endpoints used? I don't know if
    you
  • 13:23 - 13:24
    can see this very well, but this is a
  • 13:24 - 13:25
    little bit of a UI.
  • 13:25 - 13:26
    A.P.: Yeah.
  • 13:26 - 13:30
    J.S.: But what we do is, we analyze log
  • 13:30 - 13:34
    files, we analyze, spelunk logs to come up
    with
  • 13:34 - 13:37
    which controller actions are being hit, what's
    the frequency.
  • 13:37 - 13:39
    Is this a route that is hit once a
  • 13:39 - 13:42
    week, you know. Once a, once a month? And
  • 13:42 - 13:45
    we can very aggressively decruft using this
    tool as
  • 13:45 - 13:47
    well.
  • 13:47 - 13:53
    A.P.: All right. So there's definitely pros
    to this
  • 13:53 - 13:57
    approach. Because you're focusing on just
    separating the models,
  • 13:57 - 14:00
    I mean, just separating the code, you can
    quickly
  • 14:00 - 14:03
    and not worry about spinning up a separate
    database
  • 14:03 - 14:05
    schema, separate naming, all of that. You
    just worry
  • 14:05 - 14:08
    about separating the code, and that focuses
    the abstraction.
  • 14:08 - 14:12
    It makes it easier to spin up endpoints. But
  • 14:12 - 14:13
    the cons are, you're stilled tied to that
    legac,
  • 14:13 - 14:16
    to that legacy database. Not such a bad thing
  • 14:16 - 14:17
    if you really need to get it out of
  • 14:17 - 14:21
    there. But, because you're forking this code
    now, and
  • 14:21 - 14:23
    now it's being hit through endpoints, there
    is still
  • 14:23 - 14:26
    a lot of cruft in the, in the, in
  • 14:26 - 14:28
    the code base. Because a lot of these endpoints
  • 14:28 - 14:30
    are now not being used.
  • 14:30 - 14:32
    J.S.: So this was the first extraction pattern
    that
  • 14:32 - 14:34
    we used at Groupon to get out of the
  • 14:34 - 14:39
    original cobra, the original Groupon cobra.
    But teams sort
  • 14:39 - 14:41
    of own their own tactics, and there are other
  • 14:41 - 14:44
    ways that they're doing it as well. One way
  • 14:44 - 14:47
    that, one way that service extraction is also
    happening
  • 14:47 - 14:50
    is by using greenfield services that use a
    message
  • 14:50 - 14:54
    bus. Sometimes you just need to keep that
    legacy
  • 14:54 - 14:56
    API running, because there are a lot of client
  • 14:56 - 14:58
    dependencies on it. There's a lot of dependencies
    on
  • 14:58 - 15:01
    the structure of the data.
  • 15:01 - 15:03
    But who likes doing greenfield work in here?
    Raise
  • 15:03 - 15:06
    your hand if you like greenfield work. Right.
    That
  • 15:06 - 15:10
    should be all of you. Whatever.
  • 15:10 - 15:13
    So, it is possible to do greenfield service
    extraction,
  • 15:13 - 15:17
    and we're doing this as well. So, again, we
  • 15:17 - 15:22
    have a similar. Whoops. Juggling between power
    point and
  • 15:22 - 15:27
    preview. Similar type of situation. You have
    this cobra,
  • 15:27 - 15:30
    and then we get to the scenario that we're,
  • 15:30 - 15:32
    we're trying to reach with the greenfield
    extraction, where
  • 15:32 - 15:35
    you have, in this case the red, the red
  • 15:35 - 15:38
    box represents all new code. There's a gem,
    a
  • 15:38 - 15:41
    client gem that interact, that runs in the
    original
  • 15:41 - 15:43
    cobra, that runs in the green cobra. And when
  • 15:43 - 15:46
    this service writes data to its db, a message
  • 15:46 - 15:50
    is sent that the green cobra consumes and
    sends
  • 15:50 - 15:53
    over to its own data store, thus satisfying
    all
  • 15:53 - 15:57
    of the legacy API requirements.
  • 15:57 - 15:58
    And then what's notable about this is to keep
  • 15:58 - 16:03
    everything in sync for service cut-overs,
    rollouts, et cetera,
  • 16:03 - 16:06
    there is a background sync worker that runs,
    that
  • 16:06 - 16:09
    syncs it one way from the old database to
  • 16:09 - 16:13
    the new database.
  • 16:13 - 16:16
    There are pros and cons to this approach as
  • 16:16 - 16:19
    well. Some of the better parts are that you
  • 16:19 - 16:22
    can get rid of your legacy data quickly, again.
  • 16:22 - 16:24
    Devs like greenfield stuff. You like to design
    your
  • 16:24 - 16:29
    own systems. You also get to minimize the
    cut-over
  • 16:29 - 16:32
    risk with your data sync. So you're not splitting
  • 16:32 - 16:34
    the table and you have to have all of
  • 16:34 - 16:38
    these API dependencies written on one hand
    so that
  • 16:38 - 16:42
    when you break your database you don't have,
    you
  • 16:42 - 16:43
    don't have failures.
  • 16:43 - 16:45
    So you can phase the, you can phase out
  • 16:45 - 16:48
    your new, your new endpoints, and you can
    own
  • 16:48 - 16:51
    the timing of when you build out new endpoint
  • 16:51 - 16:54
    features. Again. Some of the, or some of the
  • 16:54 - 16:56
    cons are that, it is not trivial to build
  • 16:56 - 17:00
    synchronization worker, and it is less trivial
    to build
  • 17:00 - 17:04
    a validation engine for the data to make sure
  • 17:04 - 17:05
    that you don't get it out of sync when
  • 17:05 - 17:07
    you're pulling from the original source. And
    then there
  • 17:07 - 17:12
    are race conditions involved in this as well.
  • 17:12 - 17:15
    A.P.: So Jason and I work on a team
  • 17:15 - 17:19
    that manages inventory, as I said earlier.
    One of
  • 17:19 - 17:22
    the, looking a little further down the road,
    one
  • 17:22 - 17:24
    of the things we needed to do was get,
  • 17:24 - 17:26
    now we needed to get vouchers out of the
  • 17:26 - 17:30
    orders service. Another service extraction.
    And vouchers are actually
  • 17:30 - 17:33
    the things that customers redeem.
  • 17:33 - 17:38
    So, a simplified example of what a voucher
    actually
  • 17:38 - 17:41
    like would look like, except that now we have
  • 17:41 - 17:45
    an id, which is stored in our database. We
  • 17:45 - 17:47
    have the price, which is stored in a legacy
  • 17:47 - 17:51
    database, and now, Groupon's grown since orders.
    We now
  • 17:51 - 17:55
    have an international platform codebase that
    serves many different
  • 17:55 - 18:00
    countries. We have offices in Berlin, London,
    Chinai, Korea,
  • 18:00 - 18:03
    and many more places. But yeah. Now we've
    got
  • 18:03 - 18:06
    to make it, but our service's responsibility
    is to
  • 18:06 - 18:07
    make it seem like none of that matters. Anyone
  • 18:07 - 18:10
    asking for voucher data needs to know about
    all
  • 18:10 - 18:11
    voucher data.
  • 18:11 - 18:13
    Our services need to be global as well. So,
  • 18:13 - 18:17
    this is what our world looks like. And this
  • 18:17 - 18:18
    is how our service needs to be built on
  • 18:18 - 18:24
    top of that. What helped, in managing these
    different
  • 18:24 - 18:27
    sources of truth, was this manager accessor
    pattern in
  • 18:27 - 18:32
    our code base. Specifically, oh. Let me check
    if
  • 18:32 - 18:37
    I need to- yeah. Specifically, next slide
    please, this
  • 18:37 - 18:39
    is what, this is how it helped our code
  • 18:39 - 18:41
    base. Because in the controller, you could
    just specify,
  • 18:41 - 18:44
    you could talk, talk to this manager object,
    and
  • 18:44 - 18:46
    you'd say, find me this voucher.
  • 18:46 - 18:49
    And the manager, can you jump to that? All
  • 18:49 - 18:50
    right, it's gonna look like a lot of code,
  • 18:50 - 18:53
    but let's go step-by-step. In the manager,
    that's where
  • 18:53 - 18:56
    all the complexity lies. You have the accessor
    that
  • 18:56 - 18:58
    accesses local data. You have an accessor,
    a separate
  • 18:58 - 19:01
    accessor - and accessors are just simply,
    all they
  • 19:01 - 19:06
    do is persistence and finding, and finding
    data -
  • 19:06 - 19:09
    so the accessors for the legacy database here,
    the
  • 19:09 - 19:12
    cobra accessor, you get that price information,
    and then
  • 19:12 - 19:16
    you have an international accessor that goes,
    it could
  • 19:16 - 19:19
    be a database call or, in our case, that's
  • 19:19 - 19:23
    a HTTP call across the ocean.
  • 19:23 - 19:25
    And then you bring all that together, wrap
    it
  • 19:25 - 19:27
    in a model and have it return that back
  • 19:27 - 19:31
    to your controller. Hang on.
  • 19:31 - 19:35
    All right. So, definitely pros and cons to
    this
  • 19:35 - 19:37
    approach. One of the things was, it's easy
    to
  • 19:37 - 19:40
    incorporate many different data sources. We
    call that a
  • 19:40 - 19:43
    facade because it kind of hides all of that.
  • 19:43 - 19:46
    But the, behind the backend of it is really
  • 19:46 - 19:47
    more complex.
  • 19:47 - 19:52
    And, but you hide that complexity. That your
    accessors
  • 19:52 - 19:54
    are bound to the schema changes. So, our cobra
  • 19:54 - 19:57
    accessor still has to know about the legacy
    schema.
  • 19:57 - 20:00
    And you're, you, you can't really, making
    changes there
  • 20:00 - 20:02
    is not trivial.
  • 20:02 - 20:05
    And, sometimes you can use that as a crutch.
  • 20:05 - 20:07
    So if someone asks you, can you give me
  • 20:07 - 20:09
    this piece of data about a voucher, I really
  • 20:09 - 20:11
    need it, and you want to expose it to
  • 20:11 - 20:13
    the endpoints, you're like, well, I do have
    access
  • 20:13 - 20:15
    to the database or I could just make a
  • 20:15 - 20:17
    call. And now you, now you're serving the
    end-
  • 20:17 - 20:20
    that data, and you're tied to serving that
    data
  • 20:20 - 20:21
    in your API.
  • 20:21 - 20:24
    But the important thing there is to be diligent,
  • 20:24 - 20:26
    and as soon as you start serving that, they'll
  • 20:26 - 20:31
    put a strategy together to, actually on that
    data.
  • 20:31 - 20:34
    Otherwise you're, the complexity in the manager,
    which is
  • 20:34 - 20:37
    both a pro and a con, will always be
  • 20:37 - 20:40
    there. The purpose of the manager is that
    it
  • 20:40 - 20:43
    hides that complexity, but as you start owning
    more
  • 20:43 - 20:46
    data, it should become simpler.
  • 20:46 - 20:50
    J.S.: So, these, these three extraction patterns
    that we've
  • 20:50 - 20:55
    gone through are just a little bit of, a
  • 20:55 - 20:57
    little bit of what's going on. There are different
  • 20:57 - 21:01
    service extraction patterns going on, both
    at Groupon and
  • 21:01 - 21:06
    probably in your worlds too. So, again, this
    is
  • 21:06 - 21:08
    just a example of some of the ways that
  • 21:08 - 21:11
    we've chosen to do things. There are other
    interesting
  • 21:11 - 21:13
    talks about this this week at RailsConf going
    on,
  • 21:13 - 21:16
    so be, it'd be neat to check those out,
  • 21:16 - 21:17
    too, if you want to talk to us about
  • 21:17 - 21:18
    them.
  • 21:18 - 21:21
    But, you should definitely consider letting
    your teams own
  • 21:21 - 21:23
    their tactics if you're trying to make decisions
    about
  • 21:23 - 21:27
    doing SOA, because you might find some neat
    things
  • 21:27 - 21:28
    that you didn't know about.
  • 21:28 - 21:30
    A.P.: Yeah. So I'm gonna stand over here cause
  • 21:30 - 21:33
    I feel like I'm just talking to these guys.
  • 21:33 - 21:35
    But yeah. So, there's definitely a lot of
    things
  • 21:35 - 21:38
    that we learned from doing these different
    service extractions.
  • 21:38 - 21:39
    Like Jason said, there are a lot of other
  • 21:39 - 21:43
    service extractions that happened at Groupon
    and continue to
  • 21:43 - 21:45
    happen today.
  • 21:45 - 21:49
    But, taming a cobra is serious business. I
    mean,
  • 21:49 - 21:52
    like I always say, YPAGNIRN. You probably
    ain't gonna
  • 21:52 - 21:57
    need it right now. But, but the, but, like,
  • 21:57 - 21:59
    the tipping point on which you need to start
  • 21:59 - 22:04
    going towards service-oriented architecture
    isn't just black or white.
  • 22:04 - 22:07
    It's, it's more of an art than a science.
  • 22:07 - 22:08
    But as soon as you start talking about service-oriented
  • 22:08 - 22:11
    architecture, once you start feeling the pains,
    you need
  • 22:11 - 22:14
    to put, put together a strategy to accomplish
    that.
  • 22:14 - 22:15
    J.S.: Yeah. You don't want to sit around and
  • 22:15 - 22:17
    wait for Oprah to blow your site up.
  • 22:17 - 22:21
    A.P. But there's also the importance of allowing
    your
  • 22:21 - 22:25
    domain to actually evolve. Models that you
    think are
  • 22:25 - 22:27
    important in the beginning aren't gonna be
    important later
  • 22:27 - 22:31
    on. And it, that's the big benefit of a
  • 22:31 - 22:34
    cobra, is that it allows you to iterate quickly.
  • 22:34 - 22:36
    J.S.: Something else that we have also learned
    is
  • 22:36 - 22:38
    that when you go into service extraction,
    it's really
  • 22:38 - 22:42
    important that you actually have a strategy.
    Know what
  • 22:42 - 22:45
    you need to break apart. Know what you need
  • 22:45 - 22:48
    to leave in the monolith. These are important
    things
  • 22:48 - 22:51
    to consider. Know what the priorities are
    between those
  • 22:51 - 22:54
    things. It's very, it's very tricky to just
    go
  • 22:54 - 22:58
    about service extraction very scattershot
    and not really understanding
  • 22:58 - 23:01
    your business model or what benefits you derive
    from
  • 23:01 - 23:04
    extracting certain pieces over others.
  • 23:04 - 23:05
    You should prefer the things that are clearly
    like
  • 23:05 - 23:09
    their own thing, their own components, or
    things that
  • 23:09 - 23:13
    are particular maintenance problems or represent
    some sort of
  • 23:13 - 23:17
    legacy design or, or strange behavior. But
    the other
  • 23:17 - 23:20
    important part of having a strategy is that
    you
  • 23:20 - 23:24
    should expect the unexpected. Scope creep
    will bite you,
  • 23:24 - 23:26
    and you know, as these, as these code bases
  • 23:26 - 23:29
    get bigger, pulling out of them becomes a
    lot
  • 23:29 - 23:34
    more of a tricky process than you might envision.
  • 23:34 - 23:36
    Another thing that's important is that you,
    you think
  • 23:36 - 23:39
    about your entire service stack. And you should
    know
  • 23:39 - 23:42
    your business, and so you should know, or
    you
  • 23:42 - 23:45
    should at least conceptualize how all of those
    parts
  • 23:45 - 23:47
    of your business are gonna fit together.
  • 23:47 - 23:49
    How does the data flow between them? What
    are
  • 23:49 - 23:53
    the service agreements between those, those
    compartments? That's all
  • 23:53 - 23:55
    important to know. You're gonna need to be
    caching
  • 23:55 - 23:59
    between services for, for load. You're gonna
    need to
  • 23:59 - 24:06
    be caching services for, for latency requirements.
    So you
  • 24:06 - 24:08
    have to serve upstream to some kind of complex
  • 24:08 - 24:11
    algorithm. That algorithm is gonna need zero
    latency return
  • 24:11 - 24:12
    from your service.
  • 24:12 - 24:13
    You need to be thinking about all of these
  • 24:13 - 24:17
    kinds of things when you're doing service
    extraction.
  • 24:17 - 24:20
    A.P.: And the way Jason's saying it is, is
  • 24:20 - 24:23
    definitely makes it seem like, oh, it's one
    slide
  • 24:23 - 24:25
    on our deck. But each of those topics could
  • 24:25 - 24:29
    be a separate talk. And they are. So, definitely,
  • 24:29 - 24:30
    there's a lot of learn in that, in that
  • 24:30 - 24:31
    domain.
  • 24:31 - 24:35
    J.S.: Right. Just in terms of actual topics
    in
  • 24:35 - 24:37
    it, another thing you want to think about
    is
  • 24:37 - 24:40
    messaging. Inter-service messaging, when you're
    pulling these services apart,
  • 24:40 - 24:42
    they do need to talk to each other. You
  • 24:42 - 24:45
    should definitely think about what do those
    messages look
  • 24:45 - 24:50
    like. What are their delivery SOAs? Do you
    guarantee
  • 24:50 - 24:52
    that they're delivered? Do you guarantee the
    order that
  • 24:52 - 24:55
    they're delivered in? What are the payloads
    look like?
  • 24:55 - 24:58
    Think about all of this stuff.
  • 24:58 - 25:02
    And, you also need to consider your, concern
    yourself
  • 25:02 - 25:06
    with authentication and authorization. These
    are, these are important
  • 25:06 - 25:08
    topics. I think like, there was a talk about
  • 25:08 - 25:09
    this yesterday-
  • 25:09 - 25:10
    A.P.: There were two.
  • 25:10 - 25:12
    J.S.: Oh, there were two talks about this
    yesterday.
  • 25:12 - 25:14
    But you should know what you're, know what
    you're
  • 25:14 - 25:17
    users are doing. Your sites getting bigger.
    Your users
  • 25:17 - 25:20
    are getting more complicated. Know, know what
    they need
  • 25:20 - 25:22
    access to. Know how they get into your, how
  • 25:22 - 25:23
    they get into your services, how they get
    through
  • 25:23 - 25:26
    your services. And know what they can do at
  • 25:26 - 25:29
    each step of the way.
  • 25:29 - 25:32
    A.P.: And you need to create like a supportive,
  • 25:32 - 25:36
    supporting environment for services. We were
    lucky, we had
  • 25:36 - 25:40
    entire teams devoted to building tools, to,
    that make
  • 25:40 - 25:43
    it easier to spin up services easily. And
    a
  • 25:43 - 25:48
    release engineering team that made it easier
    to re,
  • 25:48 - 25:52
    deploy these services. All those became really
    easy for
  • 25:52 - 25:55
    us, but if, in your company, you need to
  • 25:55 - 25:57
    make sure that, or in your application, you
    need
  • 25:57 - 25:58
    to make sure that you think about these things
  • 25:58 - 26:02
    and devote tools and time to making those
    things
  • 26:02 - 26:02
    simpler.
  • 26:02 - 26:06
    Also, now is the time to start considering
    uuids.
  • 26:06 - 26:10
    As soon as you start talking about service-oriented
    architecture,
  • 26:10 - 26:15
    go to uuids from the start. This will immediately
  • 26:15 - 26:18
    separate you from your database, and that's
    gonna be
  • 26:18 - 26:20
    really important, because you're gonna be
    moving data from
  • 26:20 - 26:23
    one source to another.
  • 26:23 - 26:26
    And, you need to write code good. You know,
  • 26:26 - 26:29
    like, it's hard to. I mean, it's easy to
  • 26:29 - 26:31
    say, say that, but it's hard to do. Think
  • 26:31 - 26:34
    about the solid principles. Think about where
    things belong.
  • 26:34 - 26:37
    Ask yourself, am I coupling these two components
    together
  • 26:37 - 26:41
    for the fu- and is that useful enough that
  • 26:41 - 26:43
    it's gonna cause me a lot of pain later
  • 26:43 - 26:43
    in the future?
  • 26:43 - 26:46
    J.S.: So when you're writing your code good,
    you
  • 26:46 - 26:49
    should be thinking about your models. Those
    models are
  • 26:49 - 26:52
    gonna become your APIs. They're gonna become
    your service
  • 26:52 - 26:56
    APIs. So consider your public methods. What
    are you
  • 26:56 - 26:59
    putting in the public space of that model?
    Is
  • 26:59 - 27:01
    it named well? Does it represent what your
    service
  • 27:01 - 27:03
    should be doing?
  • 27:03 - 27:06
    Make sure that, while you're building up your
    cobras,
  • 27:06 - 27:09
    that your models are reflective of the way
    you
  • 27:09 - 27:12
    intend for your service APIs to look like,
    should
  • 27:12 - 27:15
    you ever need to go down that road.
  • 27:15 - 27:19
    A.P.: And, like I said earlier, avoid tangling
    those
  • 27:19 - 27:24
    components together. Specifically in Rails,
    when you introduce associations,
  • 27:24 - 27:26
    you're kind of expanding that API that Jason
    was
  • 27:26 - 27:30
    talking about. All those, now you're creating
    ways for
  • 27:30 - 27:34
    developers to reach through these models and
    get data,
  • 27:34 - 27:36
    and that'll couple them together and make
    it harder
  • 27:36 - 27:39
    for you to separate them.
  • 27:39 - 27:43
    J.S.: Test. Who's here, who here tests? Anyone
    test?
  • 27:43 - 27:44
    A.P.: Not DHH.
  • 27:44 - 27:47
    J.S.: Nope. You don't test anymore. You should
    be
  • 27:47 - 27:50
    testing. You should be testing at high levels.
    Avoid
  • 27:50 - 27:54
    the unit tests. If you can avoid the unit
  • 27:54 - 27:57
    tests. Especially because once you start doing
    service extraction,
  • 27:57 - 28:01
    you will break assloads of unit tests.
  • 28:01 - 28:02
    Make sure you write your high-level tests
    first. Make
  • 28:02 - 28:05
    sure you've got solid coverage on those high-level
    end
  • 28:05 - 28:10
    to end tests. Secondly, as you are doing service
  • 28:10 - 28:12
    extraction, it is not trivial to be spinning
    up
  • 28:12 - 28:15
    other services quickly in order to test end
    to
  • 28:15 - 28:18
    end, but you should be thinking about how
    you
  • 28:18 - 28:20
    might be doing that. Because otherwise you're
    going to
  • 28:20 - 28:23
    be doing a lot of stubbing, and that gets
  • 28:23 - 28:25
    very painful and gets error-prone.
  • 28:25 - 28:29
    A.P.: I mean, when we talked to the developers
  • 28:29 - 28:30
    who had to do some of the tougher service
  • 28:30 - 28:34
    extractions, they were like, I wish we had
    more
  • 28:34 - 28:36
    integration specs. Because we're gonna be
    changing a lot
  • 28:36 - 28:38
    of this stuff, and we need to know if
  • 28:38 - 28:40
    it works. If you've got a good set of
  • 28:40 - 28:43
    integrations, integration tests, you can be
    a lot more
  • 28:43 - 28:46
    confident about making those changes.
  • 28:47 - 28:49
    Next, over there?
  • 28:49 - 28:49
    J.S.: Yup.
  • 28:50 - 28:53
    A.P.: Yeah. So, you need to communicate. I
    mean,
  • 28:53 - 28:57
    everyone always says this, but like, when
    you solve
  • 28:57 - 29:00
    a problem, when you're spinning up a service,
    you're
  • 29:00 - 29:02
    gonna, and as more teams are spinning up services,
  • 29:02 - 29:04
    a lot of you are gonna be encountering the
  • 29:04 - 29:07
    same problems. So when you solve a problem,
    share
  • 29:07 - 29:09
    it. Make it a gem, write it down, put
  • 29:09 - 29:11
    it in a wiki, and tell people about it.
  • 29:11 - 29:15
    Give talks. Because it's gonna be hard to,
    I
  • 29:15 - 29:18
    mean, you don't want people solving the same
    problems.
  • 29:18 - 29:22
    At Groupon, we have this, Core Architecture
    Forum, it's
  • 29:22 - 29:24
    called, and basically it's got a bunch of
    people
  • 29:24 - 29:27
    who meet, and you can say, I'm gonna spin
  • 29:27 - 29:29
    up a new service, or I'm gonna solve this
  • 29:29 - 29:32
    problem. Have you seen this before? They're
    gonna help
  • 29:32 - 29:35
    you answer questions like, what's, has someone
    else solved
  • 29:35 - 29:38
    this already? Is there a similar problem?
    Is there
  • 29:38 - 29:40
    a particular technology that would help you
    solve that
  • 29:40 - 29:44
    problem better? All those questions are really
    important to
  • 29:44 - 29:49
    ask so that you don't reinvent the wheel over
  • 29:49 - 29:51
    and over again.
  • 29:51 - 29:55
    What else? Oh yeah. One more thing. One more
  • 29:55 - 29:55
    thing. That sounds like Steve Jobs. One more
    thing.
  • 29:55 - 29:58
    We have the interest, we have interest leagues
    at
  • 29:58 - 30:02
    Groupon, which are just internal user groups
    for Clojure,
  • 30:02 - 30:05
    Java. We even have one for onboarding. You
    know,
  • 30:05 - 30:07
    there's are really cool. And that's another
    way to
  • 30:07 - 30:10
    help communicate, like, what's happening.
    Once your company gets
  • 30:10 - 30:15
    big enough, that's really important.
  • 30:15 - 30:21
    J.S.: So. In conclusion, cobras are great.
  • 30:21 - 30:21
    A.P.: Yeah. They're awesome.
  • 30:21 - 30:24
    J.S.: Rails is great. And cobras do serve
    a
  • 30:24 - 30:26
    useful purpose.
  • 30:26 - 30:32
    A.P.: Oh. But beware. It's not so simple.
  • 30:32 - 30:36
    J.S.: Once you decide that you're gonna start
    raising
  • 30:36 - 30:41
    up a baby cobra, be ready for what comes
  • 30:41 - 30:41
    next.
  • 30:41 - 30:47
    A.P.: Oh. Yeah. And. OK, so. Got his part.
  • 30:47 - 30:51
    We're hiring. I mean, if you want to come
  • 30:51 - 30:54
    help us solve some of these problems, come
    talk
  • 30:54 - 30:56
    to us after the talk. There's a booth downstairs.
  • 30:56 - 31:01
    You can go to this website. Tweet at us.
  • 31:01 - 31:04
    I'd like that. But yeah. Join us.
  • 31:04 - 31:07
    J.S.: And we are standing on other people's
    shoulders
  • 31:07 - 31:07
    here.
  • 31:07 - 31:07
    A.P.: Yeah.
  • 31:07 - 31:10
    J.S.: A lot of these folks are people who
  • 31:10 - 31:12
    helped with the talk or who helped actually
    do
  • 31:12 - 31:15
    a lot of this service extraction work. This
    does
  • 31:15 - 31:19
    not comprise the total list, but we definitely
    wanted
  • 31:19 - 31:20
    to bring attention to these people.
  • 31:20 - 31:22
    A.P.: Yeah, and I mean. People like these
    guys,
  • 31:22 - 31:24
    they gave us a lot of feedback when we
  • 31:24 - 31:28
    did the talk at, at Groupon. And having people
  • 31:28 - 31:31
    who will mentor and, like, spend time to help
  • 31:31 - 31:34
    you understand things, I mean, that's the
    reason I
  • 31:34 - 31:36
    work at Groupon.
  • 31:36 - 31:36
    J.S.: Thank you all.
  • 31:39 - 31:42
    A.P.: [drowned out by applause]
Title:
RailsConf 2014 - Service Extraction at Groupon Scale by Jason Sisk & Abhishek Pillai
Description:

more » « less
Duration:
32:05

English subtitles

Revisions