RailsConf 2014 - Service Extraction at Groupon Scale by Jason Sisk & Abhishek Pillai

Edit subtitles

0:19 - 0:20

ABHISHEK PILLAI: Thanks for coming. I know
there's
0:20 - 0:23

some other cool talks right now, but you're
here so
0:23 - 0:26

that's awesome. Let's get started. You're
here to
0:26 - 0:29

learn about how to tame COBRAs.
0:29 - 0:34

JASON SISK: My name is Jason Sisk. I work
0:34 - 0:37

at Groupon. I've been here for a couple of
0:37 - 0:42

years. I work on predominantly Ruby/Rails
systems, backend development,
0:42 - 0:44

et cetera, and I do not like onions.
0:44 - 0:48

A.P.: My name is Abi, and I'm at, I've
0:48 - 0:51

been at Groupon for about two years, too.
And
0:51 - 0:53

Jason and I work on a team that does
0:53 - 0:58

backend service, basically managing inventory.
And I don't like
0:58 - 0:59

fruits.
0:59 - 1:01

J.S.: So part of what we're gonna tell you
1:01 - 1:04

today is a little bit of a history lesson
1:04 - 1:07

about the early pain of Groupon having site
outages,
1:07 - 1:11

et cetera, due to Rails scaling. We want to
1:11 - 1:13

tell you about the story of the developers
that
1:13 - 1:16

actually handled those problems and some of
the decisions
1:16 - 1:20

that they made. So that's that.
1:20 - 1:23

But we want to lead off with one important
1:23 - 1:23

point.
1:23 - 1:30

A.P.: Boom! Pause. You don't have to pause
for
1:30 - 1:32

that long. And, yeah.
1:32 - 1:37

J.S.: So. Back, back around 2007, we were
doing
1:37 - 1:39

what all the other cool kids were doing. We
1:39 - 1:43

were using a Rails monolith, and to some degree
1:43 - 1:46

still are. Rails 2 is a great framework. Who
1:46 - 1:48

is using Rails 2? Anyone?
1:48 - 1:49

AUDIENCE: Yeah!
1:49 - 1:50

J.S.: All right.
1:50 - 1:51

A.P.: Awesome.
1:51 - 1:55

J.S.: You and us. Rails is a great framework.
1:55 - 1:58

We all love Rails. That's why we're here.
We
1:58 - 2:03

still love Rails and that's why we're here.
But
2:03 - 2:05

what's great about it is that it's great for
2:05 - 2:08

Agile teams. It's, and for us it was really
2:08 - 2:12

simple. We could make some really quick decisions.
We
2:12 - 2:16

could iterate product very quickly. We could
iterate new
2:16 - 2:18

features. And we could do it with a small
2:18 - 2:20

team of five to ten devs.
2:20 - 2:23

We had a single repository. We had a single
2:23 - 2:25

test suite. And we had a single deploy process.
2:25 - 2:26

Very simple.
2:26 - 2:29

A.P.: And, most importantly, you, we had like
one
2:29 - 2:32

shared, conceptual understanding of the code
base. When we
2:32 - 2:33

wanted to make a change, we knew where to
2:33 - 2:37

put it. And things were simple that way.
2:37 - 2:40

J.S.: Also what was great was, and still is,
2:40 - 2:44

about Rails, that integrating components is
really easy. The
2:44 - 2:48

convention over configuration, model associations
- all of that
2:48 - 2:50

business you can put together things very
quickly and
2:50 - 2:53

very easily. But we didn't come here to talk
2:53 - 2:55

to you about Rails.
2:55 - 2:59

A.P.: We came here to tell you about cobras,
2:59 - 3:03

and how to tame them. At Groupon, we actually
3:03 - 3:05

have a mo- monolith, and we call it the
3:05 - 3:08

primary web app. But Jason had a thought for
3:08 - 3:10

the purposes of this talk, we'd come up with
3:10 - 3:13

a more scientifically accurate name for it.
3:13 - 3:20

Yeah. So. Centralized Omnipotent Big-ass Rails
Application.
3:20 - 3:22

J.S.: Big-ass. So we want to take you back
3:22 - 3:27

to 2009 for just a minute. So Groupon was
3:27 - 3:29

about two years old, give or take, and we
3:29 - 3:31

were still kind of kicking into gear. People
would
3:31 - 3:34

come into the office in Chicago we've got,
open
3:34 - 3:37

up New Relic, and they'd see stuff like this.
3:37 - 3:39

A.P.: So as you can see, like, in the
3:39 - 3:42

middle of the night, it's great. Everything's
working really
3:42 - 3:44

well. Soon as people woke up and started using
3:44 - 3:48

it - damn people - our performance immediately
started
3:48 - 3:52

to drop.
3:52 - 3:55

And then eight months later, we had about
thirty
3:55 - 4:00

thousand requests per minute and everything
was on fire.
4:00 - 4:02

J.S.: We blame Oprah.
4:02 - 4:04

A.P.: As you do.
4:04 - 4:08

J.S.: It's Oprah's fault. Oprah crashed Groupon.
Oprah crashed
4:08 - 4:13

Groupon not once, but at least twice. And
also
4:13 - 4:16

the Gap crashed Groupon too. Actually, the
truth is,
4:16 - 4:20

Groupon crashed Groupon. We were not scaling
properly. Bad.
4:20 - 4:22

Bad Groupon.
4:22 - 4:28

The Cobra was getting fatter and fatter. We
were
4:28 - 4:29

up to-
4:29 - 4:34

A.P.: Yeah. So. We were up to, we started,
4:34 - 4:36

we had, like, five to fifty devs. We started
4:36 - 4:38

with about three to five hundred commits per
month.
4:38 - 4:40

Slowly, and in a couple of years, as you
4:40 - 4:43

can see, we were averaging about two thousand
commits
4:43 - 4:45

in a single month. We had a lot of
4:45 - 4:47

developers developing a lot of things.
4:47 - 4:50

J.S.: This is all one cobra.
4:50 - 4:54

A.P.: And you know, we started thinking about
SOA
4:54 - 4:57

at that point. It was already becoming really
painful.
4:57 - 5:01

But we looked at the cobra, directly in the
5:01 - 5:03

eyes, and it scared the shit out of us.
5:03 - 5:06

J.S.: We had a lot of scoping problems. And
5:06 - 5:10

a lot of that had to do with model
5:10 - 5:12

coupling. So, one of the biggest things that
was
5:12 - 5:17

keeping us from extracting services early
was as the,
5:17 - 5:19

as the code grew, you had a lot of
5:19 - 5:22

sort of natural convention coupling that was
happening in
5:22 - 5:23

the models.
5:23 - 5:26

So a little bit of a over-simplified example
here.
5:26 - 5:30

But you have a, let's say you have, you're
5:30 - 5:32

on the MyGroupon's page. You want to look
at
5:32 - 5:34

all of the Groupons that you've bought. And
you
5:34 - 5:36

want to see all the titles for all of
5:36 - 5:37

those. So when we go to render the interface
5:37 - 5:40

we want to display all these deal titles.
In
5:40 - 5:42

the cobra, you might find a set of dependent
5:42 - 5:44

relationships that are somewhat like this,
where you can
5:44 - 5:48

see the cyclical dependencies.
5:48 - 5:51

But building these types of associations was
fairly common
5:51 - 5:56

place, which was kind of bad in some ways.
5:56 - 5:59

So in this case, you would instantiate a user,
5:59 - 6:01

which would require a database lookup to the
Users
6:01 - 6:05

table, select star, and, and you would map
over
6:05 - 6:08

that, that user's orders to get all of the
6:08 - 6:10

deal titles.
6:10 - 6:13

In this, in this case, there is a Demeter
6:13 - 6:17

violation. Demeter violations are bad.
6:17 - 6:20

A.P.: And it looks clean. I mean, it looks
6:20 - 6:23

good. But, what it does is couples our components.
6:23 - 6:26

J.S.: Here is an example of what I was
6:26 - 6:30

talking about. You, you have a basically unnecessarily-
unnecessary
6:30 - 6:34

table lookup to Users. Now, if you're designing
your
6:34 - 6:37

applications well, you can avoid this right
out of
6:37 - 6:40

the gate. But Rails conventions don't, don't
encourage you
6:40 - 6:42

to avoid this right out of the gate. And
6:42 - 6:46

ActiveRecord DSL for, for advanced queries
aren't something that
6:46 - 6:49

people just tend to do by default. Or at
6:49 - 6:50

least they didn't in 2009.
6:50 - 6:55

A.P.: Yeah. And, I mean. Things got a lot
6:55 - 6:59

worse, because our code base and cobra was
just
6:59 - 7:02

getting bigger and bigger. You can see here
it's
7:02 - 7:08

almost two million lines of code at this point.
7:08 - 7:09

And, oh yeah, we have to stay up 100%
7:09 - 7:14

of the time. So that's a problem. All right.
7:14 - 7:17

J.S.: Also, the database is completely on
fire.
7:17 - 7:22

A.P.: So yeah. We were in quite a pickle.
7:22 - 7:29

It was painful. Testing sucked. I mean, we
had
7:31 - 7:33

to wait like forty-five minutes for a build
to
7:33 - 7:36

run. You basically ran your tests and then
figure
7:36 - 7:38

out something else to do, because you had
to
7:38 - 7:41

wait while your tests ran. And a lot of
7:41 - 7:44

our release engineer devoted a lot of effort
to
7:44 - 7:46

make those tests run faster.
7:46 - 7:50

J.S.: Deploys were terrible. Deploy, deploy
process was somewhere
7:50 - 7:52

on the, on the scale of three hours to
7:52 - 7:56

deploy the, the application. Just a really
bad development
7:56 - 7:58

experience, especially as you start to have
teams that,
7:58 - 8:02

that split, split ownership. They want to
iterate on
8:02 - 8:04

features that matter to their team, and they
don't
8:04 - 8:07

want to be held up by this gigantic monolithic
8:07 - 8:07

application.
8:07 - 8:10

And, and it's, you know, the, the deploy's
only
8:10 - 8:12

happening once a week. That really hurts the
team's
8:12 - 8:14

ability to set, that maybe wants to do continuous
8:14 - 8:16

deployment. So, it sucked.
8:16 - 8:21

A.P.: Yeah. I mean, and development pace was
increasing,
8:21 - 8:23

as you saw, and, I mean, what's the best
8:23 - 8:25

place to put the next line of code, as
8:25 - 8:27

I heard in a talk earlier. It's the place
8:27 - 8:30

that you're changes. Models got bloated, and
there's a
8:30 - 8:31

lot of cruft.
8:31 - 8:33

J.S.: So all of these things were terrible.
It
8:33 - 8:38

was very painful. So, we decided to move towards
8:38 - 8:41

service extraction a little bit more seriously.
8:41 - 8:45

If there's a big take away from this first
8:45 - 8:47

section, we just want you to remember that
cobras
8:47 - 8:54

are great. They are great. Until they aren't.
8:55 - 8:58

A.P.: So we needed to alleviate this pain
immediately.
8:58 - 9:00

We needed to get that code out of there.
9:00 - 9:04

We needed a quick extraction. So we decided
to
9:04 - 9:07

extract a new service and build it on top
9:07 - 9:10

of our current schema. We decided to start
with
9:10 - 9:14

the order service, because. I mean. It was
causing
9:14 - 9:17

a lot of database contention. We had a lot
9:17 - 9:19

of people buying a lot of Groupons, and, a
9:19 - 9:21

good problem to have, but it was bringing
our
9:21 - 9:21

database down.
9:21 - 9:23

So we needed to get that code out of
9:23 - 9:28

it, and also another thing behind the, behind
choosing
9:28 - 9:30

orders to start is that, you know, it's gonna
9:30 - 9:32

be a long-lived model, a long living model
in
9:32 - 9:35

our domain. We know that for sure.
9:35 - 9:38

So, to illustrate, this is what it looks like
9:38 - 9:41

in the beginning. And this is what we're trying
9:41 - 9:44

to accomplish. You have an orders, you have
the
9:44 - 9:46

cobra, and then we're trying to have a separate
9:46 - 9:49

orders codebase, which will have its own database.
But
9:49 - 9:53

it continues to have re- a read-only access
to
9:53 - 9:56

the cobra's database, because we didn't focus
on completely
9:56 - 10:01

making the cobra, the order service, re, stopping,
stopping
10:01 - 10:04

it from reaching back into the cobra's database.
10:04 - 10:08

And, I mean, the cobra was really sneaky.
It
10:08 - 10:11

was really tough to find all the ways that,
10:11 - 10:14

with Rails callbacks and model associations,
all the ways
10:14 - 10:19

that the components were coupled.
10:19 - 10:22

So we built some tools to make that easier.
10:22 - 10:23

This is one of them. The service wall, as
10:23 - 10:25

call it. We're trying to, the main goal here
10:25 - 10:30

is separating the concerns of orders within
the application.
10:30 - 10:33

So, you start with having your services in
a
10:33 - 10:38

separate directory. Let's see a closer look
of it.
10:38 - 10:40

You have the order service in its own directory,
10:40 - 10:43

and you have its own app, its own lib,
10:43 - 10:45

its own specs. The way that works is that
10:45 - 10:48

in environment dot rb file, we iterated through
these
10:48 - 10:50

services and added them to the load path.
So
10:50 - 10:53

the application to the application looks like
it's just
10:53 - 10:57

one big application, but for our purposes,
the code
10:57 - 10:59

was separate.
10:59 - 11:03

So, this is like, a small example of how
11:03 - 11:06

service wall works. You have this disable
model access
11:06 - 11:12

method that basically, if, if you specify
the models
11:12 - 11:15

that you want to, if you specify the service
11:15 - 11:19

that you want to disable or deprecate, and
it'll
11:19 - 11:23

figure out the models of that service and
add
11:23 - 11:29

it to this do-not-touch list. And basically
raise these
11:29 - 11:31

kinds of violations. So if you use the disable
11:31 - 11:34

model access model, when you run your tests,
it
11:34 - 11:37

will put up this message saying, you don't
have
11:37 - 11:39

access to this method.
11:39 - 11:41

When a deal is trying to access an order,
11:41 - 11:43

we can figure that out just by running our
11:43 - 11:47

tests. If you use the more friendlier, deprecate
service
11:47 - 11:50

mo- deprecate model access method, then you
can be
11:50 - 11:53

more permissive and it'll just log it to a
11:53 - 11:55

file. You can see that in development mode
or
11:55 - 11:58

you can have it on staging, and that'll basically,
11:58 - 11:59

that'll allow you to find all the places where
11:59 - 12:03

you're having service infractions.
12:03 - 12:05

You can't do this in production though, because
it
12:05 - 12:09

causes a serious produ- performance hit.
12:09 - 12:14

Oh yeah. So this is how, so this is
12:14 - 12:18

how you actually use the service wall. Use,
you,
12:18 - 12:22

at the top of your controller, you disable,
use
12:22 - 12:25

the method disable_model_access or deprecate_model_access,
depending on what you
12:25 - 12:27

want to do. You tell it what service, and
12:27 - 12:30

it even lets you exempt some actions that
you
12:30 - 12:31

don't want to raise violations on yet.
12:31 - 12:36

That way you can comment out that action and
12:36 - 12:38

tackle one action at a time. Which endpoints
are
12:38 - 12:41

actually reaching over and causing the service
wall infraction.
12:41 - 12:46

J.S.: So, in addition to the service wall,
one,
12:46 - 12:48

one other problem with this approach, this
extraction approach
12:48 - 12:52

is that, because you necessarily fork the
code, you
12:52 - 12:54

get a lot of cruft left over from the
12:54 - 13:00

old, the old domain. So you find yourself
asking,
13:00 - 13:02

teams find themselves asking, very often,
is this endpoint
13:02 - 13:04

even used? Do we even care about this code
13:04 - 13:05

anymore?
13:05 - 13:10

So, a small team of Groupon developers hacked
together
13:10 - 13:13

something called Route 66 that we use internally
to
13:13 - 13:17

track down cruft in both our old cobra and
13:17 - 13:21

our new cobra. So it basically answers the
question,
13:21 - 13:23

are these endpoints used? I don't know if
you
13:23 - 13:24

can see this very well, but this is a
13:24 - 13:25

little bit of a UI.
13:25 - 13:26

A.P.: Yeah.
13:26 - 13:30

J.S.: But what we do is, we analyze log
13:30 - 13:34

files, we analyze, spelunk logs to come up
with
13:34 - 13:37

which controller actions are being hit, what's
the frequency.
13:37 - 13:39

Is this a route that is hit once a
13:39 - 13:42

week, you know. Once a, once a month? And
13:42 - 13:45

we can very aggressively decruft using this
tool as
13:45 - 13:47

well.
13:47 - 13:53

A.P.: All right. So there's definitely pros
to this
13:53 - 13:57

approach. Because you're focusing on just
separating the models,
13:57 - 14:00

I mean, just separating the code, you can
quickly
14:00 - 14:03

and not worry about spinning up a separate
database
14:03 - 14:05

schema, separate naming, all of that. You
just worry
14:05 - 14:08

about separating the code, and that focuses
the abstraction.
14:08 - 14:12

It makes it easier to spin up endpoints. But
14:12 - 14:13

the cons are, you're stilled tied to that
legac,
14:13 - 14:16

to that legacy database. Not such a bad thing
14:16 - 14:17

if you really need to get it out of
14:17 - 14:21

there. But, because you're forking this code
now, and
14:21 - 14:23

now it's being hit through endpoints, there
is still
14:23 - 14:26

a lot of cruft in the, in the, in
14:26 - 14:28

the code base. Because a lot of these endpoints
14:28 - 14:30

are now not being used.
14:30 - 14:32

J.S.: So this was the first extraction pattern
that
14:32 - 14:34

we used at Groupon to get out of the
14:34 - 14:39

original cobra, the original Groupon cobra.
But teams sort
14:39 - 14:41

of own their own tactics, and there are other
14:41 - 14:44

ways that they're doing it as well. One way
14:44 - 14:47

that, one way that service extraction is also
happening
14:47 - 14:50

is by using greenfield services that use a
message
14:50 - 14:54

bus. Sometimes you just need to keep that
legacy
14:54 - 14:56

API running, because there are a lot of client
14:56 - 14:58

dependencies on it. There's a lot of dependencies
on
14:58 - 15:01

the structure of the data.
15:01 - 15:03

But who likes doing greenfield work in here?
Raise
15:03 - 15:06

your hand if you like greenfield work. Right.
That
15:06 - 15:10

should be all of you. Whatever.
15:10 - 15:13

So, it is possible to do greenfield service
extraction,
15:13 - 15:17

and we're doing this as well. So, again, we
15:17 - 15:22

have a similar. Whoops. Juggling between power
point and
15:22 - 15:27

preview. Similar type of situation. You have
this cobra,
15:27 - 15:30

and then we get to the scenario that we're,
15:30 - 15:32

we're trying to reach with the greenfield
extraction, where
15:32 - 15:35

you have, in this case the red, the red
15:35 - 15:38

box represents all new code. There's a gem,
a
15:38 - 15:41

client gem that interact, that runs in the
original
15:41 - 15:43

cobra, that runs in the green cobra. And when
15:43 - 15:46

this service writes data to its db, a message
15:46 - 15:50

is sent that the green cobra consumes and
sends
15:50 - 15:53

over to its own data store, thus satisfying
all
15:53 - 15:57

of the legacy API requirements.
15:57 - 15:58

And then what's notable about this is to keep
15:58 - 16:03

everything in sync for service cut-overs,
rollouts, et cetera,
16:03 - 16:06

there is a background sync worker that runs,
that
16:06 - 16:09

syncs it one way from the old database to
16:09 - 16:13

the new database.
16:13 - 16:16

There are pros and cons to this approach as
16:16 - 16:19

well. Some of the better parts are that you
16:19 - 16:22

can get rid of your legacy data quickly, again.
16:22 - 16:24

Devs like greenfield stuff. You like to design
your
16:24 - 16:29

own systems. You also get to minimize the
cut-over
16:29 - 16:32

risk with your data sync. So you're not splitting
16:32 - 16:34

the table and you have to have all of
16:34 - 16:38

these API dependencies written on one hand
so that
16:38 - 16:42

when you break your database you don't have,
you
16:42 - 16:43

don't have failures.
16:43 - 16:45

So you can phase the, you can phase out
16:45 - 16:48

your new, your new endpoints, and you can
own
16:48 - 16:51

the timing of when you build out new endpoint
16:51 - 16:54

features. Again. Some of the, or some of the
16:54 - 16:56

cons are that, it is not trivial to build
16:56 - 17:00

synchronization worker, and it is less trivial
to build
17:00 - 17:04

a validation engine for the data to make sure
17:04 - 17:05

that you don't get it out of sync when
17:05 - 17:07

you're pulling from the original source. And
then there
17:07 - 17:12

are race conditions involved in this as well.
17:12 - 17:15

A.P.: So Jason and I work on a team
17:15 - 17:19

that manages inventory, as I said earlier.
One of
17:19 - 17:22

the, looking a little further down the road,
one
17:22 - 17:24

of the things we needed to do was get,
17:24 - 17:26

now we needed to get vouchers out of the
17:26 - 17:30

orders service. Another service extraction.
And vouchers are actually
17:30 - 17:33

the things that customers redeem.
17:33 - 17:38

So, a simplified example of what a voucher
actually
17:38 - 17:41

like would look like, except that now we have
17:41 - 17:45

an id, which is stored in our database. We
17:45 - 17:47

have the price, which is stored in a legacy
17:47 - 17:51

database, and now, Groupon's grown since orders.
We now
17:51 - 17:55

have an international platform codebase that
serves many different
17:55 - 18:00

countries. We have offices in Berlin, London,
Chinai, Korea,
18:00 - 18:03

and many more places. But yeah. Now we've
got
18:03 - 18:06

to make it, but our service's responsibility
is to
18:06 - 18:07

make it seem like none of that matters. Anyone
18:07 - 18:10

asking for voucher data needs to know about
all
18:10 - 18:11

voucher data.
18:11 - 18:13

Our services need to be global as well. So,
18:13 - 18:17

this is what our world looks like. And this
18:17 - 18:18

is how our service needs to be built on
18:18 - 18:24

top of that. What helped, in managing these
different
18:24 - 18:27

sources of truth, was this manager accessor
pattern in
18:27 - 18:32

our code base. Specifically, oh. Let me check
if
18:32 - 18:37

I need to- yeah. Specifically, next slide
please, this
18:37 - 18:39

is what, this is how it helped our code
18:39 - 18:41

base. Because in the controller, you could
just specify,
18:41 - 18:44

you could talk, talk to this manager object,
and
18:44 - 18:46

you'd say, find me this voucher.
18:46 - 18:49

And the manager, can you jump to that? All
18:49 - 18:50

right, it's gonna look like a lot of code,
18:50 - 18:53

but let's go step-by-step. In the manager,
that's where
18:53 - 18:56

all the complexity lies. You have the accessor
that
18:56 - 18:58

accesses local data. You have an accessor,
a separate
18:58 - 19:01

accessor - and accessors are just simply,
all they
19:01 - 19:06

do is persistence and finding, and finding
data -
19:06 - 19:09

so the accessors for the legacy database here,
the
19:09 - 19:12

cobra accessor, you get that price information,
and then
19:12 - 19:16

you have an international accessor that goes,
it could
19:16 - 19:19

be a database call or, in our case, that's
19:19 - 19:23

a HTTP call across the ocean.
19:23 - 19:25

And then you bring all that together, wrap
it
19:25 - 19:27

in a model and have it return that back
19:27 - 19:31

to your controller. Hang on.
19:31 - 19:35

All right. So, definitely pros and cons to
this
19:35 - 19:37

approach. One of the things was, it's easy
to
19:37 - 19:40

incorporate many different data sources. We
call that a
19:40 - 19:43

facade because it kind of hides all of that.
19:43 - 19:46

But the, behind the backend of it is really
19:46 - 19:47

more complex.
19:47 - 19:52

And, but you hide that complexity. That your
accessors
19:52 - 19:54

are bound to the schema changes. So, our cobra
19:54 - 19:57

accessor still has to know about the legacy
schema.
19:57 - 20:00

And you're, you, you can't really, making
changes there
20:00 - 20:02

is not trivial.
20:02 - 20:05

And, sometimes you can use that as a crutch.
20:05 - 20:07

So if someone asks you, can you give me
20:07 - 20:09

this piece of data about a voucher, I really
20:09 - 20:11

need it, and you want to expose it to
20:11 - 20:13

the endpoints, you're like, well, I do have
access
20:13 - 20:15

to the database or I could just make a
20:15 - 20:17

call. And now you, now you're serving the
end-
20:17 - 20:20

that data, and you're tied to serving that
data
20:20 - 20:21

in your API.
20:21 - 20:24

But the important thing there is to be diligent,
20:24 - 20:26

and as soon as you start serving that, they'll
20:26 - 20:31

put a strategy together to, actually on that
data.
20:31 - 20:34

Otherwise you're, the complexity in the manager,
which is
20:34 - 20:37

both a pro and a con, will always be
20:37 - 20:40

there. The purpose of the manager is that
it
20:40 - 20:43

hides that complexity, but as you start owning
more
20:43 - 20:46

data, it should become simpler.
20:46 - 20:50

J.S.: So, these, these three extraction patterns
that we've
20:50 - 20:55

gone through are just a little bit of, a
20:55 - 20:57

little bit of what's going on. There are different
20:57 - 21:01

service extraction patterns going on, both
at Groupon and
21:01 - 21:06

probably in your worlds too. So, again, this
is
21:06 - 21:08

just a example of some of the ways that
21:08 - 21:11

we've chosen to do things. There are other
interesting
21:11 - 21:13

talks about this this week at RailsConf going
on,
21:13 - 21:16

so be, it'd be neat to check those out,
21:16 - 21:17

too, if you want to talk to us about
21:17 - 21:18

them.
21:18 - 21:21

But, you should definitely consider letting
your teams own
21:21 - 21:23

their tactics if you're trying to make decisions
about
21:23 - 21:27

doing SOA, because you might find some neat
things
21:27 - 21:28

that you didn't know about.
21:28 - 21:30

A.P.: Yeah. So I'm gonna stand over here cause
21:30 - 21:33

I feel like I'm just talking to these guys.
21:33 - 21:35

But yeah. So, there's definitely a lot of
things
21:35 - 21:38

that we learned from doing these different
service extractions.
21:38 - 21:39

Like Jason said, there are a lot of other
21:39 - 21:43

service extractions that happened at Groupon
and continue to
21:43 - 21:45

happen today.
21:45 - 21:49

But, taming a cobra is serious business. I
mean,
21:49 - 21:52

like I always say, YPAGNIRN. You probably
ain't gonna
21:52 - 21:57

need it right now. But, but the, but, like,
21:57 - 21:59

the tipping point on which you need to start
21:59 - 22:04

going towards service-oriented architecture
isn't just black or white.
22:04 - 22:07

It's, it's more of an art than a science.
22:07 - 22:08

But as soon as you start talking about service-oriented
22:08 - 22:11

architecture, once you start feeling the pains,
you need
22:11 - 22:14

to put, put together a strategy to accomplish
that.
22:14 - 22:15

J.S.: Yeah. You don't want to sit around and
22:15 - 22:17

wait for Oprah to blow your site up.
22:17 - 22:21

A.P. But there's also the importance of allowing
your
22:21 - 22:25

domain to actually evolve. Models that you
think are
22:25 - 22:27

important in the beginning aren't gonna be
important later
22:27 - 22:31

on. And it, that's the big benefit of a
22:31 - 22:34

cobra, is that it allows you to iterate quickly.
22:34 - 22:36

J.S.: Something else that we have also learned
is
22:36 - 22:38

that when you go into service extraction,
it's really
22:38 - 22:42

important that you actually have a strategy.
Know what
22:42 - 22:45

you need to break apart. Know what you need
22:45 - 22:48

to leave in the monolith. These are important
things
22:48 - 22:51

to consider. Know what the priorities are
between those
22:51 - 22:54

things. It's very, it's very tricky to just
go
22:54 - 22:58

about service extraction very scattershot
and not really understanding
22:58 - 23:01

your business model or what benefits you derive
from
23:01 - 23:04

extracting certain pieces over others.
23:04 - 23:05

You should prefer the things that are clearly
like
23:05 - 23:09

their own thing, their own components, or
things that
23:09 - 23:13

are particular maintenance problems or represent
some sort of
23:13 - 23:17

legacy design or, or strange behavior. But
the other
23:17 - 23:20

important part of having a strategy is that
you
23:20 - 23:24

should expect the unexpected. Scope creep
will bite you,
23:24 - 23:26

and you know, as these, as these code bases
23:26 - 23:29

get bigger, pulling out of them becomes a
lot
23:29 - 23:34

more of a tricky process than you might envision.
23:34 - 23:36

Another thing that's important is that you,
you think
23:36 - 23:39

about your entire service stack. And you should
know
23:39 - 23:42

your business, and so you should know, or
you
23:42 - 23:45

should at least conceptualize how all of those
parts
23:45 - 23:47

of your business are gonna fit together.
23:47 - 23:49

How does the data flow between them? What
are
23:49 - 23:53

the service agreements between those, those
compartments? That's all
23:53 - 23:55

important to know. You're gonna need to be
caching
23:55 - 23:59

between services for, for load. You're gonna
need to
23:59 - 24:06

be caching services for, for latency requirements.
So you
24:06 - 24:08

have to serve upstream to some kind of complex
24:08 - 24:11

algorithm. That algorithm is gonna need zero
latency return
24:11 - 24:12

from your service.
24:12 - 24:13

You need to be thinking about all of these
24:13 - 24:17

kinds of things when you're doing service
extraction.
24:17 - 24:20

A.P.: And the way Jason's saying it is, is
24:20 - 24:23

definitely makes it seem like, oh, it's one
slide
24:23 - 24:25

on our deck. But each of those topics could
24:25 - 24:29

be a separate talk. And they are. So, definitely,
24:29 - 24:30

there's a lot of learn in that, in that
24:30 - 24:31

domain.
24:31 - 24:35

J.S.: Right. Just in terms of actual topics
in
24:35 - 24:37

it, another thing you want to think about
is
24:37 - 24:40

messaging. Inter-service messaging, when you're
pulling these services apart,
24:40 - 24:42

they do need to talk to each other. You
24:42 - 24:45

should definitely think about what do those
messages look
24:45 - 24:50

like. What are their delivery SOAs? Do you
guarantee
24:50 - 24:52

that they're delivered? Do you guarantee the
order that
24:52 - 24:55

they're delivered in? What are the payloads
look like?
24:55 - 24:58

Think about all of this stuff.
24:58 - 25:02

And, you also need to consider your, concern
yourself
25:02 - 25:06

with authentication and authorization. These
are, these are important
25:06 - 25:08

topics. I think like, there was a talk about
25:08 - 25:09

this yesterday-
25:09 - 25:10

A.P.: There were two.
25:10 - 25:12

J.S.: Oh, there were two talks about this
yesterday.
25:12 - 25:14

But you should know what you're, know what
you're
25:14 - 25:17

users are doing. Your sites getting bigger.
Your users
25:17 - 25:20

are getting more complicated. Know, know what
they need
25:20 - 25:22

access to. Know how they get into your, how
25:22 - 25:23

they get into your services, how they get
through
25:23 - 25:26

your services. And know what they can do at
25:26 - 25:29

each step of the way.
25:29 - 25:32

A.P.: And you need to create like a supportive,
25:32 - 25:36

supporting environment for services. We were
lucky, we had
25:36 - 25:40

entire teams devoted to building tools, to,
that make
25:40 - 25:43

it easier to spin up services easily. And
a
25:43 - 25:48

release engineering team that made it easier
to re,
25:48 - 25:52

deploy these services. All those became really
easy for
25:52 - 25:55

us, but if, in your company, you need to
25:55 - 25:57

make sure that, or in your application, you
need
25:57 - 25:58

to make sure that you think about these things
25:58 - 26:02

and devote tools and time to making those
things
26:02 - 26:02

simpler.
26:02 - 26:06

Also, now is the time to start considering
uuids.
26:06 - 26:10

As soon as you start talking about service-oriented
architecture,
26:10 - 26:15

go to uuids from the start. This will immediately
26:15 - 26:18

separate you from your database, and that's
gonna be
26:18 - 26:20

really important, because you're gonna be
moving data from
26:20 - 26:23

one source to another.
26:23 - 26:26

And, you need to write code good. You know,
26:26 - 26:29

like, it's hard to. I mean, it's easy to
26:29 - 26:31

say, say that, but it's hard to do. Think
26:31 - 26:34

about the solid principles. Think about where
things belong.
26:34 - 26:37

Ask yourself, am I coupling these two components
together
26:37 - 26:41

for the fu- and is that useful enough that
26:41 - 26:43

it's gonna cause me a lot of pain later
26:43 - 26:43

in the future?
26:43 - 26:46

J.S.: So when you're writing your code good,
you
26:46 - 26:49

should be thinking about your models. Those
models are
26:49 - 26:52

gonna become your APIs. They're gonna become
your service
26:52 - 26:56

APIs. So consider your public methods. What
are you
26:56 - 26:59

putting in the public space of that model?
Is
26:59 - 27:01

it named well? Does it represent what your
service
27:01 - 27:03

should be doing?
27:03 - 27:06

Make sure that, while you're building up your
cobras,
27:06 - 27:09

that your models are reflective of the way
you
27:09 - 27:12

intend for your service APIs to look like,
should
27:12 - 27:15

you ever need to go down that road.
27:15 - 27:19

A.P.: And, like I said earlier, avoid tangling
those
27:19 - 27:24

components together. Specifically in Rails,
when you introduce associations,
27:24 - 27:26

you're kind of expanding that API that Jason
was
27:26 - 27:30

talking about. All those, now you're creating
ways for
27:30 - 27:34

developers to reach through these models and
get data,
27:34 - 27:36

and that'll couple them together and make
it harder
27:36 - 27:39

for you to separate them.
27:39 - 27:43

J.S.: Test. Who's here, who here tests? Anyone
test?
27:43 - 27:44

A.P.: Not DHH.
27:44 - 27:47

J.S.: Nope. You don't test anymore. You should
be
27:47 - 27:50

testing. You should be testing at high levels.
Avoid
27:50 - 27:54

the unit tests. If you can avoid the unit
27:54 - 27:57

tests. Especially because once you start doing
service extraction,
27:57 - 28:01

you will break assloads of unit tests.
28:01 - 28:02

Make sure you write your high-level tests
first. Make
28:02 - 28:05

sure you've got solid coverage on those high-level
end
28:05 - 28:10

to end tests. Secondly, as you are doing service
28:10 - 28:12

extraction, it is not trivial to be spinning
up
28:12 - 28:15

other services quickly in order to test end
to
28:15 - 28:18

end, but you should be thinking about how
you
28:18 - 28:20

might be doing that. Because otherwise you're
going to
28:20 - 28:23

be doing a lot of stubbing, and that gets
28:23 - 28:25

very painful and gets error-prone.
28:25 - 28:29

A.P.: I mean, when we talked to the developers
28:29 - 28:30

who had to do some of the tougher service
28:30 - 28:34

extractions, they were like, I wish we had
more
28:34 - 28:36

integration specs. Because we're gonna be
changing a lot
28:36 - 28:38

of this stuff, and we need to know if
28:38 - 28:40

it works. If you've got a good set of
28:40 - 28:43

integrations, integration tests, you can be
a lot more
28:43 - 28:46

confident about making those changes.
28:47 - 28:49

Next, over there?
28:49 - 28:49

J.S.: Yup.
28:50 - 28:53

A.P.: Yeah. So, you need to communicate. I
mean,
28:53 - 28:57

everyone always says this, but like, when
you solve
28:57 - 29:00

a problem, when you're spinning up a service,
you're
29:00 - 29:02

gonna, and as more teams are spinning up services,
29:02 - 29:04

a lot of you are gonna be encountering the
29:04 - 29:07

same problems. So when you solve a problem,
share
29:07 - 29:09

it. Make it a gem, write it down, put
29:09 - 29:11

it in a wiki, and tell people about it.
29:11 - 29:15

Give talks. Because it's gonna be hard to,
I
29:15 - 29:18

mean, you don't want people solving the same
problems.
29:18 - 29:22

At Groupon, we have this, Core Architecture
Forum, it's
29:22 - 29:24

called, and basically it's got a bunch of
people
29:24 - 29:27

who meet, and you can say, I'm gonna spin
29:27 - 29:29

up a new service, or I'm gonna solve this
29:29 - 29:32

problem. Have you seen this before? They're
gonna help
29:32 - 29:35

you answer questions like, what's, has someone
else solved
29:35 - 29:38

this already? Is there a similar problem?
Is there
29:38 - 29:40

a particular technology that would help you
solve that
29:40 - 29:44

problem better? All those questions are really
important to
29:44 - 29:49

ask so that you don't reinvent the wheel over
29:49 - 29:51

and over again.
29:51 - 29:55

What else? Oh yeah. One more thing. One more
29:55 - 29:55

thing. That sounds like Steve Jobs. One more
thing.
29:55 - 29:58

We have the interest, we have interest leagues
at
29:58 - 30:02

Groupon, which are just internal user groups
for Clojure,
30:02 - 30:05

Java. We even have one for onboarding. You
know,
30:05 - 30:07

there's are really cool. And that's another
way to
30:07 - 30:10

help communicate, like, what's happening.
Once your company gets
30:10 - 30:15

big enough, that's really important.
30:15 - 30:21

J.S.: So. In conclusion, cobras are great.
30:21 - 30:21

A.P.: Yeah. They're awesome.
30:21 - 30:24

J.S.: Rails is great. And cobras do serve
a
30:24 - 30:26

useful purpose.
30:26 - 30:32

A.P.: Oh. But beware. It's not so simple.
30:32 - 30:36

J.S.: Once you decide that you're gonna start
raising
30:36 - 30:41

up a baby cobra, be ready for what comes
30:41 - 30:41

next.
30:41 - 30:47

A.P.: Oh. Yeah. And. OK, so. Got his part.
30:47 - 30:51

We're hiring. I mean, if you want to come
30:51 - 30:54

help us solve some of these problems, come
talk
30:54 - 30:56

to us after the talk. There's a booth downstairs.
30:56 - 31:01

You can go to this website. Tweet at us.
31:01 - 31:04

I'd like that. But yeah. Join us.
31:04 - 31:07

J.S.: And we are standing on other people's
shoulders
31:07 - 31:07

here.
31:07 - 31:07

A.P.: Yeah.
31:07 - 31:10

J.S.: A lot of these folks are people who
31:10 - 31:12

helped with the talk or who helped actually
do
31:12 - 31:15

a lot of this service extraction work. This
does
31:15 - 31:19

not comprise the total list, but we definitely
wanted
31:19 - 31:20

to bring attention to these people.
31:20 - 31:22

A.P.: Yeah, and I mean. People like these
guys,
31:22 - 31:24

they gave us a lot of feedback when we
31:24 - 31:28

did the talk at, at Groupon. And having people
31:28 - 31:31

who will mentor and, like, spend time to help
31:31 - 31:34

you understand things, I mean, that's the
reason I
31:34 - 31:36

work at Groupon.
31:36 - 31:36

J.S.: Thank you all.
31:39 - 31:42

A.P.: [drowned out by applause]

Title:: RailsConf 2014 - Service Extraction at Groupon Scale by Jason Sisk & Abhishek Pillai
Description:: more » « less
Duration:: 32:05

Amara Bot edited English subtitles for RailsConf 2014 - Service Extraction at Groupon Scale by Jason Sisk & Abhishek Pillai

English subtitles

Revisions

Revision 1 Imported

Amara Bot

RailsConf 2014 - Service Extraction at Groupon Scale by Jason Sisk & Abhishek Pillai

Revisions

Our website uses cookies

Operating cookies (Required)