ABHISHEK PILLAI: Thanks for coming. I know there's some other cool talks right now, but you're here so that's awesome. Let's get started. You're here to learn about how to tame COBRAs. JASON SISK: My name is Jason Sisk. I work at Groupon. I've been here for a couple of years. I work on predominantly Ruby/Rails systems, backend development, et cetera, and I do not like onions. A.P.: My name is Abi, and I'm at, I've been at Groupon for about two years, too. And Jason and I work on a team that does backend service, basically managing inventory. And I don't like fruits. J.S.: So part of what we're gonna tell you today is a little bit of a history lesson about the early pain of Groupon having site outages, et cetera, due to Rails scaling. We want to tell you about the story of the developers that actually handled those problems and some of the decisions that they made. So that's that. But we want to lead off with one important point. A.P.: Boom! Pause. You don't have to pause for that long. And, yeah. J.S.: So. Back, back around 2007, we were doing what all the other cool kids were doing. We were using a Rails monolith, and to some degree still are. Rails 2 is a great framework. Who is using Rails 2? Anyone? AUDIENCE: Yeah! J.S.: All right. A.P.: Awesome. J.S.: You and us. Rails is a great framework. We all love Rails. That's why we're here. We still love Rails and that's why we're here. But what's great about it is that it's great for Agile teams. It's, and for us it was really simple. We could make some really quick decisions. We could iterate product very quickly. We could iterate new features. And we could do it with a small team of five to ten devs. We had a single repository. We had a single test suite. And we had a single deploy process. Very simple. A.P.: And, most importantly, you, we had like one shared, conceptual understanding of the code base. When we wanted to make a change, we knew where to put it. And things were simple that way. J.S.: Also what was great was, and still is, about Rails, that integrating components is really easy. The convention over configuration, model associations - all of that business you can put together things very quickly and very easily. But we didn't come here to talk to you about Rails. A.P.: We came here to tell you about cobras, and how to tame them. At Groupon, we actually have a mo- monolith, and we call it the primary web app. But Jason had a thought for the purposes of this talk, we'd come up with a more scientifically accurate name for it. Yeah. So. Centralized Omnipotent Big-ass Rails Application. J.S.: Big-ass. So we want to take you back to 2009 for just a minute. So Groupon was about two years old, give or take, and we were still kind of kicking into gear. People would come into the office in Chicago we've got, open up New Relic, and they'd see stuff like this. A.P.: So as you can see, like, in the middle of the night, it's great. Everything's working really well. Soon as people woke up and started using it - damn people - our performance immediately started to drop. And then eight months later, we had about thirty thousand requests per minute and everything was on fire. J.S.: We blame Oprah. A.P.: As you do. J.S.: It's Oprah's fault. Oprah crashed Groupon. Oprah crashed Groupon not once, but at least twice. And also the Gap crashed Groupon too. Actually, the truth is, Groupon crashed Groupon. We were not scaling properly. Bad. Bad Groupon. The Cobra was getting fatter and fatter. We were up to- A.P.: Yeah. So. We were up to, we started, we had, like, five to fifty devs. We started with about three to five hundred commits per month. Slowly, and in a couple of years, as you can see, we were averaging about two thousand commits in a single month. We had a lot of developers developing a lot of things. J.S.: This is all one cobra. A.P.: And you know, we started thinking about SOA at that point. It was already becoming really painful. But we looked at the cobra, directly in the eyes, and it scared the shit out of us. J.S.: We had a lot of scoping problems. And a lot of that had to do with model coupling. So, one of the biggest things that was keeping us from extracting services early was as the, as the code grew, you had a lot of sort of natural convention coupling that was happening in the models. So a little bit of a over-simplified example here. But you have a, let's say you have, you're on the MyGroupon's page. You want to look at all of the Groupons that you've bought. And you want to see all the titles for all of those. So when we go to render the interface we want to display all these deal titles. In the cobra, you might find a set of dependent relationships that are somewhat like this, where you can see the cyclical dependencies. But building these types of associations was fairly common place, which was kind of bad in some ways. So in this case, you would instantiate a user, which would require a database lookup to the Users table, select star, and, and you would map over that, that user's orders to get all of the deal titles. In this, in this case, there is a Demeter violation. Demeter violations are bad. A.P.: And it looks clean. I mean, it looks good. But, what it does is couples our components. J.S.: Here is an example of what I was talking about. You, you have a basically unnecessarily- unnecessary table lookup to Users. Now, if you're designing your applications well, you can avoid this right out of the gate. But Rails conventions don't, don't encourage you to avoid this right out of the gate. And ActiveRecord DSL for, for advanced queries aren't something that people just tend to do by default. Or at least they didn't in 2009. A.P.: Yeah. And, I mean. Things got a lot worse, because our code base and cobra was just getting bigger and bigger. You can see here it's almost two million lines of code at this point. And, oh yeah, we have to stay up 100% of the time. So that's a problem. All right. J.S.: Also, the database is completely on fire. A.P.: So yeah. We were in quite a pickle. It was painful. Testing sucked. I mean, we had to wait like forty-five minutes for a build to run. You basically ran your tests and then figure out something else to do, because you had to wait while your tests ran. And a lot of our release engineer devoted a lot of effort to make those tests run faster. J.S.: Deploys were terrible. Deploy, deploy process was somewhere on the, on the scale of three hours to deploy the, the application. Just a really bad development experience, especially as you start to have teams that, that split, split ownership. They want to iterate on features that matter to their team, and they don't want to be held up by this gigantic monolithic application. And, and it's, you know, the, the deploy's only happening once a week. That really hurts the team's ability to set, that maybe wants to do continuous deployment. So, it sucked. A.P.: Yeah. I mean, and development pace was increasing, as you saw, and, I mean, what's the best place to put the next line of code, as I heard in a talk earlier. It's the place that you're changes. Models got bloated, and there's a lot of cruft. J.S.: So all of these things were terrible. It was very painful. So, we decided to move towards service extraction a little bit more seriously. If there's a big take away from this first section, we just want you to remember that cobras are great. They are great. Until they aren't. A.P.: So we needed to alleviate this pain immediately. We needed to get that code out of there. We needed a quick extraction. So we decided to extract a new service and build it on top of our current schema. We decided to start with the order service, because. I mean. It was causing a lot of database contention. We had a lot of people buying a lot of Groupons, and, a good problem to have, but it was bringing our database down. So we needed to get that code out of it, and also another thing behind the, behind choosing orders to start is that, you know, it's gonna be a long-lived model, a long living model in our domain. We know that for sure. So, to illustrate, this is what it looks like in the beginning. And this is what we're trying to accomplish. You have an orders, you have the cobra, and then we're trying to have a separate orders codebase, which will have its own database. But it continues to have re- a read-only access to the cobra's database, because we didn't focus on completely making the cobra, the order service, re, stopping, stopping it from reaching back into the cobra's database. And, I mean, the cobra was really sneaky. It was really tough to find all the ways that, with Rails callbacks and model associations, all the ways that the components were coupled. So we built some tools to make that easier. This is one of them. The service wall, as call it. We're trying to, the main goal here is separating the concerns of orders within the application. So, you start with having your services in a separate directory. Let's see a closer look of it. You have the order service in its own directory, and you have its own app, its own lib, its own specs. The way that works is that in environment dot rb file, we iterated through these services and added them to the load path. So the application to the application looks like it's just one big application, but for our purposes, the code was separate. So, this is like, a small example of how service wall works. You have this disable model access method that basically, if, if you specify the models that you want to, if you specify the service that you want to disable or deprecate, and it'll figure out the models of that service and add it to this do-not-touch list. And basically raise these kinds of violations. So if you use the disable model access model, when you run your tests, it will put up this message saying, you don't have access to this method. When a deal is trying to access an order, we can figure that out just by running our tests. If you use the more friendlier, deprecate service mo- deprecate model access method, then you can be more permissive and it'll just log it to a file. You can see that in development mode or you can have it on staging, and that'll basically, that'll allow you to find all the places where you're having service infractions. You can't do this in production though, because it causes a serious produ- performance hit. Oh yeah. So this is how, so this is how you actually use the service wall. Use, you, at the top of your controller, you disable, use the method disable_model_access or deprecate_model_access, depending on what you want to do. You tell it what service, and it even lets you exempt some actions that you don't want to raise violations on yet. That way you can comment out that action and tackle one action at a time. Which endpoints are actually reaching over and causing the service wall infraction. J.S.: So, in addition to the service wall, one, one other problem with this approach, this extraction approach is that, because you necessarily fork the code, you get a lot of cruft left over from the old, the old domain. So you find yourself asking, teams find themselves asking, very often, is this endpoint even used? Do we even care about this code anymore? So, a small team of Groupon developers hacked together something called Route 66 that we use internally to track down cruft in both our old cobra and our new cobra. So it basically answers the question, are these endpoints used? I don't know if you can see this very well, but this is a little bit of a UI. A.P.: Yeah. J.S.: But what we do is, we analyze log files, we analyze, spelunk logs to come up with which controller actions are being hit, what's the frequency. Is this a route that is hit once a week, you know. Once a, once a month? And we can very aggressively decruft using this tool as well. A.P.: All right. So there's definitely pros to this approach. Because you're focusing on just separating the models, I mean, just separating the code, you can quickly and not worry about spinning up a separate database schema, separate naming, all of that. You just worry about separating the code, and that focuses the abstraction. It makes it easier to spin up endpoints. But the cons are, you're stilled tied to that legac, to that legacy database. Not such a bad thing if you really need to get it out of there. But, because you're forking this code now, and now it's being hit through endpoints, there is still a lot of cruft in the, in the, in the code base. Because a lot of these endpoints are now not being used. J.S.: So this was the first extraction pattern that we used at Groupon to get out of the original cobra, the original Groupon cobra. But teams sort of own their own tactics, and there are other ways that they're doing it as well. One way that, one way that service extraction is also happening is by using greenfield services that use a message bus. Sometimes you just need to keep that legacy API running, because there are a lot of client dependencies on it. There's a lot of dependencies on the structure of the data. But who likes doing greenfield work in here? Raise your hand if you like greenfield work. Right. That should be all of you. Whatever. So, it is possible to do greenfield service extraction, and we're doing this as well. So, again, we have a similar. Whoops. Juggling between power point and preview. Similar type of situation. You have this cobra, and then we get to the scenario that we're, we're trying to reach with the greenfield extraction, where you have, in this case the red, the red box represents all new code. There's a gem, a client gem that interact, that runs in the original cobra, that runs in the green cobra. And when this service writes data to its db, a message is sent that the green cobra consumes and sends over to its own data store, thus satisfying all of the legacy API requirements. And then what's notable about this is to keep everything in sync for service cut-overs, rollouts, et cetera, there is a background sync worker that runs, that syncs it one way from the old database to the new database. There are pros and cons to this approach as well. Some of the better parts are that you can get rid of your legacy data quickly, again. Devs like greenfield stuff. You like to design your own systems. You also get to minimize the cut-over risk with your data sync. So you're not splitting the table and you have to have all of these API dependencies written on one hand so that when you break your database you don't have, you don't have failures. So you can phase the, you can phase out your new, your new endpoints, and you can own the timing of when you build out new endpoint features. Again. Some of the, or some of the cons are that, it is not trivial to build synchronization worker, and it is less trivial to build a validation engine for the data to make sure that you don't get it out of sync when you're pulling from the original source. And then there are race conditions involved in this as well. A.P.: So Jason and I work on a team that manages inventory, as I said earlier. One of the, looking a little further down the road, one of the things we needed to do was get, now we needed to get vouchers out of the orders service. Another service extraction. And vouchers are actually the things that customers redeem. So, a simplified example of what a voucher actually like would look like, except that now we have an id, which is stored in our database. We have the price, which is stored in a legacy database, and now, Groupon's grown since orders. We now have an international platform codebase that serves many different countries. We have offices in Berlin, London, Chinai, Korea, and many more places. But yeah. Now we've got to make it, but our service's responsibility is to make it seem like none of that matters. Anyone asking for voucher data needs to know about all voucher data. Our services need to be global as well. So, this is what our world looks like. And this is how our service needs to be built on top of that. What helped, in managing these different sources of truth, was this manager accessor pattern in our code base. Specifically, oh. Let me check if I need to- yeah. Specifically, next slide please, this is what, this is how it helped our code base. Because in the controller, you could just specify, you could talk, talk to this manager object, and you'd say, find me this voucher. And the manager, can you jump to that? All right, it's gonna look like a lot of code, but let's go step-by-step. In the manager, that's where all the complexity lies. You have the accessor that accesses local data. You have an accessor, a separate accessor - and accessors are just simply, all they do is persistence and finding, and finding data - so the accessors for the legacy database here, the cobra accessor, you get that price information, and then you have an international accessor that goes, it could be a database call or, in our case, that's a HTTP call across the ocean. And then you bring all that together, wrap it in a model and have it return that back to your controller. Hang on. All right. So, definitely pros and cons to this approach. One of the things was, it's easy to incorporate many different data sources. We call that a facade because it kind of hides all of that. But the, behind the backend of it is really more complex. And, but you hide that complexity. That your accessors are bound to the schema changes. So, our cobra accessor still has to know about the legacy schema. And you're, you, you can't really, making changes there is not trivial. And, sometimes you can use that as a crutch. So if someone asks you, can you give me this piece of data about a voucher, I really need it, and you want to expose it to the endpoints, you're like, well, I do have access to the database or I could just make a call. And now you, now you're serving the end- that data, and you're tied to serving that data in your API. But the important thing there is to be diligent, and as soon as you start serving that, they'll put a strategy together to, actually on that data. Otherwise you're, the complexity in the manager, which is both a pro and a con, will always be there. The purpose of the manager is that it hides that complexity, but as you start owning more data, it should become simpler. J.S.: So, these, these three extraction patterns that we've gone through are just a little bit of, a little bit of what's going on. There are different service extraction patterns going on, both at Groupon and probably in your worlds too. So, again, this is just a example of some of the ways that we've chosen to do things. There are other interesting talks about this this week at RailsConf going on, so be, it'd be neat to check those out, too, if you want to talk to us about them. But, you should definitely consider letting your teams own their tactics if you're trying to make decisions about doing SOA, because you might find some neat things that you didn't know about. A.P.: Yeah. So I'm gonna stand over here cause I feel like I'm just talking to these guys. But yeah. So, there's definitely a lot of things that we learned from doing these different service extractions. Like Jason said, there are a lot of other service extractions that happened at Groupon and continue to happen today. But, taming a cobra is serious business. I mean, like I always say, YPAGNIRN. You probably ain't gonna need it right now. But, but the, but, like, the tipping point on which you need to start going towards service-oriented architecture isn't just black or white. It's, it's more of an art than a science. But as soon as you start talking about service-oriented architecture, once you start feeling the pains, you need to put, put together a strategy to accomplish that. J.S.: Yeah. You don't want to sit around and wait for Oprah to blow your site up. A.P. But there's also the importance of allowing your domain to actually evolve. Models that you think are important in the beginning aren't gonna be important later on. And it, that's the big benefit of a cobra, is that it allows you to iterate quickly. J.S.: Something else that we have also learned is that when you go into service extraction, it's really important that you actually have a strategy. Know what you need to break apart. Know what you need to leave in the monolith. These are important things to consider. Know what the priorities are between those things. It's very, it's very tricky to just go about service extraction very scattershot and not really understanding your business model or what benefits you derive from extracting certain pieces over others. You should prefer the things that are clearly like their own thing, their own components, or things that are particular maintenance problems or represent some sort of legacy design or, or strange behavior. But the other important part of having a strategy is that you should expect the unexpected. Scope creep will bite you, and you know, as these, as these code bases get bigger, pulling out of them becomes a lot more of a tricky process than you might envision. Another thing that's important is that you, you think about your entire service stack. And you should know your business, and so you should know, or you should at least conceptualize how all of those parts of your business are gonna fit together. How does the data flow between them? What are the service agreements between those, those compartments? That's all important to know. You're gonna need to be caching between services for, for load. You're gonna need to be caching services for, for latency requirements. So you have to serve upstream to some kind of complex algorithm. That algorithm is gonna need zero latency return from your service. You need to be thinking about all of these kinds of things when you're doing service extraction. A.P.: And the way Jason's saying it is, is definitely makes it seem like, oh, it's one slide on our deck. But each of those topics could be a separate talk. And they are. So, definitely, there's a lot of learn in that, in that domain. J.S.: Right. Just in terms of actual topics in it, another thing you want to think about is messaging. Inter-service messaging, when you're pulling these services apart, they do need to talk to each other. You should definitely think about what do those messages look like. What are their delivery SOAs? Do you guarantee that they're delivered? Do you guarantee the order that they're delivered in? What are the payloads look like? Think about all of this stuff. And, you also need to consider your, concern yourself with authentication and authorization. These are, these are important topics. I think like, there was a talk about this yesterday- A.P.: There were two. J.S.: Oh, there were two talks about this yesterday. But you should know what you're, know what you're users are doing. Your sites getting bigger. Your users are getting more complicated. Know, know what they need access to. Know how they get into your, how they get into your services, how they get through your services. And know what they can do at each step of the way. A.P.: And you need to create like a supportive, supporting environment for services. We were lucky, we had entire teams devoted to building tools, to, that make it easier to spin up services easily. And a release engineering team that made it easier to re, deploy these services. All those became really easy for us, but if, in your company, you need to make sure that, or in your application, you need to make sure that you think about these things and devote tools and time to making those things simpler. Also, now is the time to start considering uuids. As soon as you start talking about service-oriented architecture, go to uuids from the start. This will immediately separate you from your database, and that's gonna be really important, because you're gonna be moving data from one source to another. And, you need to write code good. You know, like, it's hard to. I mean, it's easy to say, say that, but it's hard to do. Think about the solid principles. Think about where things belong. Ask yourself, am I coupling these two components together for the fu- and is that useful enough that it's gonna cause me a lot of pain later in the future? J.S.: So when you're writing your code good, you should be thinking about your models. Those models are gonna become your APIs. They're gonna become your service APIs. So consider your public methods. What are you putting in the public space of that model? Is it named well? Does it represent what your service should be doing? Make sure that, while you're building up your cobras, that your models are reflective of the way you intend for your service APIs to look like, should you ever need to go down that road. A.P.: And, like I said earlier, avoid tangling those components together. Specifically in Rails, when you introduce associations, you're kind of expanding that API that Jason was talking about. All those, now you're creating ways for developers to reach through these models and get data, and that'll couple them together and make it harder for you to separate them. J.S.: Test. Who's here, who here tests? Anyone test? A.P.: Not DHH. J.S.: Nope. You don't test anymore. You should be testing. You should be testing at high levels. Avoid the unit tests. If you can avoid the unit tests. Especially because once you start doing service extraction, you will break assloads of unit tests. Make sure you write your high-level tests first. Make sure you've got solid coverage on those high-level end to end tests. Secondly, as you are doing service extraction, it is not trivial to be spinning up other services quickly in order to test end to end, but you should be thinking about how you might be doing that. Because otherwise you're going to be doing a lot of stubbing, and that gets very painful and gets error-prone. A.P.: I mean, when we talked to the developers who had to do some of the tougher service extractions, they were like, I wish we had more integration specs. Because we're gonna be changing a lot of this stuff, and we need to know if it works. If you've got a good set of integrations, integration tests, you can be a lot more confident about making those changes. Next, over there? J.S.: Yup. A.P.: Yeah. So, you need to communicate. I mean, everyone always says this, but like, when you solve a problem, when you're spinning up a service, you're gonna, and as more teams are spinning up services, a lot of you are gonna be encountering the same problems. So when you solve a problem, share it. Make it a gem, write it down, put it in a wiki, and tell people about it. Give talks. Because it's gonna be hard to, I mean, you don't want people solving the same problems. At Groupon, we have this, Core Architecture Forum, it's called, and basically it's got a bunch of people who meet, and you can say, I'm gonna spin up a new service, or I'm gonna solve this problem. Have you seen this before? They're gonna help you answer questions like, what's, has someone else solved this already? Is there a similar problem? Is there a particular technology that would help you solve that problem better? All those questions are really important to ask so that you don't reinvent the wheel over and over again. What else? Oh yeah. One more thing. One more thing. That sounds like Steve Jobs. One more thing. We have the interest, we have interest leagues at Groupon, which are just internal user groups for Clojure, Java. We even have one for onboarding. You know, there's are really cool. And that's another way to help communicate, like, what's happening. Once your company gets big enough, that's really important. J.S.: So. In conclusion, cobras are great. A.P.: Yeah. They're awesome. J.S.: Rails is great. And cobras do serve a useful purpose. A.P.: Oh. But beware. It's not so simple. J.S.: Once you decide that you're gonna start raising up a baby cobra, be ready for what comes next. A.P.: Oh. Yeah. And. OK, so. Got his part. We're hiring. I mean, if you want to come help us solve some of these problems, come talk to us after the talk. There's a booth downstairs. You can go to this website. Tweet at us. I'd like that. But yeah. Join us. J.S.: And we are standing on other people's shoulders here. A.P.: Yeah. J.S.: A lot of these folks are people who helped with the talk or who helped actually do a lot of this service extraction work. This does not comprise the total list, but we definitely wanted to bring attention to these people. A.P.: Yeah, and I mean. People like these guys, they gave us a lot of feedback when we did the talk at, at Groupon. And having people who will mentor and, like, spend time to help you understand things, I mean, that's the reason I work at Groupon. J.S.: Thank you all. A.P.: [drowned out by applause]