Chad: Yes, hello, thank you. Audience member: Hello! Chad: Hello! I am Chad, as he said. He said I need no introduction so I won't introduce myself any further. I may be the biggest non-Indian fan of India [Hindi speech] I'll now switch back, sorry. If you don't understand Hindi, I said nothing of value and it was all wrong. But I was saying that my Hindi is bad and it's because now I'm learning German so I mixed them together, but I know not everyone speaks Hindi here. I just had to show off, you know So, I am currently working on 6WunderKinder, and I'm working on a product called Wunderlist. It is a productivity application. It runs on every client you can think of. We have native clients, we have a back-end, we have millions of active users, and I'm telling you this not so that you'll go download it - you can do that too - but I want to tell you about the challenges that I have and the way I'm starting to think about system's architecture and design. That's what I'm gonna talk about today I'm going to show you some things that are real and that we're really doing. I'm going to show you some things that are just a fantasy that maybe don't make any sense at all. But hopefully I'll get you think about how we think about system architecture and how we build things that can last for a long time. So the first thing that I want to mention: this is a graph from the Standish Chaos report and I've taken the years out and I've taken some of the raw data out because it doesn't matter. If you look at these, this graph, each one of these bars is a year, and each bar represents successful projects in green - software projects. Challenged projects are in silver or white in the middle and then failed ones are in red. But challenged means significantly over time or budget which to me means failed too. So basically we're terrible, all of us here, we're terrible. We call ourselves engineers but it's a disgrace. We very rarely actually launch things that work. Kind of sad, and I am here to bring you down. Then once you launch software, anecdotal-y, and you probably would see this in your own work lives, too, anecdotal-y, software gets killed after about five years - business software. So you barely ever get to launch it, because, or at least successfully, in a way that you're proud of, and then in about five years you end up in that situation where you're doing a big rewrite and throwing everything away and replacing it. You know there's always that project to get rid of the junk, old Java code or whatever that you wrote five years ago, replace it with Ruby now, five years from now you'll be replacing your old junk Ruby code that didn't work with something else. We create this thing, probably all of you know the term legacy software - Right, am I right? You know what legacy software is, and you probably think of it as a negative thing. You think of it as that ugly code that doesn't work, that's brittle, that you can't change, that you're all afraid of. But there's actually also a positive connotation of the word legacy: it's leaving behind something that future generations can benefit from. But if we're rarely ever launching successful projects and then the ones we do launch tend to die within five years none of us are actually creating a legacy in our work. We're just creating stuff that gets thrown away. Kind of sad. So we create this stuff that's a legacy software. It's hard to change, that's why it ends up getting thrown away right, that's, if the software worked and you could keep changing it to meet the needs of the business you wouldn't need to do a big rewrite and throw it away. We create these huge tightly-coupled systems, and I don't just mean one application, but like many applications are all tightly coupled. You've got this thing over here talking to the database of this system over here so if you change the columns to update the view of a webpage you ruin your billing system, that kind of thing this is what makes it so hard to change and the sad thing about this is the way we work the way we develop software, this is the default setting and, what I mean is, if we were robots churning out software and we had a preferences panel the default preferences would lead to us creating terrible software that gets thrown away in five years that's just how we all work as human beings when we sit down to write code our default instincts lead to us to create systems that are tightly coupled and hard to change and ultimately get thrown away and can't scale we create, we try doing tests, we try doing TDD but we create test suites that take forty-five minutes to run every team has had to deal with this I'm sure if you've written any kind of meaningful application and it gets to where you have like a project to speed up the test suite like you start focusing your company's resources on making the test suite faster or making it like only fail ninety percent of the time and then you say well if it only fails ninety percent that's OK right, and right now it's taking forty-five minutes we want to get it to where it only takes ten minutes to run so the test suite ends up being a liability instead of a benefit because of the way you do it because you have this architect where everything is so coupled you can't change anything without spending hours working on the stupid test suite and your terrified to deploy I know like the last big Java project I was working on it would take, once a week we did a deploy it would take fifteen people all night to deploy the thing and usually it was like copying class files around and restarting servers it's much better today but it's still terrifying you deploy code, you change it in production you're not sure what might break cause it's really hard to test these big integrated things together and actually upgrading the technology component is terrifying so, how many of you have been doing Rails for more than three years? do you have, like a Rails 2 app in production, anyone? Yeah? that's a lot of people, wow, that's terrifying and I've been in situations, recently, where we had Rails 2 apps in production security patches are coming out, we were applying our own versions of those security patches because we were afraid to upgrade Rails we would rather hack it than upgrade the thing because you just don't know what's gonna happen and then you end up, as you're re-implementing all this stuff yourself you end up burning yourself out, wasting your time because you're hacking on stupid Rails 2 or some old struts version when you should be just taking advantage of the new patches but you can't because you're afraid to upgrade the software because you don't know what's going to happen because the system is too big and too scary then, and this is really bad, I think this is something Ruby messes up for all of us I say this as someone who's been using Ruby for thirteen years now happily we create these mountains of abstractions and the logic ends up being buried inside them I mean in Java it was like static, or, you know, factories and design pattern soup in Ruby its modules and mixins and you know we have all these crazy ways of hiding what's actually happening from us but when you go look at the code it's completely opaque you have no idea where the stuff actually gets done because it's in some magic library somewhere and we do all that because we're trying to save ourselves from the complexity of these big nasty systems but like if you look at the rest of the world this is a software specific problem these cars are old, they're older than any software that you would ever run and they're still driving down the street they're older than software itself, right but these things still function, they still work how? why? why do they work? bodies! my body should not work I have abused it I should not be standing here today I shouldn't have been able to come from Berlin here without dying somehow by being in the air you know, by the air pressure changes but our bodies somehow can survive even when we don't take care of them and like it's just the system that works, right so how do our bodies work? how do we stay alive despite this fact even though we haven't done like some great design, we don't have any design patterns like mixed up into our bodies in biology there is a term called homeostasis and I literally don't know what this means other than this definition so you won't learn about this from me there's probably at least one biologist in the room so you can correct me later but basically the idea of homeostasis is that an organism has all these different components that serve different purposes that regulate it so they're all kind of in balance and they work together to regulate the system if one component, like a liver, does too much or does the wrong thing another component kicks in and fixes it and so our bodies are this well designed system for staying alive because we have almost like autonomous agents internally that take care of the many things that can and do go wrong on a regular basis so you have, you know, your brain, your liver your liver, of course, metabolizes toxic substances your kidney deals with blood, water level, et cetera you know all these things work in concert to make you live the inability to continue to do that is known as homeostatic imbalance so I was saying, homeostasis is balancing not being able to do that is when you're out of balance and that will actually lead to really bad health problems or probably death, if you fall into homeostatic imbalance so the good news is you're already dying like we're all dying all the time this is the beautiful thing about death there is, there is an estimate that fifty trillion cells are in your body, and three million die per second it's an estimate because it's actually impossible to count but scientists have figured out somehow that this is probably the right number so your cells, you've probably heard this all your life like physically, after some amount of time, you aren't the same human being that you were, physically you know, I don't know, you some period of time ago you're literally not the same organism anymore but you're the same system kind of interesting, isn't it so in a way you can think about software this you can think about software as a system if the components could be replaced like these cells like, if you focus on making death, constant death OK on a small level then the system can live on a large level that's what this talk is about solution, the solution being to mimic living organisms and as an aside, I will say many times the word small or tiny in this talk because I think I'm learning, as I age that small is good its, small projects are good you know how to estimate them small commitments are good because you know you can make them small methods are good small classes are good small applications are good small teams are good so I don't know, this is sort of a non sequitur so if we're going to think about software as like an organism what is a cell in that context? this is sort of the key question that you have to ask yourself and I say that a cell is a tiny component now, tiny and component are both subjective words so you can kind of do what you want with that but it's a good frame of thinking if you make your software system of tiny components each one can be like a cell each one can die and the system is a collection of those tiny components and what you want is not for your code to live forever you don't care that each line of code lives forever, right like if you're trying to develop a legacy in software it's not important to you that your system dot out dot printline statement lives for ten years it's important to you that the function of the system lives for ten years so like, about exactly ten years ago we created Ruby gems at the RubyConf 2003 in Austin, Texas I haven't touched Ruby gems myself in like four or five years but people are still using it they hate it because it's software everybody hates software right so if you can create software that people hate you've succeeded but it still exists I have no idea if any of the code is the same I would assume not you know I think, I'm sure that my name is still in it in a copyright notice but that's about it and that's a beautiful thing people are still using it to install Ruby libraries and software and I don't care if any of my existing, or my initial code is still in the system because the system still lives so, quite a long time ago now I was researching this kind of question about Legacy software and I asked a question on Twitter as I often do at conferences when I'm preparing what are some of the old surviving software systems you regularly use and if you look at this, I mean, one thing is obviously everyone who answered gave some sort of Unix related answer but basically all of these things on this list are either systems that are collections of really well-known split-up components or they're tiny, tiny programs so, like, grep is a tiny program, make it only does one thing well make is actually also arguably an operating system but I won't get into that emacs is obviously an operating system, right but it's well designed of these tiny little pieces so a lot of the old systems I know about follow this pattern this metaphor that I'm proposing and from my own career when I was here before in Banglore I worked for GE and some of the people we hired even worked on the system there we had a system called the Bull and it was a Honeywell Bull mainframe I doubt any of you have worked on that but this one I know you didn't work on because it had a custom operating system with our own RDVMS we had created a PCP stack for it using like custom hardware that we plugged into a Windows MT computer with some sort of MT queuing system back in the day it was this terrifying thing when I started working there the system was already something like twenty-five years old and I believe even though there have been many, many projects to try to kill it, like we had a team called the Bull exit team I believe the system is still in production not as much as it used to be, there are less and less functions in production but I believe the system is still in production the reason for this is that the system was actually made up of these tiny little components and like really queer interfaces between them and we kept the system live because every time we tried to replace it with some fancy new gem, web thing or gooey app it wasn't as good, and the users hated it it just didn't work so we had to use this old, crazy, modified mainframe for a long time as a result so, the question I ask myself is now how do I, how do I approach a problem like this and build a system that can survive for a long time I would encourage you how many of you know of Fred George this is Fred George he was at ThoughtWorks for awhile so he may have, I think he lived in Banglore for some time with ThoughtWorks, in fact he is now running a start-up in Silicon Valley but he has this talk that you can watch online from the Barcelona Ruby Conference the year before last called Microservice Architectures and he talks in great detail about he, how he implemented a concept at forward that's very much like what I'm talking about tiny components that only do one thing and can be thrown away so Microservice Architecture is kind of the core of what I'm gonna talk about now I've put together some rules for 6WunderKinder which I am going to share with you 6WunderKinder is the company I work for when we're working on Wunderlist and the rules of the, the goals of these rules are to reduce coupling, to make it where we can do fear-free deployments we reduce the chance of "cruft" in our code like nasty stuff that you're afraid of that you leave there, kind of broken window problems we make it literally trivial to change code so you just never have to ask how do I do that you just find it easy and most importantly we give ourselves the freedom to go fast because I think no developer ever wants to be slow that's one of the worst things just toiling away and not actually accomplishing anything but we go slow because we're constrained by the system and we're constrained by, sometimes projects and other, you know, management related things but often times its the mess of the system that we've created so some of the rules I think one thing, and maybe, maybe I'm going to get some push back from this crowd one rule that is less controversial than it used to be is that comments are a design smell does anyone strongly disagree with that? no? does anyone strongly agree with that? OK, so the rest of you have no idea what I'm talking about so a design smell, I want to define this really quickly a design smell is something you see in your code or your system where it doesn't necessarily mean it's bad but you look at it and you think hmm, I should look into this a little bit and ask myself, why are there so many comments in this code? you know, especially the bottom one inline comments? definitely bad, definitely a sign that you should have another method, right so it's pretty easy to convince people that comments are a design smell and I think a lot of people in the industry are starting to agree maybe not for like a public library where you really need to tell someone here's how you use this class and this is what it's for but you shouldn't have to document every method and every argument because the method name and the argument name should speak for themselves, right so here's one that you probably won't agree with tests are a design smell so this one is probably a little more controversial especially in an environment where you're maybe still struggling people struggling with people to actually get them to write tests to begin with, right you know I went through this period in, like, 2000 and 2001 where I was really heavily into evangelizing TDD and it was really stressful that you couldn't get anyone to do it I think you do have to go through that period and I'm not saying you shouldn't write any tests but that picture I showed you earlier of the slow, brittle test suite that's bad, right that is a bad state to be in and you're in that state because your tests suck that's why you get in that state your tests suck because you're writing bad tests that don't exercise the right things in your system and what I've found is whenever I look into one of these big slow brittle test suites the tests themselves are indications and the sheer proliferation of tests are indications that the system is bad and the developers are like desperately fearfully trying to run the code in every way they can because it's the only way they can manage to even think about the complexity but if you think about it, if you had a tiny trivial system you wouldn't need to have hundreds of test files that take ten minutes to run, ever if you did, you're doing something stupid you're wasting your time working on tests and we as software developers obsess about this kind of thing because we have to fight so hard to get our peers to do it in the first place and to understand it we obsess to the point where we focus on the wrong thing none of us are in the business of writing tests for customers like we're not launching our tests on the web and hoping people will buy them, right it doesn't provide value, it's just a side-effect that we have focused too heavily on and we've lost sight of what the actual goal is so, this one actually requires a visual I tell the people on my team now you can write code in any language you want any framework you want, anything you want to do as long as the code is this big so if you want to write the new service in Haskell and it's this big in a normal size font you can do it if you want to do it in Closure or Elixir or Scarla or Ruby or whatever you want to do even Python for god's sake you can do it if it's this big and no bigger why? because it means I can look at it and I can understand it or if I don't I'll just throw it away because if it's this big it doesn't do very much, right so the risk is really low and I really mean the system is that there are the, the component is that big and in my world a component means a service that's running and probably listening on an HTTP board or some sort of rift or RPC protocol so it's a standalone thing it's its own application it's probably in its own git repository people do poll requests against it but it's just tiny so this big at the top of this, by the way is some code by Konstantin Haase who also lives in Berlin, where I live this is a rewrite of Sinatra the web framework and Konstantin is actually the maintainer of Sinatra it's not fully compatible, but it's amazingly close and it all fits right in that but the font size is kind of small, so I cheated another rule, our systems are heterogeneous by default so I say you can write in any language you want that's not just because I want the developers to be excited although I think, most of you, if you worked in an environment where your boss told you you can use any programming language or tool you want you would be pretty happy about that, right anyone unhappy about that? I don't think so unless it's one of the bosses here that's like don't tell people that so that's one thing the other one is, it leads to a good system design because think about this if I write one program in Erlang, one component in Erlang one program in Ruby I have to work really, really hard to make tight coupling between those things like I have to basically use computer science to do that I don't even know what I would do you know it's hard like I would have to maybe implement Ruby in Erlang so that it can run in the same BM or vice versa it's just silly, I wouldn't do it so if my system is heterogeneous by default my coupling is very low, at least at a certain level by default because it's the path of least resistance is to make the system decoupled it's easier to make things decoupled than coupled if they're all running in different languages so in the past three months, I'll say I have written production code in objective CRuby, Scala, Closure, Node I don't know, more stuff, Java all these different languages real code for work and yes, they are not tightly coupled like I haven't installed JRuby so that I could reach into the internals of my Scala code because that would be a pain I don't want to do that another very important one is server nodes are disposable so, back when I was at GE, for example I remember being really proud when I looked at the up time of one of my servers and it was like four hundred days or something it's like, wow, this is awesome I have this big server, it had all these apps on it we kept it running for four hundred days the problem with that is I was afraid to ever touch it I was really happy it was alive but I didn't want to do anything to it I was afraid to update the operating system in fact you could not upgrade Solaris then without restarting it so that meant I had not upgrading the operating system I probably shouldn't have been too proud about it Nodes that are alive for a long time lead to fear and what I want is less fear so I throw them away and this means I don't have physical servers that I throw away that would be fun but I'm not that rich yet we use AWS right now, you could do it with any kind of cloud service or even internal cloud divider but every node is disposable so, we never upgrade software on an existing server whenever you want to deploy a new version of a service you create new servers and you deploy that version and then you replace them in the load balance or somewhere that's it so, you never have to wonder what's on a server because it was deployed through an automated process and there's no fear there you know exactly what it is you know exactly how to recreate it because you have a golden master image and in our case it's actually an Amazon image that you can just boot more of scaling is a problem you just boot ten more servers boom, done, no problem so yeah I tell the team, you know, pick your technology everything must be automated, that's another piece if you're going to deploy a closure service for the first time you have to be responsible for figuring out how it fits into our deployment system so that you have immutable deployments and disposable nodes if you can do that and you're willing to also maintain it and teach someone else about the little piece of code that you wrote, then cool you can do it, any level you want and then once you deploy stuff like a lot of us like to just SFH in the machines and then twiddle with things and replace files and like try like fixing bugs live on production why no just throw away the actual keys because you're going to throw away the system eventually you don't even need route access to it you don't need to be able to get to it except through the port that your service is listening on so you can't screw it up you can't introduce entropy and mess things up if you throw away the keys so this is actually a practice that you can do deploy the servers, remove all the credentials for logging in and the only option you have is to destroy them when you're done with them provisioning new services in our world must also be trivial so we have actually now thrown away our chef repository because chef is obsolete and we have replaced it with shell scripts and that sounds like I'm an idiot I know, but when I say chef is obsolete I don't really mean that I like to say that so that people will think because a lot of you are probably thinking we should move to chef that would be great because what you have is a bunch of servers that are running for a long time and you need to be able to continue to keep them up to date chef is really great at that chef is also good at booting a new server but really it's just overkill for that yeah so if you're always throwing stuff away I don't think you need chef do something really, really simple and that's what we've done so like whenever we deploy a new type of service I set up ZooKepper recently, which is a complete change from the other stuff we're deploying I think it was a five line shell script to do that I just added it to a get repo and run a command I've got a cluster of ZooKeeper servers running you want to always be deploying your software this is something I learned from Kent Beck early on in the agile extreme programming world that if something is hard or you perceive it to be hard or difficult the best thing you can do if you have to do that thing all the time is to just do it constantly non-stop all the time so like deploying in our old world where it would take all night once a week if we instituted a new policy in that team that said any change that goes to master must be deployed within five minutes I guarantee you we would have fixed that process, right and if you're deploying constantly all day every day you're never going to be afraid of deployments because it's always a small change so always be deploying every new deploy means you're throwing away old servers and replacing them with new ones in our world I would say that the average uptime of one of our servers is probably something like seventeen hours and that's because we don't tend to work on the weekend very much you also, when you have these sorts of systems that are distributed like this and you're trying to reduce the fear of change the big thing that you're afraid of is failure you're afraid that the service is going to fail the system is going to go down one component won't be reachable, that sort of thing so you just to have assume that that's going to happen you are not going to build a system that never fails, ever I hope you don't, because you will have wasted much of your life trying to get that to happen instead, assume that the thing, the components are going to fail and build resiliency in I have a picture here of Joe Armstrong who is one of the inventors of Erlang if you have not studied Erlang philosophy around failure and recovery you should and it won't take you long so I'm just going to leave that as homework for you and then, you know, I said, the tests are a design pattern I don't mean don't write any tests but I also want to be further responsible here and say you should monitor everything you want to favor measurement over testing so I use measurement as a surrogate for testing or as an enhancement and the reason I say this is you can either focus on one of two things I said assume failure right, so mean time between failures or mean time to resolution those are kind of two metrics in the ops world that people talk about for measuring their success and their effectiveness mean time between failures means you're trying to increase the time between failures of the system, so basically you're trying to make failures never happen, right mean time to resolution means when they happen, I'm gonna focus on bringing them back as fast as I possibly can so a perfect example would be a system fails and another one is already up and just takes over its work mean time to resolution is essentially zero, right if you're always assuming that every component can will fail then mean time to resolution is going to be really good because you're going to bake it into the process if you do that, you don't care about when things fail and back to this idea of favoring measurement over testing if you're monitoring everything, everything with intelligence then you're actually focusing on mean time to resolution and acknowledging that the software is going to be broken sometimes, right and when I say monitor everything, I mean everything I don't mean, like your disk space and your memory and stuff there I'm talking about business metrics so, at living social we created this thing called rearview which is now opensource which allows you do to aberration detection and aberration means strange behavior, strange change in behavior so rearview can do aberration detection on data sets, arbitrary data sets which means, like in the living social world we had user sign ups constantly streaming in it was a very high volume site if user sign-ups were weird we would get an alert why might they be weird? one thing could be like the user service is down, right so then we would get two alerts user sign ups have gone down and so has the service so obviously the problem is the service is down let's bring it back up but it could be something like a front-end developer or a designer made a change that was intentional but it just didn't work and no one liked it so they didn't sign up to the site anymore that's more important than just knowing that the service is down right, because what you care about isn't that the service is up or down if you could crash the entire system and still be making money you don't care, right, that's better throw it away and stop paying for the servers but if your system is up 100% of the time and performs excellently but no one's using it, that's bad so monitoring business metrics gives you a lot more than unit test could ever give you and then in our world we focused on experiencing no, you have to come up to front and say ten! ok, ten minutes left when I got to 6WunderKinder in Berlin everyone was terrified to touch the system because they hadn't created a really well-designed but traditional monolithic API so they had layers of abstractions it was all kind of in one big thing they had a huge database and they were really, really scared to do anything so there's like one person who would deploy anything and everyone else was trying to work on other projects and not touch it but it was like the production system you know so it wasn't really an option so the first thing I did in my first week is I got these graphs going and this was, yeah, response time and the first thing I did is I started turning off servers and just watching the graphs and then, as I was turning off the servers I went to the production database and I did select, count, star from tasks and we're a task management app so we have hundreds of millions of tasks and the whole thing crashed and all the people were like AAAAH what's going on you know, and I said, it's no problem I did this on purpose, I'll just make it come back which I did and from that point on like, really every day I would do something which basically crash the system for just a moment and really, like, we had way too many servers in production we were spending tens of thousands more Euros per month than we should have on the infrastructure and I just started taking things away and I would usually do it instead of the responsible way, like one server at a time I would just remove all of them and start adding them back so for a moment everything was down but after that we go to a point where everyone on the team was absolutely comfortable with the worst case scenario of the system being completely down so that we could, in a panic free way just focus on bringing it up when it was bad so now when you do a deployment and you have your business metrics being measured you know the important stuff is happening and you know what to do when everything is down you've experienced the worst thing that can happen well the worst thing is like someone breaks in and steals all your stuff, steals all your users' phone numbers and posts them online like SnapChat or something but you've experienced all these potentially horrible things and realized, eh, it's not so bad, I can deal with this I know what do to it allows you to start making bold moves and that's what we all want right we all want to be able to bravely go into our systems and do anything we think is right so that's what I've been focusing on we also do this thing called Canary in the Coal Mine deployments which removes the fear, also canary in the coalmine refers to a kind of sad thing about coal miners in the US where they would send canaries into the mines at various levels and if the canary died they knew there was a problem with the air but in the software world what this means is you have bunch of servers running or a bunch of, I don't know, clients running a certain version and you start introducing new version incrementally and watching the effects so once you're measuring everything and monitoring everything you can also start doing these canary in the coalmine things where you say OK I have a new version of this service that I'm going to deploy and I've got thirty servers running for it but I'm going to change only five of them now and see, like, does my error rate increase or does my performance drop on those servers or do people actually not successfully complete the task they're trying to do on those servers so, this also allows us the combination of monitoring everything and these immutable deployments and everything gives us the ability to gradually affect change and not be afraid so we roll out changes all day every day because we don't fear that we're just going to destroy the entire system all at once so I think I have like five minutes left uh, these are some things we're not necessarily doing yet but they're some ideas that I have that given some free time I will work on and, they're probably more exciting one is I talked about homeostatic regulation and homeostasis so I think we all understand the idea of you know homeostasis and the fact that systems have different parts that do different roles and can protect each other from each other but, so this diagram is actually just some random diagram I copied and pasted off the AWS website so it's not necessarily all that meaningful except to show that every architecture especially server based architectures has a collection of services that play different roles and it almost looks like a person you've got a brain and a heart and a liver and all these things, right what would it mean to actually implement homeostatic regulation in a web service? so that you have some controlling system where the database will actually kill an app server that is hurting it, for example just kill it I don't know yet, I don't know what that is but some ideas about this stuff I don't know if you've heard of these NetFlix, do you have NetFlix in India yet? probably not, unless you have a VPN, right NetFlix has a really great cloud based architecture they have this thing called Chaos Monkey they've created which goes through their system and randomly destroys Nodes just crashes servers and they did this because, when they were, they were early users of AWS and when they went out initially with AWS, servers were crashing like it was still immature so they said OK we still want to use this and we'll build in stuff so that we can deal with the crashes but we have to know it's gonna work when it crashes so let's make crashing be part of production so they actually have gotten really sophisticated now and they will crash entire regions cause they're in multiple data centers so they'll say like, what would happen if this data center went down, does the site still stay up? and they do this in production all the time like they're crashing servers right now it's really neat another one that is inspirational in this way is Pinterest, they use AWS as well and they have, AWS has this thing called Spot Instances and I won't go into too much detail because I don't have time but Spot Instances allow you to effectively bid on servers at a price that you are willing to pay so like if a usual server costs $0.20 per minute you can say, I'll give $0.15 per minute and when excess capacity comes open it's almost like a stock market if $0.15 is the going price, you'll get a server and it starts up and it runs what you want but here's the cool thing if the stock market goes and the price goes higher than you're willing to pay Amazon will just turn off those servers they're just dead, you don't have any warning they're just dead so Pinterest uses this for their production servers which means they save a lot of money they're paying way under the average Amazon cost for hosting but the really cool thing in my opinion is not the money they save but the fact that like, what would you have to do to build a full system where any node can and will die at any moment and it's not even under your control that's really exciting so a simple thing you can do for homeostasis though is you can just adjust so in our world we have multiple nodes and all these little services we can scale each one independently we're measuring everything so Amazon has a thing called Auto Scaling we don't use it, we do our own scaling and we just do it based on volume and performance now when you have a bunch of services like this like, I don't know, maybe we have fifty different services now that each play tiny little roles it becomes difficult to figure out, like, where things are so we've started implementing zookeeper for service resolution which means a service can come online and say I'm the reminder service version 2.3 and then tell a central guardian and the zookeeper can then route traffic to it probably too detailed for now I'm gonna skip over some stuff real quick but I want to talk about this one if, did the Nordic Ruby, no, Nordic Ruby talks never go online so you can never see this talk sorry at Nordic Ruby Reginald Braithwaite did a really cool talk on like challenges of the Ruby language and he made this statement Ruby has beautiful but static coupling which was really strange but basically he was making the same point that I was talking about earlier that, like Ruby creates a bunch of ways that you can couple your system together that kind of screw you in the end but they're really beautiful to use but, like, Ruby can really lead to some deep crazy coupling and so he presented this idea of bind by contract and bind by contract, in a Ruby sense would be, like, I have a class that has a method that takes these parameters under these conditions and I can kind of put it into my VM and whenever someone needs to have a functionality like that it will be automatically bound together by the fact that it can do that thing and instead of how we tend to use Ruby and Java and other languages I have a class with a method name I'm going to call it right, that's coupling but he proposed this idea of this decoupled system where you just say I need a functionality like this that works under the conditions that I have present so this lead me to this idea and this may be like way too weird, I don't know what if in your web application your routes file for your services read like a functional pattern matching syntax so like if you've ever used Erlang or Haskell or Scala any of these things that have functional pattern matching what if you could then route to different services across a bunch of different services based on contract now I have zero time left but I'm just gonna keep talking, cause I'm mean oh wait I'm not allowed to be mean because of the code of contact so I'll wrap up so this is an idea that I've started working on as well where I would actually write an Erlang service with this sort of functional pattern matching but have it be routing in really fast real time through back end services that support it one more thing I just want to show you real quick that I am working on and I want to show you because I want you to help me has anyone used JSON schema? OK, you people are my friends for the rest of the conference in a system where you have all these things talking to each other you do need a way to validate the inputs and outputs but I don't want to generate code that parses and creates JSON I don't want to do something in real time that intercepts my kind of traffic, so there's this thing called JSON schema that allows you to, in a completely decoupled way specify JSON documents and how they should interact and I am working on a new thing that's called Klagen which is the German word for complain it's written in Scala, so if anyone wants to pair up on some Scala stuff what it will be is a high performance asynchronous JSON schema validation middleware so if that's interesting to anyone, even if you don't know Scala or JSON schema please let me know and I believe I'm out of time so I'm just gonna end there am I right? I'm right, yes so thank you very much, and let's talk during the conference