< Return to Video

Garden City Ruby 2014 - Keynote by Chad Fowler

  • 0:25 - 0:25
  • 0:25 - 0:27
    Chad: Yes, hello, thank you.
  • 0:27 - 0:29
    Audience member: Hello!
  • 0:29 - 0:31
    Chad: Hello!
  • 0:31 - 0:34
    I am Chad, as he said.
  • 0:34 - 0:35
    He said I need no introduction
  • 0:35 - 0:38
    so I won't introduce myself any further.
  • 0:38 - 0:45
    I may be the biggest non-Indian fan of India
  • 1:02 - 1:06
    [Hindi speech]
  • 1:06 - 1:13
  • 1:15 - 1:18
    I'll now switch back, sorry.
  • 1:18 - 1:20
    If you don't understand Hindi, I said nothing
    of value
  • 1:20 - 1:22
    and it was all wrong.
  • 1:22 - 1:24
    But I was saying that my Hindi is bad
  • 1:24 - 1:26
    and it's because now I'm learning German
  • 1:26 - 1:28
    so I mixed them together, but I know not everyone
  • 1:28 - 1:29
    speaks Hindi here.
  • 1:29 - 1:32
    I just had to show off, you know
  • 1:32 - 1:37
    So, I am currently working on 6WunderKinder,
  • 1:37 - 1:40
    and I'm working on a product called Wunderlist.
  • 1:40 - 1:42
    It is a productivity application.
  • 1:42 - 1:46
    It runs on every client you can think of.
  • 1:46 - 1:48
    We have native clients, we have a back-end,
  • 1:48 - 1:50
    we have millions of active users,
  • 1:50 - 1:52
    and I'm telling you this not so that you'll
    go download it -
  • 1:52 - 1:53
    you can do that too -
  • 1:53 - 1:57
    but I want to tell you about the challenges
    that I have
  • 1:57 - 2:01
    and the way I'm starting to think about system's
    architecture and design.
  • 2:01 - 2:03
    That's what I'm gonna talk about today
  • 2:03 - 2:06
    I'm going to show you some things that are
    real
  • 2:06 - 2:07
    and that we're really doing.
  • 2:07 - 2:09
    I'm going to show you some things that are
  • 2:09 - 2:13
    just a fantasy that maybe don't make any sense
    at all.
  • 2:13 - 2:14
    But hopefully I'll get you think about
  • 2:14 - 2:16
    how we think about system architecture
  • 2:16 - 2:18
    and how we build things that can last for
    a long time.
  • 2:18 - 2:21
    So the first thing that I want to mention:
  • 2:21 - 2:23
    this is a graph from the Standish Chaos report
  • 2:23 - 2:25
    and I've taken the years out
  • 2:25 - 2:27
    and I've taken some of the raw data out
  • 2:27 - 2:29
    because it doesn't matter.
  • 2:29 - 2:31
    If you look at these, this graph,
  • 2:31 - 2:33
    each one of these bars is a year,
  • 2:33 - 2:38
    and each bar represents successful projects
    in green -
  • 2:38 - 2:40
    software projects.
  • 2:40 - 2:42
    Challenged projects are in silver or white
    in the middle
  • 2:42 - 2:44
    and then failed ones are in red.
  • 2:44 - 2:47
    But challenged means significantly over time
    or budget
  • 2:47 - 2:49
    which to me means failed too.
  • 2:49 - 2:51
    So basically we're terrible,
  • 2:51 - 2:54
    all of us here, we're terrible.
  • 2:54 - 2:57
    We call ourselves engineers but it's a disgrace.
  • 2:57 - 3:01
    We very rarely actually launch things that
    work.
  • 3:01 - 3:01
    Kind of sad,
  • 3:01 - 3:04
    and I am here to bring you down.
  • 3:04 - 3:07
    Then once you launch software, anecdotal-y,
  • 3:07 - 3:12
    and you probably would see this in your own
    work lives, too,
  • 3:12 - 3:16
    anecdotal-y, software gets killed after about
    five years -
  • 3:16 - 3:18
    business software.
  • 3:18 - 3:20
    So you barely ever get to launch it, because,
  • 3:20 - 3:23
    or at least successfully, in a way that you're
    proud of,
  • 3:23 - 3:25
    and then in about five years
  • 3:25 - 3:28
    you end up in that situation where you're
    doing a big rewrite
  • 3:28 - 3:30
    and throwing everything away and replacing
    it.
  • 3:30 - 3:33
    You know there's always that project to get
    rid of the junk,
  • 3:33 - 3:36
    old Java code or whatever that you wrote five
    years ago,
  • 3:36 - 3:37
    replace it with Ruby now,
  • 3:37 - 3:40
    five years from now you'll be replacing your
    old junk Ruby code
  • 3:40 - 3:46
    that didn't work with something else.
  • 3:46 - 3:49
    We create this thing, probably all of you
    know the term legacy software -
  • 3:49 - 3:53
    Right, am I right? You know what legacy software
    is,
  • 3:53 - 3:56
    and you probably think of it as a negative
    thing.
  • 3:56 - 3:58
    You think of it as that ugly code that doesn't
    work,
  • 3:58 - 4:03
    that's brittle, that you can't change, that
    you're all afraid of.
  • 4:03 - 4:07
    But there's actually also a positive connotation
    of the word legacy:
  • 4:07 - 4:14
    it's leaving behind something that future
    generations can benefit from.
  • 4:14 - 4:17
    But if we're rarely ever launching successful
    projects
  • 4:17 - 4:21
    and then the ones we do launch tend to die
    within five years
  • 4:21 - 4:25
    none of us are actually creating a legacy
    in our work.
  • 4:25 - 4:27
    We're just creating stuff that gets thrown
    away.
  • 4:27 - 4:29
    Kind of sad.
  • 4:29 - 4:32
    So we create this stuff that's a legacy software.
  • 4:32 - 4:35
    It's hard to change, that's why it ends up
    getting thrown away
  • 4:35 - 4:37
    right, that's, if the software worked
  • 4:37 - 4:40
    and you could keep changing it to meet the
    needs of the business
  • 4:40 - 4:44
    you wouldn't need to do a big rewrite and
    throw it away.
  • 4:44 - 4:48
    We create these huge tightly-coupled systems,
  • 4:48 - 4:49
    and I don't just mean one application,
  • 4:49 - 4:51
    but like many applications are all tightly
    coupled.
  • 4:51 - 4:56
    You've got this thing over here talking to
    the database of this system over here
  • 4:56 - 4:59
    so if you change the columns to update the
    view of a webpage
  • 4:59 - 5:03
    you ruin your billing system, that kind of
    thing
  • 5:03 - 5:06
    this is what makes it so hard to change
  • 5:06 - 5:10
    and the sad thing about this is the way we
    work
  • 5:10 - 5:14
    the way we develop software, this is the default
    setting
  • 5:14 - 5:18
    and, what I mean is, if we were robots churning
    out software
  • 5:18 - 5:21
    and we had a preferences panel
  • 5:21 - 5:25
    the default preferences would lead to us creating
    terrible software that gets thrown away in
  • 5:25 - 5:26
    five years
  • 5:26 - 5:27
    that's just how we all work
  • 5:27 - 5:30
    as human beings when we sit down to write
    code
  • 5:30 - 5:35
    our default instincts lead to us to create
    systems that are tightly coupled
  • 5:35 - 5:42
    and hard to change and ultimately get thrown
    away and can't scale
  • 5:42 - 5:46
    we create, we try doing tests, we try doing
    TDD
  • 5:46 - 5:51
    but we create test suites that take forty-five
    minutes to run
  • 5:51 - 5:53
    every team has had to deal with this I'm sure
  • 5:53 - 5:56
    if you've written any kind of meaningful application
  • 5:56 - 5:58
    and it gets to where you have like a project
  • 5:58 - 6:00
    to speed up the test suite
  • 6:00 - 6:03
    like you start focusing your company's resources
  • 6:03 - 6:05
    on making the test suite faster
  • 6:05 - 6:09
    or making it like only fail ninety percent
    of the time
  • 6:09 - 6:11
    and then you say well if it only fails ninety
    percent that's OK
  • 6:11 - 6:15
    right, and right now it's taking forty-five
    minutes
  • 6:15 - 6:18
    we want to get it to where it only takes ten
    minutes to run
  • 6:18 - 6:24
    so the test suite ends up being a liability
    instead of a benefit
  • 6:24 - 6:26
    because of the way you do it
  • 6:26 - 6:29
    because you have this architect where everything
    is so coupled
  • 6:29 - 6:35
    you can't change anything without spending
    hours working on the stupid test suite
  • 6:35 - 6:38
    and your terrified to deploy
  • 6:38 - 6:43
    I know like the last big Java project I was
    working on
  • 6:43 - 6:46
    it would take, once a week we did a deploy
  • 6:46 - 6:50
    it would take fifteen people all night to
    deploy the thing
  • 6:50 - 6:52
    and usually it was like copying class files
    around
  • 6:52 - 6:54
    and restarting servers
  • 6:54 - 6:57
    it's much better today but it's still terrifying
  • 6:57 - 6:59
    you deploy code, you change it in production
  • 6:59 - 7:01
    you're not sure what might break
  • 7:01 - 7:04
    cause it's really hard to test these big integrated
    things together
  • 7:04 - 7:09
    and actually upgrading the technology component
    is terrifying
  • 7:09 - 7:13
    so, how many of you have been doing Rails
    for more than three years?
  • 7:13 - 7:18
    do you have, like a Rails 2 app in production,
    anyone? Yeah?
  • 7:18 - 7:22
    that's a lot of people, wow, that's terrifying
  • 7:22 - 7:26
    and I've been in situations, recently, where
    we had Rails 2 apps in production
  • 7:26 - 7:30
    security patches are coming out, we were applying
    our own versions
  • 7:30 - 7:31
    of those security patches
  • 7:31 - 7:32
    because we were afraid to upgrade Rails
  • 7:32 - 7:35
    we would rather hack it than upgrade the thing
  • 7:35 - 7:38
    because you just don't know what's gonna happen
  • 7:38 - 7:42
    and then you end up, as you're re-implementing
    all this stuff yourself
  • 7:42 - 7:45
    you end up burning yourself out, wasting your
    time
  • 7:45 - 7:48
    because you're hacking on stupid Rails 2
  • 7:48 - 7:50
    or some old struts version
  • 7:50 - 7:53
    when you should be just taking advantage of
    the new patches
  • 7:53 - 7:55
    but you can't because you're afraid to upgrade
    the software
  • 7:55 - 7:56
    because you don't know what's going to happen
  • 7:56 - 8:03
    because the system is too big and too scary
  • 8:03 - 8:05
    then, and this is really bad, I think this
    is something
  • 8:05 - 8:07
    Ruby messes up for all of us
  • 8:07 - 8:11
    I say this as someone who's been using Ruby
    for thirteen years now
  • 8:11 - 8:13
    happily
  • 8:13 - 8:16
    we create these mountains of abstractions
  • 8:16 - 8:18
    and the logic ends up being buried inside
    them
  • 8:18 - 8:23
    I mean in Java it was like static, or, you
    know, factories
  • 8:23 - 8:25
    and design pattern soup
  • 8:25 - 8:27
    in Ruby its modules and mixins and you know
  • 8:27 - 8:31
    we have all these crazy ways of hiding what's
    actually happening from us
  • 8:31 - 8:33
    but when you go look at the code
  • 8:33 - 8:34
    it's completely opaque
  • 8:34 - 8:37
    you have no idea where the stuff actually
    gets done
  • 8:37 - 8:41
    because it's in some magic library somewhere
  • 8:41 - 8:45
    and we do all that because we're trying to
    save ourselves from the complexity of these
  • 8:45 - 8:47
    big nasty systems
  • 8:47 - 8:51
    but like if you look at the rest of the world
  • 8:51 - 8:54
    this is a software specific problem
  • 8:54 - 8:59
    these cars are old, they're older than any
    software that you would ever run
  • 8:59 - 9:00
    and they're still driving down the street
  • 9:00 - 9:03
    they're older than software itself, right
  • 9:03 - 9:06
    but these things still function, they still
    work
  • 9:06 - 9:09
    how? why? why do they work?
  • 9:09 - 9:11
    bodies! my body should not work
  • 9:11 - 9:13
    I have abused it
  • 9:13 - 9:14
    I should not be standing here today
  • 9:14 - 9:17
    I shouldn't have been able to come from Berlin
    here
  • 9:17 - 9:19
    without dying somehow by being in the air
  • 9:19 - 9:24
    you know, by the air pressure changes
  • 9:24 - 9:26
    but our bodies somehow can survive even when
  • 9:26 - 9:31
    we don't take care of them
  • 9:31 - 9:35
    and like it's just the system that works,
    right
  • 9:35 - 9:38
    so how do our bodies work?
  • 9:38 - 9:39
    how do we stay alive
  • 9:39 - 9:41
    despite this fact
  • 9:41 - 9:42
    even though we haven't done like some
  • 9:42 - 9:45
    great design, we don't have any design patterns
  • 9:45 - 9:50
    like mixed up into our bodies
  • 9:50 - 9:54
    in biology there is a term called homeostasis
  • 9:54 - 9:56
    and I literally don't know what this means
  • 9:56 - 9:57
    other than this definition
  • 9:57 - 9:59
    so you won't learn about this from me
  • 9:59 - 10:01
    there's probably at least one biologist in
    the room
  • 10:01 - 10:04
    so you can correct me later
  • 10:04 - 10:08
    but basically the idea of homeostasis is
  • 10:08 - 10:11
    that an organism has all these different components
  • 10:11 - 10:14
    that serve different purposes
  • 10:14 - 10:16
    that regulate it
  • 10:16 - 10:18
    so they're all kind of in balance
  • 10:18 - 10:21
    and they work together to regulate the system
  • 10:21 - 10:24
    if one component, like a liver, does too much
  • 10:24 - 10:25
    or does the wrong thing
  • 10:25 - 10:28
    another component kicks in and fixes it
  • 10:28 - 10:30
    and so our bodies are this well designed system
  • 10:30 - 10:32
    for staying alive
  • 10:32 - 10:35
    because we have almost like autonomous agents
  • 10:35 - 10:39
    internally that take care of the many things
    that can and do go wrong
  • 10:39 - 10:42
    on a regular basis
  • 10:42 - 10:44
    so you have, you know, your brain, your liver
  • 10:44 - 10:47
    your liver, of course, metabolizes toxic substances
  • 10:47 - 10:50
    your kidney deals with blood, water level,
    et cetera
  • 10:50 - 10:56
    you know all these things work in concert
    to make you live
  • 10:56 - 11:01
    the inability to continue to do that is known
    as homeostatic imbalance
  • 11:01 - 11:04
    so I was saying, homeostasis is balancing
  • 11:04 - 11:07
    not being able to do that is when you're out
    of balance
  • 11:07 - 11:10
    and that will actually lead to really bad
    health problems
  • 11:10 - 11:16
    or probably death, if you fall into homeostatic
    imbalance
  • 11:16 - 11:20
    so the good news is you're already dying
  • 11:20 - 11:22
    like we're all dying all the time
  • 11:22 - 11:26
    this is the beautiful thing about death
  • 11:26 - 11:29
    there is, there is an estimate that fifty
    trillion cells
  • 11:29 - 11:32
    are in your body, and three million die per
    second
  • 11:32 - 11:36
    it's an estimate because it's actually impossible
    to count
  • 11:36 - 11:40
    but scientists have figured out somehow that
    this is probably the right number
  • 11:40 - 11:42
    so your cells, you've probably heard this
    all your life
  • 11:42 - 11:45
    like physically, after some amount of time,
  • 11:45 - 11:47
    you aren't the same human being that you were,
    physically
  • 11:47 - 11:53
    you know, I don't know, you some period of
    time ago
  • 11:53 - 11:56
    you're literally not the same organism anymore
  • 11:56 - 11:58
    but you're the same system
  • 11:58 - 12:01
    kind of interesting, isn't it
  • 12:01 - 12:07
    so in a way you can think about software this
  • 12:07 - 12:08
    you can think about software as a system
  • 12:08 - 12:11
    if the components could be replaced like these
    cells
  • 12:11 - 12:18
    like, if you focus on making death, constant
    death OK
  • 12:19 - 12:20
    on a small level
  • 12:20 - 12:25
    then the system can live on a large level
  • 12:25 - 12:26
    that's what this talk is about
  • 12:26 - 12:29
    solution, the solution being to mimic living
    organisms
  • 12:29 - 12:36
    and as an aside, I will say many times the
    word small or tiny in this talk
  • 12:36 - 12:38
    because I think I'm learning, as I age
  • 12:38 - 12:40
    that small is good
  • 12:40 - 12:43
    its, small projects are good
  • 12:43 - 12:44
    you know how to estimate them
  • 12:44 - 12:45
    small commitments are good
  • 12:45 - 12:47
    because you know you can make them
  • 12:47 - 12:48
    small methods are good
  • 12:48 - 12:49
    small classes are good
  • 12:49 - 12:50
    small applications are good
  • 12:50 - 12:52
    small teams are good
  • 12:52 - 12:55
    so I don't know, this is sort of a non sequitur
  • 12:55 - 12:58
    so if we're going to think about software
  • 12:58 - 13:00
    as like an organism
  • 13:00 - 13:03
    what is a cell in that context?
  • 13:03 - 13:06
    this is sort of the key question that you
    have to ask yourself
  • 13:06 - 13:09
    and I say that a cell is a tiny component
  • 13:09 - 13:13
    now, tiny and component are both subjective
    words
  • 13:13 - 13:15
    so you can kind of do what you want with that
  • 13:15 - 13:18
    but it's a good frame of thinking
  • 13:18 - 13:21
    if you make your software system of tiny components
  • 13:21 - 13:23
    each one can be like a cell
  • 13:23 - 13:28
    each one can die and the system is a collection
    of those tiny components
  • 13:28 - 13:32
    and what you want is not for your code to
    live forever
  • 13:32 - 13:36
    you don't care that each line of code lives
    forever, right
  • 13:36 - 13:39
    like if you're trying to develop a legacy
    in software
  • 13:39 - 13:43
    it's not important to you that your system
    dot out dot printline statement
  • 13:43 - 13:44
    lives for ten years
  • 13:44 - 13:48
    it's important to you that the function of
    the system lives for ten years
  • 13:48 - 13:50
    so like, about exactly ten years ago
  • 13:50 - 13:57
    we created Ruby gems at the RubyConf 2003
    in Austin, Texas
  • 13:59 - 14:04
    I haven't touched Ruby gems myself in like
    four or five years
  • 14:04 - 14:05
    but people are still using it
  • 14:05 - 14:06
    they hate it because it's software
  • 14:06 - 14:08
    everybody hates software right
  • 14:08 - 14:10
    so if you can create software that people
    hate
  • 14:10 - 14:13
    you've succeeded
  • 14:13 - 14:14
    but it still exists
  • 14:14 - 14:17
    I have no idea if any of the code is the same
  • 14:17 - 14:17
    I would assume not
  • 14:17 - 14:21
    you know I think, I'm sure that my name is
    still in it in a copyright notice
  • 14:21 - 14:24
    but that's about it
  • 14:24 - 14:25
    and that's a beautiful thing
  • 14:25 - 14:28
    people are still using it to install Ruby
    libraries
  • 14:28 - 14:30
    and software
  • 14:30 - 14:36
    and I don't care if any of my existing, or
    my initial code is still in the system
  • 14:36 - 14:37
    because the system still lives
  • 14:37 - 14:43
    so, quite a long time ago now I was researching
    this kind of question
  • 14:43 - 14:45
    about Legacy software
  • 14:45 - 14:48
    and I asked a question on Twitter as I often
    do at conferences
  • 14:48 - 14:50
    when I'm preparing
  • 14:50 - 14:56
    what are some of the old surviving software
    systems you regularly use
  • 14:56 - 14:58
    and if you look at this, I mean, one thing
    is obviously
  • 14:58 - 15:03
    everyone who answered gave some sort of Unix
    related answer
  • 15:03 - 15:07
    but basically all of these things on this
    list
  • 15:07 - 15:13
    are either systems that are collections of
    really well-known split-up components
  • 15:13 - 15:16
    or they're tiny, tiny programs
  • 15:16 - 15:19
    so, like, grep is a tiny program, make
  • 15:19 - 15:20
    it only does one thing
  • 15:20 - 15:24
    well make is actually also arguably an operating
    system
  • 15:24 - 15:27
    but I won't get into that
  • 15:27 - 15:29
    emacs is obviously an operating system, right
  • 15:29 - 15:33
    but it's well designed of these tiny little
    pieces
  • 15:33 - 15:37
    so a lot of the old systems I know about follow
    this pattern
  • 15:37 - 15:40
    this metaphor that I'm proposing
  • 15:40 - 15:42
    and from my own career
  • 15:42 - 15:44
    when I was here before in Banglore
  • 15:44 - 15:47
    I worked for GE and some of the people
  • 15:47 - 15:49
    we hired even worked on the system there
  • 15:49 - 15:51
    we had a system called the Bull
  • 15:51 - 15:54
    and it was a Honeywell Bull mainframe
  • 15:54 - 15:57
    I doubt any of you have worked on that
  • 15:57 - 15:58
    but this one I know you didn't work on
  • 15:58 - 16:01
    because it had a custom operating system
  • 16:01 - 16:03
    with our own RDVMS
  • 16:03 - 16:06
    we had created a PCP stack for it
  • 16:06 - 16:11
    using like custom hardware that we plugged
    into a Windows MT computer
  • 16:11 - 16:15
    with some sort of MT queuing system back in
    the day
  • 16:15 - 16:17
    it was this terrifying thing
  • 16:17 - 16:23
    when I started working there the system was
    already something like twenty-five years old
  • 16:23 - 16:26
    and I believe even though there have been
    many, many projects
  • 16:26 - 16:30
    to try to kill it, like we had a team called
    the Bull exit team
  • 16:30 - 16:33
    I believe the system is still in production
  • 16:33 - 16:37
    not as much as it used to be, there are less
    and less functions in production
  • 16:37 - 16:39
    but I believe the system is still in production
  • 16:39 - 16:46
    the reason for this is that the system was
    actually made up of these tiny little components
  • 16:47 - 16:51
    and like really queer interfaces between them
  • 16:51 - 16:54
    and we kept the system live because every
    time we tried to replace it
  • 16:54 - 16:57
    with some fancy new gem, web thing or gooey
    app
  • 16:57 - 16:59
    it wasn't as good, and the users hated it
  • 16:59 - 17:01
    it just didn't work
  • 17:01 - 17:05
    so we had to use this old, crazy, modified
    mainframe
  • 17:05 - 17:08
    for a long time as a result
  • 17:08 - 17:11
    so, the question I ask myself is now
  • 17:11 - 17:13
    how do I, how do I approach a problem like
    this
  • 17:13 - 17:19
    and build a system that can survive for a
    long time
  • 17:19 - 17:20
    I would encourage you
  • 17:20 - 17:23
    how many of you know of Fred George
  • 17:23 - 17:25
    this is Fred George
  • 17:25 - 17:26
    he was at ThoughtWorks for awhile
  • 17:26 - 17:28
    so he may have, I think he lived in Banglore
  • 17:28 - 17:31
    for some time with ThoughtWorks, in fact
  • 17:31 - 17:35
    he is now running a start-up in Silicon Valley
  • 17:35 - 17:39
    but he has this talk that you can watch online
  • 17:39 - 17:42
    from the Barcelona Ruby Conference the year
    before last
  • 17:42 - 17:45
    called Microservice Architectures
  • 17:45 - 17:48
    and he talks in great detail about he,
  • 17:48 - 17:50
    how he implemented a concept at forward
  • 17:50 - 17:52
    that's very much like what I'm talking about
  • 17:52 - 17:55
    tiny components that only do one thing and
    can be thrown away
  • 17:55 - 18:00
    so Microservice Architecture is kind of the
    core of what I'm gonna talk about
  • 18:00 - 18:02
    now I've put together some rules for 6WunderKinder
  • 18:02 - 18:04
    which I am going to share with you
  • 18:04 - 18:07
    6WunderKinder is the company I work for
  • 18:07 - 18:09
    when we're working on Wunderlist
  • 18:09 - 18:12
    and the rules of the, the goals of these rules
  • 18:12 - 18:17
    are to reduce coupling, to make it where we
    can do fear-free deployments
  • 18:17 - 18:19
    we reduce the chance of "cruft" in our code
  • 18:19 - 18:21
    like nasty stuff that you're afraid of
  • 18:21 - 18:25
    that you leave there, kind of broken window
    problems
  • 18:25 - 18:29
    we make it literally trivial to change code
  • 18:29 - 18:33
    so you just never have to ask how do I do
    that
  • 18:33 - 18:34
    you just find it easy
  • 18:34 - 18:39
    and most importantly we give ourselves the
    freedom to go fast
  • 18:39 - 18:44
    because I think no developer ever wants to
    be slow
  • 18:44 - 18:45
    that's one of the worst things
  • 18:45 - 18:48
    just toiling away and not actually accomplishing
    anything
  • 18:48 - 18:51
    but we go slow because we're constrained by
    the system
  • 18:51 - 18:54
    and we're constrained by, sometimes projects
  • 18:54 - 18:56
    and other, you know, management related things
  • 18:56 - 19:01
    but often times its the mess of the system
    that we've created
  • 19:01 - 19:04
    so some of the rules
  • 19:04 - 19:09
    I think one thing, and maybe, maybe I'm going
    to get some push back from this crowd
  • 19:09 - 19:13
    one rule that is less controversial than it
    used to be
  • 19:13 - 19:15
    is that comments are a design smell
  • 19:15 - 19:19
    does anyone strongly disagree with that?
  • 19:19 - 19:21
    no?
  • 19:21 - 19:24
    does anyone strongly agree with that?
  • 19:24 - 19:27
    OK, so the rest of you have no idea what I'm
    talking about
  • 19:27 - 19:33
    so a design smell, I want to define this really
    quickly
  • 19:33 - 19:37
    a design smell is something you see in your
    code or your system
  • 19:37 - 19:40
    where it doesn't necessarily mean it's bad
  • 19:40 - 19:41
    but you look at it and you think
  • 19:41 - 19:43
    hmm, I should look into this a little bit
  • 19:43 - 19:46
    and ask myself, why are there so many comments
    in this code?
  • 19:46 - 19:48
    you know, especially the bottom one
  • 19:48 - 19:51
    inline comments?
  • 19:51 - 19:57
    definitely bad, definitely a sign that you
    should have another method, right
  • 19:57 - 19:59
    so it's pretty easy to convince people
  • 19:59 - 20:00
    that comments are a design smell
  • 20:00 - 20:02
    and I think a lot of people in the industry
  • 20:02 - 20:03
    are starting to agree
  • 20:03 - 20:05
    maybe not for like a public library
  • 20:05 - 20:07
    where you really need to tell someone
  • 20:07 - 20:10
    here's how you use this class and this is
    what it's for
  • 20:10 - 20:12
    but you shouldn't have to document every method
  • 20:12 - 20:15
    and every argument because the method name
    and the argument name
  • 20:15 - 20:18
    should speak for themselves, right
  • 20:18 - 20:21
    so here's one that you probably won't agree
    with
  • 20:21 - 20:22
    tests are a design smell
  • 20:22 - 20:29
    so this one is probably a little more controversial
  • 20:29 - 20:33
    especially in an environment where you're
    maybe still struggling people
  • 20:33 - 20:38
    struggling with people to actually get them
    to write tests to begin with, right
  • 20:38 - 20:41
    you know I went through this period in, like,
    2000 and 2001
  • 20:41 - 20:44
    where I was really heavily into evangelizing
    TDD
  • 20:44 - 20:47
    and it was really stressful that you couldn't
    get anyone to do it
  • 20:47 - 20:50
    I think you do have to go through that period
  • 20:50 - 20:52
    and I'm not saying you shouldn't write any
    tests
  • 20:52 - 20:57
    but that picture I showed you earlier of the
    slow, brittle test suite
  • 20:57 - 20:58
    that's bad, right
  • 20:58 - 21:01
    that is a bad state to be in
  • 21:01 - 21:04
    and you're in that state because your tests
    suck
  • 21:04 - 21:06
    that's why you get in that state
  • 21:06 - 21:10
    your tests suck because you're writing bad
    tests
  • 21:10 - 21:16
    that don't exercise the right things in your
    system
  • 21:16 - 21:19
    and what I've found is whenever I look into
    one of these
  • 21:19 - 21:22
    big slow brittle test suites
  • 21:22 - 21:25
    the tests themselves are indications
  • 21:25 - 21:28
    and the sheer proliferation of tests
  • 21:28 - 21:31
    are indications that the system is bad
  • 21:31 - 21:34
    and the developers are like desperately
  • 21:34 - 21:37
    fearfully trying to run the code
  • 21:37 - 21:39
    in every way they can
  • 21:39 - 21:41
    because it's the only way they can manage
  • 21:41 - 21:44
    to even think about the complexity
  • 21:44 - 21:48
    but if you think about it, if you had a tiny
    trivial system
  • 21:48 - 21:50
    you wouldn't need to have hundreds of test
    files
  • 21:50 - 21:53
    that take ten minutes to run, ever
  • 21:53 - 21:54
    if you did, you're doing something stupid
  • 21:54 - 21:57
    you're wasting your time working on tests
  • 21:57 - 22:00
    and we as software developers obsess about
    this kind of thing
  • 22:00 - 22:05
    because we have to fight so hard to get our
    peers to do it in the first place
  • 22:05 - 22:06
    and to understand it
  • 22:06 - 22:10
    we obsess to the point where we focus on the
    wrong thing
  • 22:10 - 22:15
    none of us are in the business of writing
    tests for customers
  • 22:15 - 22:18
    like we're not launching our tests on the
    web
  • 22:18 - 22:20
    and hoping people will buy them, right
  • 22:20 - 22:24
    it doesn't provide value, it's just a side-effect
  • 22:24 - 22:26
    that we have focused too heavily on
  • 22:26 - 22:30
    and we've lost sight of what the actual goal
    is
  • 22:30 - 22:34
    so, this one actually requires a visual
  • 22:34 - 22:37
    I tell the people on my team now
  • 22:37 - 22:40
    you can write code in any language you want
  • 22:40 - 22:43
    any framework you want, anything you want
    to do
  • 22:43 - 22:45
    as long as the code is this big
  • 22:45 - 22:47
    so if you want to write the new service in
    Haskell
  • 22:47 - 22:50
    and it's this big in a normal size font
  • 22:50 - 22:51
    you can do it
  • 22:51 - 22:54
    if you want to do it in Closure or Elixir
    or Scarla or Ruby
  • 22:54 - 22:55
    or whatever you want to do
  • 22:55 - 22:57
    even Python for god's sake
  • 22:57 - 22:59
    you can do it if it's this big and no bigger
  • 22:59 - 23:04
    why? because it means I can look at it
  • 23:04 - 23:06
    and I can understand it
  • 23:06 - 23:09
    or if I don't I'll just throw it away
  • 23:09 - 23:12
    because if it's this big it doesn't do very
    much, right
  • 23:12 - 23:14
    so the risk is really low
  • 23:14 - 23:17
    and I really mean the system is that
  • 23:17 - 23:19
    there are the, the component is that big
  • 23:19 - 23:21
    and in my world a component means a service
  • 23:21 - 23:25
    that's running and probably listening on an
    HTTP board
  • 23:25 - 23:28
    or some sort of rift or RPC protocol
  • 23:28 - 23:30
    so it's a standalone thing
  • 23:30 - 23:31
    it's its own application
  • 23:31 - 23:33
    it's probably in its own git repository
  • 23:33 - 23:35
    people do poll requests against it
  • 23:35 - 23:36
    but it's just tiny
  • 23:36 - 23:39
    so this big
  • 23:39 - 23:41
    at the top of this, by the way
  • 23:41 - 23:46
    is some code by Konstantin Haase
  • 23:46 - 23:49
    who also lives in Berlin, where I live
  • 23:49 - 23:51
    this is a rewrite of Sinatra
  • 23:51 - 23:52
    the web framework
  • 23:52 - 23:55
    and Konstantin is actually the maintainer
    of Sinatra
  • 23:55 - 23:59
    it's not fully compatible, but it's amazingly
    close
  • 23:59 - 24:00
    and it all fits right in that
  • 24:00 - 24:05
    but the font size is kind of small, so I cheated
  • 24:05 - 24:09
    another rule, our systems are heterogeneous
    by default
  • 24:09 - 24:11
    so I say you can write in any language you
    want
  • 24:11 - 24:14
    that's not just because I want the developers
    to be excited
  • 24:14 - 24:17
    although I think, most of you, if you worked
  • 24:17 - 24:19
    in an environment where your boss told you
  • 24:19 - 24:22
    you can use any programming language or tool
    you want
  • 24:22 - 24:24
    you would be pretty happy about that, right
  • 24:24 - 24:27
    anyone unhappy about that? I don't think so
  • 24:27 - 24:28
    unless it's one of the bosses here
  • 24:28 - 24:32
    that's like don't tell people that
  • 24:32 - 24:33
    so that's one thing
  • 24:33 - 24:37
    the other one is, it leads to a good system
    design
  • 24:37 - 24:39
    because think about this
  • 24:39 - 24:42
    if I write one program in Erlang, one component
    in Erlang
  • 24:42 - 24:44
    one program in Ruby
  • 24:44 - 24:48
    I have to work really, really hard to make
    tight coupling
  • 24:48 - 24:50
    between those things
  • 24:50 - 24:53
    like I have to basically use computer science
    to do that
  • 24:53 - 24:54
    I don't even know what I would do
  • 24:54 - 24:56
    you know it's hard
  • 24:56 - 24:59
    like I would have to maybe implement Ruby
    in Erlang
  • 24:59 - 25:01
    so that it can run in the same BM or vice
    versa
  • 25:01 - 25:04
    it's just silly, I wouldn't do it
  • 25:04 - 25:07
    so if my system is heterogeneous by default
  • 25:07 - 25:12
    my coupling is very low, at least at a certain
    level by default
  • 25:12 - 25:14
    because it's the path of least resistance
  • 25:14 - 25:17
    is to make the system decoupled
  • 25:17 - 25:19
    it's easier to make things decoupled than
    coupled
  • 25:19 - 25:22
    if they're all running in different languages
  • 25:22 - 25:25
    so in the past three months, I'll say
  • 25:25 - 25:30
    I have written production code in objective
    CRuby, Scala, Closure, Node
  • 25:30 - 25:34
    I don't know, more stuff, Java
  • 25:34 - 25:36
    all these different languages
  • 25:36 - 25:39
    real code for work
  • 25:39 - 25:41
    and yes, they are not tightly coupled
  • 25:41 - 25:45
    like I haven't installed JRuby so that I could
    reach into the internals of my Scala code
  • 25:45 - 25:46
    because that would be a pain
  • 25:46 - 25:51
    I don't want to do that
  • 25:51 - 25:53
    another very important one is
  • 25:53 - 25:56
    server nodes are disposable
  • 25:56 - 25:59
    so, back when I was at GE, for example
  • 25:59 - 26:03
    I remember being really proud when I looked
    at the up time of one of my servers
  • 26:03 - 26:05
    and it was like four hundred days or something
  • 26:05 - 26:07
    it's like, wow, this is awesome
  • 26:07 - 26:10
    I have this big server, it had all these apps
    on it
  • 26:10 - 26:13
    we kept it running for four hundred days
  • 26:13 - 26:15
    the problem with that is I was afraid to ever
    touch it
  • 26:15 - 26:18
    I was really happy it was alive
  • 26:18 - 26:19
    but I didn't want to do anything to it
  • 26:19 - 26:21
    I was afraid to update the operating system
  • 26:21 - 26:24
    in fact you could not upgrade Solaris then
    without restarting it
  • 26:24 - 26:28
    so that meant I had not upgrading the operating
    system
  • 26:28 - 26:32
    I probably shouldn't have been too proud about
    it
  • 26:32 - 26:35
    Nodes that are alive for a long time lead
    to fear
  • 26:35 - 26:37
    and what I want is less fear
  • 26:37 - 26:39
    so I throw them away
  • 26:39 - 26:43
    and this means I don't have physical servers
    that I throw away
  • 26:43 - 26:46
    that would be fun but I'm not that rich yet
  • 26:46 - 26:49
    we use AWS right now, you could do it with
    any kind of cloud service
  • 26:49 - 26:53
    or even internal cloud divider
  • 26:53 - 26:54
    but every node is disposable
  • 26:54 - 27:01
    so, we never upgrade software on an existing
    server
  • 27:01 - 27:03
    whenever you want to deploy a new version
    of a service
  • 27:03 - 27:04
    you create new servers
  • 27:04 - 27:05
    and you deploy that version
  • 27:05 - 27:09
    and then you replace them in the load balance
    or somewhere
  • 27:09 - 27:10
    that's it
  • 27:10 - 27:13
    so, you never have to wonder what's on a server
  • 27:13 - 27:16
    because it was deployed through an automated
    process
  • 27:16 - 27:17
    and there's no fear there
  • 27:17 - 27:18
    you know exactly what it is
  • 27:18 - 27:19
    you know exactly how to recreate it
  • 27:19 - 27:22
    because you have a golden master image
  • 27:22 - 27:24
    and in our case it's actually an Amazon image
  • 27:24 - 27:26
    that you can just boot more of
  • 27:26 - 27:27
    scaling is a problem
  • 27:27 - 27:29
    you just boot ten more servers
  • 27:29 - 27:33
    boom, done, no problem
  • 27:33 - 27:35
    so yeah I tell the team, you know, pick your
    technology
  • 27:35 - 27:38
    everything must be automated, that's another
    piece
  • 27:38 - 27:43
    if you're going to deploy a closure service
    for the first time
  • 27:43 - 27:47
    you have to be responsible for figuring out
    how it fits into our deployment system
  • 27:47 - 27:50
    so that you have immutable deployments and
    disposable nodes
  • 27:50 - 27:54
    if you can do that and you're willing to also
    maintain it and teach someone else
  • 27:54 - 27:56
    about the little piece of code that you wrote,
    then cool
  • 27:56 - 27:59
    you can do it, any level you want
  • 27:59 - 28:03
    and then once you deploy stuff
  • 28:03 - 28:05
    like a lot of us like to just SFH in the machines
  • 28:05 - 28:08
    and then twiddle with things and replace files
  • 28:08 - 28:12
    and like try like fixing bugs live on production
  • 28:12 - 28:14
    why no just throw away the actual keys
  • 28:14 - 28:17
    because you're going to throw away the system
    eventually
  • 28:17 - 28:19
    you don't even need route access to it
  • 28:19 - 28:21
    you don't need to be able to get to it
  • 28:21 - 28:25
    except through the port that your service
    is listening on
  • 28:25 - 28:27
    so you can't screw it up
  • 28:27 - 28:29
    you can't introduce entropy and mess things
    up
  • 28:29 - 28:31
    if you throw away the keys
  • 28:31 - 28:34
    so this is actually a practice that you can
    do
  • 28:34 - 28:36
    deploy the servers, remove all the credentials
  • 28:36 - 28:39
    for logging in and the only option you have
  • 28:39 - 28:44
    is to destroy them when you're done with them
  • 28:44 - 28:45
    provisioning new services in our world
  • 28:45 - 28:47
    must also be trivial
  • 28:47 - 28:51
    so we have actually now thrown away our chef
    repository
  • 28:51 - 28:54
    because chef is obsolete and
  • 28:54 - 28:56
    we have replaced it with shell scripts
  • 28:56 - 29:01
    and that sounds like I'm an idiot
  • 29:01 - 29:04
    I know, but when I say chef is obsolete
  • 29:04 - 29:05
    I don't really mean that
  • 29:05 - 29:07
    I like to say that so that people will think
  • 29:07 - 29:08
    because a lot of you are probably thinking
  • 29:08 - 29:11
    we should move to chef
  • 29:11 - 29:12
    that would be great
  • 29:12 - 29:14
    because what you have is a bunch of servers
  • 29:14 - 29:15
    that are running for a long time
  • 29:15 - 29:17
    and you need to be able to continue to keep
    them up to date
  • 29:17 - 29:19
    chef is really great at that
  • 29:19 - 29:22
    chef is also good at booting a new server
  • 29:22 - 29:24
    but really it's just overkill for that
  • 29:24 - 29:25
    yeah
  • 29:25 - 29:26
    so if you're always throwing stuff away
  • 29:26 - 29:28
    I don't think you need chef
  • 29:28 - 29:29
    do something really, really simple
  • 29:29 - 29:30
    and that's what we've done
  • 29:30 - 29:33
    so like whenever we deploy a new type of service
  • 29:33 - 29:38
    I set up ZooKepper recently, which is a complete
    change from the other stuff we're deploying
  • 29:38 - 29:40
    I think it was a five line shell script to
    do that
  • 29:40 - 29:43
    I just added it to a get repo and run a command
  • 29:43 - 29:47
    I've got a cluster of ZooKeeper servers running
  • 29:47 - 29:51
    you want to always be deploying your software
  • 29:51 - 29:56
    this is something I learned from Kent Beck
    early on in the agile extreme programming
  • 29:56 - 29:56
    world
  • 29:56 - 29:58
    that if something is hard
  • 29:58 - 30:00
    or you perceive it to be hard or difficult
  • 30:00 - 30:02
    the best thing you can do
  • 30:02 - 30:04
    if you have to do that thing all the time
  • 30:04 - 30:07
    is to just do it constantly
  • 30:07 - 30:09
    non-stop all the time
  • 30:09 - 30:11
    so like deploying in our old world
  • 30:11 - 30:15
    where it would take all night once a week
  • 30:15 - 30:18
    if we instituted a new policy
  • 30:18 - 30:19
    in that team that said
  • 30:19 - 30:23
    any change that goes to master must be deployed
    within five minutes
  • 30:23 - 30:28
    I guarantee you we would have fixed that process,
    right
  • 30:28 - 30:30
    and if you're deploying constantly
  • 30:30 - 30:31
    all day every day
  • 30:31 - 30:33
    you're never going to be afraid of deployments
  • 30:33 - 30:36
    because it's always a small change
  • 30:36 - 30:38
    so always be deploying
  • 30:38 - 30:40
    every new deploy means you're throwing away
    old servers
  • 30:40 - 30:43
    and replacing them with new ones
  • 30:43 - 30:46
    in our world I would say that the average
    uptime
  • 30:46 - 30:48
    of one of our servers is probably something
    like
  • 30:48 - 30:55
    seventeen hours and that's because we don't
    tend to work on the weekend very much
  • 30:55 - 30:57
    you also, when you have these sorts of systems
  • 30:57 - 30:59
    that are distributed like this
  • 30:59 - 31:02
    and you're trying to reduce the fear of change
  • 31:02 - 31:04
    the big thing that you're afraid of is failure
  • 31:04 - 31:06
    you're afraid that the service is going to
    fail
  • 31:06 - 31:07
    the system is going to go down
  • 31:07 - 31:10
    one component won't be reachable, that sort
    of thing
  • 31:10 - 31:12
    so you just to have assume that that's going
    to happen
  • 31:12 - 31:17
    you are not going to build a system that never
    fails, ever
  • 31:17 - 31:20
    I hope you don't, because you will have wasted
    much of your life
  • 31:20 - 31:21
    trying to get that to happen
  • 31:21 - 31:24
    instead, assume that the thing, the components
    are going to fail
  • 31:24 - 31:26
    and build resiliency in
  • 31:26 - 31:28
    I have a picture here of Joe Armstrong
  • 31:28 - 31:30
    who is one of the inventors of Erlang
  • 31:30 - 31:35
    if you have not studied Erlang philosophy
    around failure and recovery
  • 31:35 - 31:35
    you should
  • 31:35 - 31:36
    and it won't take you long
  • 31:36 - 31:39
    so I'm just going to leave that as homework
    for you
  • 31:39 - 31:42
    and then, you know, I said, the tests are
    a design pattern
  • 31:42 - 31:44
    I don't mean don't write any tests
  • 31:44 - 31:46
    but I also want to be further responsible
    here
  • 31:46 - 31:51
    and say you should monitor everything
  • 31:51 - 31:53
    you want to favor measurement over testing
  • 31:53 - 31:57
    so I use measurement as a surrogate for testing
  • 31:57 - 31:58
    or as an enhancement
  • 31:58 - 32:04
    and the reason I say this is
  • 32:04 - 32:06
    you can either focus on one of two things
  • 32:06 - 32:08
    I said assume failure right, so
  • 32:08 - 32:12
    mean time between failures or mean time to
    resolution
  • 32:12 - 32:16
    those are kind of two metrics in the ops world
  • 32:16 - 32:17
    that people talk about
  • 32:17 - 32:20
    for measuring their success and their effectiveness
  • 32:20 - 32:22
    mean time between failures means
  • 32:22 - 32:25
    you're trying to increase the time between
    failures
  • 32:25 - 32:29
    of the system, so basically you're trying
    to make failures never happen, right
  • 32:29 - 32:31
    mean time to resolution means
  • 32:31 - 32:35
    when they happen, I'm gonna focus on bringing
    them back
  • 32:35 - 32:37
    as fast as I possibly can
  • 32:37 - 32:41
    so a perfect example would be a system fails
  • 32:41 - 32:44
    and another one is already up and just takes
    over its work
  • 32:44 - 32:47
    mean time to resolution is essentially zero,
    right
  • 32:47 - 32:51
    if you're always assuming that every component
    can will fail
  • 32:51 - 32:54
    then mean time to resolution is going to be
    really good
  • 32:54 - 32:56
    because you're going to bake it into the process
  • 32:56 - 32:59
    if you do that, you don't care about when
    things fail
  • 32:59 - 33:03
    and back to this idea of favoring measurement
    over testing
  • 33:03 - 33:07
    if you're monitoring everything, everything
    with intelligence
  • 33:07 - 33:10
    then you're actually focusing on mean time
    to resolution
  • 33:10 - 33:16
    and acknowledging that the software is going
    to be broken sometimes, right
  • 33:16 - 33:18
    and when I say monitor everything, I mean
    everything
  • 33:18 - 33:22
    I don't mean, like your disk space and your
    memory and stuff there
  • 33:22 - 33:24
    I'm talking about business metrics
  • 33:24 - 33:28
    so, at living social we created this thing
    called rearview
  • 33:28 - 33:29
    which is now opensource
  • 33:29 - 33:33
    which allows you do to aberration detection
  • 33:33 - 33:38
    and aberration means strange behavior, strange
    change in behavior
  • 33:38 - 33:42
    so rearview can do aberration detection
  • 33:42 - 33:45
    on data sets, arbitrary data sets
  • 33:45 - 33:47
    which means, like in the living social world
  • 33:47 - 33:48
    we had user sign ups
  • 33:48 - 33:49
    constantly streaming in
  • 33:49 - 33:52
    it was a very high volume site
  • 33:52 - 33:54
    if user sign-ups were weird
  • 33:54 - 33:56
    we would get an alert
  • 33:56 - 33:58
    why might they be weird?
  • 33:58 - 34:01
    one thing could be like the user service is
    down, right
  • 34:01 - 34:02
    so then we would get two alerts
  • 34:02 - 34:04
    user sign ups have gone down
  • 34:04 - 34:05
    and so has the service
  • 34:05 - 34:08
    so obviously the problem is the service is
    down
  • 34:08 - 34:10
    let's bring it back up
  • 34:10 - 34:11
    but it could be something like
  • 34:11 - 34:13
    a front-end developer or a designer
  • 34:13 - 34:16
    made a change that was intentional
  • 34:16 - 34:18
    but it just didn't work and no one liked it
  • 34:18 - 34:21
    so they didn't sign up to the site anymore
  • 34:21 - 34:24
    that's more important than just knowing that
    the service is down
  • 34:24 - 34:25
    right, because what you care about
  • 34:25 - 34:27
    isn't that the service is up or down
  • 34:27 - 34:31
    if you could crash the entire system and still
    be making money
  • 34:31 - 34:32
    you don't care, right, that's better
  • 34:32 - 34:35
    throw it away and stop paying for the servers
  • 34:35 - 34:41
    but if your system is up 100% of the time
    and performs excellently
  • 34:41 - 34:43
    but no one's using it, that's bad
  • 34:43 - 34:49
    so monitoring business metrics gives you a
    lot more than unit test could ever give you
  • 34:49 - 34:51
    and then in our world
  • 34:51 - 34:52
    we focused on experiencing
  • 34:52 - 34:56
    no, you have to come up to front and say ten!
  • 34:56 - 34:59
    ok, ten minutes left
  • 34:59 - 35:02
    when I got to 6WunderKinder in Berlin
  • 35:02 - 35:04
    everyone was terrified to touch the system
  • 35:04 - 35:09
    because they hadn't created a really well-designed
  • 35:09 - 35:12
    but traditional monolithic API
  • 35:12 - 35:14
    so they had layers of abstractions
  • 35:14 - 35:15
    it was all kind of in one big thing
  • 35:15 - 35:17
    they had a huge database
  • 35:17 - 35:20
    and they were really, really scared to do
    anything
  • 35:20 - 35:22
    so there's like one person who would deploy
    anything
  • 35:22 - 35:24
    and everyone else was trying to work on other
    projects
  • 35:24 - 35:26
    and not touch it
  • 35:26 - 35:28
    but it was like the production system
  • 35:28 - 35:30
    you know so it wasn't really an option
  • 35:30 - 35:32
    so the first thing I did in my first week
  • 35:32 - 35:35
    is I got these graphs going
  • 35:35 - 35:39
    and this was, yeah, response time
  • 35:39 - 35:43
    and the first thing I did is I started turning
    off servers
  • 35:43 - 35:44
    and just watching the graphs
  • 35:44 - 35:48
    and then, as I was turning off the servers
  • 35:48 - 35:49
    I went to the production database
  • 35:49 - 35:54
    and I did select, count, star from tasks
  • 35:54 - 35:56
    and we're a task management app
  • 35:56 - 35:58
    so we have hundreds of millions of tasks
  • 35:58 - 36:01
    and the whole thing crashed
  • 36:01 - 36:04
    and all the people were like AAAAH what's
    going on
  • 36:04 - 36:06
    you know, and I said, it's no problem
  • 36:06 - 36:09
    I did this on purpose, I'll just make it come
    back
  • 36:09 - 36:10
    which I did
  • 36:10 - 36:11
    and from that point on
  • 36:11 - 36:13
    like, really every day I would do something
  • 36:13 - 36:17
    which basically crash the system for just
    a moment
  • 36:17 - 36:20
    and really, like, we had way too many servers
    in production
  • 36:20 - 36:23
    we were spending tens of thousands more Euros
    per month
  • 36:23 - 36:25
    than we should have on the infrastructure
  • 36:25 - 36:27
    and I just started taking things away
  • 36:27 - 36:29
    and I would usually do it
  • 36:29 - 36:31
    instead of the responsible way,
  • 36:31 - 36:32
    like one server at a time
  • 36:32 - 36:34
    I would just remove all of them and start
    adding them back
  • 36:34 - 36:36
    so for a moment everything was down
  • 36:36 - 36:39
    but after that we go to a point where
  • 36:39 - 36:41
    everyone on the team was absolutely comfortable
  • 36:41 - 36:43
    with the worst case scenario
  • 36:43 - 36:45
    of the system being completely down
  • 36:45 - 36:48
    so that we could, in a panic free way
  • 36:48 - 36:51
    just focus on bringing it up when it was bad
  • 36:51 - 36:53
    so now when you do a deployment
  • 36:53 - 36:55
    and you have your business metrics being measured
  • 36:55 - 36:57
    you know the important stuff is happening
  • 36:57 - 37:01
    and you know what to do when everything is
    down
  • 37:01 - 37:03
    you've experienced the worst thing that can
    happen
  • 37:03 - 37:05
    well the worst thing is like someone breaks
    in
  • 37:05 - 37:08
    and steals all your stuff, steals all your
    users' phone numbers
  • 37:08 - 37:10
    and posts them online like SnapChat or something
  • 37:10 - 37:14
    but you've experienced all these potentially
    horrible things
  • 37:14 - 37:17
    and realized, eh, it's not so bad, I can deal
    with this
  • 37:17 - 37:19
    I know what do to
  • 37:19 - 37:22
    it allows you to start making bold moves
  • 37:22 - 37:24
    and that's what we all want right
  • 37:24 - 37:29
    we all want to be able to bravely go into
    our systems
  • 37:29 - 37:30
    and do anything we think is right
  • 37:30 - 37:34
    so that's what I've been focusing on
  • 37:34 - 37:37
    we also do this thing called Canary in the
    Coal Mine deployments
  • 37:37 - 37:39
    which removes the fear, also
  • 37:39 - 37:43
    canary in the coalmine refers to a kind of
    sad thing
  • 37:43 - 37:47
    about coal miners in the US
  • 37:47 - 37:49
    where they would send canaries into the mines
  • 37:49 - 37:50
    at various levels
  • 37:50 - 37:54
    and if the canary died they knew there was
    a problem
  • 37:54 - 37:58
    with the air
  • 37:58 - 37:59
    but in the software world
  • 37:59 - 38:03
    what this means is you have bunch of servers
    running
  • 38:03 - 38:06
    or a bunch of, I don't know, clients running
    a certain version
  • 38:06 - 38:10
    and you start introducing new version incrementally
  • 38:10 - 38:12
    and watching the effects
  • 38:12 - 38:13
    so once you're measuring everything
  • 38:13 - 38:15
    and monitoring everything
  • 38:15 - 38:17
    you can also start doing these canary in the
    coalmine things
  • 38:17 - 38:19
    where you say OK I have a new version of this
    service
  • 38:19 - 38:20
    that I'm going to deploy
  • 38:20 - 38:23
    and I've got thirty servers running for it
  • 38:23 - 38:26
    but I'm going to change only five of them
    now
  • 38:26 - 38:28
    and see, like, does my error rate increase
  • 38:28 - 38:30
    or does my performance drop on those servers
  • 38:30 - 38:34
    or do people actually not successfully complete
    the task they're trying to do
  • 38:34 - 38:35
    on those servers
  • 38:35 - 38:40
    so, this also allows us the combination of
    monitoring everything
  • 38:40 - 38:42
    and these immutable deployments and everything
  • 38:42 - 38:47
    gives us the ability to gradually affect change
    and not be afraid
  • 38:47 - 38:48
    so we roll out changes all day every day
  • 38:48 - 38:54
    because we don't fear that we're just going
    to destroy the entire system all at once
  • 38:54 - 38:56
    so I think I have like five minutes left
  • 38:56 - 39:00
    uh, these are some things we're not necessarily
    doing yet
  • 39:00 - 39:02
    but they're some ideas that I have
  • 39:02 - 39:05
    that given some free time I will work on
  • 39:05 - 39:09
    and, they're probably more exciting
  • 39:09 - 39:11
    one is I talked about homeostatic regulation
  • 39:11 - 39:14
    and homeostasis
  • 39:14 - 39:17
    so I think we all understand the idea of you
    know homeostasis
  • 39:17 - 39:20
    and the fact that systems have different parts
    that do different roles
  • 39:20 - 39:22
    and can protect each other from each other
  • 39:22 - 39:28
    but, so this diagram is actually just some
    random diagram
  • 39:28 - 39:31
    I copied and pasted off the AWS website
  • 39:31 - 39:34
    so it's not necessarily all that meaningful
  • 39:34 - 39:36
    except to show that every architecture
  • 39:36 - 39:39
    especially server based architectures
  • 39:39 - 39:43
    has a collection of services that play different
    roles
  • 39:43 - 39:45
    and it almost looks like a person
  • 39:45 - 39:47
    you've got a brain and a heart and a liver
  • 39:47 - 39:51
    and all these things, right
  • 39:51 - 39:53
    what would it mean to actually implement
  • 39:53 - 39:57
    homeostatic regulation in a web service?
  • 39:57 - 40:00
    so that you have some controlling system
  • 40:00 - 40:03
    where the database will actually kill an app
    server
  • 40:03 - 40:05
    that is hurting it, for example
  • 40:05 - 40:07
    just kill it
  • 40:07 - 40:09
    I don't know yet, I don't know what that is
  • 40:09 - 40:14
    but some ideas about this stuff
  • 40:14 - 40:16
    I don't know if you've heard of these
  • 40:16 - 40:20
    NetFlix, do you have NetFlix in India yet?
  • 40:20 - 40:23
    probably not, unless you have a VPN, right
  • 40:23 - 40:27
    NetFlix has a really great cloud based architecture
  • 40:27 - 40:30
    they have this thing called Chaos Monkey they've
    created
  • 40:30 - 40:34
    which goes through their system and randomly
    destroys Nodes
  • 40:34 - 40:36
    just crashes servers
  • 40:36 - 40:40
    and they did this because, when they were,
    they were early users of AWS
  • 40:40 - 40:42
    and when they went out initially with AWS,
    servers were crashing
  • 40:42 - 40:44
    like it was still immature
  • 40:44 - 40:46
    so they said OK we still want to use this
  • 40:46 - 40:50
    and we'll build in stuff so that we can deal
    with the crashes
  • 40:50 - 40:52
    but we have to know it's gonna work when it
    crashes
  • 40:52 - 40:55
    so let's make crashing be part of production
  • 40:55 - 40:58
    so they actually have gotten really sophisticated
    now
  • 40:58 - 41:00
    and they will crash entire regions
  • 41:00 - 41:02
    cause they're in multiple data centers
  • 41:02 - 41:04
    so they'll say like, what would happen if
    this
  • 41:04 - 41:06
    data center went down, does the site still
    stay up?
  • 41:06 - 41:08
    and they do this in production all the time
  • 41:08 - 41:10
    like they're crashing servers right now
  • 41:10 - 41:11
    it's really neat
  • 41:11 - 41:14
    another one that is inspirational in this
    way
  • 41:14 - 41:19
    is Pinterest, they use AWS as well
  • 41:19 - 41:22
    and they have, AWS has this thing called Spot
    Instances
  • 41:22 - 41:24
    and I won't go into too much detail
  • 41:24 - 41:26
    because I don't have time
  • 41:26 - 41:30
    but Spot Instances allow you to effectively
  • 41:30 - 41:36
    bid on servers at a price that you are willing
    to pay
  • 41:36 - 41:40
    so like if a usual server costs $0.20 per
    minute
  • 41:40 - 41:42
    you can say, I'll give $0.15 per minute
  • 41:42 - 41:45
    and when excess capacity comes open
  • 41:45 - 41:48
    it's almost like a stock market
  • 41:48 - 41:50
    if $0.15 is the going price, you'll get a
    server
  • 41:50 - 41:52
    and it starts up and it runs what you want
  • 41:52 - 41:54
    but here's the cool thing
  • 41:54 - 42:00
    if the stock market goes and the price goes
    higher than you're willing to pay
  • 42:00 - 42:03
    Amazon will just turn off those servers
  • 42:03 - 42:05
    they're just dead, you don't have any warning
  • 42:05 - 42:07
    they're just dead
  • 42:07 - 42:11
    so Pinterest uses this for their production
    servers
  • 42:11 - 42:14
    which means they save a lot of money
  • 42:14 - 42:17
    they're paying way under the average Amazon
    cost for hosting
  • 42:17 - 42:19
    but the really cool thing in my opinion
  • 42:19 - 42:21
    is not the money they save but the fact that
  • 42:21 - 42:26
    like, what would you have to do to build a
    full system
  • 42:26 - 42:29
    where any node can and will die at any moment
  • 42:29 - 42:31
    and it's not even under your control
  • 42:31 - 42:34
    that's really exciting
  • 42:34 - 42:36
    so a simple thing you can do for homeostasis
    though
  • 42:36 - 42:38
    is you can just adjust
  • 42:38 - 42:39
    so in our world we have multiple nodes
  • 42:39 - 42:41
    and all these little services
  • 42:41 - 42:43
    we can scale each one independently
  • 42:43 - 42:45
    we're measuring everything
  • 42:45 - 42:46
    so Amazon has a thing called Auto Scaling
  • 42:46 - 42:49
    we don't use it, we do our own scaling
  • 42:49 - 42:54
    and we just do it based on volume and performance
  • 42:54 - 42:58
    now when you have a bunch of services like
    this
  • 42:58 - 43:01
    like, I don't know, maybe we have fifty different
    services now
  • 43:01 - 43:03
    that each play tiny little roles
  • 43:03 - 43:07
    it becomes difficult to figure out, like,
    where things are
  • 43:07 - 43:11
    so we've started implementing zookeeper for
    service resolution
  • 43:11 - 43:14
    which means a service can come online and
    say
  • 43:14 - 43:18
    I'm the reminder service version 2.3
  • 43:18 - 43:19
    and then tell a central guardian
  • 43:19 - 43:22
    and the zookeeper can then route traffic to
    it
  • 43:22 - 43:24
    probably too detailed for now
  • 43:24 - 43:28
    I'm gonna skip over some stuff real quick
  • 43:28 - 43:29
    but I want to talk about this one
  • 43:29 - 43:34
    if, did the Nordic Ruby, no, Nordic Ruby talks
    never go online
  • 43:34 - 43:35
    so you can never see this talk
  • 43:35 - 43:37
    sorry
  • 43:37 - 43:41
    at Nordic Ruby Reginald Braithwaite did a
    really cool talk
  • 43:41 - 43:44
    on like challenges of the Ruby language
  • 43:44 - 43:45
    and he made this statement
  • 43:45 - 43:49
    Ruby has beautiful but static coupling
  • 43:49 - 43:51
    which was really strange
  • 43:51 - 43:53
    but basically he was making the same point
    that
  • 43:53 - 43:54
    I was talking about earlier
  • 43:54 - 43:59
    that, like Ruby creates a bunch of ways that
    you can couple
  • 43:59 - 44:01
    your system together
  • 44:01 - 44:03
    that kind of screw you in the end
  • 44:03 - 44:04
    but they're really beautiful to use
  • 44:04 - 44:10
    but, like, Ruby can really lead to some deep
    crazy coupling
  • 44:10 - 44:14
    and so he presented this idea of bind by contract
  • 44:14 - 44:18
    and bind by contract, in a Ruby sense
  • 44:18 - 44:23
    would be, like, I have a class that has a
    method
  • 44:23 - 44:26
    that takes these parameters under these conditions
  • 44:26 - 44:29
    and I can kind of put it into my VM
  • 44:29 - 44:32
    and whenever someone needs to have a functionality
    like that
  • 44:32 - 44:35
    it will be automatically bound together
  • 44:35 - 44:37
    by the fact that it can do that thing
  • 44:37 - 44:41
    and instead of how we tend to use Ruby and
    Java and other languages
  • 44:41 - 44:43
    I have a class with a method name I'm going
    to call it
  • 44:43 - 44:45
    right, that's coupling
  • 44:45 - 44:48
    but he proposed this idea of this decoupled
    system
  • 44:48 - 44:51
    where you just say I need a functionality
    like this
  • 44:51 - 44:53
    that works under the conditions that I have
    present
  • 44:53 - 44:55
    so this lead me to this idea
  • 44:55 - 44:59
    and this may be like way too weird, I don't
    know
  • 44:59 - 45:03
    what if in your web application your routes
    file
  • 45:03 - 45:08
    for your services read like a functional pattern
    matching syntax
  • 45:08 - 45:11
    so like if you've ever used Erlang or Haskell
    or Scala
  • 45:11 - 45:15
    any of these things that have functional pattern
    matching
  • 45:15 - 45:19
    what if you could then route to different
    services
  • 45:19 - 45:21
    across a bunch of different services
  • 45:21 - 45:23
    based on contract
  • 45:23 - 45:27
    now I have zero time left
  • 45:27 - 45:29
    but I'm just gonna keep talking, cause I'm
    mean
  • 45:29 - 45:30
    oh wait I'm not allowed to be mean
  • 45:30 - 45:32
    because of the code of contact
  • 45:32 - 45:35
    so I'll wrap up
  • 45:35 - 45:39
    so this is an idea that I've started working
    on as well
  • 45:39 - 45:41
    where I would actually write an Erlang service
  • 45:41 - 45:43
    with this sort of functional pattern matching
  • 45:43 - 45:46
    but have it be routing in really fast real
    time
  • 45:46 - 45:49
    through back end services that support it
  • 45:49 - 45:51
    one more thing I just want to show you real
    quick
  • 45:51 - 45:54
    that I am working on and I want to show you
  • 45:54 - 45:58
    because I want you to help me
  • 45:58 - 46:01
    has anyone used JSON schema?
  • 46:01 - 46:06
    OK, you people are my friends for the rest
    of the conference
  • 46:06 - 46:08
    in a system where you have all these things
    talking to each other
  • 46:08 - 46:11
    you do need a way to validate the inputs and
    outputs
  • 46:11 - 46:16
    but I don't want to generate code that parses
    and creates JSON
  • 46:16 - 46:21
    I don't want to do something in real time
    that intercepts my
  • 46:21 - 46:24
    kind of traffic, so there's this thing called
    JSON schema
  • 46:24 - 46:27
    that allows you to, in a completely decoupled
    way
  • 46:27 - 46:31
    specify JSON documents and how they should
    interact
  • 46:31 - 46:36
    and I am working on a new thing that's called
    Klagen
  • 46:36 - 46:38
    which is the German word for complain
  • 46:38 - 46:42
    it's written in Scala, so if anyone wants
    to pair up on some Scala stuff
  • 46:42 - 46:48
    what it will be is a high performance asynchronous
    JSON schema validation middleware
  • 46:48 - 46:53
    so if that's interesting to anyone, even if
    you don't know Scala or JSON schema
  • 46:53 - 46:54
    please let me know
  • 46:54 - 46:57
    and I believe I'm out of time so I'm just
    gonna end there
  • 46:57 - 46:59
    am I right? I'm right, yes
  • 46:59 - 47:02
    so thank you very much, and let's talk during
    the conference
Title:
Garden City Ruby 2014 - Keynote by Chad Fowler
Description:

more » « less
Duration:
47:37

English subtitles

Revisions