Garden City Ruby 2014 - Keynote by Chad Fowler

Edit subtitles

0:25 - 0:25
0:25 - 0:27

Chad: Yes, hello, thank you.
0:27 - 0:29

Audience member: Hello!
0:29 - 0:31

Chad: Hello!
0:31 - 0:34

I am Chad, as he said.
0:34 - 0:35

He said I need no introduction
0:35 - 0:38

so I won't introduce myself any further.
0:38 - 0:45

I may be the biggest non-Indian fan of India
1:02 - 1:06

[Hindi speech]
1:06 - 1:13
1:15 - 1:18

I'll now switch back, sorry.
1:18 - 1:20

If you don't understand Hindi, I said nothing
of value
1:20 - 1:22

and it was all wrong.
1:22 - 1:24

But I was saying that my Hindi is bad
1:24 - 1:26

and it's because now I'm learning German
1:26 - 1:28

so I mixed them together, but I know not everyone
1:28 - 1:29

speaks Hindi here.
1:29 - 1:32

I just had to show off, you know
1:32 - 1:37

So, I am currently working on 6WunderKinder,
1:37 - 1:40

and I'm working on a product called Wunderlist.
1:40 - 1:42

It is a productivity application.
1:42 - 1:46

It runs on every client you can think of.
1:46 - 1:48

We have native clients, we have a back-end,
1:48 - 1:50

we have millions of active users,
1:50 - 1:52

and I'm telling you this not so that you'll
go download it -
1:52 - 1:53

you can do that too -
1:53 - 1:57

but I want to tell you about the challenges
that I have
1:57 - 2:01

and the way I'm starting to think about system's
architecture and design.
2:01 - 2:03

That's what I'm gonna talk about today
2:03 - 2:06

I'm going to show you some things that are
real
2:06 - 2:07

and that we're really doing.
2:07 - 2:09

I'm going to show you some things that are
2:09 - 2:13

just a fantasy that maybe don't make any sense
at all.
2:13 - 2:14

But hopefully I'll get you think about
2:14 - 2:16

how we think about system architecture
2:16 - 2:18

and how we build things that can last for
a long time.
2:18 - 2:21

So the first thing that I want to mention:
2:21 - 2:23

this is a graph from the Standish Chaos report
2:23 - 2:25

and I've taken the years out
2:25 - 2:27

and I've taken some of the raw data out
2:27 - 2:29

because it doesn't matter.
2:29 - 2:31

If you look at these, this graph,
2:31 - 2:33

each one of these bars is a year,
2:33 - 2:38

and each bar represents successful projects
in green -
2:38 - 2:40

software projects.
2:40 - 2:42

Challenged projects are in silver or white
in the middle
2:42 - 2:44

and then failed ones are in red.
2:44 - 2:47

But challenged means significantly over time
or budget
2:47 - 2:49

which to me means failed too.
2:49 - 2:51

So basically we're terrible,
2:51 - 2:54

all of us here, we're terrible.
2:54 - 2:57

We call ourselves engineers but it's a disgrace.
2:57 - 3:01

We very rarely actually launch things that
work.
3:01 - 3:01

Kind of sad,
3:01 - 3:04

and I am here to bring you down.
3:04 - 3:07

Then once you launch software, anecdotal-y,
3:07 - 3:12

and you probably would see this in your own
work lives, too,
3:12 - 3:16

anecdotal-y, software gets killed after about
five years -
3:16 - 3:18

business software.
3:18 - 3:20

So you barely ever get to launch it, because,
3:20 - 3:23

or at least successfully, in a way that you're
proud of,
3:23 - 3:25

and then in about five years
3:25 - 3:28

you end up in that situation where you're
doing a big rewrite
3:28 - 3:30

and throwing everything away and replacing
it.
3:30 - 3:33

You know there's always that project to get
rid of the junk,
3:33 - 3:36

old Java code or whatever that you wrote five
years ago,
3:36 - 3:37

replace it with Ruby now,
3:37 - 3:40

five years from now you'll be replacing your
old junk Ruby code
3:40 - 3:46

that didn't work with something else.
3:46 - 3:49

We create this thing, probably all of you
know the term legacy software -
3:49 - 3:53

Right, am I right? You know what legacy software
is,
3:53 - 3:56

and you probably think of it as a negative
thing.
3:56 - 3:58

You think of it as that ugly code that doesn't
work,
3:58 - 4:03

that's brittle, that you can't change, that
you're all afraid of.
4:03 - 4:07

But there's actually also a positive connotation
of the word legacy:
4:07 - 4:14

it's leaving behind something that future
generations can benefit from.
4:14 - 4:17

But if we're rarely ever launching successful
projects
4:17 - 4:21

and then the ones we do launch tend to die
within five years
4:21 - 4:25

none of us are actually creating a legacy
in our work.
4:25 - 4:27

We're just creating stuff that gets thrown
away.
4:27 - 4:29

Kind of sad.
4:29 - 4:32

So we create this stuff that's a legacy software.
4:32 - 4:35

It's hard to change, that's why it ends up
getting thrown away
4:35 - 4:37

right, that's, if the software worked
4:37 - 4:40

and you could keep changing it to meet the
needs of the business
4:40 - 4:44

you wouldn't need to do a big rewrite and
throw it away.
4:44 - 4:48

We create these huge tightly-coupled systems,
4:48 - 4:49

and I don't just mean one application,
4:49 - 4:51

but like many applications are all tightly
coupled.
4:51 - 4:56

You've got this thing over here talking to
the database of this system over here
4:56 - 4:59

so if you change the columns to update the
view of a webpage
4:59 - 5:03

you ruin your billing system, that kind of
thing
5:03 - 5:06

this is what makes it so hard to change
5:06 - 5:10

and the sad thing about this is the way we
work
5:10 - 5:14

the way we develop software, this is the default
setting
5:14 - 5:18

and, what I mean is, if we were robots churning
out software
5:18 - 5:21

and we had a preferences panel
5:21 - 5:25

the default preferences would lead to us creating
terrible software that gets thrown away in
5:25 - 5:26

five years
5:26 - 5:27

that's just how we all work
5:27 - 5:30

as human beings when we sit down to write
code
5:30 - 5:35

our default instincts lead to us to create
systems that are tightly coupled
5:35 - 5:42

and hard to change and ultimately get thrown
away and can't scale
5:42 - 5:46

we create, we try doing tests, we try doing
TDD
5:46 - 5:51

but we create test suites that take forty-five
minutes to run
5:51 - 5:53

every team has had to deal with this I'm sure
5:53 - 5:56

if you've written any kind of meaningful application
5:56 - 5:58

and it gets to where you have like a project
5:58 - 6:00

to speed up the test suite
6:00 - 6:03

like you start focusing your company's resources
6:03 - 6:05

on making the test suite faster
6:05 - 6:09

or making it like only fail ninety percent
of the time
6:09 - 6:11

and then you say well if it only fails ninety
percent that's OK
6:11 - 6:15

right, and right now it's taking forty-five
minutes
6:15 - 6:18

we want to get it to where it only takes ten
minutes to run
6:18 - 6:24

so the test suite ends up being a liability
instead of a benefit
6:24 - 6:26

because of the way you do it
6:26 - 6:29

because you have this architect where everything
is so coupled
6:29 - 6:35

you can't change anything without spending
hours working on the stupid test suite
6:35 - 6:38

and your terrified to deploy
6:38 - 6:43

I know like the last big Java project I was
working on
6:43 - 6:46

it would take, once a week we did a deploy
6:46 - 6:50

it would take fifteen people all night to
deploy the thing
6:50 - 6:52

and usually it was like copying class files
around
6:52 - 6:54

and restarting servers
6:54 - 6:57

it's much better today but it's still terrifying
6:57 - 6:59

you deploy code, you change it in production
6:59 - 7:01

you're not sure what might break
7:01 - 7:04

cause it's really hard to test these big integrated
things together
7:04 - 7:09

and actually upgrading the technology component
is terrifying
7:09 - 7:13

so, how many of you have been doing Rails
for more than three years?
7:13 - 7:18

do you have, like a Rails 2 app in production,
anyone? Yeah?
7:18 - 7:22

that's a lot of people, wow, that's terrifying
7:22 - 7:26

and I've been in situations, recently, where
we had Rails 2 apps in production
7:26 - 7:30

security patches are coming out, we were applying
our own versions
7:30 - 7:31

of those security patches
7:31 - 7:32

because we were afraid to upgrade Rails
7:32 - 7:35

we would rather hack it than upgrade the thing
7:35 - 7:38

because you just don't know what's gonna happen
7:38 - 7:42

and then you end up, as you're re-implementing
all this stuff yourself
7:42 - 7:45

you end up burning yourself out, wasting your
time
7:45 - 7:48

because you're hacking on stupid Rails 2
7:48 - 7:50

or some old struts version
7:50 - 7:53

when you should be just taking advantage of
the new patches
7:53 - 7:55

but you can't because you're afraid to upgrade
the software
7:55 - 7:56

because you don't know what's going to happen
7:56 - 8:03

because the system is too big and too scary
8:03 - 8:05

then, and this is really bad, I think this
is something
8:05 - 8:07

Ruby messes up for all of us
8:07 - 8:11

I say this as someone who's been using Ruby
for thirteen years now
8:11 - 8:13

happily
8:13 - 8:16

we create these mountains of abstractions
8:16 - 8:18

and the logic ends up being buried inside
them
8:18 - 8:23

I mean in Java it was like static, or, you
know, factories
8:23 - 8:25

and design pattern soup
8:25 - 8:27

in Ruby its modules and mixins and you know
8:27 - 8:31

we have all these crazy ways of hiding what's
actually happening from us
8:31 - 8:33

but when you go look at the code
8:33 - 8:34

it's completely opaque
8:34 - 8:37

you have no idea where the stuff actually
gets done
8:37 - 8:41

because it's in some magic library somewhere
8:41 - 8:45

and we do all that because we're trying to
save ourselves from the complexity of these
8:45 - 8:47

big nasty systems
8:47 - 8:51

but like if you look at the rest of the world
8:51 - 8:54

this is a software specific problem
8:54 - 8:59

these cars are old, they're older than any
software that you would ever run
8:59 - 9:00

and they're still driving down the street
9:00 - 9:03

they're older than software itself, right
9:03 - 9:06

but these things still function, they still
work
9:06 - 9:09

how? why? why do they work?
9:09 - 9:11

bodies! my body should not work
9:11 - 9:13

I have abused it
9:13 - 9:14

I should not be standing here today
9:14 - 9:17

I shouldn't have been able to come from Berlin
here
9:17 - 9:19

without dying somehow by being in the air
9:19 - 9:24

you know, by the air pressure changes
9:24 - 9:26

but our bodies somehow can survive even when
9:26 - 9:31

we don't take care of them
9:31 - 9:35

and like it's just the system that works,
right
9:35 - 9:38

so how do our bodies work?
9:38 - 9:39

how do we stay alive
9:39 - 9:41

despite this fact
9:41 - 9:42

even though we haven't done like some
9:42 - 9:45

great design, we don't have any design patterns
9:45 - 9:50

like mixed up into our bodies
9:50 - 9:54

in biology there is a term called homeostasis
9:54 - 9:56

and I literally don't know what this means
9:56 - 9:57

other than this definition
9:57 - 9:59

so you won't learn about this from me
9:59 - 10:01

there's probably at least one biologist in
the room
10:01 - 10:04

so you can correct me later
10:04 - 10:08

but basically the idea of homeostasis is
10:08 - 10:11

that an organism has all these different components
10:11 - 10:14

that serve different purposes
10:14 - 10:16

that regulate it
10:16 - 10:18

so they're all kind of in balance
10:18 - 10:21

and they work together to regulate the system
10:21 - 10:24

if one component, like a liver, does too much
10:24 - 10:25

or does the wrong thing
10:25 - 10:28

another component kicks in and fixes it
10:28 - 10:30

and so our bodies are this well designed system
10:30 - 10:32

for staying alive
10:32 - 10:35

because we have almost like autonomous agents
10:35 - 10:39

internally that take care of the many things
that can and do go wrong
10:39 - 10:42

on a regular basis
10:42 - 10:44

so you have, you know, your brain, your liver
10:44 - 10:47

your liver, of course, metabolizes toxic substances
10:47 - 10:50

your kidney deals with blood, water level,
et cetera
10:50 - 10:56

you know all these things work in concert
to make you live
10:56 - 11:01

the inability to continue to do that is known
as homeostatic imbalance
11:01 - 11:04

so I was saying, homeostasis is balancing
11:04 - 11:07

not being able to do that is when you're out
of balance
11:07 - 11:10

and that will actually lead to really bad
health problems
11:10 - 11:16

or probably death, if you fall into homeostatic
imbalance
11:16 - 11:20

so the good news is you're already dying
11:20 - 11:22

like we're all dying all the time
11:22 - 11:26

this is the beautiful thing about death
11:26 - 11:29

there is, there is an estimate that fifty
trillion cells
11:29 - 11:32

are in your body, and three million die per
second
11:32 - 11:36

it's an estimate because it's actually impossible
to count
11:36 - 11:40

but scientists have figured out somehow that
this is probably the right number
11:40 - 11:42

so your cells, you've probably heard this
all your life
11:42 - 11:45

like physically, after some amount of time,
11:45 - 11:47

you aren't the same human being that you were,
physically
11:47 - 11:53

you know, I don't know, you some period of
time ago
11:53 - 11:56

you're literally not the same organism anymore
11:56 - 11:58

but you're the same system
11:58 - 12:01

kind of interesting, isn't it
12:01 - 12:07

so in a way you can think about software this
12:07 - 12:08

you can think about software as a system
12:08 - 12:11

if the components could be replaced like these
cells
12:11 - 12:18

like, if you focus on making death, constant
death OK
12:19 - 12:20

on a small level
12:20 - 12:25

then the system can live on a large level
12:25 - 12:26

that's what this talk is about
12:26 - 12:29

solution, the solution being to mimic living
organisms
12:29 - 12:36

and as an aside, I will say many times the
word small or tiny in this talk
12:36 - 12:38

because I think I'm learning, as I age
12:38 - 12:40

that small is good
12:40 - 12:43

its, small projects are good
12:43 - 12:44

you know how to estimate them
12:44 - 12:45

small commitments are good
12:45 - 12:47

because you know you can make them
12:47 - 12:48

small methods are good
12:48 - 12:49

small classes are good
12:49 - 12:50

small applications are good
12:50 - 12:52

small teams are good
12:52 - 12:55

so I don't know, this is sort of a non sequitur
12:55 - 12:58

so if we're going to think about software
12:58 - 13:00

as like an organism
13:00 - 13:03

what is a cell in that context?
13:03 - 13:06

this is sort of the key question that you
have to ask yourself
13:06 - 13:09

and I say that a cell is a tiny component
13:09 - 13:13

now, tiny and component are both subjective
words
13:13 - 13:15

so you can kind of do what you want with that
13:15 - 13:18

but it's a good frame of thinking
13:18 - 13:21

if you make your software system of tiny components
13:21 - 13:23

each one can be like a cell
13:23 - 13:28

each one can die and the system is a collection
of those tiny components
13:28 - 13:32

and what you want is not for your code to
live forever
13:32 - 13:36

you don't care that each line of code lives
forever, right
13:36 - 13:39

like if you're trying to develop a legacy
in software
13:39 - 13:43

it's not important to you that your system
dot out dot printline statement
13:43 - 13:44

lives for ten years
13:44 - 13:48

it's important to you that the function of
the system lives for ten years
13:48 - 13:50

so like, about exactly ten years ago
13:50 - 13:57

we created Ruby gems at the RubyConf 2003
in Austin, Texas
13:59 - 14:04

I haven't touched Ruby gems myself in like
four or five years
14:04 - 14:05

but people are still using it
14:05 - 14:06

they hate it because it's software
14:06 - 14:08

everybody hates software right
14:08 - 14:10

so if you can create software that people
hate
14:10 - 14:13

you've succeeded
14:13 - 14:14

but it still exists
14:14 - 14:17

I have no idea if any of the code is the same
14:17 - 14:17

I would assume not
14:17 - 14:21

you know I think, I'm sure that my name is
still in it in a copyright notice
14:21 - 14:24

but that's about it
14:24 - 14:25

and that's a beautiful thing
14:25 - 14:28

people are still using it to install Ruby
libraries
14:28 - 14:30

and software
14:30 - 14:36

and I don't care if any of my existing, or
my initial code is still in the system
14:36 - 14:37

because the system still lives
14:37 - 14:43

so, quite a long time ago now I was researching
this kind of question
14:43 - 14:45

about Legacy software
14:45 - 14:48

and I asked a question on Twitter as I often
do at conferences
14:48 - 14:50

when I'm preparing
14:50 - 14:56

what are some of the old surviving software
systems you regularly use
14:56 - 14:58

and if you look at this, I mean, one thing
is obviously
14:58 - 15:03

everyone who answered gave some sort of Unix
related answer
15:03 - 15:07

but basically all of these things on this
list
15:07 - 15:13

are either systems that are collections of
really well-known split-up components
15:13 - 15:16

or they're tiny, tiny programs
15:16 - 15:19

so, like, grep is a tiny program, make
15:19 - 15:20

it only does one thing
15:20 - 15:24

well make is actually also arguably an operating
system
15:24 - 15:27

but I won't get into that
15:27 - 15:29

emacs is obviously an operating system, right
15:29 - 15:33

but it's well designed of these tiny little
pieces
15:33 - 15:37

so a lot of the old systems I know about follow
this pattern
15:37 - 15:40

this metaphor that I'm proposing
15:40 - 15:42

and from my own career
15:42 - 15:44

when I was here before in Banglore
15:44 - 15:47

I worked for GE and some of the people
15:47 - 15:49

we hired even worked on the system there
15:49 - 15:51

we had a system called the Bull
15:51 - 15:54

and it was a Honeywell Bull mainframe
15:54 - 15:57

I doubt any of you have worked on that
15:57 - 15:58

but this one I know you didn't work on
15:58 - 16:01

because it had a custom operating system
16:01 - 16:03

with our own RDVMS
16:03 - 16:06

we had created a PCP stack for it
16:06 - 16:11

using like custom hardware that we plugged
into a Windows MT computer
16:11 - 16:15

with some sort of MT queuing system back in
the day
16:15 - 16:17

it was this terrifying thing
16:17 - 16:23

when I started working there the system was
already something like twenty-five years old
16:23 - 16:26

and I believe even though there have been
many, many projects
16:26 - 16:30

to try to kill it, like we had a team called
the Bull exit team
16:30 - 16:33

I believe the system is still in production
16:33 - 16:37

not as much as it used to be, there are less
and less functions in production
16:37 - 16:39

but I believe the system is still in production
16:39 - 16:46

the reason for this is that the system was
actually made up of these tiny little components
16:47 - 16:51

and like really queer interfaces between them
16:51 - 16:54

and we kept the system live because every
time we tried to replace it
16:54 - 16:57

with some fancy new gem, web thing or gooey
app
16:57 - 16:59

it wasn't as good, and the users hated it
16:59 - 17:01

it just didn't work
17:01 - 17:05

so we had to use this old, crazy, modified
mainframe
17:05 - 17:08

for a long time as a result
17:08 - 17:11

so, the question I ask myself is now
17:11 - 17:13

how do I, how do I approach a problem like
this
17:13 - 17:19

and build a system that can survive for a
long time
17:19 - 17:20

I would encourage you
17:20 - 17:23

how many of you know of Fred George
17:23 - 17:25

this is Fred George
17:25 - 17:26

he was at ThoughtWorks for awhile
17:26 - 17:28

so he may have, I think he lived in Banglore
17:28 - 17:31

for some time with ThoughtWorks, in fact
17:31 - 17:35

he is now running a start-up in Silicon Valley
17:35 - 17:39

but he has this talk that you can watch online
17:39 - 17:42

from the Barcelona Ruby Conference the year
before last
17:42 - 17:45

called Microservice Architectures
17:45 - 17:48

and he talks in great detail about he,
17:48 - 17:50

how he implemented a concept at forward
17:50 - 17:52

that's very much like what I'm talking about
17:52 - 17:55

tiny components that only do one thing and
can be thrown away
17:55 - 18:00

so Microservice Architecture is kind of the
core of what I'm gonna talk about
18:00 - 18:02

now I've put together some rules for 6WunderKinder
18:02 - 18:04

which I am going to share with you
18:04 - 18:07

6WunderKinder is the company I work for
18:07 - 18:09

when we're working on Wunderlist
18:09 - 18:12

and the rules of the, the goals of these rules
18:12 - 18:17

are to reduce coupling, to make it where we
can do fear-free deployments
18:17 - 18:19

we reduce the chance of "cruft" in our code
18:19 - 18:21

like nasty stuff that you're afraid of
18:21 - 18:25

that you leave there, kind of broken window
problems
18:25 - 18:29

we make it literally trivial to change code
18:29 - 18:33

so you just never have to ask how do I do
that
18:33 - 18:34

you just find it easy
18:34 - 18:39

and most importantly we give ourselves the
freedom to go fast
18:39 - 18:44

because I think no developer ever wants to
be slow
18:44 - 18:45

that's one of the worst things
18:45 - 18:48

just toiling away and not actually accomplishing
anything
18:48 - 18:51

but we go slow because we're constrained by
the system
18:51 - 18:54

and we're constrained by, sometimes projects
18:54 - 18:56

and other, you know, management related things
18:56 - 19:01

but often times its the mess of the system
that we've created
19:01 - 19:04

so some of the rules
19:04 - 19:09

I think one thing, and maybe, maybe I'm going
to get some push back from this crowd
19:09 - 19:13

one rule that is less controversial than it
used to be
19:13 - 19:15

is that comments are a design smell
19:15 - 19:19

does anyone strongly disagree with that?
19:19 - 19:21

no?
19:21 - 19:24

does anyone strongly agree with that?
19:24 - 19:27

OK, so the rest of you have no idea what I'm
talking about
19:27 - 19:33

so a design smell, I want to define this really
quickly
19:33 - 19:37

a design smell is something you see in your
code or your system
19:37 - 19:40

where it doesn't necessarily mean it's bad
19:40 - 19:41

but you look at it and you think
19:41 - 19:43

hmm, I should look into this a little bit
19:43 - 19:46

and ask myself, why are there so many comments
in this code?
19:46 - 19:48

you know, especially the bottom one
19:48 - 19:51

inline comments?
19:51 - 19:57

definitely bad, definitely a sign that you
should have another method, right
19:57 - 19:59

so it's pretty easy to convince people
19:59 - 20:00

that comments are a design smell
20:00 - 20:02

and I think a lot of people in the industry
20:02 - 20:03

are starting to agree
20:03 - 20:05

maybe not for like a public library
20:05 - 20:07

where you really need to tell someone
20:07 - 20:10

here's how you use this class and this is
what it's for
20:10 - 20:12

but you shouldn't have to document every method
20:12 - 20:15

and every argument because the method name
and the argument name
20:15 - 20:18

should speak for themselves, right
20:18 - 20:21

so here's one that you probably won't agree
with
20:21 - 20:22

tests are a design smell
20:22 - 20:29

so this one is probably a little more controversial
20:29 - 20:33

especially in an environment where you're
maybe still struggling people
20:33 - 20:38

struggling with people to actually get them
to write tests to begin with, right
20:38 - 20:41

you know I went through this period in, like,
2000 and 2001
20:41 - 20:44

where I was really heavily into evangelizing
TDD
20:44 - 20:47

and it was really stressful that you couldn't
get anyone to do it
20:47 - 20:50

I think you do have to go through that period
20:50 - 20:52

and I'm not saying you shouldn't write any
tests
20:52 - 20:57

but that picture I showed you earlier of the
slow, brittle test suite
20:57 - 20:58

that's bad, right
20:58 - 21:01

that is a bad state to be in
21:01 - 21:04

and you're in that state because your tests
suck
21:04 - 21:06

that's why you get in that state
21:06 - 21:10

your tests suck because you're writing bad
tests
21:10 - 21:16

that don't exercise the right things in your
system
21:16 - 21:19

and what I've found is whenever I look into
one of these
21:19 - 21:22

big slow brittle test suites
21:22 - 21:25

the tests themselves are indications
21:25 - 21:28

and the sheer proliferation of tests
21:28 - 21:31

are indications that the system is bad
21:31 - 21:34

and the developers are like desperately
21:34 - 21:37

fearfully trying to run the code
21:37 - 21:39

in every way they can
21:39 - 21:41

because it's the only way they can manage
21:41 - 21:44

to even think about the complexity
21:44 - 21:48

but if you think about it, if you had a tiny
trivial system
21:48 - 21:50

you wouldn't need to have hundreds of test
files
21:50 - 21:53

that take ten minutes to run, ever
21:53 - 21:54

if you did, you're doing something stupid
21:54 - 21:57

you're wasting your time working on tests
21:57 - 22:00

and we as software developers obsess about
this kind of thing
22:00 - 22:05

because we have to fight so hard to get our
peers to do it in the first place
22:05 - 22:06

and to understand it
22:06 - 22:10

we obsess to the point where we focus on the
wrong thing
22:10 - 22:15

none of us are in the business of writing
tests for customers
22:15 - 22:18

like we're not launching our tests on the
web
22:18 - 22:20

and hoping people will buy them, right
22:20 - 22:24

it doesn't provide value, it's just a side-effect
22:24 - 22:26

that we have focused too heavily on
22:26 - 22:30

and we've lost sight of what the actual goal
is
22:30 - 22:34

so, this one actually requires a visual
22:34 - 22:37

I tell the people on my team now
22:37 - 22:40

you can write code in any language you want
22:40 - 22:43

any framework you want, anything you want
to do
22:43 - 22:45

as long as the code is this big
22:45 - 22:47

so if you want to write the new service in
Haskell
22:47 - 22:50

and it's this big in a normal size font
22:50 - 22:51

you can do it
22:51 - 22:54

if you want to do it in Closure or Elixir
or Scarla or Ruby
22:54 - 22:55

or whatever you want to do
22:55 - 22:57

even Python for god's sake
22:57 - 22:59

you can do it if it's this big and no bigger
22:59 - 23:04

why? because it means I can look at it
23:04 - 23:06

and I can understand it
23:06 - 23:09

or if I don't I'll just throw it away
23:09 - 23:12

because if it's this big it doesn't do very
much, right
23:12 - 23:14

so the risk is really low
23:14 - 23:17

and I really mean the system is that
23:17 - 23:19

there are the, the component is that big
23:19 - 23:21

and in my world a component means a service
23:21 - 23:25

that's running and probably listening on an
HTTP board
23:25 - 23:28

or some sort of rift or RPC protocol
23:28 - 23:30

so it's a standalone thing
23:30 - 23:31

it's its own application
23:31 - 23:33

it's probably in its own git repository
23:33 - 23:35

people do poll requests against it
23:35 - 23:36

but it's just tiny
23:36 - 23:39

so this big
23:39 - 23:41

at the top of this, by the way
23:41 - 23:46

is some code by Konstantin Haase
23:46 - 23:49

who also lives in Berlin, where I live
23:49 - 23:51

this is a rewrite of Sinatra
23:51 - 23:52

the web framework
23:52 - 23:55

and Konstantin is actually the maintainer
of Sinatra
23:55 - 23:59

it's not fully compatible, but it's amazingly
close
23:59 - 24:00

and it all fits right in that
24:00 - 24:05

but the font size is kind of small, so I cheated
24:05 - 24:09

another rule, our systems are heterogeneous
by default
24:09 - 24:11

so I say you can write in any language you
want
24:11 - 24:14

that's not just because I want the developers
to be excited
24:14 - 24:17

although I think, most of you, if you worked
24:17 - 24:19

in an environment where your boss told you
24:19 - 24:22

you can use any programming language or tool
you want
24:22 - 24:24

you would be pretty happy about that, right
24:24 - 24:27

anyone unhappy about that? I don't think so
24:27 - 24:28

unless it's one of the bosses here
24:28 - 24:32

that's like don't tell people that
24:32 - 24:33

so that's one thing
24:33 - 24:37

the other one is, it leads to a good system
design
24:37 - 24:39

because think about this
24:39 - 24:42

if I write one program in Erlang, one component
in Erlang
24:42 - 24:44

one program in Ruby
24:44 - 24:48

I have to work really, really hard to make
tight coupling
24:48 - 24:50

between those things
24:50 - 24:53

like I have to basically use computer science
to do that
24:53 - 24:54

I don't even know what I would do
24:54 - 24:56

you know it's hard
24:56 - 24:59

like I would have to maybe implement Ruby
in Erlang
24:59 - 25:01

so that it can run in the same BM or vice
versa
25:01 - 25:04

it's just silly, I wouldn't do it
25:04 - 25:07

so if my system is heterogeneous by default
25:07 - 25:12

my coupling is very low, at least at a certain
level by default
25:12 - 25:14

because it's the path of least resistance
25:14 - 25:17

is to make the system decoupled
25:17 - 25:19

it's easier to make things decoupled than
coupled
25:19 - 25:22

if they're all running in different languages
25:22 - 25:25

so in the past three months, I'll say
25:25 - 25:30

I have written production code in objective
CRuby, Scala, Closure, Node
25:30 - 25:34

I don't know, more stuff, Java
25:34 - 25:36

all these different languages
25:36 - 25:39

real code for work
25:39 - 25:41

and yes, they are not tightly coupled
25:41 - 25:45

like I haven't installed JRuby so that I could
reach into the internals of my Scala code
25:45 - 25:46

because that would be a pain
25:46 - 25:51

I don't want to do that
25:51 - 25:53

another very important one is
25:53 - 25:56

server nodes are disposable
25:56 - 25:59

so, back when I was at GE, for example
25:59 - 26:03

I remember being really proud when I looked
at the up time of one of my servers
26:03 - 26:05

and it was like four hundred days or something
26:05 - 26:07

it's like, wow, this is awesome
26:07 - 26:10

I have this big server, it had all these apps
on it
26:10 - 26:13

we kept it running for four hundred days
26:13 - 26:15

the problem with that is I was afraid to ever
touch it
26:15 - 26:18

I was really happy it was alive
26:18 - 26:19

but I didn't want to do anything to it
26:19 - 26:21

I was afraid to update the operating system
26:21 - 26:24

in fact you could not upgrade Solaris then
without restarting it
26:24 - 26:28

so that meant I had not upgrading the operating
system
26:28 - 26:32

I probably shouldn't have been too proud about
it
26:32 - 26:35

Nodes that are alive for a long time lead
to fear
26:35 - 26:37

and what I want is less fear
26:37 - 26:39

so I throw them away
26:39 - 26:43

and this means I don't have physical servers
that I throw away
26:43 - 26:46

that would be fun but I'm not that rich yet
26:46 - 26:49

we use AWS right now, you could do it with
any kind of cloud service
26:49 - 26:53

or even internal cloud divider
26:53 - 26:54

but every node is disposable
26:54 - 27:01

so, we never upgrade software on an existing
server
27:01 - 27:03

whenever you want to deploy a new version
of a service
27:03 - 27:04

you create new servers
27:04 - 27:05

and you deploy that version
27:05 - 27:09

and then you replace them in the load balance
or somewhere
27:09 - 27:10

that's it
27:10 - 27:13

so, you never have to wonder what's on a server
27:13 - 27:16

because it was deployed through an automated
process
27:16 - 27:17

and there's no fear there
27:17 - 27:18

you know exactly what it is
27:18 - 27:19

you know exactly how to recreate it
27:19 - 27:22

because you have a golden master image
27:22 - 27:24

and in our case it's actually an Amazon image
27:24 - 27:26

that you can just boot more of
27:26 - 27:27

scaling is a problem
27:27 - 27:29

you just boot ten more servers
27:29 - 27:33

boom, done, no problem
27:33 - 27:35

so yeah I tell the team, you know, pick your
technology
27:35 - 27:38

everything must be automated, that's another
piece
27:38 - 27:43

if you're going to deploy a closure service
for the first time
27:43 - 27:47

you have to be responsible for figuring out
how it fits into our deployment system
27:47 - 27:50

so that you have immutable deployments and
disposable nodes
27:50 - 27:54

if you can do that and you're willing to also
maintain it and teach someone else
27:54 - 27:56

about the little piece of code that you wrote,
then cool
27:56 - 27:59

you can do it, any level you want
27:59 - 28:03

and then once you deploy stuff
28:03 - 28:05

like a lot of us like to just SFH in the machines
28:05 - 28:08

and then twiddle with things and replace files
28:08 - 28:12

and like try like fixing bugs live on production
28:12 - 28:14

why no just throw away the actual keys
28:14 - 28:17

because you're going to throw away the system
eventually
28:17 - 28:19

you don't even need route access to it
28:19 - 28:21

you don't need to be able to get to it
28:21 - 28:25

except through the port that your service
is listening on
28:25 - 28:27

so you can't screw it up
28:27 - 28:29

you can't introduce entropy and mess things
up
28:29 - 28:31

if you throw away the keys
28:31 - 28:34

so this is actually a practice that you can
do
28:34 - 28:36

deploy the servers, remove all the credentials
28:36 - 28:39

for logging in and the only option you have
28:39 - 28:44

is to destroy them when you're done with them
28:44 - 28:45

provisioning new services in our world
28:45 - 28:47

must also be trivial
28:47 - 28:51

so we have actually now thrown away our chef
repository
28:51 - 28:54

because chef is obsolete and
28:54 - 28:56

we have replaced it with shell scripts
28:56 - 29:01

and that sounds like I'm an idiot
29:01 - 29:04

I know, but when I say chef is obsolete
29:04 - 29:05

I don't really mean that
29:05 - 29:07

I like to say that so that people will think
29:07 - 29:08

because a lot of you are probably thinking
29:08 - 29:11

we should move to chef
29:11 - 29:12

that would be great
29:12 - 29:14

because what you have is a bunch of servers
29:14 - 29:15

that are running for a long time
29:15 - 29:17

and you need to be able to continue to keep
them up to date
29:17 - 29:19

chef is really great at that
29:19 - 29:22

chef is also good at booting a new server
29:22 - 29:24

but really it's just overkill for that
29:24 - 29:25

yeah
29:25 - 29:26

so if you're always throwing stuff away
29:26 - 29:28

I don't think you need chef
29:28 - 29:29

do something really, really simple
29:29 - 29:30

and that's what we've done
29:30 - 29:33

so like whenever we deploy a new type of service
29:33 - 29:38

I set up ZooKepper recently, which is a complete
change from the other stuff we're deploying
29:38 - 29:40

I think it was a five line shell script to
do that
29:40 - 29:43

I just added it to a get repo and run a command
29:43 - 29:47

I've got a cluster of ZooKeeper servers running
29:47 - 29:51

you want to always be deploying your software
29:51 - 29:56

this is something I learned from Kent Beck
early on in the agile extreme programming
29:56 - 29:56

world
29:56 - 29:58

that if something is hard
29:58 - 30:00

or you perceive it to be hard or difficult
30:00 - 30:02

the best thing you can do
30:02 - 30:04

if you have to do that thing all the time
30:04 - 30:07

is to just do it constantly
30:07 - 30:09

non-stop all the time
30:09 - 30:11

so like deploying in our old world
30:11 - 30:15

where it would take all night once a week
30:15 - 30:18

if we instituted a new policy
30:18 - 30:19

in that team that said
30:19 - 30:23

any change that goes to master must be deployed
within five minutes
30:23 - 30:28

I guarantee you we would have fixed that process,
right
30:28 - 30:30

and if you're deploying constantly
30:30 - 30:31

all day every day
30:31 - 30:33

you're never going to be afraid of deployments
30:33 - 30:36

because it's always a small change
30:36 - 30:38

so always be deploying
30:38 - 30:40

every new deploy means you're throwing away
old servers
30:40 - 30:43

and replacing them with new ones
30:43 - 30:46

in our world I would say that the average
uptime
30:46 - 30:48

of one of our servers is probably something
like
30:48 - 30:55

seventeen hours and that's because we don't
tend to work on the weekend very much
30:55 - 30:57

you also, when you have these sorts of systems
30:57 - 30:59

that are distributed like this
30:59 - 31:02

and you're trying to reduce the fear of change
31:02 - 31:04

the big thing that you're afraid of is failure
31:04 - 31:06

you're afraid that the service is going to
fail
31:06 - 31:07

the system is going to go down
31:07 - 31:10

one component won't be reachable, that sort
of thing
31:10 - 31:12

so you just to have assume that that's going
to happen
31:12 - 31:17

you are not going to build a system that never
fails, ever
31:17 - 31:20

I hope you don't, because you will have wasted
much of your life
31:20 - 31:21

trying to get that to happen
31:21 - 31:24

instead, assume that the thing, the components
are going to fail
31:24 - 31:26

and build resiliency in
31:26 - 31:28

I have a picture here of Joe Armstrong
31:28 - 31:30

who is one of the inventors of Erlang
31:30 - 31:35

if you have not studied Erlang philosophy
around failure and recovery
31:35 - 31:35

you should
31:35 - 31:36

and it won't take you long
31:36 - 31:39

so I'm just going to leave that as homework
for you
31:39 - 31:42

and then, you know, I said, the tests are
a design pattern
31:42 - 31:44

I don't mean don't write any tests
31:44 - 31:46

but I also want to be further responsible
here
31:46 - 31:51

and say you should monitor everything
31:51 - 31:53

you want to favor measurement over testing
31:53 - 31:57

so I use measurement as a surrogate for testing
31:57 - 31:58

or as an enhancement
31:58 - 32:04

and the reason I say this is
32:04 - 32:06

you can either focus on one of two things
32:06 - 32:08

I said assume failure right, so
32:08 - 32:12

mean time between failures or mean time to
resolution
32:12 - 32:16

those are kind of two metrics in the ops world
32:16 - 32:17

that people talk about
32:17 - 32:20

for measuring their success and their effectiveness
32:20 - 32:22

mean time between failures means
32:22 - 32:25

you're trying to increase the time between
failures
32:25 - 32:29

of the system, so basically you're trying
to make failures never happen, right
32:29 - 32:31

mean time to resolution means
32:31 - 32:35

when they happen, I'm gonna focus on bringing
them back
32:35 - 32:37

as fast as I possibly can
32:37 - 32:41

so a perfect example would be a system fails
32:41 - 32:44

and another one is already up and just takes
over its work
32:44 - 32:47

mean time to resolution is essentially zero,
right
32:47 - 32:51

if you're always assuming that every component
can will fail
32:51 - 32:54

then mean time to resolution is going to be
really good
32:54 - 32:56

because you're going to bake it into the process
32:56 - 32:59

if you do that, you don't care about when
things fail
32:59 - 33:03

and back to this idea of favoring measurement
over testing
33:03 - 33:07

if you're monitoring everything, everything
with intelligence
33:07 - 33:10

then you're actually focusing on mean time
to resolution
33:10 - 33:16

and acknowledging that the software is going
to be broken sometimes, right
33:16 - 33:18

and when I say monitor everything, I mean
everything
33:18 - 33:22

I don't mean, like your disk space and your
memory and stuff there
33:22 - 33:24

I'm talking about business metrics
33:24 - 33:28

so, at living social we created this thing
called rearview
33:28 - 33:29

which is now opensource
33:29 - 33:33

which allows you do to aberration detection
33:33 - 33:38

and aberration means strange behavior, strange
change in behavior
33:38 - 33:42

so rearview can do aberration detection
33:42 - 33:45

on data sets, arbitrary data sets
33:45 - 33:47

which means, like in the living social world
33:47 - 33:48

we had user sign ups
33:48 - 33:49

constantly streaming in
33:49 - 33:52

it was a very high volume site
33:52 - 33:54

if user sign-ups were weird
33:54 - 33:56

we would get an alert
33:56 - 33:58

why might they be weird?
33:58 - 34:01

one thing could be like the user service is
down, right
34:01 - 34:02

so then we would get two alerts
34:02 - 34:04

user sign ups have gone down
34:04 - 34:05

and so has the service
34:05 - 34:08

so obviously the problem is the service is
down
34:08 - 34:10

let's bring it back up
34:10 - 34:11

but it could be something like
34:11 - 34:13

a front-end developer or a designer
34:13 - 34:16

made a change that was intentional
34:16 - 34:18

but it just didn't work and no one liked it
34:18 - 34:21

so they didn't sign up to the site anymore
34:21 - 34:24

that's more important than just knowing that
the service is down
34:24 - 34:25

right, because what you care about
34:25 - 34:27

isn't that the service is up or down
34:27 - 34:31

if you could crash the entire system and still
be making money
34:31 - 34:32

you don't care, right, that's better
34:32 - 34:35

throw it away and stop paying for the servers
34:35 - 34:41

but if your system is up 100% of the time
and performs excellently
34:41 - 34:43

but no one's using it, that's bad
34:43 - 34:49

so monitoring business metrics gives you a
lot more than unit test could ever give you
34:49 - 34:51

and then in our world
34:51 - 34:52

we focused on experiencing
34:52 - 34:56

no, you have to come up to front and say ten!
34:56 - 34:59

ok, ten minutes left
34:59 - 35:02

when I got to 6WunderKinder in Berlin
35:02 - 35:04

everyone was terrified to touch the system
35:04 - 35:09

because they hadn't created a really well-designed
35:09 - 35:12

but traditional monolithic API
35:12 - 35:14

so they had layers of abstractions
35:14 - 35:15

it was all kind of in one big thing
35:15 - 35:17

they had a huge database
35:17 - 35:20

and they were really, really scared to do
anything
35:20 - 35:22

so there's like one person who would deploy
anything
35:22 - 35:24

and everyone else was trying to work on other
projects
35:24 - 35:26

and not touch it
35:26 - 35:28

but it was like the production system
35:28 - 35:30

you know so it wasn't really an option
35:30 - 35:32

so the first thing I did in my first week
35:32 - 35:35

is I got these graphs going
35:35 - 35:39

and this was, yeah, response time
35:39 - 35:43

and the first thing I did is I started turning
off servers
35:43 - 35:44

and just watching the graphs
35:44 - 35:48

and then, as I was turning off the servers
35:48 - 35:49

I went to the production database
35:49 - 35:54

and I did select, count, star from tasks
35:54 - 35:56

and we're a task management app
35:56 - 35:58

so we have hundreds of millions of tasks
35:58 - 36:01

and the whole thing crashed
36:01 - 36:04

and all the people were like AAAAH what's
going on
36:04 - 36:06

you know, and I said, it's no problem
36:06 - 36:09

I did this on purpose, I'll just make it come
back
36:09 - 36:10

which I did
36:10 - 36:11

and from that point on
36:11 - 36:13

like, really every day I would do something
36:13 - 36:17

which basically crash the system for just
a moment
36:17 - 36:20

and really, like, we had way too many servers
in production
36:20 - 36:23

we were spending tens of thousands more Euros
per month
36:23 - 36:25

than we should have on the infrastructure
36:25 - 36:27

and I just started taking things away
36:27 - 36:29

and I would usually do it
36:29 - 36:31

instead of the responsible way,
36:31 - 36:32

like one server at a time
36:32 - 36:34

I would just remove all of them and start
adding them back
36:34 - 36:36

so for a moment everything was down
36:36 - 36:39

but after that we go to a point where
36:39 - 36:41

everyone on the team was absolutely comfortable
36:41 - 36:43

with the worst case scenario
36:43 - 36:45

of the system being completely down
36:45 - 36:48

so that we could, in a panic free way
36:48 - 36:51

just focus on bringing it up when it was bad
36:51 - 36:53

so now when you do a deployment
36:53 - 36:55

and you have your business metrics being measured
36:55 - 36:57

you know the important stuff is happening
36:57 - 37:01

and you know what to do when everything is
down
37:01 - 37:03

you've experienced the worst thing that can
happen
37:03 - 37:05

well the worst thing is like someone breaks
in
37:05 - 37:08

and steals all your stuff, steals all your
users' phone numbers
37:08 - 37:10

and posts them online like SnapChat or something
37:10 - 37:14

but you've experienced all these potentially
horrible things
37:14 - 37:17

and realized, eh, it's not so bad, I can deal
with this
37:17 - 37:19

I know what do to
37:19 - 37:22

it allows you to start making bold moves
37:22 - 37:24

and that's what we all want right
37:24 - 37:29

we all want to be able to bravely go into
our systems
37:29 - 37:30

and do anything we think is right
37:30 - 37:34

so that's what I've been focusing on
37:34 - 37:37

we also do this thing called Canary in the
Coal Mine deployments
37:37 - 37:39

which removes the fear, also
37:39 - 37:43

canary in the coalmine refers to a kind of
sad thing
37:43 - 37:47

about coal miners in the US
37:47 - 37:49

where they would send canaries into the mines
37:49 - 37:50

at various levels
37:50 - 37:54

and if the canary died they knew there was
a problem
37:54 - 37:58

with the air
37:58 - 37:59

but in the software world
37:59 - 38:03

what this means is you have bunch of servers
running
38:03 - 38:06

or a bunch of, I don't know, clients running
a certain version
38:06 - 38:10

and you start introducing new version incrementally
38:10 - 38:12

and watching the effects
38:12 - 38:13

so once you're measuring everything
38:13 - 38:15

and monitoring everything
38:15 - 38:17

you can also start doing these canary in the
coalmine things
38:17 - 38:19

where you say OK I have a new version of this
service
38:19 - 38:20

that I'm going to deploy
38:20 - 38:23

and I've got thirty servers running for it
38:23 - 38:26

but I'm going to change only five of them
now
38:26 - 38:28

and see, like, does my error rate increase
38:28 - 38:30

or does my performance drop on those servers
38:30 - 38:34

or do people actually not successfully complete
the task they're trying to do
38:34 - 38:35

on those servers
38:35 - 38:40

so, this also allows us the combination of
monitoring everything
38:40 - 38:42

and these immutable deployments and everything
38:42 - 38:47

gives us the ability to gradually affect change
and not be afraid
38:47 - 38:48

so we roll out changes all day every day
38:48 - 38:54

because we don't fear that we're just going
to destroy the entire system all at once
38:54 - 38:56

so I think I have like five minutes left
38:56 - 39:00

uh, these are some things we're not necessarily
doing yet
39:00 - 39:02

but they're some ideas that I have
39:02 - 39:05

that given some free time I will work on
39:05 - 39:09

and, they're probably more exciting
39:09 - 39:11

one is I talked about homeostatic regulation
39:11 - 39:14

and homeostasis
39:14 - 39:17

so I think we all understand the idea of you
know homeostasis
39:17 - 39:20

and the fact that systems have different parts
that do different roles
39:20 - 39:22

and can protect each other from each other
39:22 - 39:28

but, so this diagram is actually just some
random diagram
39:28 - 39:31

I copied and pasted off the AWS website
39:31 - 39:34

so it's not necessarily all that meaningful
39:34 - 39:36

except to show that every architecture
39:36 - 39:39

especially server based architectures
39:39 - 39:43

has a collection of services that play different
roles
39:43 - 39:45

and it almost looks like a person
39:45 - 39:47

you've got a brain and a heart and a liver
39:47 - 39:51

and all these things, right
39:51 - 39:53

what would it mean to actually implement
39:53 - 39:57

homeostatic regulation in a web service?
39:57 - 40:00

so that you have some controlling system
40:00 - 40:03

where the database will actually kill an app
server
40:03 - 40:05

that is hurting it, for example
40:05 - 40:07

just kill it
40:07 - 40:09

I don't know yet, I don't know what that is
40:09 - 40:14

but some ideas about this stuff
40:14 - 40:16

I don't know if you've heard of these
40:16 - 40:20

NetFlix, do you have NetFlix in India yet?
40:20 - 40:23

probably not, unless you have a VPN, right
40:23 - 40:27

NetFlix has a really great cloud based architecture
40:27 - 40:30

they have this thing called Chaos Monkey they've
created
40:30 - 40:34

which goes through their system and randomly
destroys Nodes
40:34 - 40:36

just crashes servers
40:36 - 40:40

and they did this because, when they were,
they were early users of AWS
40:40 - 40:42

and when they went out initially with AWS,
servers were crashing
40:42 - 40:44

like it was still immature
40:44 - 40:46

so they said OK we still want to use this
40:46 - 40:50

and we'll build in stuff so that we can deal
with the crashes
40:50 - 40:52

but we have to know it's gonna work when it
crashes
40:52 - 40:55

so let's make crashing be part of production
40:55 - 40:58

so they actually have gotten really sophisticated
now
40:58 - 41:00

and they will crash entire regions
41:00 - 41:02

cause they're in multiple data centers
41:02 - 41:04

so they'll say like, what would happen if
this
41:04 - 41:06

data center went down, does the site still
stay up?
41:06 - 41:08

and they do this in production all the time
41:08 - 41:10

like they're crashing servers right now
41:10 - 41:11

it's really neat
41:11 - 41:14

another one that is inspirational in this
way
41:14 - 41:19

is Pinterest, they use AWS as well
41:19 - 41:22

and they have, AWS has this thing called Spot
Instances
41:22 - 41:24

and I won't go into too much detail
41:24 - 41:26

because I don't have time
41:26 - 41:30

but Spot Instances allow you to effectively
41:30 - 41:36

bid on servers at a price that you are willing
to pay
41:36 - 41:40

so like if a usual server costs $0.20 per
minute
41:40 - 41:42

you can say, I'll give $0.15 per minute
41:42 - 41:45

and when excess capacity comes open
41:45 - 41:48

it's almost like a stock market
41:48 - 41:50

if $0.15 is the going price, you'll get a
server
41:50 - 41:52

and it starts up and it runs what you want
41:52 - 41:54

but here's the cool thing
41:54 - 42:00

if the stock market goes and the price goes
higher than you're willing to pay
42:00 - 42:03

Amazon will just turn off those servers
42:03 - 42:05

they're just dead, you don't have any warning
42:05 - 42:07

they're just dead
42:07 - 42:11

so Pinterest uses this for their production
servers
42:11 - 42:14

which means they save a lot of money
42:14 - 42:17

they're paying way under the average Amazon
cost for hosting
42:17 - 42:19

but the really cool thing in my opinion
42:19 - 42:21

is not the money they save but the fact that
42:21 - 42:26

like, what would you have to do to build a
full system
42:26 - 42:29

where any node can and will die at any moment
42:29 - 42:31

and it's not even under your control
42:31 - 42:34

that's really exciting
42:34 - 42:36

so a simple thing you can do for homeostasis
though
42:36 - 42:38

is you can just adjust
42:38 - 42:39

so in our world we have multiple nodes
42:39 - 42:41

and all these little services
42:41 - 42:43

we can scale each one independently
42:43 - 42:45

we're measuring everything
42:45 - 42:46

so Amazon has a thing called Auto Scaling
42:46 - 42:49

we don't use it, we do our own scaling
42:49 - 42:54

and we just do it based on volume and performance
42:54 - 42:58

now when you have a bunch of services like
this
42:58 - 43:01

like, I don't know, maybe we have fifty different
services now
43:01 - 43:03

that each play tiny little roles
43:03 - 43:07

it becomes difficult to figure out, like,
where things are
43:07 - 43:11

so we've started implementing zookeeper for
service resolution
43:11 - 43:14

which means a service can come online and
say
43:14 - 43:18

I'm the reminder service version 2.3
43:18 - 43:19

and then tell a central guardian
43:19 - 43:22

and the zookeeper can then route traffic to
it
43:22 - 43:24

probably too detailed for now
43:24 - 43:28

I'm gonna skip over some stuff real quick
43:28 - 43:29

but I want to talk about this one
43:29 - 43:34

if, did the Nordic Ruby, no, Nordic Ruby talks
never go online
43:34 - 43:35

so you can never see this talk
43:35 - 43:37

sorry
43:37 - 43:41

at Nordic Ruby Reginald Braithwaite did a
really cool talk
43:41 - 43:44

on like challenges of the Ruby language
43:44 - 43:45

and he made this statement
43:45 - 43:49

Ruby has beautiful but static coupling
43:49 - 43:51

which was really strange
43:51 - 43:53

but basically he was making the same point
that
43:53 - 43:54

I was talking about earlier
43:54 - 43:59

that, like Ruby creates a bunch of ways that
you can couple
43:59 - 44:01

your system together
44:01 - 44:03

that kind of screw you in the end
44:03 - 44:04

but they're really beautiful to use
44:04 - 44:10

but, like, Ruby can really lead to some deep
crazy coupling
44:10 - 44:14

and so he presented this idea of bind by contract
44:14 - 44:18

and bind by contract, in a Ruby sense
44:18 - 44:23

would be, like, I have a class that has a
method
44:23 - 44:26

that takes these parameters under these conditions
44:26 - 44:29

and I can kind of put it into my VM
44:29 - 44:32

and whenever someone needs to have a functionality
like that
44:32 - 44:35

it will be automatically bound together
44:35 - 44:37

by the fact that it can do that thing
44:37 - 44:41

and instead of how we tend to use Ruby and
Java and other languages
44:41 - 44:43

I have a class with a method name I'm going
to call it
44:43 - 44:45

right, that's coupling
44:45 - 44:48

but he proposed this idea of this decoupled
system
44:48 - 44:51

where you just say I need a functionality
like this
44:51 - 44:53

that works under the conditions that I have
present
44:53 - 44:55

so this lead me to this idea
44:55 - 44:59

and this may be like way too weird, I don't
know
44:59 - 45:03

what if in your web application your routes
file
45:03 - 45:08

for your services read like a functional pattern
matching syntax
45:08 - 45:11

so like if you've ever used Erlang or Haskell
or Scala
45:11 - 45:15

any of these things that have functional pattern
matching
45:15 - 45:19

what if you could then route to different
services
45:19 - 45:21

across a bunch of different services
45:21 - 45:23

based on contract
45:23 - 45:27

now I have zero time left
45:27 - 45:29

but I'm just gonna keep talking, cause I'm
mean
45:29 - 45:30

oh wait I'm not allowed to be mean
45:30 - 45:32

because of the code of contact
45:32 - 45:35

so I'll wrap up
45:35 - 45:39

so this is an idea that I've started working
on as well
45:39 - 45:41

where I would actually write an Erlang service
45:41 - 45:43

with this sort of functional pattern matching
45:43 - 45:46

but have it be routing in really fast real
time
45:46 - 45:49

through back end services that support it
45:49 - 45:51

one more thing I just want to show you real
quick
45:51 - 45:54

that I am working on and I want to show you
45:54 - 45:58

because I want you to help me
45:58 - 46:01

has anyone used JSON schema?
46:01 - 46:06

OK, you people are my friends for the rest
of the conference
46:06 - 46:08

in a system where you have all these things
talking to each other
46:08 - 46:11

you do need a way to validate the inputs and
outputs
46:11 - 46:16

but I don't want to generate code that parses
and creates JSON
46:16 - 46:21

I don't want to do something in real time
that intercepts my
46:21 - 46:24

kind of traffic, so there's this thing called
JSON schema
46:24 - 46:27

that allows you to, in a completely decoupled
way
46:27 - 46:31

specify JSON documents and how they should
interact
46:31 - 46:36

and I am working on a new thing that's called
Klagen
46:36 - 46:38

which is the German word for complain
46:38 - 46:42

it's written in Scala, so if anyone wants
to pair up on some Scala stuff
46:42 - 46:48

what it will be is a high performance asynchronous
JSON schema validation middleware
46:48 - 46:53

so if that's interesting to anyone, even if
you don't know Scala or JSON schema
46:53 - 46:54

please let me know
46:54 - 46:57

and I believe I'm out of time so I'm just
gonna end there
46:57 - 46:59

am I right? I'm right, yes
46:59 - 47:02

so thank you very much, and let's talk during
the conference

Title:: Garden City Ruby 2014 - Keynote by Chad Fowler
Description:: more » « less
Duration:: 47:37

Amara Bot edited English subtitles for Garden City Ruby 2014 - Keynote by Chad Fowler

English subtitles

Revisions

Revision 1 Imported

Amara Bot

Garden City Ruby 2014 - Keynote by Chad Fowler

Revisions

Our website uses cookies

Operating cookies (Required)