MARK MENARD: So, many thanks to the organizers
here at RailsConf. This is my first time
talking at RailsConf. It's, frankly,
kind of intimidating to be up here
and see so many people out there.
My name is Mark Menard. I'm gonna be talking
about small code today. And I've got a lot
of code. About seventy-nine slides. A hundred
and thirty-seven
transitions. Not quite as much as Sandi had,
but
it's a lot to get through. So let's get
going.
So, I'm just gonna let this quote up there
sink in.
So. All of us have that file filled with
code that you just don't want to open. As
you heard earlier, maybe it's your User class.
That
class that has comments like, woe to ye who
edit here. The problem with this code is that
it does live forever. It encapsulates business
logic that
ends up getting duplicated elsewhere, cause
no one wants
to go in there and look at that code.
It's also very hard to understand.
I'm gonna be talking about ways to avoid this
situation. I'm gonna be talking about code.
Code at
the class level and the method level. Having
small
code at the class and method level is fundamental
to being able to create systems that are composed,
composed of small, understandable parts.
I'm gonna lay out a few base concepts so
that we can start with a clean sheet and
on the same page. I think there's a lot
of problems with what people conceive of as
small
or well-designed code.
It's not about the actual amount of code you
write, but how the code is organized and the
size of the units of code. Fundamentally,
writing small
code is really design discipline, because
the only way
you can write small code is use good design
and refactoring.
Design and refactoring the way we write small
code.
You can't just sit down and write small code,
perfectly well-designed code on the first
draft. It doesn't
work that way. It's iterative process.
So what do I mean by small? It's not
about total line count. Well-designed code
will typically have
more lines of code than bad code. Just the
overhead of declaring methods and classes
is gonna increase
your line count.
It's not about method count. Well-factored
code's gonna have
more, smaller methods. It's not about class
count. Well-designed
code is almost definitely going to have more
classes
than what I call undesigned code.
Although I've seen some cases of over-abstraction,
I find
that's pretty rare unless someone goes pattern
crazy. So
small code is definitely not about decreasing
the number
of classes in your system. It's about well-designed,
it's
about well-designed classes that aren't poorly
designed.
So what do I mean by small? Small methods,
small classes. Small methods are the foundation
of writing
small code. Without the ability to decompose
large methods
into small methods, we cannot write small
code. And
without small methods, we can't raise the
level of
abstraction.
To write small code, we have to be able
to decompose large classes into smaller classes,
and abstract
responsibilities out of them and separate
them on higher-level,
and base them on higher-level abstractions.
It's important that our classes are small,
because small
classes are what lead to reusability and composability.
So, why should we strive for small code? Why
is it important?
We don't know what the future is going to
bring. Your software requirements are going
to change. Software
must be amenable to change. Any system of
software
that's going to have a long, successful life,
is
going to change significantly. Small code
is simply easier
to work with than large, complex code. If
the
requirements of your software are never gonna
change, you
can ignore everything that I have to say here.
But I doubt that that's the case.
We should write small code because it helps
us
raise the level of abstraction in our code.
It's
one of the most important things we do to
create readable, understandable code. All
good design is really
driving toward expressing ubiquitous language
of our problem domain
in our code.
The combination of small methods and small
classes is
going to help us raise that level of abstraction
and express those higher-level domain concepts.
We should also write small code so we can
effectively use composition. Small classes
and small methods compose
together well. As we compose instances of
small objects
together, our systems will become message-based.
In order to
build systems that are message-based, we have
to use
delegation. And small, composable parts. Small
code makes small
composable parts. It's gonna help our software
have flexibility
and lead to a suppleness over time, and allow
us to follow those messages. And eventually
we're gonna
see, find our duck types.
And all this is about enabling future change.
And
accommodate the future requirements without
a forklift replacement.
So the goal: small units of understandable
code that
are amenable to change.
Our primary tools are extract method and extract
class.
Longer methods are harder to understand than
short methods.
And most of the time, we can shorten a
method simply by using the abstract method
refactoring. I
use this thing all the time when I'm coding.
And once we have a set of methods that
are coherent around a concept, then we can
look
to abstract those into a separate class and
move
the methods to that new class.
So, I'm gonna be using the example of a
command line option parser that handles booleans
to start
with, and then we're gonna see where the future
takes us.
So, with the command line, I want to be
able to run some Ruby program dash v. And
handle boolean options. That's where we're
gonna start.
In my Ruby program, I want to define what
options I'm looking for, using this simple
DSL. And
then I want to be able to consume it
like this. If options.has and then a particular
option,
I do something.
Putting it all together. The DSL, the program
at
the top, the DSL, and then how we actually
consume that options object. Pretty simple.
Here's my spec. It's pretty simple. It's true
if
the option is defined and it's present on
the
command line. And it's false if it's not.
So I run my specs and I get two
failures. Yes, I used TDD. So, here's my implementation
that fits on one slide. Pretty simply, I store
the defined options in an array, and I store
the arguments, the argv for later reference.
Then I
have a has method that checks to see if
the option is defined. If it's present in
the
argv.
And then I've got my option method, which
implements
my simple DSL. Nice and readable. Fits on
one
slide. Probably very comprehendable.
So I run my tests. Zero failures. They pass.
I'm done. I get to go home until the
future comes along. And my workmate comes
along and
says, hey, I really like that library. But,
could
we handle string options?
Sounds pretty simple. Pretty straightforward.
So I think about
that, and I come up with a small extension
to the DSL, to just pass a second argument
as an option with a symbol representation
of the
option type. String, in this case.
I also default to being a boolean so I
don't have to change the code that other people
have done.
So, a string option. It's a little different
than
a boolean. It actually requires content. So
now I
need the concept of validation. If the string
option
is missing the content, it's not valid. There's
no
string there.
So, then I'm gonna normalize how I get the
values out of both those string options and
those
boolean options. You know, that value. This
is gonna
change the API, but sometimes you actually
need to
break the API to enable the future.
And I'm doing it pretty early. I've only got
one guy in my office using the library at
the moment. So, again, putting it all together.
I
can pass the options on the command line.
I
define the options with the DSL, and here's
how
I use my valid? and my value methods to
find out, get, find out if it's valued and
get my values out.
So, now here's the class that implements it.
Again,
on one slide. Probably not as readable. Probably
not
as comprehensible. We're going down what I
call kind
of the undesigned path. It's not too big.
Thirty-one
lines. But it's got issues.
It's got a method that's definitely large.
One that's
looking on the verge of being large. It's
got,
for only handling booleans and strings, it
has quite
a bit of, of conditional complexity in it
already.
And as we're soon gonna see, it's not very
amenable to change.
So we'll look at the pieces and how they
work, just so yo understand it.
That's my initialize method. It creates a
hash to
store the options. Because we have to store
the
type now, not just that we have an option.
It's either boolean or string. And the rest
of
the initialization is the same as it was before.
And the valid method, we gotta iterate over
the
options, looking to see which ones are strings.
So
we're doing checking on type here. And, and
trying
to see whether they're present and they actually
have
content.
Currently, string options are the only ones
that need
to validate. Boolean options, there's nothing
really to validate.
Either it's there or it's not. No validation.
But
strings, we have to.
And the value method, it does a lot of
stuff. Let's just pretend for a moment this
method
is a black box. We're gonna come back to
it later. Cause this is, by far, the worst
code in this current example.
But, everything is spec'd. And all my specs
are
green.
So let's talk about methods. Cause we've got
some
big ones and we need to clean them up.
I call it the first rule of method, of
methods. Do one thing. Do it well. Do only
one thing.
Harkens back to that Unix philosophy of tools
that
you string together with standard in, standard
out. But
how do we determine if a method is actually
only doing one thing? This is where your level
of abstraction and the abstract, abstractions
in your code
come into play. And you need to develop a
feel for this over time. That you want one
level of abstraction per method.
If all of our statements are the same level
of abstraction, and they're coherent around
a purpose, then
I consider that to be doing one thing. Doesn't
mean it has to be one line in a
method.
I can't tell you how many times I've looked
at code and seen a comment on a method
that was, like, an excellent description of
what the
method did, and if you just took those words,
bound together, they'd make a fantastic method
name. But
yet the method is named something else that
isn't
that descriptive. So use descriptive names.
It's really critical.
And the fewer arguments, the better. My personal
goal
is zero arguments on methods. One is OK. Two
or three. That's when I start to think I've
probably missed an abstraction in my code
and I
should go back and look at it.
Separate queries from commands. If you query
something and
it looks like a query method and it changes
the state of your object, it's hard to reason
about, and people who consume your library
will be
confused by that. So separate those.
And, don't repeat yourself. I know Sandi talked
about
this earlier, and it does take some judgment
to
know when it is time to remove the repetition.
But you don't want to leave repetition over
the
long term, because it will come back to bite
you.
So, let's look at our methods. We've got repetition
here. Both valid? and value are digging through
the
argv array to find the options from the command
line. This is the perfect candidate for an
extract
method, abstraction. Refactoring.
We have magic constants scattered around,
and those are
a strong indication that we've missed something.
An abstraction.
We're violating some other rules. It's hard
to say
either of these methods is really doing one
thing.
The code is definitely not at the same level
of abstraction. Values digging, valid? is
digging into the
argv array and value is figuring out different
divergent
types and how to return their values. So we're
gonna eliminate some of the repetition with
the extract
method refactoring.
The extract method refactoring entails moving
a part of
a method into a new method, with a descriptive
name, that's the naming part. And then calling
the
new method.
This, this refactoring helps us keep the level
of
abstraction consistent in the method we're
abstracting from. Here
we have one expression on a method that's
a
high level of abstraction, and two statements
that are
a low level of abstraction. So we move the
less abstract code to a new method with a
descriptive name, and then we call the new
method.
And this results in the old method, method
having
a consistent level of abstraction. So back
to our
CommandLineOptions class, both valid? and
value are digging through
the argv collection to find the option value.
So
we're gonna abstract that code and get the
raw
value out of argv.
Then we call the method from where the original
logic was abstracted. Pretty simple. But now
the code
left behind in valid? and value says what
I
want. Not how to do it.
The how has been moved to the abstracted method,
raising the level of abstraction just a little
bit
in valid? and value.
I'm going to do two more abstractions. I've
abstracted
the string option value method and the abstract
content
method. The naming of the abstracted methods
is very
important. They say what they do.
But overall, I'm not happy with this code.
It
is more explanatory, but it's fairly complex
and hard
to understand. It's also not as small as it
could be. The methods are large because I
missed
an abstraction. And we're gonna go find that
now.
I'm referencing the option type symbol to
see if
it's a string, which, that's a big smell.
Then
there are the magic constants used to dig
into
the argv element to find the constant within
that
particular string, the substring. If I was
confident that
I'd have no future added requirements for
this class,
I might leave this alone. It works. It's tested.
Until my buddy comes to me and says, hey,
I really like that library, but could we handle
integers now?
I could keep driving down this undesigned
path I've
been following, and complicate the valid?
and value methods
by switching on the type of the option and
digging into those argv elements to find the
value.
But, this is our chance to make a break.
And make our code more amenable to change.
But, to illustrate the point, I'm gonna show
you
that undesign method, to show you the OO design
actually matters. So we're gonna look at this.
This is the undesigned, non OO version of
this
code. Is it horrible? I'll leave that to you
to decide. Is it small? In my opinion, definitely
not. It is not small, by any measure. The
class is growing due to changes in specification.
The
valid? and value methods are being changed
in lock
step. That's a sure sign we've missed an abstraction
or a duck type. And those methods are getting
big and complicated. And now they're doing
even more
things.
And we're just doing booleans, strings, and
integers. Not
that much.
The code has tests. They all pass. That's
good.
But it's not satisfying. We've got those large
methods
and complex conditional logic. It's time to
refactor now.
To make the change easy.
And now we've got the tests that are back,
so we can do it without fear.
And, I want to call your attention to a
pattern that clearly emerges when we go down
the
non OO path here. We see checking the option
type and divergent behavior based on the type.
Don't
reinvent the type system. If you have ducks,
let
them quack.
In this example, the option types of boolean,
string,
and integer, those are our ducks. And I'll
bet
there's ducks in your code yearning to be
free.
And just a further confirmation that we're
dealing with
an abstraction or a duck, we see the testing
option type again in the value method. Hidden
inside
the valid? and value method, there's a case
statement
here. It just didn't evolve that way as I
was writing the code.
I'm gonna show you that. You're gonna see
that
it's really clear now.
Now it should be really obvious what the duck
type is. If you have case statements like
this
in your code, you've missed an abstraction.
Here, again,
we clearly see the duck type.
Now, I would guess, if I was writing this,
as soon as I had the string type, I
would have gone down the OO path. I just
wanted to illustrate to you what an undesigned,
non
OO mess you can get yourself into if you
keep riding the horse until it's dead.
My dad had a saying hanging on his wall
in his office. When the horse is dead, get
off.
But sometimes we don't realize the horse is
dead
and we just keep trying to go. Now it's
time to take a fresh look at this. So,
since class is the fundamental organizational
unit we have
to work with, it's time to work at what
constitutes a good class. Which principles
are gonna lead
us to be able to write small classes.
So, how do we write small classes? To make
small classes, I think, and this is not just
my opinion. It's a lot of peoples' opinion.
The
most important thing we should assure is that
our
class has one responsibility. And that it
has small
methods.
All the properties of a class should be cohesive
to the abstraction that the class is modeling.
If
you have properties that you only use in one
or two methods, that's probably something
else that shouldn't
be in there.
Finding a good name for a class will also
help us keep it focused on a single responsibility.
I sometimes talk to the class. Have you ever
heard the concept of talking to the rubber
duck?
Or just explaining your problem to someone?
They don't
even have to respond, and it helps you figure
it out.
Sometimes I just ask my class, hey class,
what
do you do? And if it comes out with
a long list, you've got a problem.
So, the main tools we're gonna use to create
new classes from existing code, not from scratch,
but
from existing code, is the extract class and
move
method refactorings, which we're gonna go
through here.
So, those characteristics of well-designed
class. Single responsibility. Cohesive
around a set of properties. Additionally,
it has a
small public interface that, preferably, handles
a handful of
methods at the most. That it implements a
single
use-case, if possible, and that the primary
logic is
expressed in a composed method.
That last one, I'm not gonna be covering the
composed method. That's a whole nother talk.
But you
should check that practice out. It can really
clarify
code and make it much, much more understandable.
So, let's look at the code we should have
been driving towards as soon as the string
option
type showed up. We're gonna imagine right
now that
we have a string sheet, and we can write
CommandLineOptions the way we would have with
the knowledge
that we have now.
That needs to support boolean, string, and
integer options.
And remember, we have our tests at our back,
making sure that we don't break anything.
And, here was my first take at it on
what I'd write. The class is twenty-eight
lines long.
It is cohesive around the properties. When
we're done,
most of the methods are gonna deal with the,
the hash of options and the array of args.
It has a single primary responsibility. Manage
a collection
of option objects.
So now we've introduced a collaborator. It
also manufactures
the option objects, which I could abstract
to another
class. But for the moment, I'm gonna leave
it.
If I find it hurts in the future, then
I'll change it. That's my general rule. My
guideline.
Is I refactor when it hurts. When making a
change hurts, that's the time to refactor.
My CommandLineOptions class has a small public
interface. Just
two methods, valid? and value. And it has
no
hard-coded external dependencies yet. I could
mess that up
and introduce those, but we're gonna avoid
that.
Another interesting characteristic is that,
is that there are
no conditional statements in this class, and
we're gonna
keep it that way. In Sandi Metz's 2009 Gerupo??
talk, on the Solid Principles, she said something
along
the lines of, a conditional in an OO language
is a smell.
And that's a really powerful statement. I
don't think
Sandi's saying that we can't use conditionals
in our
code, but that we use conditionals to hide
abstractions.
To hide our ducks.
The first time I saw that talk, I don't
even know if I heard her say it. It
was when I went back and rewatched it. I
thought, really? Then, as the years have gone
on
and I've been working, I've gotten to the
point
where I agree with her.
If you have a lot of conditionals in a
class, you have probably missed a concept
that should
be abstracted out of it.
So the initialize and option method from our
previous
implementation carry over unchanged. Except
that we're gonna store
the options in a hash instead of just the
type.
My valid? method now simply asks all the options
if they're valid, and the value method simply
looks
up the option hash and asks it for its
value. So, now we need to build the options.
We have to implement this. And this is where
we're gonna instantiate the objects that represent
the boolean,
string, and integer options.
So, now we have the CommandLineOption class,
we need
collaborators. In order to get anything done,
CommandLineOption needs
option classes to manage. It's gonna have
those objects.
So this is creating a dependency. And if we're
gonna create a dependency in our code, we
can
do it in a way that's amenable to change,
or we can do it in a way that's
gonna make it hurt in the future.
You don't want to depend, or, excuse me, you
want to depend on abstractions, ot concretions.
Depend on
the duck type, not the concrete type. In our
case, depend on the concept, the concept of
an
option. Not on the concrete types that implement
that
abstraction.
In our case, option is the duck type. This
is the abstraction that I missed earlier,
when I
just kept going down the conditional logic
path.
It's really simple. It has a valid? method
and
a value method. String option, integer option,
and boolean
option, those are the concrete implementation
of the option
abstraction. All they need is a valid? and
a
value method, and a consistent method of construction,
and
I can depend on the abstraction, not on the
concretions.
So, how do I do that? I could go
down the case statement road again and check
the
option type, instantiating the correct type
of the option
based upon the symbol. But I'm not gonna do
that, cause that would tie CommandLineClass
to those concrete
types, which is what we're trying to avoid.
That creates a hard dependency between CommandLineOptions
class and
those various classes. Instead, I'm gonna
use the dynamic
capabilities of Ruby to instantiate those
objects for us
using naming conventions. For string, we're
going to have
a string option. For booleans, boolean option.
Et cetera.
I could do this even in many static languages.
So this isn't something that's specific to
Ruby. And
this is a very. This very simple change takes
out CommandLineOption class from depending
on those concrete implementations
and flips it to depending on the abstraction.
This is dependency inversion from the Solid
Principles, in
practice. Alternately, some other people have
suggested, you could
use a hash and map from the string, boolean,
and integer symbols to the concrete classes,
kind of
like what Sandi did in her Gilded Rose Coda??
solution earlier.
That's OK. But, it is an additional thing
that
I have to maintain over time. It's a reason
to open the CommandLineOptions and change
it if I
have to add a new type of option. If
using the dynamic ability of Ruby bothers
you, then
make a hash. Personally, I'm fine with using
the
dynamic capabilities of my language.
So, in my case, I've inoculated CommandLineOptions
class from
needing to change to support new option types.
And
at this point, this class should be closed
for
modification, but open for extension.
So, now we need to move the logic for
the various option types to the appropriate
option classes.
I decided to make a base class of option
for my concrete types to inherit from, because
the
manner of initialization needs to be the same
for
all of them. No sense of repeating that code.
And the subtypes have a cohesion around the
flag
attribute, and the wrong, excuse me, the flag
and
the raw value properties that in the code.
Here's the boolean option. This one I just
wrote
because the requirements are so simple. Booleans
are always
valid, and they just return the raw_value
from the
command line. If it's present, it's truthy.
If it's
nil it's falsey. Very simple.
But now we need to implement string option
and
integer option. And the logic for their validation
and
value extraction is in the old CommandLineOptions
class. So,
on the left are the original CommandLineOptions'
valid? and
value methods. On the right are those new
string
option and integer option classes.
As you can see, the process of creating the
option class was simply picking apart and
disassembling the
old command line option class. Moving the
logic to
where it belongs, using a combination of extract
class
and move method refactorings, we've really
cleaned up the
command option, CommandLineOptions.
Frankly, there's not much code left there
anymore. So,
now we can replace that nasty, hard to understand
valid? method with this. And the large value
method
with this.
To create the specs for the various option
classes,
I moved the corresponding section from the
CommandLineOptions spec
to the corresponding area for the particular
type of
option, and then lightly reworked them and
then I
worked them from red to green, as I went
through the process of extracting those classes
and moving
the code to those methods.
We've isolated abstractions here. And how
do we do
that? We separate the what from the how, like
we've done in CommandLineOptions. We want
to move from
code that looks like this to code that looks
like this.
The original CommandLineOptions' valid? method
contained all of the
how. The refactored valid? method says what
we want
done for us. That's it. All of the how
has moved to the collaborators of our main
class,
in this case, StringOption, boolean option,
and IntegerOption.
We want to move from that looks like this
to code that looks like this. Move the nitty
gritty details of your code out to the leaves
of your system. And let the center be a
coordinator.
So, when we're done with this, this is what
our CommandLineOptions class looks like. These
are our public
methods. It provides a very small surface.
And it
fulfills the use case. And these are the private
implementation cruft. It's necessary, but
no one really needs
to go poking around in here, and I've made
it obvious by declaring these methods private.
They're for me. Not for you.
So in the end, the sum total of the
implementation of the public interface, and
it's all delegated.
All delegated.
So in the process of making the specs pass,
I commented out that dreamed up code as I
went through the process, and then one by
one
I wrote the examples and uncommented the code
and
made them pass, working from red to green.
Then, because nothing is ever really done,
my buddy
says, hey. Any chance you could add the ability
for me to pass an array of values for
an option?
So, to implement this new requirement, I only
need
the new array option class. So I write a
spec example. Make it fail. Then create the
ArrayOption
class, and I'm done. And this particular example,
my
OptionClass is inheriting from the OptionWithContent
superclass. And, cause
I actually went through this and realized
that strings,
integers, and arrays all have content, so
I abstracted
that superclass and, in this case, all I have
to do is write the value method of that
particular type and I'm done.
And it works.
So, we now have a CommandLineOption class
that's closed
for modification, but open for extension.
I could all
float types, decimal types, other types of
options, and
I don't have to go back and touch that
class again.
We have small, easy to understand option classes
that
have a single responsibility. Oops, excuse
me.
We can. So, we have a easy to understand
option classes that have a single responsibility,
and easy
to compose together with that CommandLineOption
class. And we
can simply create new option types and have
them
instantiated by convention.
My name is Mark Menard. My company's Enable
Labs.
We do full lifecycle business productivity
and sass app
development, from napkin to production, as
I say. And,
I'm gonna be around the conference, so let's
get
together and talk about some code.
And we can do some questions.