Few things draw my attention
like a looming deadline.
Some part of my brain
watches in morbid fascination
as the deadline approaches
wondering whether to
call emergency services;
or maybe just settle in,
make popcorn,
and watch the show.
I can't help but keep a wary eye on it.
A wary eye that is no longer
paying attention to the code at hand.
I need all the wary eyes I can get.
Without them, forget best practices.
I revert to less successful strategies
like guessing;
and desperately copying code off
of Stack Overflow.
Sometimes I'm happy.
There are moments when
the world melts away.
Your sense of self dissipates.
You become
completely absorbed in what you're doing.
Time appears to slow down
and speed up simultaneously.
Being awesome feels effortless.
I've taken to coming into
the office early in the morning
and committing random acts of refactoring.
Some people are calling this
guilt-driven development.
But really it's not.
Refactoring just makes me happy.
More refactoring is an optimization.
So today I am going to tell you a story.
The arrow needs to point the other way
because this is a refactoring story.
Anyway it has a middle...
Actually it has a beginning,
two middles and an end.
And a moral.
So to be clear, before I get started,
when I say refactoring, I mean
small careful steps that improve the
structure and the readability of the code
without changing its behavior.
Tests are implied.
So.
Once upon a time
there was an application that
performed some incredibly dirty hacks
in order to deliver data straight
into a number of real-world,
old-school, hardcopy publishing systems.
I found this particular specimen
in the dark recesses of that codebase.
It was in a module that
was 300-or-so lines long
most of which was in a single method.
This module talked to code
all over the application.
It dealt in temporary files, FTP;
it shelled out;
used Exif 2 to write
metadata into JPEG files;
it handled encoding.
It had a comment in it that read:
"A kitten dies every time
this code is run."
We had it running on a cron job.
It had no tests
and there was no documentation.
This is essentially
a bucket of business logic.
It jumps through hoops
in order to name files
so that they end up where they need to be.
Now there's a comment
but it's almost as bad as the code.
Also it's wrong.
The input to the method is a target,
which is the God object
in this application.
It's an ActiveRecord model.
That class is 500 lines long.
The big picture here is
that information is being shoved
onto a string in order to
build up the file name.
There is a bit of code that's not actually
shovelling things onto the string.
This draws the eye.
It appears to be a chunk of something.
And it's probably a good
candidate for extraction.
Another striking aspect of the code
is that it's littered
with a bunch of low-level details.
It makes it harder to see what's going on.
But amidst all the clutter
there does seem to be things...
chunks that could probably be named.
So what are we looking at?
We've got a very large method.
It appears to be doing things
at 2 different levels of abstraction.
And nothing at the lower level
of abstraction is being named.
These are not
characteristics of good code.
The thing to do with big ugly code
is to break it apart into small ugly code.
There's a critical difference
between breaking code down,
and breaking code.
The first middle of this story
is gonna be about adding
characterization tests to the method.
And then the second middle is
the actual refactoring.
Where do you even begin
testing something like this?
We don't really know
what the inputs look like.
We certainly don't know
what the output looks like.
We do know that it's in production,
and it appears to be working
because we haven't had any complaints
from the customer about it.
The easiest way to discover inputs
is to just send something—
anything really—into the method
and then see what comes back out.
We need an assertion to work against.
And again, it doesn't matter
what that assertion is.
It is going to be wrong.
The nice thing about being wrong
is it tells you what 'right' looks like.
Running this fails, obviously.
The error message tells us
that we're missing an input.
It also points us to the exact spot
in the code where this is happening.
What we find on line 6—
it comes as no surprise at this point—
the message publish_on
is being sent to target
and it returns a date.
Meanwhile back in our tests,
we know that the stub represents a target.
And we are ready to start
fleshing out those inputs.
The one we need right now is publish_on,
which we know is a date.
And the failure tells us
what our next input is.
Digging around in the code reveals
that xyz_category_prefix is a string.
Next up is kind, which is also a string.
personal is a boolean.
id is an integer.
And title is a string.
And this gives us our first real failure.
Meaning that we've gotten
all the inputs right.
These are the inputs.
The messages that we need
to send to target
in order to build up a file name.
And with the inputs in place,
we're also given the output,
which looks like this.
And now all that practice
copying-and-pasting from Stack Overflow
comes in handy.
Copy that string from the failure into
the assertion and the test should pass.
It doesn't.
The fix for this is to use a regex.
This gives us our first passing test.
We're not quite done yet.
There are a bunch of low-level details
that aren't being exercised.
And there are alternate paths
through the method.
We need to improve the inputs
for these 3 lines of code
and add test cases for the lines
that have conditionals in them.
At every step, failing tests tell us
how to tweak our assertions.
These are trivial to fix; boring to watch.
So I'll show you some highlights.
publish_on gets zero-padded,
and pi doesn't actually
get affected by this
so we'll fall back on e.
kind gsubs out underscores.
We need an underscore
so we can see it being removed.
title appears to strip out everything.
And the existing input
doesn't have very much cruft in it.
So we need some numbers
and some funky characters
and a couple of upper-case letters.
There's something fishy about that regex.
What's with the square brackets?
I'm too lazy to reason about this
so I'm adding a test case,
overriding the stub's input for title,
giving it some square brackets.
And that proves that
brackets get left in place.
This is probably not
the intended behavior.
This test now serves as documentation
until we can clarify this
with the customer.
The first conditional deals in
personalization
and our stub doesn't personalize.
So we need a test for the case
that actually does.
We're informed about yet another input
that's missing.
It's trivial to supply.
The other conditional on that line of code
provides a fallback in case age is nil.
And the test for this stubs out
the same inputs as the previous one.
The final conditional is
a ternary statement
that is trying to determine
where to truncate the title.
Our default input for the title is
neither long nor short.
So by making it longer
we can make sure that it gets truncated.
And then the only case left to test for
is a title that is too short
to get truncated.
At this point we have
protection against regressions.
What have we actually accomplished so far?
We took a piece of
undocumented, untested code.
And with a bit of hand-waving,
we got fake assertions
to give us the inputs.
The inputs gave us the outputs.
And the outputs
gave us the real assertions.
It's almost karmic.
Then we had to reason about the code.
We inspected every line.
Made sure that we had inputs
that would actually exercise those lines
and then we made sure that
every branch of every conditional
was called from a test.
There's no specification here.
There's no design being done.
We're getting a regression test suite.
We are also getting documentation
up to and including the fact
that we probably have a bug.
The biggest win at this point
is that we can start changing things
without breaking into a cold sweat.
So here's what we're gonna do.
We're gonna isolate the method.
And then we're going to
extract a bunch of smaller methods.
I'm gonna borrow a detailed prescription
from Martin Fowler's book Refactoring,
called Replace Method with Method Object.
Traditionally you do this refactoring
when you have big computation
and a bunch of temporary variables
and you don't want to be passing
those temporary variables around
as you extract methods.
Here, we pretty much only have
this one target to worry about.
So first of all,
we need to make a home for this code.
Add an initializer that takes the target.
Add an attribute for it.
And just copy-paste
the whole thing in there.
The method shouldn't be called
on the class.
It needs a better name.
And it no longer takes any arguments.
Back in the old module,
we need to reference the new method
from the old one,
so just delete the whole body
of the method,
reference the new file,
instantiate the class,
pass in the target,
call name on it,
and we're green.
Which means that we have
a license to go to town on this code.
Where to begin?
You could start anywhere.
I always like to delete something.
Let's go ahead and
get rid of that comment.
And now somewhat arbitrarily,
I've chosen to go
from biggest to smallest.
The biggest chunk
that we've seen for extraction
is the truncated title bit.
The actual mechanics of
a method extraction is as follows:
Create an empty method
and name it by what it does
or what it is,
not by how it does it.
Coming up with a good name
is often the hardest thing
you'll ever do in a refactoring.
The next step is to copy the lines of code
from the source method into the new one.
Scan the new method for local variables.
filename here is declared
in the source method,
so we have a choice to make.
We can either do the mutation here,
which means passing the filename in,
or making it an instance variable.
Or we could make this a query method.
Now the whole point of the refactoring is
to separate the 2 levels of abstraction—
building up the filename in one place
and putting together all the little pieces
in another.
So we're gonna just return
the value that we need.
The other temporary variables here
are all local to this method.
We need to assign the result
of our new query method
and delete the temporary variables
that are no longer used
in the source method.
And then we're green again.
Before we move on
I'd like to tidy this up a bit.
There's some unnecessary work
going on in the regex.
We're doing a case-insensitive match,
and then we're downcasing afterwards.
That's kinda backwards.
There's also something in the
match-if brackets that drives me nuts.
The spurious parentheses have got to go.
Another thing that bugs me is
the fact that we have a ternary statement
to decide the range for the match-if.
A match-if is not going
to raise an exception.
If you try to truncate a 4-character
string to 9 characters,
it'll just return 4 characters.
The ternary statement can go.
Since we no longer need truncate_to,
we don't need to worry
about the length of the title.
So that can go too.
And with those 2 lines gone,
there seems very little point
in having a temporary variable,
so we can ditch that as well
effectively leaving us with
a single line of code in truncated_title.
The longest line of code
is now the hexdigest stuff.
Now, hexdigest is a terrible name
in this context.
Copy stuff in.
Assign the result.
And we're green again.
The personalization line
is now the longest.
Perform a quick extraction.
publication_day
and xyz_category_prefix.
Inside this class, xyz is redundant.
The fact that it's a prefix is irrelevant.
So we can drop the xyz prefix
as well as the prefix suffix
and just call the new method category.
And then we've got kind.
This leaves us with
a fairly readable method.
We could totally leave it at this
but there's still
some spurious stuff in here.
publication_day for instance.
publication_day returns a string.
So, why are we interpolating it?
Same goes for category
and kind.
target.id is an integer
but string interpolation
implicitly calls to_s on it.
This next move might seem
a little bit subtle
until you recognize that kids' game
Five Differences.
Which line of code here is different
from all of the other pieces?
The first line was the only one that's
not being shovelled onto the string,
so we can start off with an empty string
and then make the first piece
just like all the others.
Up to a point.
The extension is also different.
If you conceptually separate
this into 2 jobs—
building up the filename,
and then slapping the extension on it
—we're left with 2 distinct sections:
one where everything is smooshed together,
and another where the pieces
are separated by underscores.
If we extract
the smooshed together pieces,
then all the remaining pieces
are separated by an underscore.
We can now shovel them into an array
rather than a string.
This gets rid of a bunch more
of the interpolation syntax,
joining them with an underscore.
Done.
Is it perfect?
Of course not.
Is it better?
Hell yeah!
We went from this to this
in less than 30 steps.
Let me just show you that again.
Before.
After.
Before.
After.
So first we quarantined the method.
Then we extracted stuff.
Extracting stuff basically boils down to
identifying a piece of code
that performs a subtask
and then giving that code a name.
Back when we started out,
we identified 3 code smells:
a large method,
2 different levels of abstraction,
unnamed abstractions.
Abstracting methods addressed
all 3 of those issues.
At that point we still had
some ragged edges though.
So we kinda picked through it
and removed pointless cruft.
This concludes the second middle
of the refactoring story
and I'm gonna end by taking a closer look
at pointless cruft.
In the field of information design,
chartjunk refers to visual elements
in charts and graphs
that add noise.
They don't help you understand the data.
They often get in the way of
comprehension.
To give you an idea of
what we're talking about
this graph gets just about
everything wrong.
This is the exact same data
presented without chartjunk.
Codejunk is the Ruby equivalent
of chartjunk.
This is not about coding practices.
It's not about naming and SRP
and small methods and DRY.
It's about noise.
I've tried to classify codejunk
and I came up with this list.
#10
Does a bear shit in the woods?
Comments shouldn't echo
the implementation.
They shouldn't be wrong.
They shouldn't be imprecise.
And they shouldn't be misspelled.
#9
It is my conviction
that everyone should make their editor
show them trailing whitespace.
If you leave it in,
you have noise in your code.
If you then take it out,
you have noise in your diffs,
which is just wrong on so many levels.
#8 If it's dead, let it rest in peace.
What were you saving it for?
You need to learn to let go.
What is this doing in source control?
#7
More parentheses.
This is vile.
When I see this,
I secretly wonder if you
wash your hands after you
go to the bathroom.
#6 Intelligent defaults.
Yes. 4 characters saved are 4 characters
that you can spend another day.
Or something.
Let's talk about dependencies.
In particular, dependencies that
you're not actually depending on.
This may or may not impact performance.
It probably will impact the performance
of your tests.
A whole different story.
Mostly you just added noise to the code.
#4 This is familiar territory.
Recognize this?
Not to beat a dead horse,
but ew.
#3 Don't do work that the computer
is already doing for you.
More string interpolation.
I could've technicallly folded this into
the previous one
but I'm trying to get to 10.
This is the same thing.
The call to map is superfluous here.
The match-if brackets
don't need your help.
We've seen that one.
In this example the computer isn't
doing the work for you but should be.
#2 Test suites.
They should be curated, maintained, fed,
watered, coddled and minimized.
The only exception I can think of is if
documenting behavior outweighs the cost.
Choose wisely.
And finally, the compounding sin.
You know the old saying,
"1 + 1 = 3 for very large values of 1"?
This is revolting.
In fact, it bears a horrifying resemblance
to the code that
we started out with today.
When people say that
software is grown, not built,
I'm pretty sure that this
is what they're talking about.
It's kinda like fungus.
Earlier I said about the refactoring
that after we extracted methods
we kinda polished up the code a little.
I'm gonna restate that
in light of the previous section.
First, we extracted methods.
Then we eliminated codejunk.
That was the ending of
the refactoring story.
Before I get to the moral,
I'd like to mention
the whole thing lives on Github.
It was committed step-by-step,
so if you wanna look
at any of the mechanics
or run the tests or anything,
just dig through the commit history.
It's all there.
I have a vague recollection of the night
when that code was written.
Let me rephrase that:
I vaguely recall writing it.
Yeah.
When I panic, I write godawful code.
Other people do yoga, or meditate,
or go white water rafting.
I refactor.
It soothes me.
Science—if you believe that sorta thing—
actually explains why.
The key player in the game
is working memory.
Working memory is the thing
that allows you to do mental arithmetic.
You store temporary results off in memory
as you work through
other parts of the problem.
The main differentiator in performance
of tasks that rely on working memory,
is the type of strategy
you're able to apply to a problem.
If you have better working memory
you can apply more complex,
more successful strategies.
Here's the deal:
when you worry in ways
that involve mental chatter,
you use up available slots
in your working memory.
You no longer have the swap space
necessary to use complex strategies
and you revert to less successful ones
like copying stuff off of Stack Overflow.
This is where refactoring comes in.
Refactoring makes you smarter.
Refactoring basically gives you
an exobrain.
You offload a bunch of those
little details—that
under normal circumstances
go into working memory
—into your tests.
Once you start refactoring,
you start reclaiming your brain.
It actively counteracts panic.
One of the really unexpected things
that happened when I started refactoring
simply for the act
of refactoring itself was
that I started optimizing
for happiness.
I would specifically put something
into its own class
just so that I could load
just that class and nothing else.
The feedback loop you get
when you have subsecond test suites
for the thing that you're working on
is unbelievable.
It's a huge enabler for flow.
Flow is nice.
Flow is more than nice.
It's like a bliss cocktail.
Selfishly isolating code
to improve my own happiness
turned out to be a pretty good deal
for the code as well.
Suddenly I was avoiding dependencies.
I was constantly asking myself
"Can this be a separate class?"
"How about a gem?"
"What is the unrelated subproblem here?"
It made a dramatic difference.
In summary, refactoring makes you smarter
by giving you an exobrain.
It is preventative and curative
with respect to panic.
If you optimize for developer happiness
by making your tests really fast,
you get loosely coupled code that adheres
to the Single Responsibility Principle.
Thank you.