-
Few things draw my attention
like a looming deadline.
-
Some part of my brain
watches in morbid fascination
-
as the deadline approaches
-
wondering whether to
call emergency services;
-
or maybe just settle in,
-
make popcorn,
and watch the show.
-
I can't help but keep a wary eye on it.
-
A wary eye that is no longer
paying attention to the code at hand.
-
I need all the wary eyes I can get.
-
Without them, forget best practices.
-
I revert to less successful strategies
-
like guessing;
-
and desperately copying code off
of Stack Overflow.
-
Sometimes I'm happy.
-
There are moments when
the world melts away.
-
Your sense of self dissipates.
-
You become
completely absorbed in what you're doing.
-
Time appears to slow down
and speed up simultaneously.
-
Being awesome feels effortless.
-
I've taken to coming into
the office early in the morning
-
and committing random acts of refactoring.
-
Some people are calling this
guilt-driven development.
-
But really it's not.
-
Refactoring just makes me happy.
-
More refactoring is an optimization.
-
So today I am going to tell you a story.
-
The arrow needs to point the other way
because this is a refactoring story.
-
Anyway it has a middle...
-
Actually it has a beginning,
two middles and an end.
-
And a moral.
-
So to be clear, before I get started,
-
when I say refactoring, I mean
-
small careful steps that improve the
structure and the readability of the code
-
without changing its behavior.
-
Tests are implied.
-
So.
-
Once upon a time
-
there was an application that
performed some incredibly dirty hacks
-
in order to deliver data straight
into a number of real-world,
-
old-school, hardcopy publishing systems.
-
I found this particular specimen
in the dark recesses of that codebase.
-
It was in a module that
was 300-or-so lines long
-
most of which was in a single method.
-
This module talked to code
all over the application.
-
It dealt in temporary files, FTP;
it shelled out;
-
used Exif 2 to write
metadata into JPEG files;
-
it handled encoding.
-
It had a comment in it that read:
-
"A kitten dies every time
this code is run."
-
We had it running on a cron job.
-
It had no tests
-
and there was no documentation.
-
This is essentially
a bucket of business logic.
-
It jumps through hoops
in order to name files
-
so that they end up where they need to be.
-
Now there's a comment
-
but it's almost as bad as the code.
-
Also it's wrong.
-
The input to the method is a target,
-
which is the God object
in this application.
-
It's an ActiveRecord model.
-
That class is 500 lines long.
-
The big picture here is
that information is being shoved
-
onto a string in order to
build up the file name.
-
There is a bit of code that's not actually
shovelling things onto the string.
-
This draws the eye.
-
It appears to be a chunk of something.
-
And it's probably a good
candidate for extraction.
-
Another striking aspect of the code
is that it's littered
-
with a bunch of low-level details.
-
It makes it harder to see what's going on.
-
But amidst all the clutter
-
there does seem to be things...
-
chunks that could probably be named.
-
So what are we looking at?
-
We've got a very large method.
-
It appears to be doing things
at 2 different levels of abstraction.
-
And nothing at the lower level
of abstraction is being named.
-
These are not
characteristics of good code.
-
The thing to do with big ugly code
is to break it apart into small ugly code.
-
There's a critical difference
between breaking code down,
-
and breaking code.
-
The first middle of this story
is gonna be about adding
-
characterization tests to the method.
-
And then the second middle is
the actual refactoring.
-
Where do you even begin
testing something like this?
-
We don't really know
what the inputs look like.
-
We certainly don't know
what the output looks like.
-
We do know that it's in production,
-
and it appears to be working
-
because we haven't had any complaints
from the customer about it.
-
The easiest way to discover inputs
is to just send something—
-
anything really—into the method
-
and then see what comes back out.
-
We need an assertion to work against.
-
And again, it doesn't matter
what that assertion is.
-
It is going to be wrong.
-
The nice thing about being wrong
is it tells you what 'right' looks like.
-
Running this fails, obviously.
-
The error message tells us
that we're missing an input.
-
It also points us to the exact spot
in the code where this is happening.
-
What we find on line 6—
it comes as no surprise at this point—
-
the message publish_on
is being sent to target
-
and it returns a date.
-
Meanwhile back in our tests,
-
we know that the stub represents a target.
-
And we are ready to start
fleshing out those inputs.
-
The one we need right now is publish_on,
-
which we know is a date.
-
And the failure tells us
what our next input is.
-
Digging around in the code reveals
-
that xyz_category_prefix is a string.
-
Next up is kind, which is also a string.
-
personal is a boolean.
-
id is an integer.
-
And title is a string.
-
And this gives us our first real failure.
-
Meaning that we've gotten
all the inputs right.
-
These are the inputs.
-
The messages that we need
to send to target
-
in order to build up a file name.
-
And with the inputs in place,
-
we're also given the output,
-
which looks like this.
-
And now all that practice
copying-and-pasting from Stack Overflow
-
comes in handy.
-
Copy that string from the failure into
the assertion and the test should pass.
-
It doesn't.
-
The fix for this is to use a regex.
-
This gives us our first passing test.
-
We're not quite done yet.
-
There are a bunch of low-level details
that aren't being exercised.
-
And there are alternate paths
through the method.
-
We need to improve the inputs
for these 3 lines of code
-
and add test cases for the lines
that have conditionals in them.
-
At every step, failing tests tell us
how to tweak our assertions.
-
These are trivial to fix; boring to watch.
-
So I'll show you some highlights.
-
publish_on gets zero-padded,
-
and pi doesn't actually
get affected by this
-
so we'll fall back on e.
-
kind gsubs out underscores.
-
We need an underscore
so we can see it being removed.
-
title appears to strip out everything.
-
And the existing input
doesn't have very much cruft in it.
-
So we need some numbers
and some funky characters
-
and a couple of upper-case letters.
-
There's something fishy about that regex.
-
What's with the square brackets?
-
I'm too lazy to reason about this
-
so I'm adding a test case,
-
overriding the stub's input for title,
-
giving it some square brackets.
-
And that proves that
brackets get left in place.
-
This is probably not
the intended behavior.
-
This test now serves as documentation
-
until we can clarify this
with the customer.
-
The first conditional deals in
personalization
-
and our stub doesn't personalize.
-
So we need a test for the case
that actually does.
-
We're informed about yet another input
that's missing.
-
It's trivial to supply.
-
The other conditional on that line of code
provides a fallback in case age is nil.
-
And the test for this stubs out
the same inputs as the previous one.
-
The final conditional is
a ternary statement
-
that is trying to determine
where to truncate the title.
-
Our default input for the title is
neither long nor short.
-
So by making it longer
we can make sure that it gets truncated.
-
And then the only case left to test for
is a title that is too short
-
to get truncated.
-
At this point we have
protection against regressions.
-
What have we actually accomplished so far?
-
We took a piece of
undocumented, untested code.
-
And with a bit of hand-waving,
-
we got fake assertions
to give us the inputs.
-
The inputs gave us the outputs.
-
And the outputs
gave us the real assertions.
-
It's almost karmic.
-
Then we had to reason about the code.
-
We inspected every line.
-
Made sure that we had inputs
that would actually exercise those lines
-
and then we made sure that
every branch of every conditional
-
was called from a test.
-
There's no specification here.
-
There's no design being done.
-
We're getting a regression test suite.
-
We are also getting documentation
up to and including the fact
-
that we probably have a bug.
-
The biggest win at this point
is that we can start changing things
-
without breaking into a cold sweat.
-
So here's what we're gonna do.
-
We're gonna isolate the method.
-
And then we're going to
extract a bunch of smaller methods.
-
I'm gonna borrow a detailed prescription
from Martin Fowler's book Refactoring,
-
called Replace Method with Method Object.
-
Traditionally you do this refactoring
when you have big computation
-
and a bunch of temporary variables
-
and you don't want to be passing
those temporary variables around
-
as you extract methods.
-
Here, we pretty much only have
this one target to worry about.
-
So first of all,
we need to make a home for this code.
-
Add an initializer that takes the target.
-
Add an attribute for it.
-
And just copy-paste
the whole thing in there.
-
The method shouldn't be called
on the class.
-
It needs a better name.
-
And it no longer takes any arguments.
-
Back in the old module,
-
we need to reference the new method
from the old one,
-
so just delete the whole body
of the method,
-
reference the new file,
-
instantiate the class,
-
pass in the target,
call name on it,
-
and we're green.
-
Which means that we have
a license to go to town on this code.
-
Where to begin?
-
You could start anywhere.
-
I always like to delete something.
-
Let's go ahead and
get rid of that comment.
-
And now somewhat arbitrarily,
-
I've chosen to go
from biggest to smallest.
-
The biggest chunk
that we've seen for extraction
-
is the truncated title bit.
-
The actual mechanics of
a method extraction is as follows:
-
Create an empty method
and name it by what it does
-
or what it is,
-
not by how it does it.
-
Coming up with a good name
is often the hardest thing
-
you'll ever do in a refactoring.
-
The next step is to copy the lines of code
from the source method into the new one.
-
Scan the new method for local variables.
-
filename here is declared
in the source method,
-
so we have a choice to make.
-
We can either do the mutation here,
-
which means passing the filename in,
-
or making it an instance variable.
-
Or we could make this a query method.
-
Now the whole point of the refactoring is
to separate the 2 levels of abstraction—
-
building up the filename in one place
-
and putting together all the little pieces
in another.
-
So we're gonna just return
the value that we need.
-
The other temporary variables here
are all local to this method.
-
We need to assign the result
of our new query method
-
and delete the temporary variables
that are no longer used
-
in the source method.
-
And then we're green again.
-
Before we move on
-
I'd like to tidy this up a bit.
-
There's some unnecessary work
going on in the regex.
-
We're doing a case-insensitive match,
-
and then we're downcasing afterwards.
-
That's kinda backwards.
-
There's also something in the
match-if brackets that drives me nuts.
-
The spurious parentheses have got to go.
-
Another thing that bugs me is
the fact that we have a ternary statement
-
to decide the range for the match-if.
-
A match-if is not going
to raise an exception.
-
If you try to truncate a 4-character
string to 9 characters,
-
it'll just return 4 characters.
-
The ternary statement can go.
-
Since we no longer need truncate_to,
-
we don't need to worry
about the length of the title.
-
So that can go too.
-
And with those 2 lines gone,
-
there seems very little point
in having a temporary variable,
-
so we can ditch that as well
-
effectively leaving us with
a single line of code in truncated_title.
-
The longest line of code
is now the hexdigest stuff.
-
Now, hexdigest is a terrible name
in this context.
-
Copy stuff in.
-
Assign the result.
-
And we're green again.
-
The personalization line
is now the longest.
-
Perform a quick extraction.
-
publication_day
-
and xyz_category_prefix.
-
Inside this class, xyz is redundant.
-
The fact that it's a prefix is irrelevant.
-
So we can drop the xyz prefix
-
as well as the prefix suffix
-
and just call the new method category.
-
And then we've got kind.
-
This leaves us with
a fairly readable method.
-
We could totally leave it at this
-
but there's still
some spurious stuff in here.
-
publication_day for instance.
-
publication_day returns a string.
-
So, why are we interpolating it?
-
Same goes for category
-
and kind.
-
target.id is an integer
-
but string interpolation
implicitly calls to_s on it.
-
This next move might seem
a little bit subtle
-
until you recognize that kids' game
Five Differences.
-
Which line of code here is different
from all of the other pieces?
-
The first line was the only one that's
not being shovelled onto the string,
-
so we can start off with an empty string
-
and then make the first piece
just like all the others.
-
Up to a point.
-
The extension is also different.
-
If you conceptually separate
this into 2 jobs—
-
building up the filename,
-
and then slapping the extension on it
-
—we're left with 2 distinct sections:
-
one where everything is smooshed together,
-
and another where the pieces
are separated by underscores.
-
If we extract
the smooshed together pieces,
-
then all the remaining pieces
are separated by an underscore.
-
We can now shovel them into an array
rather than a string.
-
This gets rid of a bunch more
of the interpolation syntax,
-
joining them with an underscore.
-
Done.
-
Is it perfect?
-
Of course not.
-
Is it better?
-
Hell yeah!
-
We went from this to this
in less than 30 steps.
-
Let me just show you that again.
-
Before.
-
After.
-
Before.
-
After.
-
So first we quarantined the method.
-
Then we extracted stuff.
-
Extracting stuff basically boils down to
-
identifying a piece of code
that performs a subtask
-
and then giving that code a name.
-
Back when we started out,
-
we identified 3 code smells:
-
a large method,
-
2 different levels of abstraction,
-
unnamed abstractions.
-
Abstracting methods addressed
all 3 of those issues.
-
At that point we still had
some ragged edges though.
-
So we kinda picked through it
-
and removed pointless cruft.
-
This concludes the second middle
of the refactoring story
-
and I'm gonna end by taking a closer look
at pointless cruft.
-
In the field of information design,
-
chartjunk refers to visual elements
in charts and graphs
-
that add noise.
-
They don't help you understand the data.
-
They often get in the way of
comprehension.
-
To give you an idea of
what we're talking about
-
this graph gets just about
everything wrong.
-
This is the exact same data
presented without chartjunk.
-
Codejunk is the Ruby equivalent
of chartjunk.
-
This is not about coding practices.
-
It's not about naming and SRP
and small methods and DRY.
-
It's about noise.
-
I've tried to classify codejunk
and I came up with this list.
-
#10
-
Does a bear shit in the woods?
-
Comments shouldn't echo
the implementation.
-
They shouldn't be wrong.
-
They shouldn't be imprecise.
-
And they shouldn't be misspelled.
-
#9
-
It is my conviction
-
that everyone should make their editor
show them trailing whitespace.
-
If you leave it in,
you have noise in your code.
-
If you then take it out,
you have noise in your diffs,
-
which is just wrong on so many levels.
-
#8 If it's dead, let it rest in peace.
-
What were you saving it for?
-
You need to learn to let go.
-
What is this doing in source control?
-
#7
-
More parentheses.
-
This is vile.
-
When I see this,
I secretly wonder if you
-
wash your hands after you
go to the bathroom.
-
#6 Intelligent defaults.
-
Yes. 4 characters saved are 4 characters
that you can spend another day.
-
Or something.
-
Let's talk about dependencies.
-
In particular, dependencies that
you're not actually depending on.
-
This may or may not impact performance.
-
It probably will impact the performance
of your tests.
-
A whole different story.
-
Mostly you just added noise to the code.
-
#4 This is familiar territory.
-
Recognize this?
-
Not to beat a dead horse,
-
but ew.
-
#3 Don't do work that the computer
is already doing for you.
-
More string interpolation.
-
I could've technicallly folded this into
the previous one
-
but I'm trying to get to 10.
-
This is the same thing.
-
The call to map is superfluous here.
-
The match-if brackets
don't need your help.
-
We've seen that one.
-
In this example the computer isn't
doing the work for you but should be.
-
#2 Test suites.
-
They should be curated, maintained, fed,
watered, coddled and minimized.
-
The only exception I can think of is if
documenting behavior outweighs the cost.
-
Choose wisely.
-
And finally, the compounding sin.
-
You know the old saying,
"1 + 1 = 3 for very large values of 1"?
-
This is revolting.
-
In fact, it bears a horrifying resemblance
to the code that
-
we started out with today.
-
When people say that
software is grown, not built,
-
I'm pretty sure that this
is what they're talking about.
-
It's kinda like fungus.
-
Earlier I said about the refactoring
that after we extracted methods
-
we kinda polished up the code a little.
-
I'm gonna restate that
in light of the previous section.
-
First, we extracted methods.
-
Then we eliminated codejunk.
-
That was the ending of
the refactoring story.
-
Before I get to the moral,
I'd like to mention
-
the whole thing lives on Github.
-
It was committed step-by-step,
-
so if you wanna look
at any of the mechanics
-
or run the tests or anything,
-
just dig through the commit history.
It's all there.
-
I have a vague recollection of the night
when that code was written.
-
Let me rephrase that:
-
I vaguely recall writing it.
-
Yeah.
-
When I panic, I write godawful code.
-
Other people do yoga, or meditate,
or go white water rafting.
-
I refactor.
-
It soothes me.
-
Science—if you believe that sorta thing—
-
actually explains why.
-
The key player in the game
is working memory.
-
Working memory is the thing
that allows you to do mental arithmetic.
-
You store temporary results off in memory
as you work through
-
other parts of the problem.
-
The main differentiator in performance
of tasks that rely on working memory,
-
is the type of strategy
you're able to apply to a problem.
-
If you have better working memory
-
you can apply more complex,
more successful strategies.
-
Here's the deal:
-
when you worry in ways
that involve mental chatter,
-
you use up available slots
in your working memory.
-
You no longer have the swap space
necessary to use complex strategies
-
and you revert to less successful ones
like copying stuff off of Stack Overflow.
-
This is where refactoring comes in.
-
Refactoring makes you smarter.
-
Refactoring basically gives you
an exobrain.
-
You offload a bunch of those
little details—that
-
under normal circumstances
go into working memory
-
—into your tests.
-
Once you start refactoring,
you start reclaiming your brain.
-
It actively counteracts panic.
-
One of the really unexpected things
that happened when I started refactoring
-
simply for the act
of refactoring itself was
-
that I started optimizing
for happiness.
-
I would specifically put something
into its own class
-
just so that I could load
just that class and nothing else.
-
The feedback loop you get
when you have subsecond test suites
-
for the thing that you're working on
is unbelievable.
-
It's a huge enabler for flow.
-
Flow is nice.
-
Flow is more than nice.
-
It's like a bliss cocktail.
-
Selfishly isolating code
to improve my own happiness
-
turned out to be a pretty good deal
for the code as well.
-
Suddenly I was avoiding dependencies.
-
I was constantly asking myself
-
"Can this be a separate class?"
-
"How about a gem?"
-
"What is the unrelated subproblem here?"
-
It made a dramatic difference.
-
In summary, refactoring makes you smarter
by giving you an exobrain.
-
It is preventative and curative
with respect to panic.
-
If you optimize for developer happiness
by making your tests really fast,
-
you get loosely coupled code that adheres
to the Single Responsibility Principle.
-
Thank you.