Few things draw my attention like a looming deadline. Some part of my brain watches in morbid fascination as the deadline approaches wondering whether to call emergency services; or maybe just settle in, make popcorn, and watch the show. I can't help but keep a wary eye on it. A wary eye that is no longer paying attention to the code at hand. I need all the wary eyes I can get. Without them, forget best practices. I revert to less successful strategies like guessing; and desperately copying code off of Stack Overflow. Sometimes I'm happy. There are moments when the world melts away. Your sense of self dissipates. You become completely absorbed in what you're doing. Time appears to slow down and speed up simultaneously. Being awesome feels effortless. I've taken to coming into the office early in the morning and committing random acts of refactoring. Some people are calling this guilt-driven development. But really it's not. Refactoring just makes me happy. More refactoring is an optimization. So today I am going to tell you a story. The arrow needs to point the other way because this is a refactoring story. Anyway it has a middle... Actually it has a beginning, two middles and an end. And a moral. So to be clear, before I get started, when I say refactoring, I mean small careful steps that improve the structure and the readability of the code without changing its behavior. Tests are implied. So. Once upon a time there was an application that performed some incredibly dirty hacks in order to deliver data straight into a number of real-world, old-school, hardcopy publishing systems. I found this particular specimen in the dark recesses of that codebase. It was in a module that was 300-or-so lines long most of which was in a single method. This module talked to code all over the application. It dealt in temporary files, FTP; it shelled out; used Exif 2 to write metadata into JPEG files; it handled encoding. It had a comment in it that read: "A kitten dies every time this code is run." We had it running on a cron job. It had no tests and there was no documentation. This is essentially a bucket of business logic. It jumps through hoops in order to name files so that they end up where they need to be. Now there's a comment but it's almost as bad as the code. Also it's wrong. The input to the method is a target, which is the God object in this application. It's an ActiveRecord model. That class is 500 lines long. The big picture here is that information is being shoved onto a string in order to build up the file name. There is a bit of code that's not actually shovelling things onto the string. This draws the eye. It appears to be a chunk of something. And it's probably a good candidate for extraction. Another striking aspect of the code is that it's littered with a bunch of low-level details. It makes it harder to see what's going on. But amidst all the clutter there does seem to be things... chunks that could probably be named. So what are we looking at? We've got a very large method. It appears to be doing things at 2 different levels of abstraction. And nothing at the lower level of abstraction is being named. These are not characteristics of good code. The thing to do with big ugly code is to break it apart into small ugly code. There's a critical difference between breaking code down, and breaking code. The first middle of this story is gonna be about adding characterization tests to the method. And then the second middle is the actual refactoring. Where do you even begin testing something like this? We don't really know what the inputs look like. We certainly don't know what the output looks like. We do know that it's in production, and it appears to be working because we haven't had any complaints from the customer about it. The easiest way to discover inputs is to just send something— anything really—into the method and then see what comes back out. We need an assertion to work against. And again, it doesn't matter what that assertion is. It is going to be wrong. The nice thing about being wrong is it tells you what 'right' looks like. Running this fails, obviously. The error message tells us that we're missing an input. It also points us to the exact spot in the code where this is happening. What we find on line 6— it comes as no surprise at this point— the message publish_on is being sent to target and it returns a date. Meanwhile back in our tests, we know that the stub represents a target. And we are ready to start fleshing out those inputs. The one we need right now is publish_on, which we know is a date. And the failure tells us what our next input is. Digging around in the code reveals that xyz_category_prefix is a string. Next up is kind, which is also a string. personal is a boolean. id is an integer. And title is a string. And this gives us our first real failure. Meaning that we've gotten all the inputs right. These are the inputs. The messages that we need to send to target in order to build up a file name. And with the inputs in place, we're also given the output, which looks like this. And now all that practice copying-and-pasting from Stack Overflow comes in handy. Copy that string from the failure into the assertion and the test should pass. It doesn't. The fix for this is to use a regex. This gives us our first passing test. We're not quite done yet. There are a bunch of low-level details that aren't being exercised. And there are alternate paths through the method. We need to improve the inputs for these 3 lines of code and add test cases for the lines that have conditionals in them. At every step, failing tests tell us how to tweak our assertions. These are trivial to fix; boring to watch. So I'll show you some highlights. publish_on gets zero-padded, and pi doesn't actually get affected by this so we'll fall back on e. kind gsubs out underscores. We need an underscore so we can see it being removed. title appears to strip out everything. And the existing input doesn't have very much cruft in it. So we need some numbers and some funky characters and a couple of upper-case letters. There's something fishy about that regex. What's with the square brackets? I'm too lazy to reason about this so I'm adding a test case, overriding the stub's input for title, giving it some square brackets. And that proves that brackets get left in place. This is probably not the intended behavior. This test now serves as documentation until we can clarify this with the customer. The first conditional deals in personalization and our stub doesn't personalize. So we need a test for the case that actually does. We're informed about yet another input that's missing. It's trivial to supply. The other conditional on that line of code provides a fallback in case age is nil. And the test for this stubs out the same inputs as the previous one. The final conditional is a ternary statement that is trying to determine where to truncate the title. Our default input for the title is neither long nor short. So by making it longer we can make sure that it gets truncated. And then the only case left to test for is a title that is too short to get truncated. At this point we have protection against regressions. What have we actually accomplished so far? We took a piece of undocumented, untested code. And with a bit of hand-waving, we got fake assertions to give us the inputs. The inputs gave us the outputs. And the outputs gave us the real assertions. It's almost karmic. Then we had to reason about the code. We inspected every line. Made sure that we had inputs that would actually exercise those lines and then we made sure that every branch of every conditional was called from a test. There's no specification here. There's no design being done. We're getting a regression test suite. We are also getting documentation up to and including the fact that we probably have a bug. The biggest win at this point is that we can start changing things without breaking into a cold sweat. So here's what we're gonna do. We're gonna isolate the method. And then we're going to extract a bunch of smaller methods. I'm gonna borrow a detailed prescription from Martin Fowler's book Refactoring, called Replace Method with Method Object. Traditionally you do this refactoring when you have big computation and a bunch of temporary variables and you don't want to be passing those temporary variables around as you extract methods. Here, we pretty much only have this one target to worry about. So first of all, we need to make a home for this code. Add an initializer that takes the target. Add an attribute for it. And just copy-paste the whole thing in there. The method shouldn't be called on the class. It needs a better name. And it no longer takes any arguments. Back in the old module, we need to reference the new method from the old one, so just delete the whole body of the method, reference the new file, instantiate the class, pass in the target, call name on it, and we're green. Which means that we have a license to go to town on this code. Where to begin? You could start anywhere. I always like to delete something. Let's go ahead and get rid of that comment. And now somewhat arbitrarily, I've chosen to go from biggest to smallest. The biggest chunk that we've seen for extraction is the truncated title bit. The actual mechanics of a method extraction is as follows: Create an empty method and name it by what it does or what it is, not by how it does it. Coming up with a good name is often the hardest thing you'll ever do in a refactoring. The next step is to copy the lines of code from the source method into the new one. Scan the new method for local variables. filename here is declared in the source method, so we have a choice to make. We can either do the mutation here, which means passing the filename in, or making it an instance variable. Or we could make this a query method. Now the whole point of the refactoring is to separate the 2 levels of abstraction— building up the filename in one place and putting together all the little pieces in another. So we're gonna just return the value that we need. The other temporary variables here are all local to this method. We need to assign the result of our new query method and delete the temporary variables that are no longer used in the source method. And then we're green again. Before we move on I'd like to tidy this up a bit. There's some unnecessary work going on in the regex. We're doing a case-insensitive match, and then we're downcasing afterwards. That's kinda backwards. There's also something in the match-if brackets that drives me nuts. The spurious parentheses have got to go. Another thing that bugs me is the fact that we have a ternary statement to decide the range for the match-if. A match-if is not going to raise an exception. If you try to truncate a 4-character string to 9 characters, it'll just return 4 characters. The ternary statement can go. Since we no longer need truncate_to, we don't need to worry about the length of the title. So that can go too. And with those 2 lines gone, there seems very little point in having a temporary variable, so we can ditch that as well effectively leaving us with a single line of code in truncated_title. The longest line of code is now the hexdigest stuff. Now, hexdigest is a terrible name in this context. Copy stuff in. Assign the result. And we're green again. The personalization line is now the longest. Perform a quick extraction. publication_day and xyz_category_prefix. Inside this class, xyz is redundant. The fact that it's a prefix is irrelevant. So we can drop the xyz prefix as well as the prefix suffix and just call the new method category. And then we've got kind. This leaves us with a fairly readable method. We could totally leave it at this but there's still some spurious stuff in here. publication_day for instance. publication_day returns a string. So, why are we interpolating it? Same goes for category and kind. target.id is an integer but string interpolation implicitly calls to_s on it. This next move might seem a little bit subtle until you recognize that kids' game Five Differences. Which line of code here is different from all of the other pieces? The first line was the only one that's not being shovelled onto the string, so we can start off with an empty string and then make the first piece just like all the others. Up to a point. The extension is also different. If you conceptually separate this into 2 jobs— building up the filename, and then slapping the extension on it —we're left with 2 distinct sections: one where everything is smooshed together, and another where the pieces are separated by underscores. If we extract the smooshed together pieces, then all the remaining pieces are separated by an underscore. We can now shovel them into an array rather than a string. This gets rid of a bunch more of the interpolation syntax, joining them with an underscore. Done. Is it perfect? Of course not. Is it better? Hell yeah! We went from this to this in less than 30 steps. Let me just show you that again. Before. After. Before. After. So first we quarantined the method. Then we extracted stuff. Extracting stuff basically boils down to identifying a piece of code that performs a subtask and then giving that code a name. Back when we started out, we identified 3 code smells: a large method, 2 different levels of abstraction, unnamed abstractions. Abstracting methods addressed all 3 of those issues. At that point we still had some ragged edges though. So we kinda picked through it and removed pointless cruft. This concludes the second middle of the refactoring story and I'm gonna end by taking a closer look at pointless cruft. In the field of information design, chartjunk refers to visual elements in charts and graphs that add noise. They don't help you understand the data. They often get in the way of comprehension. To give you an idea of what we're talking about this graph gets just about everything wrong. This is the exact same data presented without chartjunk. Codejunk is the Ruby equivalent of chartjunk. This is not about coding practices. It's not about naming and SRP and small methods and DRY. It's about noise. I've tried to classify codejunk and I came up with this list. #10 Does a bear shit in the woods? Comments shouldn't echo the implementation. They shouldn't be wrong. They shouldn't be imprecise. And they shouldn't be misspelled. #9 It is my conviction that everyone should make their editor show them trailing whitespace. If you leave it in, you have noise in your code. If you then take it out, you have noise in your diffs, which is just wrong on so many levels. #8 If it's dead, let it rest in peace. What were you saving it for? You need to learn to let go. What is this doing in source control? #7 More parentheses. This is vile. When I see this, I secretly wonder if you wash your hands after you go to the bathroom. #6 Intelligent defaults. Yes. 4 characters saved are 4 characters that you can spend another day. Or something. Let's talk about dependencies. In particular, dependencies that you're not actually depending on. This may or may not impact performance. It probably will impact the performance of your tests. A whole different story. Mostly you just added noise to the code. #4 This is familiar territory. Recognize this? Not to beat a dead horse, but ew. #3 Don't do work that the computer is already doing for you. More string interpolation. I could've technicallly folded this into the previous one but I'm trying to get to 10. This is the same thing. The call to map is superfluous here. The match-if brackets don't need your help. We've seen that one. In this example the computer isn't doing the work for you but should be. #2 Test suites. They should be curated, maintained, fed, watered, coddled and minimized. The only exception I can think of is if documenting behavior outweighs the cost. Choose wisely. And finally, the compounding sin. You know the old saying, "1 + 1 = 3 for very large values of 1"? This is revolting. In fact, it bears a horrifying resemblance to the code that we started out with today. When people say that software is grown, not built, I'm pretty sure that this is what they're talking about. It's kinda like fungus. Earlier I said about the refactoring that after we extracted methods we kinda polished up the code a little. I'm gonna restate that in light of the previous section. First, we extracted methods. Then we eliminated codejunk. That was the ending of the refactoring story. Before I get to the moral, I'd like to mention the whole thing lives on Github. It was committed step-by-step, so if you wanna look at any of the mechanics or run the tests or anything, just dig through the commit history. It's all there. I have a vague recollection of the night when that code was written. Let me rephrase that: I vaguely recall writing it. Yeah. When I panic, I write godawful code. Other people do yoga, or meditate, or go white water rafting. I refactor. It soothes me. Science—if you believe that sorta thing— actually explains why. The key player in the game is working memory. Working memory is the thing that allows you to do mental arithmetic. You store temporary results off in memory as you work through other parts of the problem. The main differentiator in performance of tasks that rely on working memory, is the type of strategy you're able to apply to a problem. If you have better working memory you can apply more complex, more successful strategies. Here's the deal: when you worry in ways that involve mental chatter, you use up available slots in your working memory. You no longer have the swap space necessary to use complex strategies and you revert to less successful ones like copying stuff off of Stack Overflow. This is where refactoring comes in. Refactoring makes you smarter. Refactoring basically gives you an exobrain. You offload a bunch of those little details—that under normal circumstances go into working memory —into your tests. Once you start refactoring, you start reclaiming your brain. It actively counteracts panic. One of the really unexpected things that happened when I started refactoring simply for the act of refactoring itself was that I started optimizing for happiness. I would specifically put something into its own class just so that I could load just that class and nothing else. The feedback loop you get when you have subsecond test suites for the thing that you're working on is unbelievable. It's a huge enabler for flow. Flow is nice. Flow is more than nice. It's like a bliss cocktail. Selfishly isolating code to improve my own happiness turned out to be a pretty good deal for the code as well. Suddenly I was avoiding dependencies. I was constantly asking myself "Can this be a separate class?" "How about a gem?" "What is the unrelated subproblem here?" It made a dramatic difference. In summary, refactoring makes you smarter by giving you an exobrain. It is preventative and curative with respect to panic. If you optimize for developer happiness by making your tests really fast, you get loosely coupled code that adheres to the Single Responsibility Principle. Thank you.