Return to Video

5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives

  • 0:00 - 0:01
    So we spent a bunch of time
  • 0:01 - 0:03
    in the last couple of lectures
  • 0:03 - 0:05
    talking about different kinds of testing
  • 0:05 - 0:08
    about unit testing versus integration testing
  • 0:08 - 0:10
    We talked about how do you use RSpec
  • 0:10 - 0:12
    to really isolate the parts of your code you want to test
  • 0:12 - 0:14
    you’ve also, you know, because of homework 3,
  • 0:14 - 0:18
    and other stuff, we have been doing BDD,
  • 0:18 - 0:20
    where we’ve been using Cucumber to turn user stories
  • 0:20 - 0:22
    into, essentially, integration and acceptance tests
  • 0:22 - 0:25
    So you’ve seen testing in a couple of different levels
  • 0:25 - 0:27
    and the goal here is sort of to do a few remarks
  • 0:27 - 0:29
    to, you know, let’s back up a little bit
  • 0:29 - 0:33
    and see the big picture, and tie those things together
  • 0:33 - 0:34
    So this sort of spans material
  • 0:34 - 0:37
    that covers three or four sections in the book
  • 0:37 - 0:39
    and I want to just hit the high points in lecture
  • 0:39 - 0:41
    So a question that comes up
  • 0:41 - 0:43
    I’m sure it’s come up for all of you
  • 0:43 - 0:44
    as you have been doing homework
  • 0:44 - 0:45
    is: “How much testing is enough?”
  • 0:45 - 0:48
    And, sadly, for a long time
  • 0:48 - 0:51
    kind of if you asked this question in industry
  • 0:51 - 0:52
    the answer was basically
  • 0:52 - 0:53
    “Well, we have a shipping deadline,
  • 0:53 - 0:54
    so however much testing we can do
  • 0:54 - 0:56
    before that deadline, that’s how much.”
  • 0:56 - 0:58
    That’s what you have time for.
  • 0:58 - 1:00
    So, you know, that’s a little flip
  • 1:00 - 1:01
    obviously not very good
  • 1:01 - 1:02
    So you can do a bit better, right?
  • 1:02 - 1:03
    There’re some static measures
  • 1:03 - 1:06
    like how many lines of code does your app have
  • 1:06 - 1:08
    and how many lines of tests do you have?
  • 1:08 - 1:10
    And it’s not unusual in industry
  • 1:10 - 1:12
    in a well-tested piece of software
  • 1:12 - 1:14
    for the number of lines of tests
  • 1:14 - 1:17
    to go far beyond the number of lines of code
  • 1:17 - 1:19
    So, integer multiples are not unusual
  • 1:19 - 1:21
    And I think even for sort of, you know,
  • 1:21 - 1:23
    research code or classwork
  • 1:23 - 1:26
    a ratio of, you know, maybe 1.5 is not unreasonable
  • 1:26 - 1:30
    so one and a half times the amount of test code
  • 1:30 - 1:32
    as you have application code
  • 1:32 - 1:34
    And in a lot of production systems
  • 1:34 - 1:35
    where they really care about testing
  • 1:35 - 1:36
    it is much higher than that
  • 1:36 - 1:38
    So maybe a better question to ask:
  • 1:38 - 1:39
    Rather than saying “How much testing is enough?”
  • 1:39 - 1:42
    is to ask “How good is the testing I am doing now?
  • 1:42 - 1:44
    How thorough is it?”
  • 1:44 - 1:45
    Later in this semester
  • 1:45 - 1:46
    Professor Sen will talk about
  • 1:46 - 1:48
    a little bit about formal methods
  • 1:48 - 1:50
    and sort of what’s at the frontiers of testing and debugging
  • 1:50 - 1:52
    But a couple of things that we can talk about
  • 1:52 - 1:54
    based on what you already know
  • 1:54 - 1:57
    is some basic concepts about test coverage
  • 1:57 - 1:59
    And although I would say
  • 1:59 - 2:01
    you know, we’ve been saying all along
  • 2:01 - 2:03
    formal methods, they don’t really work on big systems
  • 2:03 - 2:05
    I think that statement, in my personal opinion
  • 2:05 - 2:07
    is actually a lot less true than it used to be
  • 2:07 - 2:09
    I think there are a number of specific places
  • 2:09 - 2:10
    especially in testing and debugging
  • 2:10 - 2:12
    where formal methods are actually making fast progress
  • 2:12 - 2:15
    and Koushik Sen is one of the leaders in that
  • 2:15 - 2:17
    So you’ll have the opportunity to hear more about that later
  • 2:17 - 2:21
    but for the moment I think, kind of bread and butter
  • 2:21 - 2:22
    is let’s talk about coverage measurement
  • 2:22 - 2:24
    because this is where the rubber meets the road
  • 2:24 - 2:26
    in terms of how you’d be evaluated
  • 2:26 - 2:28
    if you are doing this for real
  • 2:28 - 2:29
    So what’s some basics?
  • 2:29 - 2:30
    Here’s a really simple class you can use
  • 2:30 - 2:32
    to talk about different ways to measure
  • 2:32 - 2:34
    how our test covers this code
  • 2:34 - 2:36
    And there’re a few different levels
  • 2:36 - 2:37
    with different terminologies
  • 2:37 - 2:40
    It’s not really universal across all software houses
  • 2:40 - 2:42
    But one common set of terminology
  • 2:42 - 2:43
    that the book exposes
  • 2:43 - 2:44
    is we could talk about S0
  • 2:44 - 2:47
    where we’d just mean you’ve called every method once
  • 2:47 - 2:50
    So you know, if you call foo, and you call bar, you’re done
  • 2:50 - 2:52
    That’s S0 coverage: not terribly thorough
  • 2:52 - 2:54
    A little more stringent, S1, is
  • 2:54 - 2:56
    you could say, we’re calling every method
  • 2:56 - 2:57
    from every place that it could be called
  • 2:57 - 2:58
    So what does that mean?
  • 2:58 - 3:00
    It means, for example
  • 3:00 - 3:01
    it’s not enough to call bar
  • 3:01 - 3:02
    You have to make sure that you have to call it
  • 3:02 - 3:05
    at least once from in here
  • 3:05 - 3:07
    as well as calling it once
  • 3:07 - 3:10
    from any exterior function that might call it
  • 3:10 - 3:12
    C0 which is what SimpleCov measures
  • 3:12 - 3:15
    (those of you who’ve gotten SimpleCov up and running)
  • 3:15 - 3:18
    basically says you’ve executed every statement
  • 3:18 - 3:20
    you’ve touched every statement in your code once
  • 3:20 - 3:22
    But the caveat there is that
  • 3:22 - 3:25
    conditionals really just count as a single statement
  • 3:25 - 3:28
    So, if you, no matter which branch of this “if” you took
  • 3:28 - 3:31
    as long as you touched one of the other branch
  • 3:31 - 3:33
    you’ve executed the “if’ statement
  • 3:33 - 3:35
    So even C0 is still, you know, sort of superficial coverage
  • 3:35 - 3:37
    But, as we will see
  • 3:37 - 3:39
    the way that you will want to read this information is:
  • 3:39 - 3:41
    if you are getting bad coverage at the C0 level
  • 3:41 - 3:44
    then you have really really bad coverage
  • 3:44 - 3:46
    So if you are not kind of making
  • 3:46 - 3:47
    this simple level of superficial coverage
  • 3:47 - 3:50
    then your testing is probably deficient
  • 3:50 - 3:51
    C1 is the next step up from that
  • 3:51 - 3:53
    We could say:
  • 3:53 - 3:55
    Well, we have to take every branch in both directions
  • 3:55 - 3:56
    So, when we are doing this “if” statement
  • 3:56 - 3:58
    we have to make sure that
  • 3:58 - 3:59
    we do the “if x” part once
  • 3:59 - 4:05
    and the “if not x” part at least once to meet C1
  • 4:05 - 4:08
    You can augment that with decision coverage
  • 4:08 - 4:09
    saying: Well, if we’re gonna…
  • 4:09 - 4:12
    If we have “if” statments where the condition
  • 4:12 - 4:13
    is made up of multiple terms
  • 4:13 - 4:15
    we have to make sure that every subexpression
  • 4:15 - 4:17
    has been evaluated both directions
  • 4:17 - 4:19
    In other words, that means that
  • 4:19 - 4:22
    if we’re going to fail this “if” statement
  • 4:22 - 4:24
    we have to make sure to fail it at least once
  • 4:24 - 4:26
    because y was false in at least once because z was false
  • 4:26 - 4:28
    In other words, any subexpression that could
  • 4:28 - 4:31
    independently change the outcome of the condition
  • 4:31 - 4:34
    has to be exercised in both directions
  • 4:34 - 4:36
    And then,
  • 4:36 - 4:38
    kind of, the one that, you know, a lot of people aspire to
  • 4:38 - 4:41
    but there is disagreement on how much more valuable it is
  • 4:41 - 4:42
    is you take every path through the code
  • 4:42 - 4:45
    Obviously, this is kind of difficult because
  • 4:45 - 4:48
    it tends to be exponential in the number of conditions
  • 4:48 - 4:53
    And in general it’s difficult
  • 4:53 - 4:55
    to evaluate if you’ve taken every path through the code
  • 4:55 - 4:57
    There are formal techniques that you can use
  • 4:57 - 4:58
    to tell you where the holes are
  • 4:58 - 5:01
    but the bottom line is that
  • 5:01 - 5:03
    in most commercial software houses
  • 5:03 - 5:04
    there is, I would say, not complete consensus
  • 5:04 - 5:06
    on how much more valuable C2 is
  • 5:06 - 5:08
    compared to C0 or C1
  • 5:08 - 5:10
    So, I think, for the purpose of our class
  • 5:10 - 5:11
    you get exposed to the idea
  • 5:11 - 5:13
    of how you use coverage information
  • 5:13 - 5:16
    SimpleCov takes advantage of some built-in Ruby features
  • 5:16 - 5:18
    to give you C0 coverage
  • 5:18 - 5:19
    [It] does really nice reports
  • 5:19 - 5:21
    We can sort of see it
  • 5:21 - 5:22
    at the level of individual lines in your file
  • 5:22 - 5:24
    You can see what your coverage is
  • 5:24 - 5:27
    and I think that’s kind of a, you know
  • 5:27 - 5:31
    a good start for where we are
  • 5:31 - 5:33
    So, having see a sort of different flavours of tests
  • 5:33 - 5:37
    Stepping back and looking back at the big picture
  • 5:37 - 5:38
    what are the different kind of tests
  • 5:38 - 5:40
    that we’ve seen concretely?
  • 5:40 - 5:42
    and what are the tradeoffs
  • 5:42 - 5:43
    between using those different kinds of tests?
  • 5:43 - 5:47
    So we’ve seen at the level of individual classes or methods
  • 5:47 - 5:50
    we use RSpec, with extensive use of mocking and stubbing
  • 5:50 - 5:53
    So, for example when we do testing methods in the model
  • 5:53 - 5:55
    that will be an example of unit testing
  • 5:55 - 5:59
    We also did something that is pretty similar to
  • 5:59 - 6:00
    functional or module testing
  • 6:00 - 6:02
    where there is more than one module participating
  • 6:02 - 6:04
    So, for example when we did controller specs
  • 6:04 - 6:07
    we saw that—we simulate a POST action
  • 6:07 - 6:09
    but remember that the POST action
  • 6:09 - 6:10
    has to go through the routing subsystem
  • 6:10 - 6:12
    before it gets to the controller
  • 6:12 - 6:14
    Once the controller is done it will try to render a view
  • 6:14 - 6:16
    So in fact there’s other pieces
  • 6:16 - 6:17
    that collaborate with the controller
  • 6:17 - 6:19
    that have to be working in order for controller specs to pass
  • 6:19 - 6:21
    So that’s somewhere inbetween:
  • 6:21 - 6:23
    where we’re doing more than a single method
  • 6:23 - 6:25
    touching more than a single class
  • 6:25 - 6:27
    but we’re still concentrating [our] attention
  • 6:27 - 6:28
    on a fairly narrow slice of the system at a time
  • 6:28 - 6:31
    and we’re still using mocking and stubbing extensively
  • 6:31 - 6:35
    to sort of isolate that behaviour that we want to test
  • 6:35 - 6:36
    And then at the level of Cucumber scenarios
  • 6:36 - 6:38
    these are more like integration or system tests
  • 6:38 - 6:41
    They exercise complete paths throughout the application
  • 6:41 - 6:43
    They probably touch a lot of different modules
  • 6:43 - 6:46
    They make minimal use of mocks and stubs
  • 6:46 - 6:48
    because part of the goal of an integration test
  • 6:48 - 6:50
    is exactly to test the interaction between pieces
  • 6:50 - 6:53
    So you don’t want to stub or control those interactions
  • 6:53 - 6:54
    You actually want to let the system do
  • 6:54 - 6:56
    what it would really do
  • 6:56 - 6:58
    if this was a scenario happening in production
  • 6:58 - 7:00
    So how would we compare these different kinds of tests?
  • 7:00 - 7:02
    There’s a few different axes we can look at
  • 7:02 - 7:05
    One of them is how long they take to run
  • 7:05 - 7:06
    Now, both RSpec and Cucumber
  • 7:06 - 7:09
    have, kind of, high startup times and stuff like that
  • 7:09 - 7:10
    But, as you’ll see
  • 7:10 - 7:11
    as you start adding more and more RSpec tests
  • 7:11 - 7:14
    and using autotest to run them in the background
  • 7:14 - 7:17
    by and large, once RSpec kind of gets off the launching pad
  • 7:17 - 7:19
    it runs specs really fast
  • 7:19 - 7:21
    whereas running Cucumber features just takes a long time
  • 7:21 - 7:24
    as it essentially fires up your entire application
  • 7:24 - 7:26
    And later in this semester
  • 7:26 - 7:28
    we’ll see a way to make Cucumber even slower—
  • 7:28 - 7:30
    which is to have it fire up an entire browser
  • 7:30 - 7:33
    basically act like a puppet, remote-controlling Firefox
  • 7:33 - 7:35
    so you can test Javascript code
  • 7:35 - 7:37
    We’ll do that when we actually—
  • 7:37 - 7:40
    I think we’ll be able to work with our friends at SourceLabs
  • 7:40 - 7:42
    so you can do that in the cloud—That will be exciting
  • 7:42 - 7:45
    So, “run fast” versus “run slow”
  • 7:45 - 7:46
    Resolution:
  • 7:46 - 7:48
    If an error happens in your unit tests
  • 7:48 - 7:49
    it’s usually pretty easy
  • 7:49 - 7:52
    to figure out and track down what the source of that error is
  • 7:52 - 7:53
    because the tests are so isolated
  • 7:53 - 7:56
    You’ve stubbed out everything that doesn’t matter
  • 7:56 - 7:58
    and you’re focusing on only the behaviour of interest
  • 7:58 - 7:59
    So, if you’ve done a good job of doing that
  • 7:59 - 8:01
    when something goes wrong in one of your tests
  • 8:01 - 8:03
    there’s not a lot of places
  • 8:03 - 8:04
    that something could have gone wrong
  • 8:04 - 8:07
    In contrast, if you’re running a Cucumber scenario
  • 8:07 - 8:08
    that’s got, you know, 10 steps
  • 8:08 - 8:10
    and every step is touching
  • 8:10 - 8:11
    a whole bunch of pieces of the app
  • 8:11 - 8:12
    it could take a long time
  • 8:12 - 8:14
    to actually get to the bottom of a bug
  • 8:14 - 8:16
    So it is kind of a tradeoff
  • 8:16 - 8:17
    between how well can you localize errors
  • 8:17 - 8:20
    Coverage:
  • 8:20 - 8:23
    It’s possible if you write a good suite
  • 8:23 - 8:24
    of unit and functional tests
  • 8:24 - 8:26
    you can get really high coverage
  • 8:26 - 8:27
    You can run your SimpleCov report
  • 8:27 - 8:30
    and you can actually identify specific lines in your files
  • 8:30 - 8:32
    that have not been exercised by any test
  • 8:32 - 8:34
    and then you can go right at tests that cover them
  • 8:34 - 8:36
    So, figuring out how to improve your coverage
  • 8:36 - 8:37
    for example at the C0 level
  • 8:37 - 8:40
    is something much more easily done with unit tests
  • 8:40 - 8:42
    whereas, with a Cucumber test—
  • 8:42 - 8:43
    with a Cucumber scenario—
  • 8:43 - 8:45
    you are touching a lot of parts of the code
  • 8:45 - 8:47
    but you are doing it very sparsely
  • 8:47 - 8:49
    So, if your goal is to get your coverage up
  • 8:49 - 8:51
    use the tools at that are at the unit levels
  • 8:51 - 8:53
    so that you can focusing on understanding
  • 8:53 - 8:54
    what parts or my code are undertested
  • 8:54 - 8:56
    and then you can write very targeted tests
  • 8:56 - 8:58
    just to focus on them
  • 8:58 - 9:01
    And, sort of, you know, putting those pieces together
  • 9:01 - 9:03
    the unit tests
  • 9:03 - 9:05
    because of their isolation and their fine resolution
  • 9:05 - 9:07
    tend to use a lot of mocks
  • 9:07 - 9:09
    to isolate the behaviours you don’t care about
  • 9:09 - 9:11
    But that means that, by definition
  • 9:11 - 9:12
    you’re not testing the interfaces
  • 9:12 - 9:14
    and it’s sort of a “received wisdom” in software
  • 9:14 - 9:16
    that a lot of the interesting bugs
  • 9:16 - 9:18
    occur at the interfaces between pieces
  • 9:18 - 9:20
    and not sort of within a class or within a method—
  • 9:20 - 9:22
    those are sort of the easy bugs to track down
  • 9:22 - 9:24
    And at the other extreme
  • 9:24 - 9:26
    the more you get towards the integration testing extreme
  • 9:26 - 9:29
    you’re supposed to rely less and less on mocks
  • 9:29 - 9:30
    for that exact reason
  • 9:30 - 9:32
    Now we saw, if you’re testing something like
  • 9:32 - 9:34
    say, in a service-oriented architecture
  • 9:34 - 9:35
    where you have to interact with the remote site
  • 9:35 - 9:37
    you still end up
  • 9:37 - 9:38
    having to do a fair amount of mocking and stubbing
  • 9:38 - 9:40
    so that you don’t rely on the Internet
  • 9:40 - 9:41
    in order for your tests to pass
  • 9:41 - 9:43
    but, generally speaking
  • 9:43 - 9:47
    you’re trying to remove as many of the mocks that you can
  • 9:47 - 9:48
    and let the system run the way it would run in real life
  • 9:48 - 9:52
    So, the good news is you are testing the interfaces
  • 9:52 - 9:54
    but when something goes wrong in one of the interfaces
  • 9:54 - 9:57
    because your resolution is not as good
  • 9:57 - 10:00
    it may take longer to figure out what it is
  • 10:00 - 10:05
    So, what’s sort of the high-order bit from this tradeoff
  • 10:05 - 10:07
    is you don’t really want to rely
  • 10:07 - 10:08
    too heavily on any one kind of test
  • 10:08 - 10:10
    They serve different purposes and, depending on
  • 10:10 - 10:13
    are you trying to exercise your interfaces more
  • 10:13 - 10:15
    or are you trying to improve your fine-grain coverage
  • 10:15 - 10:18
    that affects how you develop your test suite
  • 10:18 - 10:20
    and you’ll evolve it along with your software
  • 10:20 - 10:24
    So, we’ve used a certain set of terminology in testing
  • 10:24 - 10:26
    It’s the terminology that, by and large
  • 10:26 - 10:29
    is most commonly used in the Rails community
  • 10:29 - 10:30
    but there’s some variation
  • 10:30 - 10:33
    [and] some other terms that you might hear
  • 10:33 - 10:35
    if you go get a job somewhere
  • 10:35 - 10:36
    and you hear about mutation testing
  • 10:36 - 10:38
    which we haven’t done
  • 10:38 - 10:40
    This is an interesting idea that was, I think, invented by
  • 10:40 - 10:43
    Ammann and Offutt, who have, sort of
  • 10:43 - 10:44
    the definitive book on software testing
  • 10:44 - 10:46
    The idea is:
  • 10:46 - 10:48
    Suppose I introduced a deliberate bug into my code
  • 10:48 - 10:49
    does that force some test to fail?
  • 10:49 - 10:53
    Because, if I changed, you know, “if x” to “if not x”
  • 10:53 - 10:56
    and no tests fail, then either I’m missing some coverage
  • 10:56 - 10:59
    or my app is very strange and somehow nondeterministic
  • 10:59 - 11:03
    Fuzz testing, which Koushik Sen may talk more about
  • 11:03 - 11:07
    basically, this is the “10,000 monkeys at typewriters
  • 11:07 - 11:09
    throwing random input at your code”
  • 11:09 - 11:10
    What’s interesting about it is that
  • 11:10 - 11:11
    those tests we’ve been doing
  • 11:11 - 11:13
    essentially are crafted to test the app
  • 11:13 - 11:15
    the way it was designed
  • 11:15 - 11:16
    and these, you know, fuzz testing
  • 11:16 - 11:19
    is about testing the app in ways it wasn’t meant to be used
  • 11:19 - 11:22
    So, what happens if you throw enormous form submissions
  • 11:22 - 11:25
    What happens if you put control characters in your forms?
  • 11:25 - 11:27
    What happens if you submit the same thing over and over?
  • 11:27 - 11:29
    And, Koushik has a statistic that
  • 11:29 - 11:32
    Microsoft finds up to 20% of their bugs
  • 11:32 - 11:34
    using some variation of fuzz testing
  • 11:34 - 11:36
    and that about 25%
  • 11:36 - 11:39
    of the common Unix command-line programs
  • 11:39 - 11:40
    can be made to crash
  • 11:40 - 11:44
    [when] put through aggressive fuzz testing
  • 11:44 - 11:46
    Defining-use coverage is something that we haven’t done
  • 11:46 - 11:48
    but it’s another interesting concept
  • 11:48 - 11:50
    The idea is that at any point in my program
  • 11:50 - 11:52
    there’s a place where I define—
  • 11:52 - 11:54
    or I assign a value to some variable—
  • 11:54 - 11:56
    and then there’s a place downstream
  • 11:56 - 11:57
    where presumably I’m going to consume that value—
  • 11:57 - 11:59
    someone’s going to use that value
  • 11:59 - 12:01
    Have I covered every pair?
  • 12:01 - 12:02
    In other words, do I have tests where every pair
  • 12:02 - 12:04
    of defining a variable and using it somewhere
  • 12:04 - 12:07
    is executed at some part of my test suites
  • 12:07 - 12:10
    It’s sometimes called DU-coverage
  • 12:10 - 12:14
    And other terms that I think are not as widely used anymore
  • 12:14 - 12:17
    blackbox versus whitebox, or blackbox versus glassbox
  • 12:17 - 12:20
    Roughly, a blackbox test is one that is written from
  • 12:20 - 12:22
    the point of view of the external specification of the thing
  • 12:22 - 12:24
    [For example:] “This is a hash table
  • 12:24 - 12:26
    When I put in a key I should get back a value
  • 12:26 - 12:28
    If I delete the key the value shouldn’t be there”
  • 12:28 - 12:29
    That’s a blackbox test because it doesn’t say
  • 12:29 - 12:32
    anything about how the hash table is implemented
  • 12:32 - 12:34
    and it doesn’t try to stress the implementation
  • 12:34 - 12:36
    A corresponding whitebox test might be:
  • 12:36 - 12:38
    “I know something about the hash function
  • 12:38 - 12:39
    and I’m going to deliberately create
  • 12:39 - 12:41
    hash keys in my test cases
  • 12:41 - 12:43
    that cause a lot of hash collisions
  • 12:43 - 12:45
    to make sure that I’m testing that part of the functionality”
  • 12:45 - 12:49
    Now, a C0 test coverage tool, like SimpleCov
  • 12:49 - 12:52
    would reveal that if all you had is blackbox tests
  • 12:52 - 12:53
    you might find that
  • 12:53 - 12:55
    the collision coverage code wasn’t being hit very often
  • 12:55 - 12:56
    And that might tip you off and say:
  • 12:56 - 12:58
    “Ok, if I really want to strengthen that—
  • 12:58 - 13:00
    for one, if I want to boost coverage for those tests
  • 13:00 - 13:02
    now I have to write a whitebox or a glassbox test
  • 13:02 - 13:04
    I have to look inside, see what the implementation does
  • 13:04 - 13:05
    and find specific ways
  • 13:05 - 13:10
    to try to break the implementation in evil ways”
  • 13:10 - 13:13
    So, I think, testing is a kind of a way of life, right?
  • 13:13 - 13:16
    We’ve gotten away from the phase of
  • 13:16 - 13:18
    “We’d build the whole thing and then we’d test it”
  • 13:18 - 13:19
    and we’ve gotten into the phase of
  • 13:19 - 13:20
    “We’re testing as we go”
  • 13:20 - 13:22
    Testing is really more like a development tool
  • 13:22 - 13:24
    and like so many development tools
  • 13:24 - 13:25
    the effectiveness of it depends
  • 13:25 - 13:27
    on whether you’re using it in a tasteful manner
  • 13:27 - 13:31
    So, you could say: “Well, let’s see—I kicked the tires
  • 13:31 - 13:33
    You know, I fired up the browser, I tried a couple of things
  • 13:33 - 13:35
    (claps hand) Looks like it works! Deploy it!”
  • 13:35 - 13:38
    That’s obviously a little more cavalier than you’d want to be
  • 13:38 - 13:41
    And, by the way, one of the things that we discovered
  • 13:41 - 13:43
    with this online course just starting up
  • 13:43 - 13:45
    when 60,000 people are enrolled in the course
  • 13:45 - 13:48
    and 0.1% of those people have a problem
  • 13:48 - 13:50
    you’d get 60 emails
  • 13:50 - 13:53
    The corollary is: when your site is used by a lot of people
  • 13:53 - 13:55
    some stupid bug that you didn’t find
  • 13:55 - 13:57
    but that could have found by testing
  • 13:57 - 13:59
    could very quickly generate *a lot* of pain
  • 13:59 - 14:02
    On the other hand, you don’t want to be dogmatic and say
  • 14:02 - 14:04
    “Uh, until we have 100% coverage and every test is green
  • 14:04 - 14:06
    we absolutely will not ship”
  • 14:06 - 14:07
    That’s not healthy either
  • 14:07 - 14:08
    And the test quality
  • 14:08 - 14:10
    doesn’t necessarily correlate with the statement
  • 14:10 - 14:11
    unless you can say something
  • 14:11 - 14:12
    about the quality of your tests
  • 14:12 - 14:14
    just because you’ve executed every line
  • 14:14 - 14:17
    doesn’t mean that you’ve tested the interesting cases
  • 14:17 - 14:18
    So, somewhere in between, you could say
  • 14:18 - 14:20
    “Well, we’ll use coverage tools to identify
  • 14:20 - 14:23
    undertested or poorly-tested parts of the code
  • 14:23 - 14:24
    and we’ll use them as a guideline
  • 14:24 - 14:27
    to sort of help improve our overall confidence level”
  • 14:27 - 14:29
    But remember, Agile is about embracing change
  • 14:29 - 14:30
    and dealing with it
  • 14:30 - 14:32
    Part of change is things would change that will cause
  • 14:32 - 14:33
    bugs that you didn’t foresee
  • 14:33 - 14:34
    and the right reaction is:
  • 14:34 - 14:36
    Be comfortable enough for the testing tools
  • 14:36 - 14:37
    [so] that you can quickly find those bugs
  • 14:37 - 14:39
    Write a test that reproduces that bug
  • 14:39 - 14:40
    And then make the test green
  • 14:40 - 14:41
    Then you’ll really fix it
  • 14:41 - 14:43
    That means, the way that you really fix a bug is
  • 14:43 - 14:45
    if you created a test that correctly failed
  • 14:45 - 14:46
    to reproduce that bug
  • 14:46 - 14:48
    and then you went back and fixed the code
  • 14:48 - 14:49
    to make those tests pass
  • 14:49 - 14:51
    Similarly, you don’t want to say
  • 14:51 - 14:53
    “Well, unit tests give you better coverage
  • 14:53 - 14:54
    They’re more thorough and detailed
  • 14:54 - 14:56
    So let’s focus all our energy on that”
  • 14:56 - 14:57
    as opposed to
  • 14:57 - 14:58
    “Oh, focus on integration tests
  • 14:58 - 15:00
    because they’re more realistic, right?
  • 15:00 - 15:01
    They reflect what the customer said they want
  • 15:01 - 15:03
    So, if the integration tests are passing
  • 15:03 - 15:05
    by definition we’re meeting a customer need”
  • 15:05 - 15:07
    Again, both extremes are kind of unhealthy
  • 15:07 - 15:09
    because each one of these can find problems
  • 15:09 - 15:11
    that would be missed by the other
  • 15:11 - 15:12
    So, having a good combination of them
  • 15:12 - 15:15
    is kind of all it is all about
  • 15:15 - 15:18
    The last thing I want to leave you with is, I think
  • 15:18 - 15:20
    in terms of testing, is “TDD versus
  • 15:20 - 15:22
    what I call conventional debugging—
  • 15:22 - 15:24
    i.e., the way that we all kind of do it
  • 15:24 - 15:25
    even though we say we don’t”
  • 15:25 - 15:26
    and we’re all trying to get better, right?
  • 15:26 - 15:27
    We’re all kind of in the gutter
  • 15:27 - 15:29
    Some of us are looking up at the stars
  • 15:29 - 15:31
    trying to improve our practices
  • 15:31 - 15:33
    But, having now lived with this for 3 or 4 years myself
  • 15:33 - 15:35
    and—I’ll be honest—3 years ago I didn’t do TDD
  • 15:35 - 15:37
    I do it now, because I find that it’s better
  • 15:37 - 15:40
    and here’s my distillation of why I think it works for me
  • 15:40 - 15:43
    Sorry, the colours are a little weird
  • 15:43 - 15:45
    but on the left column of the table
  • 15:45 - 15:46
    [it] says “Conventional debugging”
  • 15:46 - 15:47
    and the right side says “TDD”
  • 15:47 - 15:49
    So what’s the way I used to write code?
  • 15:49 - 15:51
    Maybe some of you still do this
  • 15:51 - 15:53
    I write a whole bunch of lines
  • 15:53 - 15:54
    maybe a few tens of lines of code
  • 15:54 - 15:55
    I’m sure they’re right—
  • 15:55 - 15:56
    I mean, I am a good programmer, right?
  • 15:56 - 15:57
    This is not that hard
  • 15:57 - 15:59
    I run it – It doesn’t work
  • 15:59 - 16:01
    Ok, fire up the debugger – Start putting in printf’s
  • 16:01 - 16:04
    If I’d been using TDD what would I do instead?
  • 16:04 - 16:08
    Well I’d write a few lines of code, having written a test first
  • 16:08 - 16:10
    So as soon as the test goes from red to green
  • 16:10 - 16:12
    I know I wrote code that works—
  • 16:12 - 16:15
    or at least the parts of the behaviour that I had in mind
  • 16:15 - 16:16
    Those parts of the behaviour work, because I had a test
  • 16:16 - 16:19
    Ok, back to conventional debugging:
  • 16:19 - 16:21
    I’m running my program, trying to find the bugs
  • 16:21 - 16:23
    I start putting in printf’s everywhere
  • 16:23 - 16:24
    to print out the values of things
  • 16:24 - 16:25
    which by the way is a lot fun
  • 16:25 - 16:26
    when you’re trying to read them
  • 16:26 - 16:28
    out of the 500 lines of log output
  • 16:28 - 16:29
    that you’d get in a Rails app
  • 16:29 - 16:30
    trying to find your printf’s
  • 16:30 - 16:32
    you know, “I know what I’ll do—
  • 16:32 - 16:34
    I’ll put in 75 asterisks before and after
  • 16:34 - 16:36
    That will make it readable” (laughter)
  • 16:36 - 16:38
    Who don’t—Ok, raise your hands if you don’t do this!
  • 16:38 - 16:40
    Thank you for your honesty. (laughter) Ok.
  • 16:40 - 16:43
    Or— Or I could do the other thing, I could say:
  • 16:43 - 16:45
    Instead of printing the value of a variable
  • 16:45 - 16:47
    why don’t I write a test that inspects it
  • 16:47 - 16:48
    in such an expectation which should
  • 16:48 - 16:50
    and I’ll know immediately in bright red letters
  • 16:50 - 16:53
    if that expectation wasn’t met
  • 16:53 - 16:56
    Ok, I’m back on the conventional debugging side:
  • 16:56 - 16:58
    I break out the big guns: I pull out the Ruby debugger
  • 16:58 - 17:02
    I set a debug breakpoint, and I now start tweaking and say
  • 17:02 - 17:04
    “Oh, let’s see, I have to get past that ‘if’ statement
  • 17:04 - 17:06
    so I have to set that thing
  • 17:06 - 17:07
    Oh, I have to call that method and so I need to…”
  • 17:07 - 17:08
    No!
  • 17:08 - 17:10
    I could instead—if I’m going to do that anyway—
  • 17:10 - 17:13
    let’s just do it in a file, set up some mocks and stubs
  • 17:13 - 17:16
    to control the code path, make it go the way I want
  • 17:16 - 17:19
    And now, “Ok, for sure I’ve fixed it!
  • 17:19 - 17:22
    I’ll get out of the debugger, run it all again!”
  • 17:22 - 17:24
    And, of course, 9 times out of 10, you didn’t fix it
  • 17:24 - 17:26
    or you kind of partly fixed it but you didn’t completely fix it
  • 17:26 - 17:30
    and now I have to do all these manual things all over again
  • 17:30 - 17:32
    or I already have a bunch of tests
  • 17:32 - 17:34
    and I can just rerun them automatically
  • 17:34 - 17:35
    and I could, if some of them fail
  • 17:35 - 17:36
    “Oh, I didn’t fix the whole thing
  • 17:36 - 17:38
    No problem, I’ll just go back!”
  • 17:38 - 17:39
    So, the bottom line is that
  • 17:39 - 17:41
    you know, you could do it on the left side
  • 17:41 - 17:45
    but you’re using the same techniques in both cases
  • 17:45 - 17:48
    The only difference is, in one case you’re doing it manually
  • 17:48 - 17:50
    which is boring and error-prone
  • 17:50 - 17:51
    In the other case you’re doing a little more work
  • 17:51 - 17:53
    but you can make it automatic and repeatable
  • 17:53 - 17:55
    and have, you know, some high confidence
  • 17:55 - 17:57
    that as you change things in your code
  • 17:57 - 17:58
    you are not breaking stuff that used to work
  • 17:58 - 18:00
    and basically it’s more productive
  • 18:00 - 18:02
    So you’re doing all the same things
  • 18:02 - 18:04
    but with a, kind of, “delta” extra work
  • 18:04 - 18:07
    you are using your effort at a much higher leverage
  • 18:07 - 18:10
    So that’s kind of my view of why TDD is a good thing
  • 18:10 - 18:11
    It’s really, it doesn’t require new skills
  • 18:11 - 18:15
    It just requires [you] to refactor your existing skills
  • 18:15 - 18:18
    I also tried when I—again, honest confessions, right?—
  • 18:18 - 18:19
    when I started doing this it was like
  • 18:19 - 18:21
    “Ok, I gonna be teaching a course on Rails
  • 18:21 - 18:22
    I should really focus on testing
  • 18:22 - 18:24
    So I went back to some code I had written
  • 18:24 - 18:26
    that was working—you know, that was decent code—
  • 18:26 - 18:29
    and I started trying to write tests for it
  • 18:29 - 18:31
    and it was *so painful*
  • 18:31 - 18:33
    because the code wasn’t written in way that was testable
  • 18:33 - 18:34
    There were all kinds of interactions
  • 18:34 - 18:36
    There were, like, nested conditionals
  • 18:36 - 18:38
    And if you wanted to isolate a particular statement
  • 18:38 - 18:41
    and have it test—to trigger test—just that statement
  • 18:41 - 18:44
    the amount of stuff you’d have to set up in your test
  • 18:44 - 18:45
    to have it happen—
  • 18:45 - 18:46
    remember when talked about mock train wrecks—
  • 18:46 - 18:48
    you have to set up all this infrastructure
  • 18:48 - 18:49
    just to get one line of code
  • 18:49 - 18:51
    and you do that and you go
  • 18:51 - 18:52
    “Gawd, testing is really not worth it!
  • 18:52 - 18:54
    I wrote 20 lines of setup
  • 18:54 - 18:56
    so that I could test two lines in my function!”
  • 18:56 - 18:58
    What that’s really telling you—as I now realize—
  • 18:58 - 19:00
    is your function is bad
  • 19:00 - 19:01
    It’s a badly written function
  • 19:01 - 19:02
    It’s not a testable function
  • 19:02 - 19:03
    It’s got too many moving parts
  • 19:03 - 19:06
    whose dependencies can be broken
  • 19:06 - 19:07
    There’s no seams in my function
  • 19:07 - 19:11
    that allow me to individually test the different behaviours
  • 19:11 - 19:12
    And once you start doing Test First Development
  • 19:12 - 19:15
    because you have to write your tests in small chunks
  • 19:15 - 19:17
    it kind of make this problem go away
  • 19:17 -
    So that’s been my epiphany
Title:
5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
Video Language:
English

English subtitles

Revisions