5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives

0:00 - 0:01

So we spent a bunch of time
0:01 - 0:03

in the last couple of lectures
0:03 - 0:05

talking about different kinds of testing
0:05 - 0:08

about unit testing versus integration testing
0:08 - 0:10

We talked about how do you use RSpec
0:10 - 0:12

to really isolate the parts of your code you want to test
0:12 - 0:14

you’ve also, you know, because of homework 3,
0:14 - 0:18

and other stuff, we have been doing BDD,
0:18 - 0:20

where we’ve been using Cucumber to turn user stories
0:20 - 0:22

into, essentially, integration and acceptance tests
0:22 - 0:25

So you’ve seen testing in a couple of different levels
0:25 - 0:27

and the goal here is sort of to do a few remarks
0:27 - 0:29

to, you know, let’s back up a little bit
0:29 - 0:33

and see the big picture, and tie those things together
0:33 - 0:34

So this sort of spans material
0:34 - 0:37

that covers three or four sections in the book
0:37 - 0:39

and I want to just hit the high points in lecture
0:39 - 0:41

So a question that comes up
0:41 - 0:43

I’m sure it’s come up for all of you
0:43 - 0:44

as you have been doing homework
0:44 - 0:45

is: “How much testing is enough?”
0:45 - 0:48

And, sadly, for a long time
0:48 - 0:51

kind of if you asked this question in industry
0:51 - 0:52

the answer was basically
0:52 - 0:53

“Well, we have a shipping deadline,
0:53 - 0:54

so however much testing we can do
0:54 - 0:56

before that deadline, that’s how much.”
0:56 - 0:58

That’s what you have time for.
0:58 - 1:00

So, you know, that’s a little flip
1:00 - 1:01

obviously not very good
1:01 - 1:02

So you can do a bit better, right?
1:02 - 1:03

There’re some static measures
1:03 - 1:06

like how many lines of code does your app have
1:06 - 1:08

and how many lines of tests do you have?
1:08 - 1:10

And it’s not unusual in industry
1:10 - 1:12

in a well-tested piece of software
1:12 - 1:14

for the number of lines of tests
1:14 - 1:17

to go far beyond the number of lines of code
1:17 - 1:19

So, integer multiples are not unusual
1:19 - 1:21

And I think even for sort of, you know,
1:21 - 1:23

research code or classwork
1:23 - 1:26

a ratio of, you know, maybe 1.5 is not unreasonable
1:26 - 1:30

so one and a half times the amount of test code
1:30 - 1:32

as you have application code
1:32 - 1:34

And in a lot of production systems
1:34 - 1:35

where they really care about testing
1:35 - 1:36

it is much higher than that
1:36 - 1:38

So maybe a better question to ask:
1:38 - 1:39

Rather than saying “How much testing is enough?”
1:39 - 1:42

is to ask “How good is the testing I am doing now?
1:42 - 1:44

How thorough is it?”
1:44 - 1:45

Later in this semester
1:45 - 1:46

Professor Sen will talk about
1:46 - 1:48

a little bit about formal methods
1:48 - 1:50

and sort of what’s at the frontiers of testing and debugging
1:50 - 1:52

But a couple of things that we can talk about
1:52 - 1:54

based on what you already know
1:54 - 1:57

is some basic concepts about test coverage
1:57 - 1:59

And although I would say
1:59 - 2:01

you know, we’ve been saying all along
2:01 - 2:03

formal methods, they don’t really work on big systems
2:03 - 2:05

I think that statement, in my personal opinion
2:05 - 2:07

is actually a lot less true than it used to be
2:07 - 2:09

I think there are a number of specific places
2:09 - 2:10

especially in testing and debugging
2:10 - 2:12

where formal methods are actually making fast progress
2:12 - 2:15

and Koushik Sen is one of the leaders in that
2:15 - 2:17

So you’ll have the opportunity to hear more about that later
2:17 - 2:21

but for the moment I think, kind of bread and butter
2:21 - 2:22

is let’s talk about coverage measurement
2:22 - 2:24

because this is where the rubber meets the road
2:24 - 2:26

in terms of how you’d be evaluated
2:26 - 2:28

if you are doing this for real
2:28 - 2:29

So what’s some basics?
2:29 - 2:30

Here’s a really simple class you can use
2:30 - 2:32

to talk about different ways to measure
2:32 - 2:34

how our test covers this code
2:34 - 2:36

And there’re a few different levels
2:36 - 2:37

with different terminologies
2:37 - 2:40

It’s not really universal across all software houses
2:40 - 2:42

But one common set of terminology
2:42 - 2:43

that the book exposes
2:43 - 2:44

is we could talk about S0
2:44 - 2:47

where we’d just mean you’ve called every method once
2:47 - 2:50

So you know, if you call foo, and you call bar, you’re done
2:50 - 2:52

That’s S0 coverage: not terribly thorough
2:52 - 2:54

A little more stringent, S1, is
2:54 - 2:56

you could say, we’re calling every method
2:56 - 2:57

from every place that it could be called
2:57 - 2:58

So what does that mean?
2:58 - 3:00

It means, for example
3:00 - 3:01

it’s not enough to call bar
3:01 - 3:02

You have to make sure that you have to call it
3:02 - 3:05

at least once from in here
3:05 - 3:07

as well as calling it once
3:07 - 3:10

from any exterior function that might call it
3:10 - 3:12

C0 which is what SimpleCov measures
3:12 - 3:15

(those of you who’ve gotten SimpleCov up and running)
3:15 - 3:18

basically says you’ve executed every statement
3:18 - 3:20

you’ve touched every statement in your code once
3:20 - 3:22

But the caveat there is that
3:22 - 3:25

conditionals really just count as a single statement
3:25 - 3:28

So, if you, no matter which branch of this “if” you took
3:28 - 3:31

as long as you touched one of the other branch
3:31 - 3:33

you’ve executed the “if’ statement
3:33 - 3:35

So even C0 is still, you know, sort of superficial coverage
3:35 - 3:37

But, as we will see
3:37 - 3:39

the way that you will want to read this information is:
3:39 - 3:41

if you are getting bad coverage at the C0 level
3:41 - 3:44

then you have really really bad coverage
3:44 - 3:46

So if you are not kind of making
3:46 - 3:47

this simple level of superficial coverage
3:47 - 3:50

then your testing is probably deficient
3:50 - 3:51

C1 is the next step up from that
3:51 - 3:53

We could say:
3:53 - 3:55

Well, we have to take every branch in both directions
3:55 - 3:56

So, when we are doing this “if” statement
3:56 - 3:58

we have to make sure that
3:58 - 3:59

we do the “if x” part once
3:59 - 4:05

and the “if not x” part at least once to meet C1
4:05 - 4:08

You can augment that with decision coverage
4:08 - 4:09

saying: Well, if we’re gonna…
4:09 - 4:12

If we have “if” statments where the condition
4:12 - 4:13

is made up of multiple terms
4:13 - 4:15

we have to make sure that every subexpression
4:15 - 4:17

has been evaluated both directions
4:17 - 4:19

In other words, that means that
4:19 - 4:22

if we’re going to fail this “if” statement
4:22 - 4:24

we have to make sure to fail it at least once
4:24 - 4:26

because y was false in at least once because z was false
4:26 - 4:28

In other words, any subexpression that could
4:28 - 4:31

independently change the outcome of the condition
4:31 - 4:34

has to be exercised in both directions
4:34 - 4:36

And then,
4:36 - 4:38

kind of, the one that, you know, a lot of people aspire to
4:38 - 4:41

but there is disagreement on how much more valuable it is
4:41 - 4:42

is you take every path through the code
4:42 - 4:45

Obviously, this is kind of difficult because
4:45 - 4:48

it tends to be exponential in the number of conditions
4:48 - 4:53

And in general it’s difficult
4:53 - 4:55

to evaluate if you’ve taken every path through the code
4:55 - 4:57

There are formal techniques that you can use
4:57 - 4:58

to tell you where the holes are
4:58 - 5:01

but the bottom line is that
5:01 - 5:03

in most commercial software houses
5:03 - 5:04

there is, I would say, not complete consensus
5:04 - 5:06

on how much more valuable C2 is
5:06 - 5:08

compared to C0 or C1
5:08 - 5:10

So, I think, for the purpose of our class
5:10 - 5:11

you get exposed to the idea
5:11 - 5:13

of how you use coverage information
5:13 - 5:16

SimpleCov takes advantage of some built-in Ruby features
5:16 - 5:18

to give you C0 coverage
5:18 - 5:19

[It] does really nice reports
5:19 - 5:21

We can sort of see it
5:21 - 5:22

at the level of individual lines in your file
5:22 - 5:24

You can see what your coverage is
5:24 - 5:27

and I think that’s kind of a, you know
5:27 - 5:31

a good start for where we are
5:31 - 5:33

So, having see a sort of different flavours of tests
5:33 - 5:37

Stepping back and looking back at the big picture
5:37 - 5:38

what are the different kind of tests
5:38 - 5:40

that we’ve seen concretely?
5:40 - 5:42

and what are the tradeoffs
5:42 - 5:43

between using those different kinds of tests?
5:43 - 5:47

So we’ve seen at the level of individual classes or methods
5:47 - 5:50

we use RSpec, with extensive use of mocking and stubbing
5:50 - 5:53

So, for example when we do testing methods in the model
5:53 - 5:55

that will be an example of unit testing
5:55 - 5:59

We also did something that is pretty similar to
5:59 - 6:00

functional or module testing
6:00 - 6:02

where there is more than one module participating
6:02 - 6:04

So, for example when we did controller specs
6:04 - 6:07

we saw that—we simulate a POST action
6:07 - 6:09

but remember that the POST action
6:09 - 6:10

has to go through the routing subsystem
6:10 - 6:12

before it gets to the controller
6:12 - 6:14

Once the controller is done it will try to render a view
6:14 - 6:16

So in fact there’s other pieces
6:16 - 6:17

that collaborate with the controller
6:17 - 6:19

that have to be working in order for controller specs to pass
6:19 - 6:21

So that’s somewhere inbetween:
6:21 - 6:23

where we’re doing more than a single method
6:23 - 6:25

touching more than a single class
6:25 - 6:27

but we’re still concentrating [our] attention
6:27 - 6:28

on a fairly narrow slice of the system at a time
6:28 - 6:31

and we’re still using mocking and stubbing extensively
6:31 - 6:35

to sort of isolate that behaviour that we want to test
6:35 - 6:36

And then at the level of Cucumber scenarios
6:36 - 6:38

these are more like integration or system tests
6:38 - 6:41

They exercise complete paths throughout the application
6:41 - 6:43

They probably touch a lot of different modules
6:43 - 6:46

They make minimal use of mocks and stubs
6:46 - 6:48

because part of the goal of an integration test
6:48 - 6:50

is exactly to test the interaction between pieces
6:50 - 6:53

So you don’t want to stub or control those interactions
6:53 - 6:54

You actually want to let the system do
6:54 - 6:56

what it would really do
6:56 - 6:58

if this was a scenario happening in production
6:58 - 7:00

So how would we compare these different kinds of tests?
7:00 - 7:02

There’s a few different axes we can look at
7:02 - 7:05

One of them is how long they take to run
7:05 - 7:06

Now, both RSpec and Cucumber
7:06 - 7:09

have, kind of, high startup times and stuff like that
7:09 - 7:10

But, as you’ll see
7:10 - 7:11

as you start adding more and more RSpec tests
7:11 - 7:14

and using autotest to run them in the background
7:14 - 7:17

by and large, once RSpec kind of gets off the launching pad
7:17 - 7:19

it runs specs really fast
7:19 - 7:21

whereas running Cucumber features just takes a long time
7:21 - 7:24

as it essentially fires up your entire application
7:24 - 7:26

And later in this semester
7:26 - 7:28

we’ll see a way to make Cucumber even slower—
7:28 - 7:30

which is to have it fire up an entire browser
7:30 - 7:33

basically act like a puppet, remote-controlling Firefox
7:33 - 7:35

so you can test Javascript code
7:35 - 7:37

We’ll do that when we actually—
7:37 - 7:40

I think we’ll be able to work with our friends at SourceLabs
7:40 - 7:42

so you can do that in the cloud—That will be exciting
7:42 - 7:45

So, “run fast” versus “run slow”
7:45 - 7:46

Resolution:
7:46 - 7:48

If an error happens in your unit tests
7:48 - 7:49

it’s usually pretty easy
7:49 - 7:52

to figure out and track down what the source of that error is
7:52 - 7:53

because the tests are so isolated
7:53 - 7:56

You’ve stubbed out everything that doesn’t matter
7:56 - 7:58

and you’re focusing on only the behaviour of interest
7:58 - 7:59

So, if you’ve done a good job of doing that
7:59 - 8:01

when something goes wrong in one of your tests
8:01 - 8:03

there’s not a lot of places
8:03 - 8:04

that something could have gone wrong
8:04 - 8:07

In contrast, if you’re running a Cucumber scenario
8:07 - 8:08

that’s got, you know, 10 steps
8:08 - 8:10

and every step is touching
8:10 - 8:11

a whole bunch of pieces of the app
8:11 - 8:12

it could take a long time
8:12 - 8:14

to actually get to the bottom of a bug
8:14 - 8:16

So it is kind of a tradeoff
8:16 - 8:17

between how well can you localize errors
8:17 - 8:20

Coverage:
8:20 - 8:23

It’s possible if you write a good suite
8:23 - 8:24

of unit and functional tests
8:24 - 8:26

you can get really high coverage
8:26 - 8:27

You can run your SimpleCov report
8:27 - 8:30

and you can actually identify specific lines in your files
8:30 - 8:32

that have not been exercised by any test
8:32 - 8:34

and then you can go right at tests that cover them
8:34 - 8:36

So, figuring out how to improve your coverage
8:36 - 8:37

for example at the C0 level
8:37 - 8:40

is something much more easily done with unit tests
8:40 - 8:42

whereas, with a Cucumber test—
8:42 - 8:43

with a Cucumber scenario—
8:43 - 8:45

you are touching a lot of parts of the code
8:45 - 8:47

but you are doing it very sparsely
8:47 - 8:49

So, if your goal is to get your coverage up
8:49 - 8:51

use the tools at that are at the unit levels
8:51 - 8:53

so that you can focusing on understanding
8:53 - 8:54

what parts or my code are undertested
8:54 - 8:56

and then you can write very targeted tests
8:56 - 8:58

just to focus on them
8:58 - 9:01

And, sort of, you know, putting those pieces together
9:01 - 9:03

the unit tests
9:03 - 9:05

because of their isolation and their fine resolution
9:05 - 9:07

tend to use a lot of mocks
9:07 - 9:09

to isolate the behaviours you don’t care about
9:09 - 9:11

But that means that, by definition
9:11 - 9:12

you’re not testing the interfaces
9:12 - 9:14

and it’s sort of a “received wisdom” in software
9:14 - 9:16

that a lot of the interesting bugs
9:16 - 9:18

occur at the interfaces between pieces
9:18 - 9:20

and not sort of within a class or within a method—
9:20 - 9:22

those are sort of the easy bugs to track down
9:22 - 9:24

And at the other extreme
9:24 - 9:26

the more you get towards the integration testing extreme
9:26 - 9:29

you’re supposed to rely less and less on mocks
9:29 - 9:30

for that exact reason
9:30 - 9:32

Now we saw, if you’re testing something like
9:32 - 9:34

say, in a service-oriented architecture
9:34 - 9:35

where you have to interact with the remote site
9:35 - 9:37

you still end up
9:37 - 9:38

having to do a fair amount of mocking and stubbing
9:38 - 9:40

so that you don’t rely on the Internet
9:40 - 9:41

in order for your tests to pass
9:41 - 9:43

but, generally speaking
9:43 - 9:47

you’re trying to remove as many of the mocks that you can
9:47 - 9:48

and let the system run the way it would run in real life
9:48 - 9:52

So, the good news is you are testing the interfaces
9:52 - 9:54

but when something goes wrong in one of the interfaces
9:54 - 9:57

because your resolution is not as good
9:57 - 10:00

it may take longer to figure out what it is
10:00 - 10:05

So, what’s sort of the high-order bit from this tradeoff
10:05 - 10:07

is you don’t really want to rely
10:07 - 10:08

too heavily on any one kind of test
10:08 - 10:10

They serve different purposes and, depending on
10:10 - 10:13

are you trying to exercise your interfaces more
10:13 - 10:15

or are you trying to improve your fine-grain coverage
10:15 - 10:18

that affects how you develop your test suite
10:18 - 10:20

and you’ll evolve it along with your software
10:20 - 10:24

So, we’ve used a certain set of terminology in testing
10:24 - 10:26

It’s the terminology that, by and large
10:26 - 10:29

is most commonly used in the Rails community
10:29 - 10:30

but there’s some variation
10:30 - 10:33

[and] some other terms that you might hear
10:33 - 10:35

if you go get a job somewhere
10:35 - 10:36

and you hear about mutation testing
10:36 - 10:38

which we haven’t done
10:38 - 10:40

This is an interesting idea that was, I think, invented by
10:40 - 10:43

Ammann and Offutt, who have, sort of
10:43 - 10:44

the definitive book on software testing
10:44 - 10:46

The idea is:
10:46 - 10:48

Suppose I introduced a deliberate bug into my code
10:48 - 10:49

does that force some test to fail?
10:49 - 10:53

Because, if I changed, you know, “if x” to “if not x”
10:53 - 10:56

and no tests fail, then either I’m missing some coverage
10:56 - 10:59

or my app is very strange and somehow nondeterministic
10:59 - 11:03

Fuzz testing, which Koushik Sen may talk more about
11:03 - 11:07

basically, this is the “10,000 monkeys at typewriters
11:07 - 11:09

throwing random input at your code”
11:09 - 11:10

What’s interesting about it is that
11:10 - 11:11

those tests we’ve been doing
11:11 - 11:13

essentially are crafted to test the app
11:13 - 11:15

the way it was designed
11:15 - 11:16

and these, you know, fuzz testing
11:16 - 11:19

is about testing the app in ways it wasn’t meant to be used
11:19 - 11:22

So, what happens if you throw enormous form submissions
11:22 - 11:25

What happens if you put control characters in your forms?
11:25 - 11:27

What happens if you submit the same thing over and over?
11:27 - 11:29

And, Koushik has a statistic that
11:29 - 11:32

Microsoft finds up to 20% of their bugs
11:32 - 11:34

using some variation of fuzz testing
11:34 - 11:36

and that about 25%
11:36 - 11:39

of the common Unix command-line programs
11:39 - 11:40

can be made to crash
11:40 - 11:44

[when] put through aggressive fuzz testing
11:44 - 11:46

Defining-use coverage is something that we haven’t done
11:46 - 11:48

but it’s another interesting concept
11:48 - 11:50

The idea is that at any point in my program
11:50 - 11:52

there’s a place where I define—
11:52 - 11:54

or I assign a value to some variable—
11:54 - 11:56

and then there’s a place downstream
11:56 - 11:57

where presumably I’m going to consume that value—
11:57 - 11:59

someone’s going to use that value
11:59 - 12:01

Have I covered every pair?
12:01 - 12:02

In other words, do I have tests where every pair
12:02 - 12:04

of defining a variable and using it somewhere
12:04 - 12:07

is executed at some part of my test suites
12:07 - 12:10

It’s sometimes called DU-coverage
12:10 - 12:14

And other terms that I think are not as widely used anymore
12:14 - 12:17

blackbox versus whitebox, or blackbox versus glassbox
12:17 - 12:20

Roughly, a blackbox test is one that is written from
12:20 - 12:22

the point of view of the external specification of the thing
12:22 - 12:24

[For example:] “This is a hash table
12:24 - 12:26

When I put in a key I should get back a value
12:26 - 12:28

If I delete the key the value shouldn’t be there”
12:28 - 12:29

That’s a blackbox test because it doesn’t say
12:29 - 12:32

anything about how the hash table is implemented
12:32 - 12:34

and it doesn’t try to stress the implementation
12:34 - 12:36

A corresponding whitebox test might be:
12:36 - 12:38

“I know something about the hash function
12:38 - 12:39

and I’m going to deliberately create
12:39 - 12:41

hash keys in my test cases
12:41 - 12:43

that cause a lot of hash collisions
12:43 - 12:45

to make sure that I’m testing that part of the functionality”
12:45 - 12:49

Now, a C0 test coverage tool, like SimpleCov
12:49 - 12:52

would reveal that if all you had is blackbox tests
12:52 - 12:53

you might find that
12:53 - 12:55

the collision coverage code wasn’t being hit very often
12:55 - 12:56

And that might tip you off and say:
12:56 - 12:58

“Ok, if I really want to strengthen that—
12:58 - 13:00

for one, if I want to boost coverage for those tests
13:00 - 13:02

now I have to write a whitebox or a glassbox test
13:02 - 13:04

I have to look inside, see what the implementation does
13:04 - 13:05

and find specific ways
13:05 - 13:10

to try to break the implementation in evil ways”
13:10 - 13:13

So, I think, testing is a kind of a way of life, right?
13:13 - 13:16

We’ve gotten away from the phase of
13:16 - 13:18

“We’d build the whole thing and then we’d test it”
13:18 - 13:19

and we’ve gotten into the phase of
13:19 - 13:20

“We’re testing as we go”
13:20 - 13:22

Testing is really more like a development tool
13:22 - 13:24

and like so many development tools
13:24 - 13:25

the effectiveness of it depends
13:25 - 13:27

on whether you’re using it in a tasteful manner
13:27 - 13:31

So, you could say: “Well, let’s see—I kicked the tires
13:31 - 13:33

You know, I fired up the browser, I tried a couple of things
13:33 - 13:35

(claps hand) Looks like it works! Deploy it!”
13:35 - 13:38

That’s obviously a little more cavalier than you’d want to be
13:38 - 13:41

And, by the way, one of the things that we discovered
13:41 - 13:43

with this online course just starting up
13:43 - 13:45

when 60,000 people are enrolled in the course
13:45 - 13:48

and 0.1% of those people have a problem
13:48 - 13:50

you’d get 60 emails
13:50 - 13:53

The corollary is: when your site is used by a lot of people
13:53 - 13:55

some stupid bug that you didn’t find
13:55 - 13:57

but that could have found by testing
13:57 - 13:59

could very quickly generate *a lot* of pain
13:59 - 14:02

On the other hand, you don’t want to be dogmatic and say
14:02 - 14:04

“Uh, until we have 100% coverage and every test is green
14:04 - 14:06

we absolutely will not ship”
14:06 - 14:07

That’s not healthy either
14:07 - 14:08

And the test quality
14:08 - 14:10

doesn’t necessarily correlate with the statement
14:10 - 14:11

unless you can say something
14:11 - 14:12

about the quality of your tests
14:12 - 14:14

just because you’ve executed every line
14:14 - 14:17

doesn’t mean that you’ve tested the interesting cases
14:17 - 14:18

So, somewhere in between, you could say
14:18 - 14:20

“Well, we’ll use coverage tools to identify
14:20 - 14:23

undertested or poorly-tested parts of the code
14:23 - 14:24

and we’ll use them as a guideline
14:24 - 14:27

to sort of help improve our overall confidence level”
14:27 - 14:29

But remember, Agile is about embracing change
14:29 - 14:30

and dealing with it
14:30 - 14:32

Part of change is things would change that will cause
14:32 - 14:33

bugs that you didn’t foresee
14:33 - 14:34

and the right reaction is:
14:34 - 14:36

Be comfortable enough for the testing tools
14:36 - 14:37

[so] that you can quickly find those bugs
14:37 - 14:39

Write a test that reproduces that bug
14:39 - 14:40

And then make the test green
14:40 - 14:41

Then you’ll really fix it
14:41 - 14:43

That means, the way that you really fix a bug is
14:43 - 14:45

if you created a test that correctly failed
14:45 - 14:46

to reproduce that bug
14:46 - 14:48

and then you went back and fixed the code
14:48 - 14:49

to make those tests pass
14:49 - 14:51

Similarly, you don’t want to say
14:51 - 14:53

“Well, unit tests give you better coverage
14:53 - 14:54

They’re more thorough and detailed
14:54 - 14:56

So let’s focus all our energy on that”
14:56 - 14:57

as opposed to
14:57 - 14:58

“Oh, focus on integration tests
14:58 - 15:00

because they’re more realistic, right?
15:00 - 15:01

They reflect what the customer said they want
15:01 - 15:03

So, if the integration tests are passing
15:03 - 15:05

by definition we’re meeting a customer need”
15:05 - 15:07

Again, both extremes are kind of unhealthy
15:07 - 15:09

because each one of these can find problems
15:09 - 15:11

that would be missed by the other
15:11 - 15:12

So, having a good combination of them
15:12 - 15:15

is kind of all it is all about
15:15 - 15:18

The last thing I want to leave you with is, I think
15:18 - 15:20

in terms of testing, is “TDD versus
15:20 - 15:22

what I call conventional debugging—
15:22 - 15:24

i.e., the way that we all kind of do it
15:24 - 15:25

even though we say we don’t”
15:25 - 15:26

and we’re all trying to get better, right?
15:26 - 15:27

We’re all kind of in the gutter
15:27 - 15:29

Some of us are looking up at the stars
15:29 - 15:31

trying to improve our practices
15:31 - 15:33

But, having now lived with this for 3 or 4 years myself
15:33 - 15:35

and—I’ll be honest—3 years ago I didn’t do TDD
15:35 - 15:37

I do it now, because I find that it’s better
15:37 - 15:40

and here’s my distillation of why I think it works for me
15:40 - 15:43

Sorry, the colours are a little weird
15:43 - 15:45

but on the left column of the table
15:45 - 15:46

[it] says “Conventional debugging”
15:46 - 15:47

and the right side says “TDD”
15:47 - 15:49

So what’s the way I used to write code?
15:49 - 15:51

Maybe some of you still do this
15:51 - 15:53

I write a whole bunch of lines
15:53 - 15:54

maybe a few tens of lines of code
15:54 - 15:55

I’m sure they’re right—
15:55 - 15:56

I mean, I am a good programmer, right?
15:56 - 15:57

This is not that hard
15:57 - 15:59

I run it – It doesn’t work
15:59 - 16:01

Ok, fire up the debugger – Start putting in printf’s
16:01 - 16:04

If I’d been using TDD what would I do instead?
16:04 - 16:08

Well I’d write a few lines of code, having written a test first
16:08 - 16:10

So as soon as the test goes from red to green
16:10 - 16:12

I know I wrote code that works—
16:12 - 16:15

or at least the parts of the behaviour that I had in mind
16:15 - 16:16

Those parts of the behaviour work, because I had a test
16:16 - 16:19

Ok, back to conventional debugging:
16:19 - 16:21

I’m running my program, trying to find the bugs
16:21 - 16:23

I start putting in printf’s everywhere
16:23 - 16:24

to print out the values of things
16:24 - 16:25

which by the way is a lot fun
16:25 - 16:26

when you’re trying to read them
16:26 - 16:28

out of the 500 lines of log output
16:28 - 16:29

that you’d get in a Rails app
16:29 - 16:30

trying to find your printf’s
16:30 - 16:32

you know, “I know what I’ll do—
16:32 - 16:34

I’ll put in 75 asterisks before and after
16:34 - 16:36

That will make it readable” (laughter)
16:36 - 16:38

Who don’t—Ok, raise your hands if you don’t do this!
16:38 - 16:40

Thank you for your honesty. (laughter) Ok.
16:40 - 16:43

Or— Or I could do the other thing, I could say:
16:43 - 16:45

Instead of printing the value of a variable
16:45 - 16:47

why don’t I write a test that inspects it
16:47 - 16:48

in such an expectation which should
16:48 - 16:50

and I’ll know immediately in bright red letters
16:50 - 16:53

if that expectation wasn’t met
16:53 - 16:56

Ok, I’m back on the conventional debugging side:
16:56 - 16:58

I break out the big guns: I pull out the Ruby debugger
16:58 - 17:02

I set a debug breakpoint, and I now start tweaking and say
17:02 - 17:04

“Oh, let’s see, I have to get past that ‘if’ statement
17:04 - 17:06

so I have to set that thing
17:06 - 17:07

Oh, I have to call that method and so I need to…”
17:07 - 17:08

No!
17:08 - 17:10

I could instead—if I’m going to do that anyway—
17:10 - 17:13

let’s just do it in a file, set up some mocks and stubs
17:13 - 17:16

to control the code path, make it go the way I want
17:16 - 17:19

And now, “Ok, for sure I’ve fixed it!
17:19 - 17:22

I’ll get out of the debugger, run it all again!”
17:22 - 17:24

And, of course, 9 times out of 10, you didn’t fix it
17:24 - 17:26

or you kind of partly fixed it but you didn’t completely fix it
17:26 - 17:30

and now I have to do all these manual things all over again
17:30 - 17:32

or I already have a bunch of tests
17:32 - 17:34

and I can just rerun them automatically
17:34 - 17:35

and I could, if some of them fail
17:35 - 17:36

“Oh, I didn’t fix the whole thing
17:36 - 17:38

No problem, I’ll just go back!”
17:38 - 17:39

So, the bottom line is that
17:39 - 17:41

you know, you could do it on the left side
17:41 - 17:45

but you’re using the same techniques in both cases
17:45 - 17:48

The only difference is, in one case you’re doing it manually
17:48 - 17:50

which is boring and error-prone
17:50 - 17:51

In the other case you’re doing a little more work
17:51 - 17:53

but you can make it automatic and repeatable
17:53 - 17:55

and have, you know, some high confidence
17:55 - 17:57

that as you change things in your code
17:57 - 17:58

you are not breaking stuff that used to work
17:58 - 18:00

and basically it’s more productive
18:00 - 18:02

So you’re doing all the same things
18:02 - 18:04

but with a, kind of, “delta” extra work
18:04 - 18:07

you are using your effort at a much higher leverage
18:07 - 18:10

So that’s kind of my view of why TDD is a good thing
18:10 - 18:11

It’s really, it doesn’t require new skills
18:11 - 18:15

It just requires [you] to refactor your existing skills
18:15 - 18:18

I also tried when I—again, honest confessions, right?—
18:18 - 18:19

when I started doing this it was like
18:19 - 18:21

“Ok, I gonna be teaching a course on Rails
18:21 - 18:22

I should really focus on testing
18:22 - 18:24

So I went back to some code I had written
18:24 - 18:26

that was working—you know, that was decent code—
18:26 - 18:29

and I started trying to write tests for it
18:29 - 18:31

and it was *so painful*
18:31 - 18:33

because the code wasn’t written in way that was testable
18:33 - 18:34

There were all kinds of interactions
18:34 - 18:36

There were, like, nested conditionals
18:36 - 18:38

And if you wanted to isolate a particular statement
18:38 - 18:41

and have it test—to trigger test—just that statement
18:41 - 18:44

the amount of stuff you’d have to set up in your test
18:44 - 18:45

to have it happen—
18:45 - 18:46

remember when talked about mock train wrecks—
18:46 - 18:48

you have to set up all this infrastructure
18:48 - 18:49

just to get one line of code
18:49 - 18:51

and you do that and you go
18:51 - 18:52

“Gawd, testing is really not worth it!
18:52 - 18:54

I wrote 20 lines of setup
18:54 - 18:56

so that I could test two lines in my function!”
18:56 - 18:58

What that’s really telling you—as I now realize—
18:58 - 19:00

is your function is bad
19:00 - 19:01

It’s a badly written function
19:01 - 19:02

It’s not a testable function
19:02 - 19:03

It’s got too many moving parts
19:03 - 19:06

whose dependencies can be broken
19:06 - 19:07

There’s no seams in my function
19:07 - 19:11

that allow me to individually test the different behaviours
19:11 - 19:12

And once you start doing Test First Development
19:12 - 19:15

because you have to write your tests in small chunks
19:15 - 19:17

it kind of make this problem go away
19:17 -

So that’s been my epiphany

Title:: 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
Video Language:: English

	Ambrose LI edited English subtitles for 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
	Ambrose LI edited English subtitles for 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
	Ambrose LI edited English subtitles for 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
	Ambrose LI edited English subtitles for 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
	Ambrose LI edited English subtitles for 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
	Ambrose LI edited English subtitles for 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
	Ambrose LI edited English subtitles for 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives
	stanford-bot edited English subtitles for 5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives

Show all

English subtitles

Revisions

Revision 9

Ambrose LI

5.8 - 5.11 - Coverage, Unit vs. Integration Tests, Other Testing Concepts, and Perspectives

Revisions

Our website uses cookies

Operating cookies (Required)