RailsConf 2014 - Mutation Testing with Mutant by Erik Michaels-Ober

Edit subtitles

0:17 - 0:20

ERIK MICHAELS-OBER: OK. Is the mic live? Yeah?
We're good.
0:20 - 0:27

OK. Hi everybody. Welcome. Thank you for coming.
So,
0:29 - 0:34

this is gonna be a talk about tools. And
0:34 - 0:38

there's this common expression that says that
a carpenter
0:38 - 0:42

is only as good as his or her tools.
0:42 - 0:44

I'm not a carpenter, but that makes a lot
0:44 - 0:46

of sense to me. If your hammer is made
0:46 - 0:49

out of feathers, you're not gonna be able
to
0:49 - 0:51

build very much.
0:51 - 0:55

And I really think the same thing is true
0:55 - 0:59

for programmers. I know that that is true.
The
0:59 - 1:03

tools that we use really enable us to do
1:03 - 1:06

our job. And we use so many tools, it's
1:06 - 1:09

easy to sort of take for granted the tools
1:09 - 1:10

that we have and the tools that we do
1:10 - 1:13

use. And so I think it's worth sort of
1:13 - 1:15

thinking about the tools that we have and
how
1:15 - 1:19

they help us improve as a programmer. And
thinking
1:19 - 1:22

about what new tools we can use. In this
1:22 - 1:26

case, I'll be talking specifically about mutation
testing and
1:26 - 1:28

how that, as a tool, can really help us
1:28 - 1:32

all improve as programmers. Help us write
better tests.
1:32 - 1:35

But, I think, I just want to sort of
1:35 - 1:37

take some time to reflect and, and set a
1:37 - 1:40

little bit of a context for the tools that
1:40 - 1:42

we use every day and sort of, I think
1:42 - 1:44

take for granted a bit.
1:44 - 1:50

So, the first one is an editor. And it
1:50 - 1:52

seems like a very simple tool, right. You
just
1:52 - 1:54

type in text and it just shows up on
1:54 - 1:57

the screen. But it's incredibly sophisticated.
If you've ever
1:57 - 1:59

tried to write a text editor, if you've ever
1:59 - 2:01

read the source code of a text editor, most
2:01 - 2:04

text editors are like millions of lines of
code
2:04 - 2:08

to implement what seems like a relatively
simple thing.
2:08 - 2:10

And they help us. They provide us with things
2:10 - 2:15

like syntax highlighting, auto completion.
And this directly helps
2:15 - 2:18

us write better programs, right. We avoid
bugs. We'll
2:18 - 2:21

realize a bug in our editor before we, before
2:21 - 2:23

we deploy it to production. Before we even
run
2:23 - 2:26

tests, we'll find a bug in our editor. Because
2:26 - 2:30

our editor tells us about it.
2:30 - 2:37

This is an early version of Vim. So it
2:37 - 2:39

can, it can be really easy to forget sort
2:39 - 2:41

of what these tools used to look like, right.
2:41 - 2:44

This is how people used to write code. And
2:44 - 2:46

these look more like the sort of tools from
2:46 - 2:47

the wood shop than the tools that we're used
2:47 - 2:51

to using. So this is an early punch card
2:51 - 2:55

machine. The photo was taken in the, in the
2:55 - 2:59

computer history museum in Mountainview, California.
And I can
2:59 - 3:01

tell you for a fact that I would not
3:01 - 3:03

be a programmer today if this is how we
3:03 - 3:06

still had to write programs. And I suspect
that
3:06 - 3:09

many of you would not be programmers if this
3:09 - 3:12

was sort of the state-of-the-art in how it
was
3:12 - 3:12

done.
3:12 - 3:15

And so I think, like, I want to make
3:15 - 3:17

the case that sort of both the quality and
3:17 - 3:21

the quantity of software would be much worse
than
3:21 - 3:23

it is today, if not for sort of the
3:23 - 3:28

continued evolution of, of our tools.
3:28 - 3:30

Another tool I use every day is an interactive
3:30 - 3:35

debugger. So, sort of allows you to step through
3:35 - 3:37

your code, line by line, and better understand
how
3:37 - 3:39

it works. You can kind of get inside the
3:39 - 3:43

code, right. I'm not gonna spend too much
time
3:43 - 3:48

talking about debuggers. Sort of a public
service announcement,
3:48 - 3:52

next week, not next week. This week. Next
Thursday.
3:52 - 3:55

This Thursday. In this same room, I believe,
is
3:55 - 4:00

a great talk on debugger-driven development
with Pry. So,
4:00 - 4:02

if you're interested in hearing more about
that, you
4:02 - 4:03

should go to that.
4:03 - 4:07

So, what do we do when our code is
4:07 - 4:11

slow? What's the tool for that, right? We
have
4:11 - 4:14

profilers that tell us where time is being
spent
4:14 - 4:17

when we execute our code. And I wouldn't even
4:17 - 4:21

know how to start optimizing the program if
I
4:21 - 4:23

didn't have a profile, right, profiler. I
would be
4:23 - 4:27

a terrible optimizer without a profiler. I
guess I
4:27 - 4:30

would like start putting in, you know, t equals
4:30 - 4:33

time dot now, and then, like, at the end
4:33 - 4:36

of whatever I wanted to measure, I would subtract
4:36 - 4:40

the current time from the start time. But,
that's
4:40 - 4:43

crazy. Like, instrumenting your entire code
that way is,
4:43 - 4:47

yeah. Like, I wouldn't really know how to
optimize
4:47 - 4:49

code without a profiler. I wouldn't be as
good
4:49 - 4:52

at it. None of us would be.
4:52 - 4:57

And another sort of tool that is very prevalent
4:57 - 5:01

in the, in the Ruby community is testing.
This
5:01 - 5:03

is an example of someone who should have done
5:03 - 5:10

more testing. So that, again, just. Yeah.
All right.
5:12 - 5:15

So I think this is a good illustration of
5:15 - 5:20

how testing can save you, right. Test so that
5:20 - 5:22

you find out before you sort of run it
5:22 - 5:24

in production. OK.
5:24 - 5:26

Enough of that.
5:26 - 5:28

So, I would, I'm actually gonna make the case
5:28 - 5:31

that, in the Ruby, Ruby toolbox, or maybe
in
5:31 - 5:34

the Rubyist's toolbox, tests are sort of like
the
5:34 - 5:35

hammer, right. Like, this is the thing you
turn
5:35 - 5:39

to all the time for all sorts of things.
5:39 - 5:42

We use them to prevent regressions. We use
them
5:42 - 5:45

to specify behavior. And we actually use them
to
5:45 - 5:50

drive development. DHH doesn't do this, but
many others
5:50 - 5:53

do. And find it useful.
5:53 - 5:57

So if we write tests, then we have perfect
5:57 - 6:00

code, right. If we have tests that verify
that
6:00 - 6:03

our code does what it's supposed to do, then
6:03 - 6:05

at the end of the day, we have perfect
6:05 - 6:10

code. Correct? Not correct.
6:10 - 6:13

This is the fundamental logical flaw with
testing, right.
6:13 - 6:17

You have some code. And you know that code
6:17 - 6:19

can have bugs. So you say I have an
6:19 - 6:23

idea, let's write some tests. But tests are
just
6:23 - 6:27

more code. And we know that code has bugs.
6:27 - 6:29

So we're screwed.
6:29 - 6:31

What's that?
6:31 - 6:36

Test your tests. That's right. So. I'm getting
there.
6:36 - 6:36

Patience.
6:36 - 6:40

So, like, one tool that people use to sort
6:40 - 6:43

of measure the effectiveness of their tests
is code
6:43 - 6:50

coverage. And it's sort of a metric that's
designed
6:50 - 6:52

to tell you whether your tests do what they're
6:52 - 6:56

supposed to do. But I'll show you, in a
6:56 - 6:58

moment, why I think it's a really flawed metric
6:58 - 7:00

and why it sort of can give you a
7:00 - 7:03

false sense of security, right. A lot of people
7:03 - 7:07

think that they have 100% code coverage, and
that
7:07 - 7:09

means, like, their code is perfect and bug-free.
Or
7:09 - 7:11

if they reach that level, then their code
will
7:11 - 7:14

be perfect and bug-free. But this is not true.
7:14 - 7:17

Right, like, this guy thinks he's covered
and he's
7:17 - 7:18

not.
7:18 - 7:21

And code coverage is actually, like, it's
something that
7:21 - 7:24

was built into Ruby, right. Like, in Ruby
1.9.3,
7:24 - 7:26

this is something that, like, we as a programmer
7:26 - 7:28

community said, like, we want to have. And
I'm
7:28 - 7:31

not against it. Like, I think it's good. But
7:31 - 7:33

I do think it can give you a false
7:33 - 7:34

sense of security, right.
7:34 - 7:38

I thought this was a funny Tweet.
7:38 - 7:43

So, you can have 100% code coverage and still
7:43 - 7:50

have completely bug-ridden code. So, so is
there hope
7:51 - 7:54

for us? Right? Like, how do we, how do
7:54 - 7:57

we test our tests? It's sort of this problem
7:57 - 8:00

of, like, who will watch the watchers, right?
Who
8:00 - 8:02

do we, who can we trust? If we can't
8:02 - 8:04

trust our tests, how, why, why are we even
8:04 - 8:06

writing them?
8:06 - 8:08

And I'm gonna try to make the case that
8:08 - 8:12

mutation testing is the sort of solution to
this
8:12 - 8:18

problem. So just like everything else, like
an editor,
8:18 - 8:22

like an interactive debugger, like a profiler,
like tests,
8:22 - 8:26

mutation testing is a tool. The basic idea
behind
8:26 - 8:28

it is that it takes your tests and it
8:28 - 8:32

runs them against your code, and they should
pass.
8:32 - 8:33

And if they do pass, then what it does,
8:33 - 8:35

is it takes your code and it makes a
8:35 - 8:39

modification to your code. It actually changes
your code
8:39 - 8:43

at runtime. And then it runs your tests again,
8:43 - 8:46

against the modified version of your code.
And the
8:46 - 8:48

idea is that when that code is modified, the
8:48 - 8:52

tests that previously passed should now fail,
right.
8:52 - 8:55

So the thing, your modified code is called
a
8:55 - 8:57

mutant, and the idea is that if that test
8:57 - 9:01

fails, you kill the mutant. Right. The mutant
dies.
9:01 - 9:03

But if that mutant survives, then that means
there's
9:03 - 9:05

something wrong with your tests. There might
not be
9:05 - 9:07

something wrong with your code. But there
is certainly
9:07 - 9:10

something wrong with your tests. Either you
have a
9:10 - 9:12

bug in your tests. You have missing tests.
Your
9:12 - 9:17

tests are either over-specified or under-specified.
9:17 - 9:19

So this is a technique, it's very helpful
for
9:19 - 9:22

sort of answering the question, what tests
should I
9:22 - 9:25

write? Which I think is a question that many
9:25 - 9:28

of us struggle with. It's certainly something
that beginners
9:28 - 9:30

struggle with when they're starting to program.
Like, how
9:30 - 9:33

do I, how do I write tests? What, what
9:33 - 9:35

do I test? Right.
9:35 - 9:37

And then there's also this question of like,
how
9:37 - 9:39

do I know when I'm done? How do I
9:39 - 9:42

know when the code is sufficiently tested?
And I
9:42 - 9:46

think these are actually hard questions to
ask, or
9:46 - 9:51

hard questions to answer, and mutation, mutation
testing provides
9:51 - 9:54

a, a quantitative answer to those questions.
You can
9:54 - 9:59

say, with confidence, that this code has 100%
mutation
9:59 - 10:02

coverage.
10:02 - 10:09

So, just to sort of give an example, here
10:09 - 10:14

is some code. And an assertion about the code.
10:14 - 10:17

So, I have a method, foo. It takes an
10:17 - 10:21

argument whose default is true. And the actual
method
10:21 - 10:26

body for foo is either return that argument
or
10:26 - 10:33

fail. And my assertion says assert_nothing_raised
if I call
10:33 - 10:38

the method foo without passing in any parameters.
10:38 - 10:40

And so, or without passing in any arguments
to
10:40 - 10:47

the, our parameter, rather. And so what, you
know,
10:48 - 10:52

this test will pass, right. Arg. You call
foo.
10:52 - 10:55

Arg is true. And it sort of short-circuits,
right.
10:55 - 11:00

It sees arg. It sees the or. And this
11:00 - 11:03

test passes. So maybe you think this is a
11:03 - 11:06

good test. Maybe you think you're done writing
your
11:06 - 11:08

tests. But you are not.
11:08 - 11:11

And a mutant of that code, a small modification,
11:11 - 11:14

a sort of unit modification of that code might
11:14 - 11:17

look like this. And basically what it did
was
11:17 - 11:19

it just sort of took that or fail and
11:19 - 11:22

removed it. And the idea is like, if you
11:22 - 11:25

do that, at least one of your tests should
11:25 - 11:28

now, that was passing before, should now fail.
One
11:28 - 11:30

of your tests over that code, for that foo
11:30 - 11:34

method, should now fail. And if it does not,
11:34 - 11:38

then you are not testing your code sufficiently.
11:38 - 11:42

So, this is called a statement deletion mutation.
There
11:42 - 11:47

are various other types of mutation. So, for
example,
11:47 - 11:50

there are mutations that would take that default
parameter
11:50 - 11:52

and change it from true to false, or from
11:52 - 11:56

true to nil, right. Which would also cause
failure,
11:56 - 11:59

in this case.
11:59 - 12:01

There's another mutation that will take the
or and
12:01 - 12:05

change it to an and, right. So any time
12:05 - 12:08

there is sort of a unit in your code,
12:08 - 12:10

it takes greater than signs and changes them
to
12:10 - 12:13

less than or equal to signs, et cetera. Right.
12:13 - 12:16

It takes ifs and changes them to unless. It
12:16 - 12:18

will take whole expressions and negate them
and make
12:18 - 12:21

sure that your tests fail when the negation
of
12:21 - 12:25

a statement is, when, when the method returns
the
12:25 - 12:28

negation of the statement instead of the statement,
right.
12:28 - 12:31

So that's, that's sort of the core idea behind
12:31 - 12:34

mutation testing. And so you end up sort of
12:34 - 12:38

writing these tests to cover all these cases
that,
12:38 - 12:40

and then you sort of know when you're done,
12:40 - 12:43

right. Like, you know when all of your tests,
12:43 - 12:47

when, when your code is fully mutation-covered.
12:47 - 12:52

This is another Tweet. It's one from Katrina
Owen.
12:52 - 12:54

And it's sort of this idea, it's kind of
12:54 - 12:57

like both horrifying and satisfying at the
same time.
12:57 - 12:59

But if you sort of add more granular tests,
12:59 - 13:03

you'll find more bugs. And in many cases,
mutant,
13:03 - 13:05

which is a mutation testing framework, will
find those
13:05 - 13:08

bugs for you. Right. That's cool.
13:08 - 13:10

OK. So I promised there would be live-coding.
This
13:10 - 13:14

is sort of. The introduction is over and now
13:14 - 13:17

we will write some code. Hopefully.
13:17 - 13:24

I'm just gonna switch to mirror display. Command
F1.
13:33 - 13:37

That is a protip. That's great. You're a pro.
13:37 - 13:41

I clearly am not. OK. Cool.
13:41 - 13:44

Cool. And a new version of mutant was, like,
13:44 - 13:49

just released a few minutes ago, in advance
of
13:49 - 13:53

this presentation. I am not the author of
mutant.
13:53 - 13:57

It's a great library by Markus Schirp, and
I
13:57 - 14:00

encourage you all to check it out. Version
zero
14:00 - 14:03

dot five dot eleven, hot off the presses.
14:03 - 14:08

So this is some code. So, like, the, this
14:08 - 14:11

sort of thrust behind this live-coding demo
is I
14:11 - 14:13

will not be live-coding code, I will be live-coding
14:13 - 14:17

tests. Because the idea is not to, like, mutant
14:17 - 14:19

doesn't verify that your code is correct.
It verifies
14:19 - 14:21

that your tests are correct. So you still
need
14:21 - 14:23

to write tests, right. Tests verify that your
code
14:23 - 14:26

is correct. Mutant verifies that your tests
are correct.
14:26 - 14:29

So this is the code. And it's pretty, pretty
14:29 - 14:33

simple. But we'll sort of walk through it
line-by-line.
14:33 - 14:34

Just to make sure everyone has a good understanding
14:34 - 14:39

of it. And so there's this module that represents
14:39 - 14:43

the universe, the entire universe, and inside
of the
14:43 - 14:45

universe we have planets. And that's what
this class
14:45 - 14:49

is all about. It's a pretty simple planet.
It
14:49 - 14:55

takes a radius and an area as parameters when
14:55 - 14:59

it's constructed and stores those in instance
variables. The
14:59 - 15:03

radius is the mean radius of the planet and,
15:03 - 15:06

in kilometers, and the area is sort of surface
15:06 - 15:10

area of the planet in square kilometers.
15:10 - 15:14

And then there's one sort of interesting method,
one
15:14 - 15:21

public method, spherical. And spherical will
return true if,
15:23 - 15:26

if the planet is a perfect sphere, or within
15:26 - 15:29

a particular tolerance of that. So the idea
is
15:29 - 15:33

we calculate the approximate area using four
pi r
15:33 - 15:37

squared, which is the formula to calculate
the area
15:37 - 15:44

of a sphere, and if the area sort of
15:44 - 15:47

matches that, then we know it's a sphere.
We
15:47 - 15:50

know it's spherical. This method returns true.
15:50 - 15:53

And if, if that's not true, then the planet
15:53 - 15:57

is not spherical. It's either oblate, like
the earth,
15:57 - 16:02

or prolate, and then this method will return
false.
16:02 - 16:04

So, yeah. We just sort of calculate the approximate
16:04 - 16:07

area and then we have this ranged private
method
16:07 - 16:10

that just generates a range. We need sort
a
16:10 - 16:12

tolerance. The idea is you don't want it to
16:12 - 16:17

be too precise, because we're dealing with
pi, so
16:17 - 16:23

pi is, I mean, in actuality, it's a non-terminating
16:23 - 16:27

number. In Ruby, it has, like, ten digits
of
16:27 - 16:29

precision or something like that, right. Like
the constant
16:29 - 16:30

map pi.
16:30 - 16:33

But the idea is that, like, if it's close
16:33 - 16:36

enough to a sphere, within a particular tolerance,
then
16:36 - 16:40

we'll just call it round, basically. And so
we
16:40 - 16:44

generate this range, which is sort of the
approximate
16:44 - 16:47

area that we've calculated, based on the radius
plus
16:47 - 16:49

or minus the tolerance, and we see if the
16:49 - 16:53

area falls within those bounds. Does everyone
understand this
16:53 - 16:56

code? I think it is pretty simple. I tried
16:56 - 16:58

to make it fit on one screen. On one
16:58 - 16:59

slide.
16:59 - 17:01

Yeah?
17:01 - 17:05

OK. So if everyone understands it, I want
to
17:05 - 17:07

take a little bit of a poll. This is
17:07 - 17:09

kind of like the interactive part of the talk.
17:09 - 17:11

And you have to, like, everyone has to participate.
17:11 - 17:15

That's the, that's the goal. Everyone, people
like to
17:15 - 17:17

sort of sit by the sidelines and not commit,
17:17 - 17:19

but you have to commit. I'll be really angry
17:19 - 17:23

if you don't.
17:23 - 17:26

You don't want to see me angry.
17:26 - 17:29

So how many tests do you think you need
17:29 - 17:35

to fully cover this code? To cover the public
17:35 - 17:39

method, the, the spherical method, right,
so that it's
17:39 - 17:42

sort of fully exercised. Who thinks you need
zero
17:42 - 17:49

tests? Show of hands? Anybody? No. Good. I
agree.
17:49 - 17:52

You can't cover code without tests. So, that's
good.
17:52 - 17:54

You've been paying some attention.
17:54 - 17:57

Who thinks you can do it with one test?
17:57 - 18:00

Maybe, sort of, the happy path? Right. You
write
18:00 - 18:04

a test that says, you know, you expect some
18:04 - 18:09

planet to be spherical given radius and an
area,
18:09 - 18:16

and it is. All good. Who thinks that's sufficient?
18:18 - 18:20

Nobody. So.
18:20 - 18:24

You can actually get C-zero, 100% C-zero code
coverage
18:24 - 18:27

of this entire class with one test. With one
18:27 - 18:31

spec. Right. You won't have 100% mutation
coverage, but
18:31 - 18:33

I will show you, in a minute, you can
18:33 - 18:36

have 100% C-zero code coverage, despite the
fact that
18:36 - 18:39

nobody in this room thinks that that is sufficient
18:39 - 18:42

to cover this code. So. I will prove it
18:42 - 18:44

to you. But you all intuitively know this
to
18:44 - 18:47

be the case. And yet we all idolize this
18:47 - 18:51

C-zero code coverage metric as if it means
something,
18:51 - 18:54

when really it, it's a false sense of security,
18:54 - 18:56

right. You're the guy with the umbrella in
the
18:56 - 18:59

hurricane, and the umbrella is like destroyed
and inside
18:59 - 19:01

out.
19:01 - 19:05

OK. So how many people think you can do
19:05 - 19:10

it with two tests? OK. Somebody who's raising
your
19:10 - 19:12

hand. This gentleman in the front. What are
the
19:12 - 19:14

two tests that you would write? Just sort
of
19:14 - 19:18

roughly? Maybe the happy path and what other?
19:18 - 19:20

AUDIENCE: One that's spherical and one not.
19:20 - 19:22

E.M.: One that's spherical and one that's
not. OK.
19:22 - 19:24

I think that's good. How many people think
you
19:24 - 19:28

would need three to do it? K, maybe gentleman
19:28 - 19:30

there who thinks we need three. What's the
third
19:30 - 19:33

you would write?
19:33 - 19:37

AUDIENCE: [indecipherable - 00:19:41]
19:37 - 19:43

E.M.: Say it again? A value for tolerance?
19:43 - 19:46

AUDIENCE: A value that will blow up the computation.
19:46 - 19:47

E.M.: That will blow up the computation. How
would
19:47 - 19:49

you blow up the computation?
19:49 - 19:54

AUDIENCE: [indecipherable - 00:19:55]
19:54 - 19:56

E.M.: Passing in a string.
19:56 - 19:58

AUDIENCE: Yes.
19:58 - 20:02

E.M.: OK. Great. And what would you expect
the
20:02 - 20:04

result to be, like, what would your expectation,
what
20:04 - 20:06

would you assert? Like, I pass in a string
20:06 - 20:09

and I expect.
20:09 - 20:12

AUDIENCE: An exception to be raised.
20:12 - 20:13

E.M.: An exception. OK. And if you didn't
get
20:13 - 20:16

an exception then that would be a problem.
20:16 - 20:18

AUDIENCE: Yes.
20:18 - 20:24

E.M.: OK. OK. Who thinks four will do it?
20:24 - 20:26

Nobody thinks four will do it. A few people
20:26 - 20:34

do. Yeah. What additional tests would you
add?
20:34 - 20:40

AUDIENCE: Well, you're testing a range, so
you have-
20:40 - 20:40

E.M.: Hmm. Great.
20:40 - 20:40

AUDIENCE: -so there's two sides.
20:40 - 20:41

E.M.: I really like this. So, the comment
was
20:41 - 20:43

that you're testing a range, and there's sort
of
20:43 - 20:45

two sides. There's the, I'm on the low-end
of
20:45 - 20:47

the range and I am included, and I am
20:47 - 20:50

on the high-end of the range. So it would
20:50 - 20:53

be, there's two of those. Right. One for the
20:53 - 20:54

low-end and one for the high-end. Exactly.
20:54 - 20:57

So, it's sort of the happy path. The thing
20:57 - 21:00

is spherical. The sad path, the thing is not
21:00 - 21:05

spherical. And both sides of the range. I
like
21:05 - 21:08

that. Good. How many people think five? Five
or
21:08 - 21:11

more? How's that? Five or more. OK. Lots of
21:11 - 21:13

hands for five or more.
21:13 - 21:20

So, according to mutant, which is also software,
therefore
21:20 - 21:23

imperfect, you can, you can test this with
four,
21:23 - 21:26

and it will not handle things like you should,
21:26 - 21:29

like, it sort of assumes that the radius and
21:29 - 21:34

area are valid, right. Like, you can, although,
actually,
21:34 - 21:36

maybe that's. Well, we can try it. It's a
21:36 - 21:38

live coding thing. So let's just do it and
21:38 - 21:41

see what happens. But thank you for participating
in
21:41 - 21:44

that. I think it was an interesting exercise.
21:44 - 21:47

But, yeah. Basically, like, mutant says the
answer to
21:47 - 21:50

this question is four, right. It's basically
the happy
21:50 - 21:52

path, the sad path, and both sides of the
21:52 - 21:58

range. So yeah. Let's, let's sort of show
how
21:58 - 22:01

that works.
22:01 - 22:08

OK. So I'm gonna start by just making a
22:08 - 22:10

gemfile, as you do. So let me, I can
22:10 - 22:16

just sort of show. It's a very simple layout
22:16 - 22:19

so far. I have a lib directory, which contains
22:19 - 22:23

universe dot rb, which you've all seen. And
a
22:23 - 22:26

spec directory which is empty. So, very little
up
22:26 - 22:28

my sleeve at this point.
22:28 - 22:35

I'm just gonna make a gemfile, as you do.
22:41 - 22:43

And at this point I'm just gonna add rspec,
22:43 - 22:46

cause I'm starting to write some tests, and
I'm
22:46 - 22:48

gonna add mutant.
22:48 - 22:55

OK. So, and we'll bundle install. Ah. Cool.
It
22:59 - 23:03

just installed that new version of mutant
that was
23:03 - 23:06

just released moments ago. Good. Let me just
see
23:06 - 23:12

what Ruby version I'm on. OK. That should
be
23:12 - 23:13

fine.
23:13 - 23:15

So. Let's write some specs. So we have the
23:15 - 23:21

spec directory. Let's write planet_spec dot
rb. And we'll
23:21 - 23:26

require rspec and we'll require our planet
file. I'll
23:26 - 23:29

just use require_relative for that, rather
than messing with
23:29 - 23:31

the load path or anything. So that's up a
23:31 - 23:36

directory in lib and I think it's called universe.
23:36 - 23:41

And now let's start writing our specs, right.
So
23:41 - 23:48

we're just gonna describe our planet in our
universe
23:48 - 23:55

model. And. So let's create a subject, which
is
23:59 - 24:02

just gonna be our planet. That's like the
main
24:02 - 24:05

thing that we're gonna be testing here. And
it's
24:05 - 24:08

initialized with a radius and an area. I believe
24:08 - 24:12

in that order. Yup.
24:12 - 24:19

Cool. So let's create a context. And let's
do
24:20 - 24:22

the happy path first, because that was kind
of,
24:22 - 24:24

like, we all agreed that the first path we
24:24 - 24:28

should write was the happy path. So in this
24:28 - 24:32

case, Venus is actually the happy path. Venus
is
24:32 - 24:37

pretty darn close to spherical. So in this
case
24:37 - 24:44

we'll define the radius to be. Oops. Cool.
25:01 - 25:05

And I think I said it's in meters, yeah?
25:05 - 25:11

So it'll be that. And then the area will
25:11 - 25:18

be. Eh, let's see. Wikipedia. OK. So the surface
25:27 - 25:34

area is, what is that? Four-hundred sixty
million? Which
25:34 - 25:37

is OK. But actually, like, I would like a
25:37 - 25:40

more precise number, because, like, I don't
want to
25:40 - 25:42

crank up our tolerance to some ridiculous
value to
25:42 - 25:45

make this true. So I actually found a more
25:45 - 25:48

precise number than the one that's on Wikipedia,
which
25:48 - 25:52

is this. So it's four-hundred sixty million,
two hundred
25:52 - 25:55

sixty-four thousand, seven-hundred forty.
Which is, you know, pretty
25:55 - 25:58

round number still, but it's more precise
than the
25:58 - 26:01

one on Wikipedia.
26:01 - 26:03

And now we'll have our assertion. So we'll
just
26:03 - 26:10

say it's spherical. Venus is spherical. We
expect our
26:10 - 26:17

subject to be spherical. Good? Is everyone
satisfied? Do
26:19 - 26:21

I, like, if people see bugs, call them out.
26:21 - 26:23

Like, does this look like a good happy path
26:23 - 26:28

test? Yes? This will pass?
26:28 - 26:35

Good. Let's run it. Yup. That should work.
Cool.
26:40 - 26:43

It passed. Hooray.
26:43 - 26:45

Let's do something else. Let's open up our
gemfile
26:45 - 26:49

again and add simplecov to measure the C-0
code
26:49 - 26:56

coverage. And I guess here we can just say
26:57 - 27:03

require simplecov. SimpleCov.start. And so
now, if we run
27:03 - 27:10

our specs again, we'll get a little coverage
report.
27:11 - 27:17

Tada!
27:17 - 27:19

So for those who aren't that familiar with
simplecov,
27:19 - 27:24

basically it looks to make sure that your,
that
27:24 - 27:26

every line of code is executed, and if you
27:26 - 27:30

test the happy path, it totally is, right?
The
27:30 - 27:34

class, the module is loaded, the class is
loaded,
27:34 - 27:41

this constant is set. We initialize. We initialize
a
27:41 - 27:46

planet. I can turn on lines. We initialize
a
27:46 - 27:51

planet on line nine. We invoke this spherical
method
27:51 - 27:56

on line fifteen, in the assertion. And that
invokes
27:56 - 27:59

the range method. So we have, you can actually
27:59 - 28:02

see every line of code is executed precisely
one
28:02 - 28:04

time.
28:04 - 28:07

So we have, we're not over-testing. We're
not under-testing.
28:07 - 28:11

We have perfect, a hundred percent C-zero
code coverage.
28:11 - 28:14

But we all agreed that this was completely
insufficient.
28:14 - 28:15

So-
28:15 - 28:16

AUDIENCE: Ship it.
28:16 - 28:19

E.M.: Ship it. K. Right.
28:19 - 28:21

AUDIENCE: Force push.
28:21 - 28:27

E.M.: I'm gonna delete this simplecov stuff
cause it's
28:27 - 28:29

garbage.
28:29 - 28:34

OK. So let's write some more tests. So a
28:34 - 28:41

planet that's not spherical is. No. That's
my name.
28:41 - 28:48

Thank you. Is our home. The earth. Radius
of
28:53 - 29:00

the earth. Cool. I guess we could say point
29:06 - 29:13

one. Doesn't really matter. And. Oops. What's
the area?
29:22 - 29:29

Cool. So in square kilometers, it's five-hundred
ten. Five-hundred
29:32 - 29:37

ten million, rather.
29:37 - 29:38

So we, again, we could, like, try to find
29:38 - 29:40

a number that's more precise, but we actually,
like,
29:40 - 29:43

the whole point of this test is to test
29:43 - 29:45

a planet that is an oblate spheroid, not an
29:45 - 29:48

actual sphere. And so in this case, we want
29:48 - 29:51

to, so like, it's fine that the numbers are
29:51 - 29:56

not within the default tolerance. And so,
yeah. Basically
29:56 - 30:00

we want to say, like, it is oblate. Not
30:00 - 30:06

spherical.
30:06 - 30:11

So in this case, we expect our subject not
30:11 - 30:18

to be spherical. Cool. Look good? Let's run
it.
30:22 - 30:28

Cool. Our tests pass.
30:28 - 30:32

So this is, like, maybe your normal workflow.
You
30:32 - 30:33

would do this. A few of you would stop
30:33 - 30:35

at this point. I think there were probably
as
30:35 - 30:37

many hands for, like, I would stop at two,
30:37 - 30:39

or probably more tests, for like, I would
stop
30:39 - 30:42

at two, than I would stop at four or
30:42 - 30:45

three. But let me show, let me show what
30:45 - 30:46

mutant does.
30:46 - 30:48

Let me show sort of how this mutation testing
30:48 - 30:54

stuff works. So you're gonna say bundle exec.
Or,
30:54 - 30:58

I have it aliased to b-e. I can spell
30:58 - 31:02

that out. So this is the mutant command line,
31:02 - 31:05

and it takes a bunch of arguments. So you
31:05 - 31:07

have to give it a lib for the sort
31:07 - 31:09

of lib directory that you're testing so that
it
31:09 - 31:12

knows to add that to the load path. And
31:12 - 31:16

then you give it a require. So it's gonna
31:16 - 31:20

require some specific library, in this case
the universe
31:20 - 31:24

library that you wrote. And then you can say,
31:24 - 31:26

like, I want to test everything in universe,
or
31:26 - 31:30

you can say, like, with wild cards like colon
31:30 - 31:32

colon universe star. I can make that a little
31:32 - 31:34

smaller so it fits on one line.
31:34 - 31:37

Or you can say, like, I want to test
31:37 - 31:41

specifically the planet class, or you say,
like, I
31:41 - 31:42

want to test a particular method. So you can
31:42 - 31:45

say, like, I want to test spherical. Something
like
31:45 - 31:47

that. Right. But we want to test the whole
31:47 - 31:48

planet class.
31:48 - 31:51

Oh, and you also, there's an option to say
31:51 - 31:54

use rspec, so it knows what test framework
to
31:54 - 31:59

run. This is important, because it's testing
your tests.
31:59 - 32:01

And I am getting some sort of an error.
32:01 - 32:05

Ah. I am missing mutant-rspec in my gemfile.
That
32:05 - 32:10

is easy to fix. Right. So.
32:10 - 32:12

Rspec used to be built in. This has changed
32:12 - 32:16

recently. So basically there are other libraries.
There's like
32:16 - 32:18

plugin library. So if you want to write, if
32:18 - 32:21

you use some crazy test-framework, you can
just write
32:21 - 32:23

a gem that adds mutant support for that test
32:23 - 32:26

framework. So this happens to be the one for
32:26 - 32:29

rspec. But you can use one for test-unit or
32:29 - 32:30

anything else.
32:30 - 32:33

So. BI is just a short-cut for bundle install.
32:33 - 32:38

And we'll do this. Cool.
32:38 - 32:40

So what it is doing, you're like, what, this
32:40 - 32:43

is crazy. We only wrote two tests. Why are
32:43 - 32:45

there all those little green dots and Fs flying
32:45 - 32:52

by? So basically what's happening is we, it's
taking
32:54 - 32:59

our two tests and it's running through these
various
32:59 - 33:01

mutations. In this case, it made eight-three
mutations to
33:01 - 33:05

our code, based on what we used, right. Like,
33:05 - 33:08

so, depending on, like, if you use an and,
33:08 - 33:09

it will convert it to an or. But if
33:09 - 33:12

you don't use that, you can't, you do that
33:12 - 33:12

mutation.
33:12 - 33:17

So, in this case, there was eighty-three mutations.
Eighty-three
33:17 - 33:19

sort of mutants. And eighty-two of those mutants
were
33:19 - 33:24

killed. So there, in this case, was one that
33:24 - 33:27

was not. And you get this really cool output,
33:27 - 33:31

diff output. So it basically says, this is
the
33:31 - 33:34

mutation we did that was not killed. We took,
33:34 - 33:40

what is it, line twenty-four? Was that? Is
there
33:40 - 33:45

a comment? We took line twenty-five, right,
this range
33:45 - 33:50

method, and we deleted the code that you wrote
33:50 - 33:52

and we mutated it in this way. We got
33:52 - 33:54

rid of that minus T. And it turned out
33:54 - 33:57

that even after we made that mutation, all
of
33:57 - 34:00

your tests still passed.
34:00 - 34:03

Actually, maybe it would be helpful, like,
I can
34:03 - 34:10

show with earth. So before we do earth, this
34:10 - 34:12

is what the mutation output would look like.
Right.
34:12 - 34:13

So. I just want to give you a sense
34:13 - 34:16

of, like, all the different mutations and
kind of
34:16 - 34:18

how they work and what the output looks like.
34:18 - 34:20

So if we don't have the sort of unhappy
34:20 - 34:23

path where it returns false, these are the
various
34:23 - 34:26

mutations it runs. So there was this one,
which
34:26 - 34:27

we saw earlier, where it removes the minus
T
34:27 - 34:30

from the range and it still passes because
we're
34:30 - 34:33

sort of in the top half of that range.
34:33 - 34:36

There's this other one where it gets rid of
34:36 - 34:38

the n, so the beginning part of the range,
34:38 - 34:40

and it just puts in t there.
34:40 - 34:44

Here, it actually gets rid of that call to
34:44 - 34:48

dot cover, and it turns out that, because
the
34:48 - 34:50

range returns true and you haven't put in
a
34:50 - 34:53

thing that says it should return false, that
this
34:53 - 34:56

also passes, right. So, in this case, you're
just
34:56 - 34:59

returning the range. But that is truthy. And
so
34:59 - 35:03

this, this test fails.
35:03 - 35:05

If you wanted to write a more precise test,
35:05 - 35:08

instead of saying. No, I guess that's right.
So,
35:08 - 35:12

in this case it's just gonna check whether
that
35:12 - 35:14

method is truthy or falsey, and in this case
35:14 - 35:16

it's truthy if it just returns the range.
Right?
35:16 - 35:18

And you're not testing that it would ever
be
35:18 - 35:20

falsey.
35:20 - 35:24

Also, if you just return the instance variable
area,
35:24 - 35:27

so if you basically throw away everything
except that
35:27 - 35:31

last argument to the cover method, this turns
out
35:31 - 35:35

to also, like, you have no tests that covers
35:35 - 35:39

this. And actually you can delete that whole
line,
35:39 - 35:42

and the previous line, approximate area, like
you get
35:42 - 35:44

the same result. Like, the fact that you have
35:44 - 35:46

an approximate area and that is truthy and
you
35:46 - 35:48

are only testing that this method returns
a truthy
35:48 - 35:52

value means that this test will pass.
35:52 - 35:59

So I just wanted to show that. I can
35:59 - 36:02

bring this back. Cool.
36:02 - 36:05

So now we're in a place where, oops. OK.
36:05 - 36:12

So our tests will pass. And we have one
36:12 - 36:16

mutant that we need to kill. So does anyone
36:16 - 36:19

have an idea for how to kill this mutant?
36:19 - 36:26

AUDIENCE: Pass in a tolerance. [indecipherable
- 00:36:27] Pass
36:27 - 36:29

in zero tolerance.
36:29 - 36:33

E.M.: So the suggestion was to pass in a
36:33 - 36:36

zero tolerance. So let's try that. So should
I
36:36 - 36:40

just, should we make up a planet or, how
36:40 - 36:41

do you want to do that? We could do
36:41 - 36:42

Mars, maybe?
36:42 - 36:44

AUDIENCE: Venus shouldn't be spherical with
a tolerance of
36:44 - 36:46

E.M.: Ah. Venus shouldn't be spherical with
a tolerance
36:46 - 36:51

of zero. So that's true. So we can sort
36:51 - 36:54

of change this one to be, it is spherical,
36:54 - 36:55

give the default tolerance.
36:55 - 36:57

AUDIENCE: Yes.
36:57 - 37:01

E.M.: That's what that tests. Right. It's
spherical-ish. I
37:01 - 37:04

like that. Ish.
37:04 - 37:11

But is not perfectly spherical. And so here
we
37:16 - 37:18

would expect this not to be spherical, given
a
37:18 - 37:22

tolerance of zero. Yeah? So let's first run
that
37:22 - 37:29

test. Cool. So that passes. It is not perfectly
37:30 - 37:35

spherical, and it is spherical-ish. We didn't
break that
37:35 - 37:38

test. OK, so now let's do the same thing
37:38 - 37:45

with our mutant command.
37:45 - 37:51

So the mutant still lives. Why?
37:51 - 37:53

So to make this fail, what we need to
37:53 - 37:56

do is we need to pass in a tolerance
37:56 - 37:59

that falls in the bottom half of the range.
37:59 - 38:03

So, in this case, Venus is slightly the area
38:03 - 38:09

of Venus is slightly above the perfect sphericism
or
38:09 - 38:14

whatever, right. It's not, it's on the high-end
of
38:14 - 38:16

the range. So what we need to do is
38:16 - 38:20

we need to find a planet that is actually
38:20 - 38:22

on the low-end of the range, right, where
it's
38:22 - 38:29

less. It's spherical, but within the tolerance,
but it's,
38:29 - 38:34

yeah. On the low-end of the range. Make sense?
38:34 - 38:40

So yeah. I don't know. Like, what we could
38:40 - 38:43

do to test, like, we could, I, I don't
38:43 - 38:45

want to necessarily like look up more planets
and
38:45 - 38:49

their radiuses. But we could do something
like this.
38:49 - 38:53

So this is, sorry, that's not earth. This
is,
38:53 - 38:54

like.
38:54 - 38:57

AUDIENCE: Rubinius 5.
38:57 - 39:00

E.M.: Rubinius 5. I like that. Thank you for
39:00 - 39:05

the suggestion from the audience. And Rubinius
5. Let's
39:05 - 39:07

sort of make it easy for ourselves. So we'll
39:07 - 39:14

say the radius is zero point five, right.
So
39:14 - 39:17

if we put that in our formula, zero point
39:17 - 39:21

five squared is a quarter, and then a quarter,
39:21 - 39:26

when it sort of cancels out the multiple by
39:26 - 39:29

four. You div, you're dividing by four basically.
So
39:29 - 39:33

the, we know that the actual area should be
39:33 - 39:36

pi. So then we can just say something like,
39:36 - 39:44

let the area be Math::PI. And we want it
39:45 - 39:48

to fall, we want the area to be below
39:48 - 39:49

the range. Right, so we want it to be
39:49 - 39:52

like, Math::Pi minus, like, some amount that
falls within
39:52 - 39:57

the tolerance or whatever. Right? Make sense?
39:57 - 40:02

And then we expect that this is gonna be
40:02 - 40:09

spherical. Ish. Within the default tolerance.
Cool. OK. So
40:20 - 40:26

let's run that. Specs pass. And have we killed
40:26 - 40:33

the last mutant? Nice. Yeah.
40:33 - 40:37

Yeah! So.

Title:: RailsConf 2014 - Mutation Testing with Mutant by Erik Michaels-Ober
Description:: more » « less
Duration:: 41:02

Amara Bot edited English subtitles for RailsConf 2014 - Mutation Testing with Mutant by Erik Michaels-Ober

English subtitles

Revisions

Revision 1 Imported

Amara Bot

RailsConf 2014 - Mutation Testing with Mutant by Erik Michaels-Ober

Revisions

Our website uses cookies

Operating cookies (Required)