-
AUSTIN PUTMAN: This is the last session before
happy hour.
-
I appreciate all of you for hanging around
-
this long. Maybe you're here because,
-
you don't know, there's a bar on the
-
first floor of this hotel. I think that
-
is where the main track is currently taking
place.
-
I am Austin Putman. I am the VP of
-
engineering for Omada Health. At Omada, we
support people
-
at risk of chronic disease, like diabetes,
make crucial
-
behavior changes and have longer, healthier
lives. So, it's
-
pretty awesome.
-
I'm gonna start with some spoilers, because
I want
-
you to have an amazing RailsConf. So if this
-
is not what you're looking for, don't be shy
-
about finding that bar track. We're gonna
spend some
-
quality time with Capybara and Cucumber, whose
flakiness is
-
legendary, for very good reasons.
-
Let me take your temperature. Can I see hands?
-
How many people have had problems with random
failures
-
in Cucumber or Capybara? Yeah. Yeah. This
is reality,
-
folks.
-
We're also gonna cover the ways that Rspec
does
-
and does not help us track down test pollution.
-
How many folks out there have had a random
-
failure problem in the Rspec suite, like in
your
-
models or your controller tests? OK, still
a lot
-
of people, right. It happens. But we don't
talk
-
about it.
-
So in between, we're gonna review some problems
that
-
can dog any test suite. This is like, random
-
data, times zone heck, external dependencies.
All this leads
-
to pain. There was a great talk before about
-
external dependencies.
-
Just, here's just a random one. How many people
-
here have had a test fail due to a
-
daylight savings time issue? Yeah. Ben Franklin,
you are
-
a menace.
-
Let's talk about eliminating inconsistent
failures in your tests,
-
and on our team, we call that fighting randos.
-
And I'm here to talk about this, because I
-
was stupid and short-sighted, and random failures
caused us
-
a lot of pain. I chose to try to
-
hit deadlines instead of focusing on build
quality, and
-
our team paid a terrible price.
-
Anybody out there paying that price? Anybody
out there
-
feel me on this? Yeah. It's, it sucks.
-
So let's do some science. Some problems seem
to
-
have more random failure problems than others.
I want
-
to gather some data. So first, if you write
-
tests on a regular basis, raise your hand.
Right?
-
Wow. I love RailsConf. Keep your hand up if
-
you believe you have experienced a random
test failure.
-
The whole room.
-
Now, if you think you're likely to have one
-
in the next, like, four weeks. Who's out there?
-
It's still happening, right. You're in the
middle of
-
it. OK, so this is not hypothetical for this
-
audience. This is a widespread problem. But
I don't
-
see a lot of people talking about it.
-
And the truth is, while being a great tool,
-
a comprehensive integration suite, is like
a breeding ground
-
for baffling Heisenbugs.
-
So, to understand how test failures become
a chronic
-
productivity blocker, I want to talk a little
bit
-
about testing culture, right. Why is this
even bad?
-
So, we have an automated CI machine that runs
-
our full test suite every time a commit is
-
pushed. And every time the bill passes, we
push
-
the new code to a staging environment for
acceptance.
-
Right, that's our process. How many people
out there
-
have a setup that's kind of like that? OK.
-
Awesome. So a lot of people know what I'm
-
talking about.
-
So, in the fall of 2012, we started seeing
-
occasional, unreproducible failures of the
test suite in Jenkins.
-
And we were pushing to get features out the
-
door for January first. And we found that
we
-
could just rerun the build and the failure
would
-
go away.
-
And we got pretty good at spotting the two
-
or three tests where this happened. So, we
would
-
check the output of a failed build, and if
-
it was one of the suspect tests, we would
-
just run the build again. Not a problem. Staging
-
would deploy. We would continue our march
towards the
-
launch.
-
But by the time spring rolled around, there
were
-
like seven or eight places causing problems
regularly. And
-
we would try to fix them, you know, we
-
wouldn't ignore them. But the failures were
unreliable. So
-
it was hard to say if we had actually
-
fixed anything.
-
And eventually we just added a gem called
Cucumber-Rerun.
-
Yeah. And this just reruns the failed specs
if
-
there's a problem. And when it passed the
second
-
time, it's good. You're fine. No big deal.
-
And then some people on our team got ambitious,
-
and they said, we could make it faster. We
-
could make CI faster with the parallel_test
gem, which
-
is awesome. But Cucumber-Rerun and parallel_test
are not compatible.
-
And so we had a test suite that ran
-
three times faster, but failed twice as often.
-
And as we came into the fall, we had
-
our first Bad Jenkins week. On a fateful Tuesday,
-
4 PM, the build just stopped passing. And
there
-
were anywhere from like thirty to seventy
failures. And
-
some of them were our usual suspects, and
dozens
-
of them were, like, previously good tests.
Tests we
-
trusted.
-
And, so none of them failed in isolation,
right.
-
And after like two days of working on this,
-
we eventually got a clean Rspec build, but
Cucumber
-
would still fail. And the failures could not
be
-
reproduced on a dev machine, or even on the
-
same CI machine, outside of the, the whole
build
-
running.
-
So, over the weekend, somebody pushes a commit
and
-
we get a green build. And there's nothing
special
-
about this commit, right. Like, it was like,
a
-
comment change. And we had tried a million
things,
-
and no single change obviously lead to the
passing
-
build.
-
And the next week, we were back to like,
-
you know, fifteen percent failure rate. Like,
pretty good.
-
So, we could push stories to staging again,
and
-
we're still under the deadline pressure, right.
So, so
-
we shrugged. And we moved on, right. And maybe
-
somebody wants to guess, what happened next?
Right?
-
Yeah. It happened again, right. A whole week
of
-
just no tests pass. The build never passes.
So
-
we turn of parallel_tests, right. Because
we can't even
-
get like a coherent log of which tests are
-
causing errors, and then we started commenting
out the
-
really problematic tests, and there were still
these like
-
seemingly innocuous specs that failed regularly
but not consistently.
-
So these are tests that have enough business
value
-
that we are very reluctant to just, like,
delete
-
them.
-
And so we reinstated Cucumber-Rerun, and its
buddy Rspec-Rerun.
-
And this mostly worked, right. So we were
making
-
progress. But the build issues continued to
show up
-
in the negative column in our retrospectives.
And that
-
was because there were several problems with
this situation,
-
right. Like, reduced trust. When build failures
happen four
-
or five times a day, those aren't a red
-
flag. Those are just how things go. And everyone
-
on the team knows that the most likely explanation
-
is a random failure.
-
And the default response to a build failure
becomes,
-
run it again. So, just run it again, right.
-
The build failed. Whatever. So then, occasionally,
we break
-
things for real. But we stopped noticing because
we
-
started expecting CI to be broken. Sometimes
other pairs
-
would pull the code and they would see the
-
legitimate failures. Sometimes we thought
we were having a
-
Bad Jenkins week, and on the third or fourth
-
day we realized we were having actual failures.
-
This is pretty bad, right.
-
So our system depends on green builds to mark
-
the code that can be deployed to staging and
-
production, and without green builds, stories
can't get delivered
-
and reviewed. So we stopped getting timely
feedback. Meanwhile,
-
the reviewer gets, like, a week's worth of
stories.
-
All at once. Big clump.
-
And that means they have less time to pay
-
attention to detail on each delivered feature.
And that
-
means that the product is a little bit crappier
-
every week. So, maybe you need a bug fix.
-
Fast. Forget about that. You've got, like,
a twenty
-
percent chance your bug fix build is gonna
fail
-
for no reason.
-
Maybe the code has to ship, because the app
-
is mega busted. In this case, we would rerun
-
the failed tests on our local machine, and
then
-
cross our fingers and deploy. So, in effect,
our
-
policy was, if the code works on my machine,
-
it can be deployed to production.
-
So. At the most extreme, people lose faith
in
-
the build, and eventually they just forget
about testing.
-
And this didn't happen to us, but I had
-
to explain to management that key features
couldn't be
-
shipped because of problems with the test
server. And
-
they wanted to know a lot more about the
-
test server. And it was totally clear that
while
-
a working test server has their full support,
an
-
unreliable test server is a business liability
and needs
-
to be resolved.
-
So, the test server is supposed to solve problems
-
for us, and that is the only story that
-
I like to tell about it. So, we began
-
to fight back. And we personified the random
failures.
-
They became randos. A rando attack. A rando
storm.
-
And most memorably, Rando Backstabbian. Intergalactic
randomness villain.
-
We had a pair working on the test suite
-
full time for about three months trying to
resolve
-
these issues. We tried about a thousand things,
and
-
some of them worked. And I'm gonna pass along
-
the answers we found, and my hypothesis that
we
-
didn't disprove. Honestly, I'm hoping that
you came to
-
this talk because you've had similar problems
and you
-
found better solutions. So, this is just what
we
-
found.
-
I, I have a very important tool for this
-
section of the talk. It's the finger of blame.
-
We use this a lot when we were like,
-
hey, could the problem be Cucumber? And then
we
-
would go after that. So here comes finger
of
-
blame.
-
Cucumber! Capybara. Poltergeist. Definitely
part of the problem. I've
-
talked to enough other teams that use these
tools
-
extensively, and the evidence from our audience,
just to
-
know that the results are just not as deterministic
-
as we want. When you're using multiple threads
and
-
you're asserting against a browser environment,
you're gonna have
-
so issues, right.
-
And one of those is browser environment, right.
Browser
-
environment is a euphemism for, like, a complicated
piece
-
of software that itself is a playground for
network
-
latency issues and rendering hiccups and a
callback soup.
-
So your tests have to be written in a
-
very specific way to prevent all the threads
and
-
all the different layers of code from getting
confused
-
and smashing into each other.
-
You know, some of you maybe are lucky, and
-
you use the right style most of the time
-
by default. Maybe you don't see that many
problems.
-
A few things you gotta never assume.
-
Never assume the page has loaded. Never assume
the
-
markup you are asserting against exists. Never
assume your
-
AJAX request actually finished, and never
assume the speed
-
at which things happen, because until you
bolt it
-
down, you just don't know.
-
So, always make sure the markup exists before
you
-
assert against it. New Capybara is supposed
to be
-
better at this, and it, it's improved. But
I
-
do not trust them. I am super paranoid about
-
this stuff. This is a good example of a
-
lurking rando, due to a race condition, in
your
-
browser.
-
Capybara is supposed to wait for the page
to
-
load before it continues after the visit method,
but
-
I find it has sort of medium success with
-
doing that. Bolt it down, right. We used to
-
have something called the wait_until block,
and that would
-
stop execution until a condition was met.
And that
-
was great. Cause it replaced, like, sleep
statements, which
-
is what we used before that.
-
Modern Capybara, no more wait_until block.
It's inside the
-
have_cc and have_content matcher. So, always
assert that something
-
exists before you try to do anything with
it.
-
And sometimes it might take a long time. The
-
default timeout for that, for those Capybara
assertions, is
-
like five seconds. And sometimes, you need
twenty seconds.
-
Usually, for us, that's because we're doing
like a
-
file upload or another lengthy operation.
But, again, never
-
assume that things are gonna take a normal
amount
-
of time.
-
Race conditions. I would be out of line to
-
give this talk without talking explicitly
about race conditions,
-
right. Whenever you get, create a situation
where a
-
sequence of key events doesn't happen in a
predetermined
-
order, you've got a potential race condition.
-
So the winner of the race is random. And
-
that can create random outcomes in your test
suite.
-
So what's an example of one of those? AJAX.
-
Right? In AJAX, your JavaScript running in
Firefox may
-
or may not complete its AJAX call and render
-
the response before the test thread makes
its assertions.
-
Now, Capybara tries to fix this by retrying
to
-
assertions. But that doesn't always work.
So, say you're
-
clicking a button to submit a form, and then
-
you're going to another page or refreshing
the page.
-
This might cut off that post request, whether
it's
-
from a regular form or an AJAX form, but
-
especially if it's an AJAX request. As soon
as
-
you say, visit, all the outstanding AJAX requests
cancel
-
in your browser.
-
So, you can fix this by adding an explicit
-
wait into your Cucumber step, right. When
you need
-
to rig the race, jQuery provides this handy
counter,
-
dollar dot active. That's all the XHR requests
that
-
are outstanding. So, it's really not hard
to keep
-
an eye on what's going on.
-
Here's another offender. Creating database
objects from within the
-
test thread, right. What's wrong with this
approach? Now,
-
if you're using mySQL, maybe nothing's wrong
with this,
-
right. And that's because mySQL has the transaction
hygiene
-
of a roadside diner, right. There's no separation.
If
-
you're using Postgres, which we are, it has
stricter
-
rules about the transactions. And this can
create a
-
world of pain.
-
So, the test code and the Rails server are
-
running in different threads. And this effectively
means different
-
database connections, and that means different
transaction states. Now
-
there is some shared database connection code
out there.
-
And I've had sort of mixed results with it.
-
I've heard this thing, right, about shared
mutable resources
-
between threads being problematic. Like, they
are. So let's
-
say you're lucky, and both threads are in
the
-
same database transaction. Both the test thread
and the
-
server thread are issuing check points and
rollbacks against
-
the same connection. So sometimes one thread
will reset
-
to a checkpoint after the other thread has
already
-
rolled back the entire transaction. Right?
And that's how
-
you get a rando.
-
So, you want to create some state within your
-
application to run your test against, but
you can't
-
trust the test thread and the server thread
to
-
read the same database state, right. What
do you
-
do?
-
So in our project, we use a single set
-
of fixture data that's fixed at the beginning
of
-
the test run. And, essentially, the server
thread, or
-
the test thread, sorry, treats the database
as immutable.
-
It is read only, and any kind of verification
-
of changes has to happen via the browser.
-
So, we do this using RyanD's fixture_builder
gem, to
-
combine the maintainable characteristics of
factoried objects with the,
-
like, set it and forget it simplicity of fixtures.
-
So, any state that needs to exist across multiple
-
tests is stored in a set of fixtures, and
-
those are used throughout the test suite.
-
And this is great, except it's also terrible.
Unfortunately,
-
our fixture_builder definition file is like
900 lines long.
-
And it's as dense as a Master's thesis, right.
-
It takes about two minutes to rebuild the
fixture
-
set. And this happens when we rebundle, change
the
-
factories, change the schema. Fortunately,
that only happens a
-
couple of times a day, right. So mostly we're
-
saving time with it. But seriously? Two minutes
as
-
your overhead to run one test is brutal.
-
So, at our stage, we think the right solution
-
is to use fixture_builder sparingly, right.
Use it for
-
Cucumber tests, because they need an immutable
database. And
-
maybe use it for core shared models for Rspec,
-
but whatever you do, do not create like a
-
DC Comics multiverse in your fixture setup
file, with
-
like different versions for everything, because
that leads to
-
pain.
-
Another thing you want to do is Mutex it.
-
So, a key technique we've used to prevent
database
-
collisions is to put a Mutex on access to
-
the database. And this is crazy, but, you
know,
-
an app running in the browser can make more
-
than one connection to the server at once
over
-
AJAX. And that's a great place to breed race
-
conditions.
-
So, unless you have a Mutex, to ensure the
-
server only responds to one request at a time,
-
you don't necessarily know the order in which
things
-
are gonna happen, and that means you're gonna
get
-
unreproducible failures.
-
In effect, we use a Mutex to rig, rig
-
the race. You can check it out on GitHub.
-
It's just a sketch of the code we're using.
-
It's on omadahealth slash capybara_sync.
-
Faker. Some of the randomness in our test
suite
-
was due to inputs that we gave it. Our
-
code depends on factories. And the factories
used randomly
-
generated fake data to fill in names, zip
codes,
-
all the text fields. And there are good reasons
-
to use random data.
-
It regularly exercises your edge cases. Engineers
don't have
-
to think of all possible first names you could
-
use. The code should work the same regardless
of
-
what zip code someone is in. But sometimes
it
-
doesn't.
-
For example, did you know that Faker includes
Guam
-
and Porto Rico in the states that it might
-
generate for someone? And we didn't include
those in
-
our states dropdown. So when a Cucumber test
edits
-
an account for a user that Faker placed in
-
Guam, and their state is not entered when
you
-
try to click save. And that leads to a
-
validation failure, and that leads to Cucumber
not seeing
-
the expected results, and a test run with,
from
-
a new factory will not reproduce that failure,
right.
-
Something crazy happened. Here we go.
-
Times and dates. Oh, we're out of sync. Let
-
me just. Momentary technical difficulties.
Mhmm.
-
Cool.
-
OK. Times and dates. Another subtle input
to your
-
code is the current time. Our app sets itself
-
up to be on the user's time zone, to
-
prevent time-dependent data, like which week
of our program
-
you are on in the middle of Saturday night.
-
And this was policy. We all knew about this.
-
We always used zone-aware time calls.
-
Except that we didn't. Like, when I audited
it,
-
I found over a hundred places where we neglected
-
to use zone-aware time calls. So most of these
-
are fine. There's usually nothing wrong with
epic seconds.
-
But it only takes one misplaced call to time
-
dot now to create a failure. It's really best
-
to just forget about time dot now. Search
your
-
code base for it and eliminate it. Always
use
-
time dot zone dot now. Same thing for date
-
dot today. That's time zone dependent. You
want to
-
use time dot zone dot today.
-
Unsurprisingly, I found a bunch of this class
of
-
failure when I was at RubyConf in Miami. So
-
these methods create random failures. Because
your database objects
-
can be in a different time zone than your
-
machine's local time zone.
-
External dependencies. Any time you depend
on a third
-
party service in your test, you introduce
a possible
-
random element, right. S3, Google Analytics,
Facebook. Any of
-
these things can go down. They can be slow.
-
They can be broken. Additionally, they all
depend on
-
the quality of your local internet connection.
-
So, I'm gonna suggest that if you are affected
-
by random failures, it's important to reproduce
the failure.
-
It is possible. It is possible. It is not
-
only possible. It is critical. And any problem
that
-
you can reproduce, reliably, can be solved.
Well, at
-
least, if you can reproduce it, you have a
-
heck of a lot better chance of solving it.
-
So, you have to bolt it all down. How
-
do you fix the data? When you're trying to
-
reproduce a random failure, you're gonna need
the same
-
database objects used by the failing test.
So if
-
you used factories, and there's not a file
system
-
record when a test starts to fail randomly,
you're
-
gonna want to document the database state
at the
-
time of failure.
-
And that's gonna mean yml fixtures or, like,
and
-
SQL dump, or something else clever. You have
to
-
find a way to re-establish that same state
that
-
was created at the moment that you had the
-
failure. And the network. Great talk before
about how
-
to nail down the network. API calls and responses
-
are input for your code. Web-mock, vcr, other
libraries
-
exist to replay third party service responses.
-
So, if you're trying to reproduce a failure
in
-
a test that has any third party dependencies,
you're
-
gonna wanna use a library to capture and replay
-
those responses.
-
Also, share buttons, right. In your Cucumber
tests, you're
-
gonna wanna remove the calls to Google Analytics,
Facebook
-
lite buttons, all that stuff from the browser.
These
-
slow down your page load time, and they create
-
unnecessary failures because of that.
-
But, if you're replaying all your network
calls, how
-
do you know the external API hasn't changed,
right?
-
You want to test the services that your code
-
depends on, too. So you need a build that
-
does that. But it shouldn't be the main build.
-
Purpose of the main build is to let the
-
team know when their code is broken, when
their
-
code is broken. And it should do that as
-
quickly as possible.
-
And then we have a separate, external build
that
-
tests the interactions with third party services.
So, essentially,
-
external communication is off and then on,
and we
-
check build results for both.
-
So, I want to talk about another reason that
-
tests fail randomly. Rspec runs all your tests
in
-
a random order every time. And obviously this
introduces
-
randomness. But, there is a reason for that,
and
-
the reason is to help you stay on top
-
of test pollution.
-
Test pollution is when state that is changed
in
-
one test persists and influences the results
of other
-
tests. Changed state can live in process memory,
in
-
a database, on the file system, in an external
-
service. Right. Lots of places.
-
Sometimes, the polluted state causes the subsequent
test to
-
fail incorrectly. And sometimes it causes
the subsequent test
-
to pass incorrectly. And this was such a rampant
-
issue in the early days of Rspec that the
-
Rspec team made running the tests in a random
-
order the default as of Rspec 2. So, thank
-
you Rspec.
-
Now, any test pollution issues should stand
out. But
-
what do you think happens if you ignore random
-
test failures for like a year or so? Yeah.
-
Here's some clues that your issue might be
test
-
pollution, right.
-
With test pollution, the effected tests never
fail when
-
they're run in isolation. Not ever. And rather
than
-
throwing an unexpected exception, a test pollution
failure usually
-
takes the form of returning different data
than what
-
you expected.
-
And finally, the biggest clue that you might
have
-
a test pollution issue is that you haven't
really
-
been checking for test pollution. So, we gotta
reproduce
-
test pollution issues. Which means we have
to run
-
the tests suite in the same order, and we
-
have to use the fixture or database data and
-
the network data from the failed build.
-
So, first you have to identify the random
seed.
-
Maybe you've seen this cryptic line at the
end
-
of your Rspec test output. This is not completely
-
meaningless. 22164 is your magic key to rerun
the
-
test in the same order as the build that
-
just ran. So you want to modify your dot
-
Rspec file to include the seed value. Be sure
-
to format, to change the format to documentation
as
-
well as adding the seed. That will make it
-
more readable, for you, so that you can start
-
to think about the order that things are running
-
in and what could possibly be causing your
pollution
-
problem.
-
So, the problem with test pollution is fundamentally
about
-
incorrectly persisted state, so that the state
that's being
-
persisted is important. You want to ensure
that the
-
data is identical to the failed build. And
there's
-
lots of ways to do this.
-
So you've got your random seed. You've got
your
-
data from the failed build, and then you rerun
-
the specs. And if you see the failure repeated,
-
you should celebrate, right. You've correctly
diagnosed that the
-
issue is test pollution and you are on your
-
way to fixing it.
-
And if you don't see the failure, maybe it's
-
not test pollution. Maybe there's another
aspect of your
-
build environment that needs to be duplicated,
right. But
-
even then, say you've reproduced the problem.
Now what?
-
You still have to diagnose what is causing
the
-
pollution. You know that running the tests
in a
-
particular order creates a failure. The problem
with test
-
pollution is that there is a non-obvious connection
between
-
where the problem appears in the failed test
and
-
its source in another test case.
-
And you can find out about the failure using
-
print statements or debugger, using whatever
tools you want.
-
But, maybe you get lucky and you are able
-
to just figure it out. But in a complex
-
code base with thousands of tests the source
of
-
the pollution can be tricky to track down.
-
So, just running through the suite to reproduce
the
-
failure might take ten minutes. And this is
actually
-
terrible, right. Waiting ten minutes for feedback?
This is
-
a source of cognitive depletion. All of the
stack
-
you've built up in your brain to solve this
-
problem is disintegrating over that ten minutes.
You're gonna
-
work on other problems. You're gonna check
Facebook while
-
those tests are running. And you're gonna
lose your
-
focus, right. And that is, essentially, how
rando wins.
-
Fortunately, we can discard large amounts
of complexity and
-
noise, by using a stupid process that we don't
-
have to think about. Binary search. In code,
debugging
-
via binary search is a process of repeatedly
dividing
-
the search space in half, until you locate
the
-
smallest coherent unit that exhibits the desired
behavior.
-
OK. So we have the output of a set
-
of specs that we ran in documentation mode.
This
-
is sort of a high level overview that you
-
might see in Sublime, right. And in the middle
-
here, this red spot is where the failure occurs.
-
So we know the cause has to happen before
-
the failure, because causality. So in the
green block,
-
at the top, is, that's the candidate block,
or
-
the search space.
-
So, practically, we split the search space
in half,
-
and remove half of it. And if the failure
-
reoccurs when we rerun with this configuration,
we know
-
that the cause is in that remaining block,
right.
-
But sometimes you've got more problems than
you know.
-
So it's good to test the other half of
-
the search space as well.
-
So if you're failure appeared in step zero,
you
-
expect not to see the failure here. If you
-
also see the failure here, you might have
multiple
-
sources of test pollution or, more likely,
test pollution
-
isn't really your problem, and the problem
is actually
-
outside of the search space.
-
So here's a hiccup. Binary search requires
us to
-
remove large segments of the test suite to
narrow
-
in on the test that causes the pollution.
And
-
this creates a problem, because random ordering
in the
-
test suite changes when you remove tests.
Completely. Remove
-
one test, the whole thing reshuffles on the
same
-
seeds. So there's no way to effectively perform
a
-
binary search using a random seed.
-
So here's the good news. It is possible to
-
manually declare the ordering of your Rspec
tests, using
-
this undocumented configuration option, order_examples.
So, config dot order_examples
-
takes a block, and that'll get the whole collection
-
of Rspec examples after Rspec has loaded the
specs
-
to be run. And then you just reorder the
-
examples in whatever order you want them to
be
-
ordered in and return that set from the block.
-
So, that sounds simple.
-
I, I made a little proto-gem for this. It's
-
called rspec_manual_order, and basically it
takes the output of
-
the documentation format from the test that
you ran
-
earlier, and turns that into an ordering list.
So,
-
if you, if you log the output of, of
-
your Rspec suite with the failure to a file,
-
you'll be able to replay it using rspec_manual_order,
and
-
you can check that out on GitHub.
-
So it's possible to reduce the search space
and
-
do a binary search on Rspec. And once you've
-
reduced the search space to a single spec
or
-
a suite of examples that all cause the problem,
-
you put your monkey in the position to shine
-
against your test pollution issue, right.
This is where
-
it actually becomes possible to figure it
out by
-
looking at the context.
-
I've gone in depth into test pollution, because
it's
-
amenable to investigation using simple techniques,
right. Binary search
-
and reproducing the failure state are key
debugging skills
-
that you will improve with practice. When
I started
-
looking into our random failures, I didn't
know we
-
had test pollution issues. Turned out we weren't
resetting
-
the global time zone correctly between tests.
-
This was far from the only problem I found.
-
But without fixing this one, our suite would
never
-
be clean. So, every random failure that you
are
-
chasing has its own unique story. There are
some
-
in our code that we haven't figured out yet,
-
and there are some in your code that I
-
hope I never see.
-
The key to eliminating random test failures
is don't
-
give up, right. Today we've covered things
that go
-
wrong in Cucumber and Capybara. Things that
go wrong
-
in Rspec and just general sources of randomness
in
-
your test suite. And hopefully you're walking
out of
-
here with at least one new technique to improve
-
the reliability of your tests.
-
We've been working with ours for about eight
months,
-
and we're in a place where random failures
occur
-
like, less than five percent of the time.
And
-
we set up a tiered build system to run
-
the tests sequentially when the fast parallel
build fails.
-
So, the important thing is that when new random
-
failures occur, we reliably assign a team
to hunt
-
them down.
-
And if you keep working on your build, eventually
-
you'll figure out a combination of tactics
that will
-
lead to a stable, reliable test suite, that
will
-
have the trust of your team. So thank you.