WEBVTT

00:00:17.240 --> 00:00:20.029
AUSTIN PUTMAN: This is the last session before
happy hour.

00:00:20.029 --> 00:00:24.560
I appreciate all of you for hanging around

00:00:24.560 --> 00:00:27.720
this long. Maybe you're here because,

00:00:27.720 --> 00:00:28.939
you don't know, there's a bar on the

00:00:28.939 --> 00:00:32.250
first floor of this hotel. I think that

00:00:32.250 --> 00:00:35.390
is where the main track is currently taking
place.

00:00:35.390 --> 00:00:37.660
I am Austin Putman. I am the VP of

00:00:37.660 --> 00:00:41.410
engineering for Omada Health. At Omada, we
support people

00:00:41.410 --> 00:00:44.940
at risk of chronic disease, like diabetes,
make crucial

00:00:44.940 --> 00:00:49.190
behavior changes and have longer, healthier
lives. So, it's

00:00:49.190 --> 00:00:49.909
pretty awesome.

00:00:49.909 --> 00:00:52.680
I'm gonna start with some spoilers, because
I want

00:00:52.680 --> 00:00:55.339
you to have an amazing RailsConf. So if this

00:00:55.339 --> 00:00:57.350
is not what you're looking for, don't be shy

00:00:57.350 --> 00:01:00.769
about finding that bar track. We're gonna
spend some

00:01:00.769 --> 00:01:04.750
quality time with Capybara and Cucumber, whose
flakiness is

00:01:04.750 --> 00:01:07.490
legendary, for very good reasons.

00:01:07.490 --> 00:01:10.670
Let me take your temperature. Can I see hands?

00:01:10.670 --> 00:01:12.909
How many people have had problems with random
failures

00:01:12.909 --> 00:01:18.229
in Cucumber or Capybara? Yeah. Yeah. This
is reality,

00:01:18.229 --> 00:01:19.020
folks.

00:01:19.020 --> 00:01:21.880
We're also gonna cover the ways that Rspec
does

00:01:21.880 --> 00:01:24.630
and does not help us track down test pollution.

00:01:24.630 --> 00:01:26.950
How many folks out there have had a random

00:01:26.950 --> 00:01:28.710
failure problem in the Rspec suite, like in
your

00:01:28.710 --> 00:01:32.070
models or your controller tests? OK, still
a lot

00:01:32.070 --> 00:01:34.950
of people, right. It happens. But we don't
talk

00:01:34.950 --> 00:01:36.780
about it.

00:01:36.780 --> 00:01:39.520
So in between, we're gonna review some problems
that

00:01:39.520 --> 00:01:42.520
can dog any test suite. This is like, random

00:01:42.520 --> 00:01:46.569
data, times zone heck, external dependencies.
All this leads

00:01:46.569 --> 00:01:48.490
to pain. There was a great talk before about

00:01:48.490 --> 00:01:49.719
external dependencies.

00:01:49.719 --> 00:01:52.990
Just, here's just a random one. How many people

00:01:52.990 --> 00:01:54.219
here have had a test fail due to a

00:01:54.219 --> 00:01:59.689
daylight savings time issue? Yeah. Ben Franklin,
you are

00:01:59.689 --> 00:02:02.359
a menace.

00:02:02.359 --> 00:02:05.780
Let's talk about eliminating inconsistent
failures in your tests,

00:02:05.780 --> 00:02:09.818
and on our team, we call that fighting randos.

00:02:09.818 --> 00:02:11.920
And I'm here to talk about this, because I

00:02:11.920 --> 00:02:15.670
was stupid and short-sighted, and random failures
caused us

00:02:15.670 --> 00:02:18.250
a lot of pain. I chose to try to

00:02:18.250 --> 00:02:21.910
hit deadlines instead of focusing on build
quality, and

00:02:21.910 --> 00:02:24.060
our team paid a terrible price.

00:02:24.060 --> 00:02:26.250
Anybody out there paying that price? Anybody
out there

00:02:26.250 --> 00:02:30.030
feel me on this? Yeah. It's, it sucks.

00:02:30.030 --> 00:02:31.950
So let's do some science. Some problems seem
to

00:02:31.950 --> 00:02:34.750
have more random failure problems than others.
I want

00:02:34.750 --> 00:02:37.120
to gather some data. So first, if you write

00:02:37.120 --> 00:02:40.750
tests on a regular basis, raise your hand.
Right?

00:02:40.750 --> 00:02:44.530
Wow. I love RailsConf. Keep your hand up if

00:02:44.530 --> 00:02:48.360
you believe you have experienced a random
test failure.

00:02:48.360 --> 00:02:49.820
The whole room.

00:02:49.820 --> 00:02:52.280
Now, if you think you're likely to have one

00:02:52.280 --> 00:02:55.560
in the next, like, four weeks. Who's out there?

00:02:55.560 --> 00:02:57.880
It's still happening, right. You're in the
middle of

00:02:57.880 --> 00:03:00.790
it. OK, so this is not hypothetical for this

00:03:00.790 --> 00:03:03.320
audience. This is a widespread problem. But
I don't

00:03:03.320 --> 00:03:05.120
see a lot of people talking about it.

00:03:05.120 --> 00:03:08.160
And the truth is, while being a great tool,

00:03:08.160 --> 00:03:11.560
a comprehensive integration suite, is like
a breeding ground

00:03:11.560 --> 00:03:13.810
for baffling Heisenbugs.

00:03:13.810 --> 00:03:18.400
So, to understand how test failures become
a chronic

00:03:18.400 --> 00:03:20.560
productivity blocker, I want to talk a little
bit

00:03:20.560 --> 00:03:25.010
about testing culture, right. Why is this
even bad?

00:03:25.010 --> 00:03:28.390
So, we have an automated CI machine that runs

00:03:28.390 --> 00:03:30.620
our full test suite every time a commit is

00:03:30.620 --> 00:03:33.650
pushed. And every time the bill passes, we
push

00:03:33.650 --> 00:03:36.620
the new code to a staging environment for
acceptance.

00:03:36.620 --> 00:03:38.510
Right, that's our process. How many people
out there

00:03:38.510 --> 00:03:40.970
have a setup that's kind of like that? OK.

00:03:40.970 --> 00:03:43.410
Awesome. So a lot of people know what I'm

00:03:43.410 --> 00:03:44.220
talking about.

00:03:44.220 --> 00:03:47.410
So, in the fall of 2012, we started seeing

00:03:47.410 --> 00:03:52.190
occasional, unreproducible failures of the
test suite in Jenkins.

00:03:52.190 --> 00:03:54.460
And we were pushing to get features out the

00:03:54.460 --> 00:03:58.630
door for January first. And we found that
we

00:03:58.630 --> 00:04:01.470
could just rerun the build and the failure
would

00:04:01.470 --> 00:04:03.260
go away.

00:04:03.260 --> 00:04:05.630
And we got pretty good at spotting the two

00:04:05.630 --> 00:04:09.400
or three tests where this happened. So, we
would

00:04:09.400 --> 00:04:11.500
check the output of a failed build, and if

00:04:11.500 --> 00:04:14.650
it was one of the suspect tests, we would

00:04:14.650 --> 00:04:17.720
just run the build again. Not a problem. Staging

00:04:17.720 --> 00:04:20.789
would deploy. We would continue our march
towards the

00:04:20.789 --> 00:04:22.789
launch.

00:04:22.789 --> 00:04:24.280
But by the time spring rolled around, there
were

00:04:24.280 --> 00:04:28.479
like seven or eight places causing problems
regularly. And

00:04:28.479 --> 00:04:29.949
we would try to fix them, you know, we

00:04:29.949 --> 00:04:33.360
wouldn't ignore them. But the failures were
unreliable. So

00:04:33.360 --> 00:04:35.060
it was hard to say if we had actually

00:04:35.060 --> 00:04:36.840
fixed anything.

00:04:36.840 --> 00:04:39.020
And eventually we just added a gem called
Cucumber-Rerun.

00:04:39.020 --> 00:04:46.020
Yeah. And this just reruns the failed specs
if

00:04:46.389 --> 00:04:48.539
there's a problem. And when it passed the
second

00:04:48.539 --> 00:04:54.229
time, it's good. You're fine. No big deal.

00:04:54.229 --> 00:04:55.749
And then some people on our team got ambitious,

00:04:55.749 --> 00:04:58.469
and they said, we could make it faster. We

00:04:58.469 --> 00:05:01.629
could make CI faster with the parallel_test
gem, which

00:05:01.629 --> 00:05:07.169
is awesome. But Cucumber-Rerun and parallel_test
are not compatible.

00:05:07.169 --> 00:05:08.999
And so we had a test suite that ran

00:05:08.999 --> 00:05:13.020
three times faster, but failed twice as often.

00:05:13.020 --> 00:05:15.800
And as we came into the fall, we had

00:05:15.800 --> 00:05:20.629
our first Bad Jenkins week. On a fateful Tuesday,

00:05:20.629 --> 00:05:24.210
4 PM, the build just stopped passing. And
there

00:05:24.210 --> 00:05:27.389
were anywhere from like thirty to seventy
failures. And

00:05:27.389 --> 00:05:30.229
some of them were our usual suspects, and
dozens

00:05:30.229 --> 00:05:34.020
of them were, like, previously good tests.
Tests we

00:05:34.020 --> 00:05:34.509
trusted.

00:05:34.509 --> 00:05:37.370
And, so none of them failed in isolation,
right.

00:05:37.370 --> 00:05:39.639
And after like two days of working on this,

00:05:39.639 --> 00:05:42.710
we eventually got a clean Rspec build, but
Cucumber

00:05:42.710 --> 00:05:45.439
would still fail. And the failures could not
be

00:05:45.439 --> 00:05:47.669
reproduced on a dev machine, or even on the

00:05:47.669 --> 00:05:50.800
same CI machine, outside of the, the whole
build

00:05:50.800 --> 00:05:51.930
running.

00:05:51.930 --> 00:05:55.310
So, over the weekend, somebody pushes a commit
and

00:05:55.310 --> 00:05:58.729
we get a green build. And there's nothing
special

00:05:58.729 --> 00:06:00.990
about this commit, right. Like, it was like,
a

00:06:00.990 --> 00:06:04.550
comment change. And we had tried a million
things,

00:06:04.550 --> 00:06:07.900
and no single change obviously lead to the
passing

00:06:07.900 --> 00:06:09.349
build.

00:06:09.349 --> 00:06:10.930
And the next week, we were back to like,

00:06:10.930 --> 00:06:14.889
you know, fifteen percent failure rate. Like,
pretty good.

00:06:14.889 --> 00:06:18.580
So, we could push stories to staging again,
and

00:06:18.580 --> 00:06:23.289
we're still under the deadline pressure, right.
So, so

00:06:23.289 --> 00:06:28.400
we shrugged. And we moved on, right. And maybe

00:06:28.400 --> 00:06:32.400
somebody wants to guess, what happened next?
Right?

00:06:32.400 --> 00:06:35.249
Yeah. It happened again, right. A whole week
of

00:06:35.249 --> 00:06:38.090
just no tests pass. The build never passes.
So

00:06:38.090 --> 00:06:40.400
we turn of parallel_tests, right. Because
we can't even

00:06:40.400 --> 00:06:42.419
get like a coherent log of which tests are

00:06:42.419 --> 00:06:45.990
causing errors, and then we started commenting
out the

00:06:45.990 --> 00:06:49.719
really problematic tests, and there were still
these like

00:06:49.719 --> 00:06:55.340
seemingly innocuous specs that failed regularly
but not consistently.

00:06:55.340 --> 00:06:57.469
So these are tests that have enough business
value

00:06:57.469 --> 00:07:00.729
that we are very reluctant to just, like,
delete

00:07:00.729 --> 00:07:01.370
them.

00:07:01.370 --> 00:07:07.729
And so we reinstated Cucumber-Rerun, and its
buddy Rspec-Rerun.

00:07:07.729 --> 00:07:12.529
And this mostly worked, right. So we were
making

00:07:12.529 --> 00:07:15.309
progress. But the build issues continued to
show up

00:07:15.309 --> 00:07:18.520
in the negative column in our retrospectives.
And that

00:07:18.520 --> 00:07:21.400
was because there were several problems with
this situation,

00:07:21.400 --> 00:07:25.610
right. Like, reduced trust. When build failures
happen four

00:07:25.610 --> 00:07:27.520
or five times a day, those aren't a red

00:07:27.520 --> 00:07:30.419
flag. Those are just how things go. And everyone

00:07:30.419 --> 00:07:33.439
on the team knows that the most likely explanation

00:07:33.439 --> 00:07:35.559
is a random failure.

00:07:35.559 --> 00:07:38.139
And the default response to a build failure
becomes,

00:07:38.139 --> 00:07:41.689
run it again. So, just run it again, right.

00:07:41.689 --> 00:07:47.779
The build failed. Whatever. So then, occasionally,
we break

00:07:47.779 --> 00:07:50.419
things for real. But we stopped noticing because
we

00:07:50.419 --> 00:07:54.339
started expecting CI to be broken. Sometimes
other pairs

00:07:54.339 --> 00:07:56.449
would pull the code and they would see the

00:07:56.449 --> 00:07:59.749
legitimate failures. Sometimes we thought
we were having a

00:07:59.749 --> 00:08:02.319
Bad Jenkins week, and on the third or fourth

00:08:02.319 --> 00:08:06.279
day we realized we were having actual failures.

00:08:06.279 --> 00:08:08.319
This is pretty bad, right.

00:08:08.319 --> 00:08:12.039
So our system depends on green builds to mark

00:08:12.039 --> 00:08:14.189
the code that can be deployed to staging and

00:08:14.189 --> 00:08:17.240
production, and without green builds, stories
can't get delivered

00:08:17.240 --> 00:08:22.020
and reviewed. So we stopped getting timely
feedback. Meanwhile,

00:08:22.020 --> 00:08:24.520
the reviewer gets, like, a week's worth of
stories.

00:08:24.520 --> 00:08:26.189
All at once. Big clump.

00:08:26.189 --> 00:08:27.749
And that means they have less time to pay

00:08:27.749 --> 00:08:31.719
attention to detail on each delivered feature.
And that

00:08:31.719 --> 00:08:34.809
means that the product is a little bit crappier

00:08:34.809 --> 00:08:39.809
every week. So, maybe you need a bug fix.

00:08:39.809 --> 00:08:42.510
Fast. Forget about that. You've got, like,
a twenty

00:08:42.510 --> 00:08:44.410
percent chance your bug fix build is gonna
fail

00:08:44.410 --> 00:08:46.070
for no reason.

00:08:46.070 --> 00:08:47.870
Maybe the code has to ship, because the app

00:08:47.870 --> 00:08:51.529
is mega busted. In this case, we would rerun

00:08:51.529 --> 00:08:53.920
the failed tests on our local machine, and
then

00:08:53.920 --> 00:08:58.660
cross our fingers and deploy. So, in effect,
our

00:08:58.660 --> 00:09:03.130
policy was, if the code works on my machine,

00:09:03.130 --> 00:09:07.190
it can be deployed to production.

00:09:07.190 --> 00:09:11.339
So. At the most extreme, people lose faith
in

00:09:11.339 --> 00:09:13.690
the build, and eventually they just forget
about testing.

00:09:13.690 --> 00:09:15.690
And this didn't happen to us, but I had

00:09:15.690 --> 00:09:18.160
to explain to management that key features
couldn't be

00:09:18.160 --> 00:09:21.839
shipped because of problems with the test
server. And

00:09:21.839 --> 00:09:24.310
they wanted to know a lot more about the

00:09:24.310 --> 00:09:26.680
test server. And it was totally clear that
while

00:09:26.680 --> 00:09:30.019
a working test server has their full support,
an

00:09:30.019 --> 00:09:33.480
unreliable test server is a business liability
and needs

00:09:33.480 --> 00:09:34.800
to be resolved.

00:09:34.800 --> 00:09:37.660
So, the test server is supposed to solve problems

00:09:37.660 --> 00:09:39.560
for us, and that is the only story that

00:09:39.560 --> 00:09:43.360
I like to tell about it. So, we began

00:09:43.360 --> 00:09:48.310
to fight back. And we personified the random
failures.

00:09:48.310 --> 00:09:53.089
They became randos. A rando attack. A rando
storm.

00:09:53.089 --> 00:10:00.089
And most memorably, Rando Backstabbian. Intergalactic
randomness villain.

00:10:01.649 --> 00:10:03.290
We had a pair working on the test suite

00:10:03.290 --> 00:10:05.490
full time for about three months trying to
resolve

00:10:05.490 --> 00:10:08.290
these issues. We tried about a thousand things,
and

00:10:08.290 --> 00:10:09.639
some of them worked. And I'm gonna pass along

00:10:09.639 --> 00:10:12.440
the answers we found, and my hypothesis that
we

00:10:12.440 --> 00:10:16.720
didn't disprove. Honestly, I'm hoping that
you came to

00:10:16.720 --> 00:10:18.940
this talk because you've had similar problems
and you

00:10:18.940 --> 00:10:22.680
found better solutions. So, this is just what
we

00:10:22.680 --> 00:10:23.600
found.

00:10:23.600 --> 00:10:26.380
I, I have a very important tool for this

00:10:26.380 --> 00:10:29.420
section of the talk. It's the finger of blame.

00:10:29.420 --> 00:10:31.050
We use this a lot when we were like,

00:10:31.050 --> 00:10:34.920
hey, could the problem be Cucumber? And then
we

00:10:34.920 --> 00:10:37.550
would go after that. So here comes finger
of

00:10:37.550 --> 00:10:39.209
blame.

00:10:39.209 --> 00:10:46.160
Cucumber! Capybara. Poltergeist. Definitely
part of the problem. I've

00:10:46.160 --> 00:10:48.959
talked to enough other teams that use these
tools

00:10:48.959 --> 00:10:51.740
extensively, and the evidence from our audience,
just to

00:10:51.740 --> 00:10:53.630
know that the results are just not as deterministic

00:10:53.630 --> 00:10:57.250
as we want. When you're using multiple threads
and

00:10:57.250 --> 00:11:00.940
you're asserting against a browser environment,
you're gonna have

00:11:00.940 --> 00:11:02.470
so issues, right.

00:11:02.470 --> 00:11:06.670
And one of those is browser environment, right.
Browser

00:11:06.670 --> 00:11:09.709
environment is a euphemism for, like, a complicated
piece

00:11:09.709 --> 00:11:12.970
of software that itself is a playground for
network

00:11:12.970 --> 00:11:17.430
latency issues and rendering hiccups and a
callback soup.

00:11:17.430 --> 00:11:18.810
So your tests have to be written in a

00:11:18.810 --> 00:11:21.490
very specific way to prevent all the threads
and

00:11:21.490 --> 00:11:23.490
all the different layers of code from getting
confused

00:11:23.490 --> 00:11:24.769
and smashing into each other.

00:11:24.769 --> 00:11:26.910
You know, some of you maybe are lucky, and

00:11:26.910 --> 00:11:29.009
you use the right style most of the time

00:11:29.009 --> 00:11:31.880
by default. Maybe you don't see that many
problems.

00:11:31.880 --> 00:11:35.149
A few things you gotta never assume.

00:11:35.149 --> 00:11:38.250
Never assume the page has loaded. Never assume
the

00:11:38.250 --> 00:11:42.170
markup you are asserting against exists. Never
assume your

00:11:42.170 --> 00:11:46.389
AJAX request actually finished, and never
assume the speed

00:11:46.389 --> 00:11:48.709
at which things happen, because until you
bolt it

00:11:48.709 --> 00:11:52.410
down, you just don't know.

00:11:52.410 --> 00:11:57.069
So, always make sure the markup exists before
you

00:11:57.069 --> 00:11:59.819
assert against it. New Capybara is supposed
to be

00:11:59.819 --> 00:12:03.380
better at this, and it, it's improved. But
I

00:12:03.380 --> 00:12:06.060
do not trust them. I am super paranoid about

00:12:06.060 --> 00:12:08.509
this stuff. This is a good example of a

00:12:08.509 --> 00:12:11.709
lurking rando, due to a race condition, in
your

00:12:11.709 --> 00:12:12.589
browser.

00:12:12.589 --> 00:12:14.470
Capybara is supposed to wait for the page
to

00:12:14.470 --> 00:12:17.470
load before it continues after the visit method,
but

00:12:17.470 --> 00:12:19.639
I find it has sort of medium success with

00:12:19.639 --> 00:12:24.079
doing that. Bolt it down, right. We used to

00:12:24.079 --> 00:12:26.069
have something called the wait_until block,
and that would

00:12:26.069 --> 00:12:28.410
stop execution until a condition was met.
And that

00:12:28.410 --> 00:12:31.310
was great. Cause it replaced, like, sleep
statements, which

00:12:31.310 --> 00:12:33.120
is what we used before that.

00:12:33.120 --> 00:12:37.430
Modern Capybara, no more wait_until block.
It's inside the

00:12:37.430 --> 00:12:42.110
have_cc and have_content matcher. So, always
assert that something

00:12:42.110 --> 00:12:46.370
exists before you try to do anything with
it.

00:12:46.370 --> 00:12:49.050
And sometimes it might take a long time. The

00:12:49.050 --> 00:12:51.519
default timeout for that, for those Capybara
assertions, is

00:12:51.519 --> 00:12:55.720
like five seconds. And sometimes, you need
twenty seconds.

00:12:55.720 --> 00:12:57.519
Usually, for us, that's because we're doing
like a

00:12:57.519 --> 00:13:01.160
file upload or another lengthy operation.
But, again, never

00:13:01.160 --> 00:13:03.829
assume that things are gonna take a normal
amount

00:13:03.829 --> 00:13:07.060
of time.

00:13:07.060 --> 00:13:09.940
Race conditions. I would be out of line to

00:13:09.940 --> 00:13:13.300
give this talk without talking explicitly
about race conditions,

00:13:13.300 --> 00:13:16.449
right. Whenever you get, create a situation
where a

00:13:16.449 --> 00:13:20.470
sequence of key events doesn't happen in a
predetermined

00:13:20.470 --> 00:13:24.560
order, you've got a potential race condition.

00:13:24.560 --> 00:13:27.569
So the winner of the race is random. And

00:13:27.569 --> 00:13:31.709
that can create random outcomes in your test
suite.

00:13:31.709 --> 00:13:36.449
So what's an example of one of those? AJAX.

00:13:36.449 --> 00:13:42.569
Right? In AJAX, your JavaScript running in
Firefox may

00:13:42.569 --> 00:13:46.110
or may not complete its AJAX call and render

00:13:46.110 --> 00:13:49.629
the response before the test thread makes
its assertions.

00:13:49.629 --> 00:13:54.060
Now, Capybara tries to fix this by retrying
to

00:13:54.060 --> 00:13:57.660
assertions. But that doesn't always work.
So, say you're

00:13:57.660 --> 00:14:00.740
clicking a button to submit a form, and then

00:14:00.740 --> 00:14:05.339
you're going to another page or refreshing
the page.

00:14:05.339 --> 00:14:07.920
This might cut off that post request, whether
it's

00:14:07.920 --> 00:14:10.339
from a regular form or an AJAX form, but

00:14:10.339 --> 00:14:13.290
especially if it's an AJAX request. As soon
as

00:14:13.290 --> 00:14:16.439
you say, visit, all the outstanding AJAX requests
cancel

00:14:16.439 --> 00:14:18.000
in your browser.

00:14:18.000 --> 00:14:20.740
So, you can fix this by adding an explicit

00:14:20.740 --> 00:14:23.980
wait into your Cucumber step, right. When
you need

00:14:23.980 --> 00:14:27.329
to rig the race, jQuery provides this handy
counter,

00:14:27.329 --> 00:14:30.529
dollar dot active. That's all the XHR requests
that

00:14:30.529 --> 00:14:33.120
are outstanding. So, it's really not hard
to keep

00:14:33.120 --> 00:14:36.720
an eye on what's going on.

00:14:36.720 --> 00:14:41.089
Here's another offender. Creating database
objects from within the

00:14:41.089 --> 00:14:44.709
test thread, right. What's wrong with this
approach? Now,

00:14:44.709 --> 00:14:49.399
if you're using mySQL, maybe nothing's wrong
with this,

00:14:49.399 --> 00:14:53.750
right. And that's because mySQL has the transaction
hygiene

00:14:53.750 --> 00:14:59.589
of a roadside diner, right. There's no separation.
If

00:14:59.589 --> 00:15:02.319
you're using Postgres, which we are, it has
stricter

00:15:02.319 --> 00:15:04.569
rules about the transactions. And this can
create a

00:15:04.569 --> 00:15:06.720
world of pain.

00:15:06.720 --> 00:15:09.860
So, the test code and the Rails server are

00:15:09.860 --> 00:15:13.050
running in different threads. And this effectively
means different

00:15:13.050 --> 00:15:17.019
database connections, and that means different
transaction states. Now

00:15:17.019 --> 00:15:20.819
there is some shared database connection code
out there.

00:15:20.819 --> 00:15:23.889
And I've had sort of mixed results with it.

00:15:23.889 --> 00:15:26.810
I've heard this thing, right, about shared
mutable resources

00:15:26.810 --> 00:15:32.199
between threads being problematic. Like, they
are. So let's

00:15:32.199 --> 00:15:34.639
say you're lucky, and both threads are in
the

00:15:34.639 --> 00:15:38.910
same database transaction. Both the test thread
and the

00:15:38.910 --> 00:15:43.360
server thread are issuing check points and
rollbacks against

00:15:43.360 --> 00:15:47.860
the same connection. So sometimes one thread
will reset

00:15:47.860 --> 00:15:50.350
to a checkpoint after the other thread has
already

00:15:50.350 --> 00:15:54.120
rolled back the entire transaction. Right?
And that's how

00:15:54.120 --> 00:15:56.189
you get a rando.

00:15:56.189 --> 00:15:59.029
So, you want to create some state within your

00:15:59.029 --> 00:16:01.079
application to run your test against, but
you can't

00:16:01.079 --> 00:16:03.629
trust the test thread and the server thread
to

00:16:03.629 --> 00:16:07.379
read the same database state, right. What
do you

00:16:07.379 --> 00:16:09.850
do?

00:16:09.850 --> 00:16:11.819
So in our project, we use a single set

00:16:11.819 --> 00:16:14.269
of fixture data that's fixed at the beginning
of

00:16:14.269 --> 00:16:16.730
the test run. And, essentially, the server
thread, or

00:16:16.730 --> 00:16:20.279
the test thread, sorry, treats the database
as immutable.

00:16:20.279 --> 00:16:22.970
It is read only, and any kind of verification

00:16:22.970 --> 00:16:25.680
of changes has to happen via the browser.

00:16:25.680 --> 00:16:29.410
So, we do this using RyanD's fixture_builder
gem, to

00:16:29.410 --> 00:16:34.329
combine the maintainable characteristics of
factoried objects with the,

00:16:34.329 --> 00:16:37.529
like, set it and forget it simplicity of fixtures.

00:16:37.529 --> 00:16:40.689
So, any state that needs to exist across multiple

00:16:40.689 --> 00:16:43.129
tests is stored in a set of fixtures, and

00:16:43.129 --> 00:16:45.319
those are used throughout the test suite.

00:16:45.319 --> 00:16:50.910
And this is great, except it's also terrible.
Unfortunately,

00:16:50.910 --> 00:16:56.290
our fixture_builder definition file is like
900 lines long.

00:16:56.290 --> 00:16:59.170
And it's as dense as a Master's thesis, right.

00:16:59.170 --> 00:17:01.720
It takes about two minutes to rebuild the
fixture

00:17:01.720 --> 00:17:03.249
set. And this happens when we rebundle, change
the

00:17:03.249 --> 00:17:07.430
factories, change the schema. Fortunately,
that only happens a

00:17:07.430 --> 00:17:09.689
couple of times a day, right. So mostly we're

00:17:09.689 --> 00:17:12.119
saving time with it. But seriously? Two minutes
as

00:17:12.119 --> 00:17:14.540
your overhead to run one test is brutal.

00:17:14.540 --> 00:17:17.449
So, at our stage, we think the right solution

00:17:17.449 --> 00:17:20.900
is to use fixture_builder sparingly, right.
Use it for

00:17:20.900 --> 00:17:24.089
Cucumber tests, because they need an immutable
database. And

00:17:24.089 --> 00:17:27.720
maybe use it for core shared models for Rspec,

00:17:27.720 --> 00:17:30.120
but whatever you do, do not create like a

00:17:30.120 --> 00:17:33.500
DC Comics multiverse in your fixture setup
file, with

00:17:33.500 --> 00:17:36.940
like different versions for everything, because
that leads to

00:17:36.940 --> 00:17:37.600
pain.

00:17:37.600 --> 00:17:40.799
Another thing you want to do is Mutex it.

00:17:40.799 --> 00:17:43.900
So, a key technique we've used to prevent
database

00:17:43.900 --> 00:17:46.929
collisions is to put a Mutex on access to

00:17:46.929 --> 00:17:50.779
the database. And this is crazy, but, you
know,

00:17:50.779 --> 00:17:53.820
an app running in the browser can make more

00:17:53.820 --> 00:17:56.289
than one connection to the server at once
over

00:17:56.289 --> 00:17:59.440
AJAX. And that's a great place to breed race

00:17:59.440 --> 00:18:00.279
conditions.

00:18:00.279 --> 00:18:03.690
So, unless you have a Mutex, to ensure the

00:18:03.690 --> 00:18:07.220
server only responds to one request at a time,

00:18:07.220 --> 00:18:08.720
you don't necessarily know the order in which
things

00:18:08.720 --> 00:18:10.309
are gonna happen, and that means you're gonna
get

00:18:10.309 --> 00:18:11.460
unreproducible failures.

00:18:11.460 --> 00:18:13.720
In effect, we use a Mutex to rig, rig

00:18:13.720 --> 00:18:15.770
the race. You can check it out on GitHub.

00:18:15.770 --> 00:18:17.539
It's just a sketch of the code we're using.

00:18:17.539 --> 00:18:24.150
It's on omadahealth slash capybara_sync.

00:18:24.150 --> 00:18:30.770
Faker. Some of the randomness in our test
suite

00:18:30.770 --> 00:18:33.400
was due to inputs that we gave it. Our

00:18:33.400 --> 00:18:38.260
code depends on factories. And the factories
used randomly

00:18:38.260 --> 00:18:41.630
generated fake data to fill in names, zip
codes,

00:18:41.630 --> 00:18:44.010
all the text fields. And there are good reasons

00:18:44.010 --> 00:18:45.010
to use random data.

00:18:45.010 --> 00:18:49.270
It regularly exercises your edge cases. Engineers
don't have

00:18:49.270 --> 00:18:51.549
to think of all possible first names you could

00:18:51.549 --> 00:18:54.360
use. The code should work the same regardless
of

00:18:54.360 --> 00:18:57.760
what zip code someone is in. But sometimes
it

00:18:57.760 --> 00:18:59.559
doesn't.

00:18:59.559 --> 00:19:03.650
For example, did you know that Faker includes
Guam

00:19:03.650 --> 00:19:05.700
and Porto Rico in the states that it might

00:19:05.700 --> 00:19:09.049
generate for someone? And we didn't include
those in

00:19:09.049 --> 00:19:12.690
our states dropdown. So when a Cucumber test
edits

00:19:12.690 --> 00:19:15.130
an account for a user that Faker placed in

00:19:15.130 --> 00:19:18.220
Guam, and their state is not entered when
you

00:19:18.220 --> 00:19:20.409
try to click save. And that leads to a

00:19:20.409 --> 00:19:23.380
validation failure, and that leads to Cucumber
not seeing

00:19:23.380 --> 00:19:26.470
the expected results, and a test run with,
from

00:19:26.470 --> 00:19:33.470
a new factory will not reproduce that failure,
right.

00:19:37.210 --> 00:19:40.360
Something crazy happened. Here we go.

00:19:40.360 --> 00:19:45.549
Times and dates. Oh, we're out of sync. Let

00:19:45.549 --> 00:19:52.549
me just. Momentary technical difficulties.
Mhmm.

00:20:02.309 --> 00:20:09.169
Cool.

00:20:09.169 --> 00:20:16.169
OK. Times and dates. Another subtle input
to your

00:20:19.460 --> 00:20:23.850
code is the current time. Our app sets itself

00:20:23.850 --> 00:20:26.720
up to be on the user's time zone, to

00:20:26.720 --> 00:20:29.740
prevent time-dependent data, like which week
of our program

00:20:29.740 --> 00:20:32.740
you are on in the middle of Saturday night.

00:20:32.740 --> 00:20:34.740
And this was policy. We all knew about this.

00:20:34.740 --> 00:20:38.350
We always used zone-aware time calls.

00:20:38.350 --> 00:20:40.799
Except that we didn't. Like, when I audited
it,

00:20:40.799 --> 00:20:43.029
I found over a hundred places where we neglected

00:20:43.029 --> 00:20:46.649
to use zone-aware time calls. So most of these

00:20:46.649 --> 00:20:50.909
are fine. There's usually nothing wrong with
epic seconds.

00:20:50.909 --> 00:20:53.600
But it only takes one misplaced call to time

00:20:53.600 --> 00:20:56.630
dot now to create a failure. It's really best

00:20:56.630 --> 00:20:59.110
to just forget about time dot now. Search
your

00:20:59.110 --> 00:21:01.549
code base for it and eliminate it. Always
use

00:21:01.549 --> 00:21:04.100
time dot zone dot now. Same thing for date

00:21:04.100 --> 00:21:07.049
dot today. That's time zone dependent. You
want to

00:21:07.049 --> 00:21:09.059
use time dot zone dot today.

00:21:09.059 --> 00:21:11.600
Unsurprisingly, I found a bunch of this class
of

00:21:11.600 --> 00:21:15.240
failure when I was at RubyConf in Miami. So

00:21:15.240 --> 00:21:19.250
these methods create random failures. Because
your database objects

00:21:19.250 --> 00:21:21.279
can be in a different time zone than your

00:21:21.279 --> 00:21:25.580
machine's local time zone.

00:21:25.580 --> 00:21:28.919
External dependencies. Any time you depend
on a third

00:21:28.919 --> 00:21:32.260
party service in your test, you introduce
a possible

00:21:32.260 --> 00:21:37.190
random element, right. S3, Google Analytics,
Facebook. Any of

00:21:37.190 --> 00:21:39.399
these things can go down. They can be slow.

00:21:39.399 --> 00:21:42.669
They can be broken. Additionally, they all
depend on

00:21:42.669 --> 00:21:46.100
the quality of your local internet connection.

00:21:46.100 --> 00:21:49.870
So, I'm gonna suggest that if you are affected

00:21:49.870 --> 00:21:54.570
by random failures, it's important to reproduce
the failure.

00:21:54.570 --> 00:21:56.130
It is possible. It is possible. It is not

00:21:56.130 --> 00:21:59.240
only possible. It is critical. And any problem
that

00:21:59.240 --> 00:22:04.120
you can reproduce, reliably, can be solved.
Well, at

00:22:04.120 --> 00:22:06.130
least, if you can reproduce it, you have a

00:22:06.130 --> 00:22:10.210
heck of a lot better chance of solving it.

00:22:10.210 --> 00:22:14.320
So, you have to bolt it all down. How

00:22:14.320 --> 00:22:16.659
do you fix the data? When you're trying to

00:22:16.659 --> 00:22:19.049
reproduce a random failure, you're gonna need
the same

00:22:19.049 --> 00:22:22.409
database objects used by the failing test.
So if

00:22:22.409 --> 00:22:24.149
you used factories, and there's not a file
system

00:22:24.149 --> 00:22:27.029
record when a test starts to fail randomly,
you're

00:22:27.029 --> 00:22:29.110
gonna want to document the database state
at the

00:22:29.110 --> 00:22:30.250
time of failure.

00:22:30.250 --> 00:22:33.320
And that's gonna mean yml fixtures or, like,
and

00:22:33.320 --> 00:22:35.539
SQL dump, or something else clever. You have
to

00:22:35.539 --> 00:22:38.120
find a way to re-establish that same state
that

00:22:38.120 --> 00:22:41.880
was created at the moment that you had the

00:22:41.880 --> 00:22:44.919
failure. And the network. Great talk before
about how

00:22:44.919 --> 00:22:48.049
to nail down the network. API calls and responses

00:22:48.049 --> 00:22:52.279
are input for your code. Web-mock, vcr, other
libraries

00:22:52.279 --> 00:22:55.520
exist to replay third party service responses.

00:22:55.520 --> 00:22:57.909
So, if you're trying to reproduce a failure
in

00:22:57.909 --> 00:23:00.570
a test that has any third party dependencies,
you're

00:23:00.570 --> 00:23:03.380
gonna wanna use a library to capture and replay

00:23:03.380 --> 00:23:05.730
those responses.

00:23:05.730 --> 00:23:09.820
Also, share buttons, right. In your Cucumber
tests, you're

00:23:09.820 --> 00:23:12.779
gonna wanna remove the calls to Google Analytics,
Facebook

00:23:12.779 --> 00:23:15.679
lite buttons, all that stuff from the browser.
These

00:23:15.679 --> 00:23:18.230
slow down your page load time, and they create

00:23:18.230 --> 00:23:21.669
unnecessary failures because of that.

00:23:21.669 --> 00:23:24.409
But, if you're replaying all your network
calls, how

00:23:24.409 --> 00:23:27.549
do you know the external API hasn't changed,
right?

00:23:27.549 --> 00:23:29.200
You want to test the services that your code

00:23:29.200 --> 00:23:31.409
depends on, too. So you need a build that

00:23:31.409 --> 00:23:34.850
does that. But it shouldn't be the main build.

00:23:34.850 --> 00:23:36.390
Purpose of the main build is to let the

00:23:36.390 --> 00:23:40.000
team know when their code is broken, when
their

00:23:40.000 --> 00:23:42.840
code is broken. And it should do that as

00:23:42.840 --> 00:23:44.740
quickly as possible.

00:23:44.740 --> 00:23:46.960
And then we have a separate, external build
that

00:23:46.960 --> 00:23:50.539
tests the interactions with third party services.
So, essentially,

00:23:50.539 --> 00:23:52.570
external communication is off and then on,
and we

00:23:52.570 --> 00:23:54.110
check build results for both.

00:23:54.110 --> 00:23:57.990
So, I want to talk about another reason that

00:23:57.990 --> 00:24:01.360
tests fail randomly. Rspec runs all your tests
in

00:24:01.360 --> 00:24:06.200
a random order every time. And obviously this
introduces

00:24:06.200 --> 00:24:09.450
randomness. But, there is a reason for that,
and

00:24:09.450 --> 00:24:11.130
the reason is to help you stay on top

00:24:11.130 --> 00:24:13.669
of test pollution.

00:24:13.669 --> 00:24:16.019
Test pollution is when state that is changed
in

00:24:16.019 --> 00:24:19.870
one test persists and influences the results
of other

00:24:19.870 --> 00:24:22.950
tests. Changed state can live in process memory,
in

00:24:22.950 --> 00:24:26.460
a database, on the file system, in an external

00:24:26.460 --> 00:24:29.240
service. Right. Lots of places.

00:24:29.240 --> 00:24:32.750
Sometimes, the polluted state causes the subsequent
test to

00:24:32.750 --> 00:24:37.070
fail incorrectly. And sometimes it causes
the subsequent test

00:24:37.070 --> 00:24:40.789
to pass incorrectly. And this was such a rampant

00:24:40.789 --> 00:24:43.549
issue in the early days of Rspec that the

00:24:43.549 --> 00:24:45.710
Rspec team made running the tests in a random

00:24:45.710 --> 00:24:49.740
order the default as of Rspec 2. So, thank

00:24:49.740 --> 00:24:51.220
you Rspec.

00:24:51.220 --> 00:24:56.419
Now, any test pollution issues should stand
out. But

00:24:56.419 --> 00:24:59.390
what do you think happens if you ignore random

00:24:59.390 --> 00:25:04.669
test failures for like a year or so? Yeah.

00:25:04.669 --> 00:25:08.080
Here's some clues that your issue might be
test

00:25:08.080 --> 00:25:10.490
pollution, right.

00:25:10.490 --> 00:25:14.090
With test pollution, the effected tests never
fail when

00:25:14.090 --> 00:25:17.600
they're run in isolation. Not ever. And rather
than

00:25:17.600 --> 00:25:21.970
throwing an unexpected exception, a test pollution
failure usually

00:25:21.970 --> 00:25:24.760
takes the form of returning different data
than what

00:25:24.760 --> 00:25:26.480
you expected.

00:25:26.480 --> 00:25:29.460
And finally, the biggest clue that you might
have

00:25:29.460 --> 00:25:31.730
a test pollution issue is that you haven't
really

00:25:31.730 --> 00:25:36.909
been checking for test pollution. So, we gotta
reproduce

00:25:36.909 --> 00:25:39.059
test pollution issues. Which means we have
to run

00:25:39.059 --> 00:25:40.980
the tests suite in the same order, and we

00:25:40.980 --> 00:25:43.929
have to use the fixture or database data and

00:25:43.929 --> 00:25:46.539
the network data from the failed build.

00:25:46.539 --> 00:25:49.899
So, first you have to identify the random
seed.

00:25:49.899 --> 00:25:52.340
Maybe you've seen this cryptic line at the
end

00:25:52.340 --> 00:25:55.740
of your Rspec test output. This is not completely

00:25:55.740 --> 00:25:59.990
meaningless. 22164 is your magic key to rerun
the

00:25:59.990 --> 00:26:01.820
test in the same order as the build that

00:26:01.820 --> 00:26:05.210
just ran. So you want to modify your dot

00:26:05.210 --> 00:26:08.929
Rspec file to include the seed value. Be sure

00:26:08.929 --> 00:26:11.500
to format, to change the format to documentation
as

00:26:11.500 --> 00:26:13.470
well as adding the seed. That will make it

00:26:13.470 --> 00:26:15.640
more readable, for you, so that you can start

00:26:15.640 --> 00:26:17.260
to think about the order that things are running

00:26:17.260 --> 00:26:19.529
in and what could possibly be causing your
pollution

00:26:19.529 --> 00:26:21.309
problem.

00:26:21.309 --> 00:26:24.980
So, the problem with test pollution is fundamentally
about

00:26:24.980 --> 00:26:28.480
incorrectly persisted state, so that the state
that's being

00:26:28.480 --> 00:26:31.600
persisted is important. You want to ensure
that the

00:26:31.600 --> 00:26:35.320
data is identical to the failed build. And
there's

00:26:35.320 --> 00:26:39.010
lots of ways to do this.

00:26:39.010 --> 00:26:41.679
So you've got your random seed. You've got
your

00:26:41.679 --> 00:26:43.940
data from the failed build, and then you rerun

00:26:43.940 --> 00:26:47.110
the specs. And if you see the failure repeated,

00:26:47.110 --> 00:26:50.220
you should celebrate, right. You've correctly
diagnosed that the

00:26:50.220 --> 00:26:52.370
issue is test pollution and you are on your

00:26:52.370 --> 00:26:53.440
way to fixing it.

00:26:53.440 --> 00:26:56.370
And if you don't see the failure, maybe it's

00:26:56.370 --> 00:26:59.330
not test pollution. Maybe there's another
aspect of your

00:26:59.330 --> 00:27:03.159
build environment that needs to be duplicated,
right. But

00:27:03.159 --> 00:27:07.700
even then, say you've reproduced the problem.
Now what?

00:27:07.700 --> 00:27:10.570
You still have to diagnose what is causing
the

00:27:10.570 --> 00:27:12.909
pollution. You know that running the tests
in a

00:27:12.909 --> 00:27:16.950
particular order creates a failure. The problem
with test

00:27:16.950 --> 00:27:19.799
pollution is that there is a non-obvious connection
between

00:27:19.799 --> 00:27:22.769
where the problem appears in the failed test
and

00:27:22.769 --> 00:27:25.830
its source in another test case.

00:27:25.830 --> 00:27:28.279
And you can find out about the failure using

00:27:28.279 --> 00:27:31.980
print statements or debugger, using whatever
tools you want.

00:27:31.980 --> 00:27:34.769
But, maybe you get lucky and you are able

00:27:34.769 --> 00:27:37.100
to just figure it out. But in a complex

00:27:37.100 --> 00:27:41.080
code base with thousands of tests the source
of

00:27:41.080 --> 00:27:44.710
the pollution can be tricky to track down.

00:27:44.710 --> 00:27:48.940
So, just running through the suite to reproduce
the

00:27:48.940 --> 00:27:52.059
failure might take ten minutes. And this is
actually

00:27:52.059 --> 00:27:56.090
terrible, right. Waiting ten minutes for feedback?
This is

00:27:56.090 --> 00:28:00.659
a source of cognitive depletion. All of the
stack

00:28:00.659 --> 00:28:02.649
you've built up in your brain to solve this

00:28:02.649 --> 00:28:05.909
problem is disintegrating over that ten minutes.
You're gonna

00:28:05.909 --> 00:28:08.980
work on other problems. You're gonna check
Facebook while

00:28:08.980 --> 00:28:11.460
those tests are running. And you're gonna
lose your

00:28:11.460 --> 00:28:16.279
focus, right. And that is, essentially, how
rando wins.

00:28:16.279 --> 00:28:20.960
Fortunately, we can discard large amounts
of complexity and

00:28:20.960 --> 00:28:23.179
noise, by using a stupid process that we don't

00:28:23.179 --> 00:28:28.909
have to think about. Binary search. In code,
debugging

00:28:28.909 --> 00:28:32.440
via binary search is a process of repeatedly
dividing

00:28:32.440 --> 00:28:36.029
the search space in half, until you locate
the

00:28:36.029 --> 00:28:41.190
smallest coherent unit that exhibits the desired
behavior.

00:28:41.190 --> 00:28:44.899
OK. So we have the output of a set

00:28:44.899 --> 00:28:47.630
of specs that we ran in documentation mode.
This

00:28:47.630 --> 00:28:49.730
is sort of a high level overview that you

00:28:49.730 --> 00:28:52.380
might see in Sublime, right. And in the middle

00:28:52.380 --> 00:28:56.220
here, this red spot is where the failure occurs.

00:28:56.220 --> 00:28:59.690
So we know the cause has to happen before

00:28:59.690 --> 00:29:05.070
the failure, because causality. So in the
green block,

00:29:05.070 --> 00:29:08.769
at the top, is, that's the candidate block,
or

00:29:08.769 --> 00:29:10.700
the search space.

00:29:10.700 --> 00:29:16.789
So, practically, we split the search space
in half,

00:29:16.789 --> 00:29:19.630
and remove half of it. And if the failure

00:29:19.630 --> 00:29:23.399
reoccurs when we rerun with this configuration,
we know

00:29:23.399 --> 00:29:28.370
that the cause is in that remaining block,
right.

00:29:28.370 --> 00:29:30.669
But sometimes you've got more problems than
you know.

00:29:30.669 --> 00:29:32.240
So it's good to test the other half of

00:29:32.240 --> 00:29:35.690
the search space as well.

00:29:35.690 --> 00:29:37.980
So if you're failure appeared in step zero,
you

00:29:37.980 --> 00:29:41.100
expect not to see the failure here. If you

00:29:41.100 --> 00:29:44.789
also see the failure here, you might have
multiple

00:29:44.789 --> 00:29:48.870
sources of test pollution or, more likely,
test pollution

00:29:48.870 --> 00:29:51.279
isn't really your problem, and the problem
is actually

00:29:51.279 --> 00:29:55.090
outside of the search space.

00:29:55.090 --> 00:29:58.840
So here's a hiccup. Binary search requires
us to

00:29:58.840 --> 00:30:01.149
remove large segments of the test suite to
narrow

00:30:01.149 --> 00:30:04.029
in on the test that causes the pollution.
And

00:30:04.029 --> 00:30:07.220
this creates a problem, because random ordering
in the

00:30:07.220 --> 00:30:11.289
test suite changes when you remove tests.
Completely. Remove

00:30:11.289 --> 00:30:14.460
one test, the whole thing reshuffles on the
same

00:30:14.460 --> 00:30:17.549
seeds. So there's no way to effectively perform
a

00:30:17.549 --> 00:30:21.210
binary search using a random seed.

00:30:21.210 --> 00:30:23.710
So here's the good news. It is possible to

00:30:23.710 --> 00:30:27.120
manually declare the ordering of your Rspec
tests, using

00:30:27.120 --> 00:30:34.120
this undocumented configuration option, order_examples.
So, config dot order_examples

00:30:35.090 --> 00:30:37.750
takes a block, and that'll get the whole collection

00:30:37.750 --> 00:30:41.179
of Rspec examples after Rspec has loaded the
specs

00:30:41.179 --> 00:30:43.429
to be run. And then you just reorder the

00:30:43.429 --> 00:30:45.370
examples in whatever order you want them to
be

00:30:45.370 --> 00:30:48.620
ordered in and return that set from the block.

00:30:48.620 --> 00:30:50.730
So, that sounds simple.

00:30:50.730 --> 00:30:56.690
I, I made a little proto-gem for this. It's

00:30:56.690 --> 00:31:00.830
called rspec_manual_order, and basically it
takes the output of

00:31:00.830 --> 00:31:04.110
the documentation format from the test that
you ran

00:31:04.110 --> 00:31:08.820
earlier, and turns that into an ordering list.
So,

00:31:08.820 --> 00:31:11.019
if you, if you log the output of, of

00:31:11.019 --> 00:31:13.580
your Rspec suite with the failure to a file,

00:31:13.580 --> 00:31:16.210
you'll be able to replay it using rspec_manual_order,
and

00:31:16.210 --> 00:31:18.929
you can check that out on GitHub.

00:31:18.929 --> 00:31:25.210
So it's possible to reduce the search space
and

00:31:25.210 --> 00:31:27.419
do a binary search on Rspec. And once you've

00:31:27.419 --> 00:31:30.080
reduced the search space to a single spec
or

00:31:30.080 --> 00:31:32.500
a suite of examples that all cause the problem,

00:31:32.500 --> 00:31:35.509
you put your monkey in the position to shine

00:31:35.509 --> 00:31:38.490
against your test pollution issue, right.
This is where

00:31:38.490 --> 00:31:40.570
it actually becomes possible to figure it
out by

00:31:40.570 --> 00:31:43.500
looking at the context.

00:31:43.500 --> 00:31:45.720
I've gone in depth into test pollution, because
it's

00:31:45.720 --> 00:31:50.860
amenable to investigation using simple techniques,
right. Binary search

00:31:50.860 --> 00:31:54.100
and reproducing the failure state are key
debugging skills

00:31:54.100 --> 00:31:56.990
that you will improve with practice. When
I started

00:31:56.990 --> 00:31:59.470
looking into our random failures, I didn't
know we

00:31:59.470 --> 00:32:02.850
had test pollution issues. Turned out we weren't
resetting

00:32:02.850 --> 00:32:06.100
the global time zone correctly between tests.

00:32:06.100 --> 00:32:08.110
This was far from the only problem I found.

00:32:08.110 --> 00:32:11.139
But without fixing this one, our suite would
never

00:32:11.139 --> 00:32:16.399
be clean. So, every random failure that you
are

00:32:16.399 --> 00:32:19.789
chasing has its own unique story. There are
some

00:32:19.789 --> 00:32:22.240
in our code that we haven't figured out yet,

00:32:22.240 --> 00:32:23.620
and there are some in your code that I

00:32:23.620 --> 00:32:26.929
hope I never see.

00:32:26.929 --> 00:32:30.610
The key to eliminating random test failures
is don't

00:32:30.610 --> 00:32:33.880
give up, right. Today we've covered things
that go

00:32:33.880 --> 00:32:36.899
wrong in Cucumber and Capybara. Things that
go wrong

00:32:36.899 --> 00:32:39.870
in Rspec and just general sources of randomness
in

00:32:39.870 --> 00:32:42.100
your test suite. And hopefully you're walking
out of

00:32:42.100 --> 00:32:44.200
here with at least one new technique to improve

00:32:44.200 --> 00:32:46.399
the reliability of your tests.

00:32:46.399 --> 00:32:48.259
We've been working with ours for about eight
months,

00:32:48.259 --> 00:32:50.080
and we're in a place where random failures
occur

00:32:50.080 --> 00:32:53.789
like, less than five percent of the time.
And

00:32:53.789 --> 00:32:55.620
we set up a tiered build system to run

00:32:55.620 --> 00:32:58.820
the tests sequentially when the fast parallel
build fails.

00:32:58.820 --> 00:33:02.149
So, the important thing is that when new random

00:33:02.149 --> 00:33:04.570
failures occur, we reliably assign a team
to hunt

00:33:04.570 --> 00:33:05.399
them down.

00:33:05.399 --> 00:33:08.049
And if you keep working on your build, eventually

00:33:08.049 --> 00:33:10.049
you'll figure out a combination of tactics
that will

00:33:10.049 --> 00:33:12.799
lead to a stable, reliable test suite, that
will

00:33:12.799 --> 00:33:15.740
have the trust of your team. So thank you.