WEBVTT 00:00:17.240 --> 00:00:20.029 AUSTIN PUTMAN: This is the last session before happy hour. 00:00:20.029 --> 00:00:24.560 I appreciate all of you for hanging around 00:00:24.560 --> 00:00:27.720 this long. Maybe you're here because, 00:00:27.720 --> 00:00:28.939 you don't know, there's a bar on the 00:00:28.939 --> 00:00:32.250 first floor of this hotel. I think that 00:00:32.250 --> 00:00:35.390 is where the main track is currently taking place. 00:00:35.390 --> 00:00:37.660 I am Austin Putman. I am the VP of 00:00:37.660 --> 00:00:41.410 engineering for Omada Health. At Omada, we support people 00:00:41.410 --> 00:00:44.940 at risk of chronic disease, like diabetes, make crucial 00:00:44.940 --> 00:00:49.190 behavior changes and have longer, healthier lives. So, it's 00:00:49.190 --> 00:00:49.909 pretty awesome. 00:00:49.909 --> 00:00:52.680 I'm gonna start with some spoilers, because I want 00:00:52.680 --> 00:00:55.339 you to have an amazing RailsConf. So if this 00:00:55.339 --> 00:00:57.350 is not what you're looking for, don't be shy 00:00:57.350 --> 00:01:00.769 about finding that bar track. We're gonna spend some 00:01:00.769 --> 00:01:04.750 quality time with Capybara and Cucumber, whose flakiness is 00:01:04.750 --> 00:01:07.490 legendary, for very good reasons. 00:01:07.490 --> 00:01:10.670 Let me take your temperature. Can I see hands? 00:01:10.670 --> 00:01:12.909 How many people have had problems with random failures 00:01:12.909 --> 00:01:18.229 in Cucumber or Capybara? Yeah. Yeah. This is reality, 00:01:18.229 --> 00:01:19.020 folks. 00:01:19.020 --> 00:01:21.880 We're also gonna cover the ways that Rspec does 00:01:21.880 --> 00:01:24.630 and does not help us track down test pollution. 00:01:24.630 --> 00:01:26.950 How many folks out there have had a random 00:01:26.950 --> 00:01:28.710 failure problem in the Rspec suite, like in your 00:01:28.710 --> 00:01:32.070 models or your controller tests? OK, still a lot 00:01:32.070 --> 00:01:34.950 of people, right. It happens. But we don't talk 00:01:34.950 --> 00:01:36.780 about it. 00:01:36.780 --> 00:01:39.520 So in between, we're gonna review some problems that 00:01:39.520 --> 00:01:42.520 can dog any test suite. This is like, random 00:01:42.520 --> 00:01:46.569 data, times zone heck, external dependencies. All this leads 00:01:46.569 --> 00:01:48.490 to pain. There was a great talk before about 00:01:48.490 --> 00:01:49.719 external dependencies. 00:01:49.719 --> 00:01:52.990 Just, here's just a random one. How many people 00:01:52.990 --> 00:01:54.219 here have had a test fail due to a 00:01:54.219 --> 00:01:59.689 daylight savings time issue? Yeah. Ben Franklin, you are 00:01:59.689 --> 00:02:02.359 a menace. 00:02:02.359 --> 00:02:05.780 Let's talk about eliminating inconsistent failures in your tests, 00:02:05.780 --> 00:02:09.818 and on our team, we call that fighting randos. 00:02:09.818 --> 00:02:11.920 And I'm here to talk about this, because I 00:02:11.920 --> 00:02:15.670 was stupid and short-sighted, and random failures caused us 00:02:15.670 --> 00:02:18.250 a lot of pain. I chose to try to 00:02:18.250 --> 00:02:21.910 hit deadlines instead of focusing on build quality, and 00:02:21.910 --> 00:02:24.060 our team paid a terrible price. 00:02:24.060 --> 00:02:26.250 Anybody out there paying that price? Anybody out there 00:02:26.250 --> 00:02:30.030 feel me on this? Yeah. It's, it sucks. 00:02:30.030 --> 00:02:31.950 So let's do some science. Some problems seem to 00:02:31.950 --> 00:02:34.750 have more random failure problems than others. I want 00:02:34.750 --> 00:02:37.120 to gather some data. So first, if you write 00:02:37.120 --> 00:02:40.750 tests on a regular basis, raise your hand. Right? 00:02:40.750 --> 00:02:44.530 Wow. I love RailsConf. Keep your hand up if 00:02:44.530 --> 00:02:48.360 you believe you have experienced a random test failure. 00:02:48.360 --> 00:02:49.820 The whole room. 00:02:49.820 --> 00:02:52.280 Now, if you think you're likely to have one 00:02:52.280 --> 00:02:55.560 in the next, like, four weeks. Who's out there? 00:02:55.560 --> 00:02:57.880 It's still happening, right. You're in the middle of 00:02:57.880 --> 00:03:00.790 it. OK, so this is not hypothetical for this 00:03:00.790 --> 00:03:03.320 audience. This is a widespread problem. But I don't 00:03:03.320 --> 00:03:05.120 see a lot of people talking about it. 00:03:05.120 --> 00:03:08.160 And the truth is, while being a great tool, 00:03:08.160 --> 00:03:11.560 a comprehensive integration suite, is like a breeding ground 00:03:11.560 --> 00:03:13.810 for baffling Heisenbugs. 00:03:13.810 --> 00:03:18.400 So, to understand how test failures become a chronic 00:03:18.400 --> 00:03:20.560 productivity blocker, I want to talk a little bit 00:03:20.560 --> 00:03:25.010 about testing culture, right. Why is this even bad? 00:03:25.010 --> 00:03:28.390 So, we have an automated CI machine that runs 00:03:28.390 --> 00:03:30.620 our full test suite every time a commit is 00:03:30.620 --> 00:03:33.650 pushed. And every time the bill passes, we push 00:03:33.650 --> 00:03:36.620 the new code to a staging environment for acceptance. 00:03:36.620 --> 00:03:38.510 Right, that's our process. How many people out there 00:03:38.510 --> 00:03:40.970 have a setup that's kind of like that? OK. 00:03:40.970 --> 00:03:43.410 Awesome. So a lot of people know what I'm 00:03:43.410 --> 00:03:44.220 talking about. 00:03:44.220 --> 00:03:47.410 So, in the fall of 2012, we started seeing 00:03:47.410 --> 00:03:52.190 occasional, unreproducible failures of the test suite in Jenkins. 00:03:52.190 --> 00:03:54.460 And we were pushing to get features out the 00:03:54.460 --> 00:03:58.630 door for January first. And we found that we 00:03:58.630 --> 00:04:01.470 could just rerun the build and the failure would 00:04:01.470 --> 00:04:03.260 go away. 00:04:03.260 --> 00:04:05.630 And we got pretty good at spotting the two 00:04:05.630 --> 00:04:09.400 or three tests where this happened. So, we would 00:04:09.400 --> 00:04:11.500 check the output of a failed build, and if 00:04:11.500 --> 00:04:14.650 it was one of the suspect tests, we would 00:04:14.650 --> 00:04:17.720 just run the build again. Not a problem. Staging 00:04:17.720 --> 00:04:20.789 would deploy. We would continue our march towards the 00:04:20.789 --> 00:04:22.789 launch. 00:04:22.789 --> 00:04:24.280 But by the time spring rolled around, there were 00:04:24.280 --> 00:04:28.479 like seven or eight places causing problems regularly. And 00:04:28.479 --> 00:04:29.949 we would try to fix them, you know, we 00:04:29.949 --> 00:04:33.360 wouldn't ignore them. But the failures were unreliable. So 00:04:33.360 --> 00:04:35.060 it was hard to say if we had actually 00:04:35.060 --> 00:04:36.840 fixed anything. 00:04:36.840 --> 00:04:39.020 And eventually we just added a gem called Cucumber-Rerun. 00:04:39.020 --> 00:04:46.020 Yeah. And this just reruns the failed specs if 00:04:46.389 --> 00:04:48.539 there's a problem. And when it passed the second 00:04:48.539 --> 00:04:54.229 time, it's good. You're fine. No big deal. 00:04:54.229 --> 00:04:55.749 And then some people on our team got ambitious, 00:04:55.749 --> 00:04:58.469 and they said, we could make it faster. We 00:04:58.469 --> 00:05:01.629 could make CI faster with the parallel_test gem, which 00:05:01.629 --> 00:05:07.169 is awesome. But Cucumber-Rerun and parallel_test are not compatible. 00:05:07.169 --> 00:05:08.999 And so we had a test suite that ran 00:05:08.999 --> 00:05:13.020 three times faster, but failed twice as often. 00:05:13.020 --> 00:05:15.800 And as we came into the fall, we had 00:05:15.800 --> 00:05:20.629 our first Bad Jenkins week. On a fateful Tuesday, 00:05:20.629 --> 00:05:24.210 4 PM, the build just stopped passing. And there 00:05:24.210 --> 00:05:27.389 were anywhere from like thirty to seventy failures. And 00:05:27.389 --> 00:05:30.229 some of them were our usual suspects, and dozens 00:05:30.229 --> 00:05:34.020 of them were, like, previously good tests. Tests we 00:05:34.020 --> 00:05:34.509 trusted. 00:05:34.509 --> 00:05:37.370 And, so none of them failed in isolation, right. 00:05:37.370 --> 00:05:39.639 And after like two days of working on this, 00:05:39.639 --> 00:05:42.710 we eventually got a clean Rspec build, but Cucumber 00:05:42.710 --> 00:05:45.439 would still fail. And the failures could not be 00:05:45.439 --> 00:05:47.669 reproduced on a dev machine, or even on the 00:05:47.669 --> 00:05:50.800 same CI machine, outside of the, the whole build 00:05:50.800 --> 00:05:51.930 running. 00:05:51.930 --> 00:05:55.310 So, over the weekend, somebody pushes a commit and 00:05:55.310 --> 00:05:58.729 we get a green build. And there's nothing special 00:05:58.729 --> 00:06:00.990 about this commit, right. Like, it was like, a 00:06:00.990 --> 00:06:04.550 comment change. And we had tried a million things, 00:06:04.550 --> 00:06:07.900 and no single change obviously lead to the passing 00:06:07.900 --> 00:06:09.349 build. 00:06:09.349 --> 00:06:10.930 And the next week, we were back to like, 00:06:10.930 --> 00:06:14.889 you know, fifteen percent failure rate. Like, pretty good. 00:06:14.889 --> 00:06:18.580 So, we could push stories to staging again, and 00:06:18.580 --> 00:06:23.289 we're still under the deadline pressure, right. So, so 00:06:23.289 --> 00:06:28.400 we shrugged. And we moved on, right. And maybe 00:06:28.400 --> 00:06:32.400 somebody wants to guess, what happened next? Right? 00:06:32.400 --> 00:06:35.249 Yeah. It happened again, right. A whole week of 00:06:35.249 --> 00:06:38.090 just no tests pass. The build never passes. So 00:06:38.090 --> 00:06:40.400 we turn of parallel_tests, right. Because we can't even 00:06:40.400 --> 00:06:42.419 get like a coherent log of which tests are 00:06:42.419 --> 00:06:45.990 causing errors, and then we started commenting out the 00:06:45.990 --> 00:06:49.719 really problematic tests, and there were still these like 00:06:49.719 --> 00:06:55.340 seemingly innocuous specs that failed regularly but not consistently. 00:06:55.340 --> 00:06:57.469 So these are tests that have enough business value 00:06:57.469 --> 00:07:00.729 that we are very reluctant to just, like, delete 00:07:00.729 --> 00:07:01.370 them. 00:07:01.370 --> 00:07:07.729 And so we reinstated Cucumber-Rerun, and its buddy Rspec-Rerun. 00:07:07.729 --> 00:07:12.529 And this mostly worked, right. So we were making 00:07:12.529 --> 00:07:15.309 progress. But the build issues continued to show up 00:07:15.309 --> 00:07:18.520 in the negative column in our retrospectives. And that 00:07:18.520 --> 00:07:21.400 was because there were several problems with this situation, 00:07:21.400 --> 00:07:25.610 right. Like, reduced trust. When build failures happen four 00:07:25.610 --> 00:07:27.520 or five times a day, those aren't a red 00:07:27.520 --> 00:07:30.419 flag. Those are just how things go. And everyone 00:07:30.419 --> 00:07:33.439 on the team knows that the most likely explanation 00:07:33.439 --> 00:07:35.559 is a random failure. 00:07:35.559 --> 00:07:38.139 And the default response to a build failure becomes, 00:07:38.139 --> 00:07:41.689 run it again. So, just run it again, right. 00:07:41.689 --> 00:07:47.779 The build failed. Whatever. So then, occasionally, we break 00:07:47.779 --> 00:07:50.419 things for real. But we stopped noticing because we 00:07:50.419 --> 00:07:54.339 started expecting CI to be broken. Sometimes other pairs 00:07:54.339 --> 00:07:56.449 would pull the code and they would see the 00:07:56.449 --> 00:07:59.749 legitimate failures. Sometimes we thought we were having a 00:07:59.749 --> 00:08:02.319 Bad Jenkins week, and on the third or fourth 00:08:02.319 --> 00:08:06.279 day we realized we were having actual failures. 00:08:06.279 --> 00:08:08.319 This is pretty bad, right. 00:08:08.319 --> 00:08:12.039 So our system depends on green builds to mark 00:08:12.039 --> 00:08:14.189 the code that can be deployed to staging and 00:08:14.189 --> 00:08:17.240 production, and without green builds, stories can't get delivered 00:08:17.240 --> 00:08:22.020 and reviewed. So we stopped getting timely feedback. Meanwhile, 00:08:22.020 --> 00:08:24.520 the reviewer gets, like, a week's worth of stories. 00:08:24.520 --> 00:08:26.189 All at once. Big clump. 00:08:26.189 --> 00:08:27.749 And that means they have less time to pay 00:08:27.749 --> 00:08:31.719 attention to detail on each delivered feature. And that 00:08:31.719 --> 00:08:34.809 means that the product is a little bit crappier 00:08:34.809 --> 00:08:39.809 every week. So, maybe you need a bug fix. 00:08:39.809 --> 00:08:42.510 Fast. Forget about that. You've got, like, a twenty 00:08:42.510 --> 00:08:44.410 percent chance your bug fix build is gonna fail 00:08:44.410 --> 00:08:46.070 for no reason. 00:08:46.070 --> 00:08:47.870 Maybe the code has to ship, because the app 00:08:47.870 --> 00:08:51.529 is mega busted. In this case, we would rerun 00:08:51.529 --> 00:08:53.920 the failed tests on our local machine, and then 00:08:53.920 --> 00:08:58.660 cross our fingers and deploy. So, in effect, our 00:08:58.660 --> 00:09:03.130 policy was, if the code works on my machine, 00:09:03.130 --> 00:09:07.190 it can be deployed to production. 00:09:07.190 --> 00:09:11.339 So. At the most extreme, people lose faith in 00:09:11.339 --> 00:09:13.690 the build, and eventually they just forget about testing. 00:09:13.690 --> 00:09:15.690 And this didn't happen to us, but I had 00:09:15.690 --> 00:09:18.160 to explain to management that key features couldn't be 00:09:18.160 --> 00:09:21.839 shipped because of problems with the test server. And 00:09:21.839 --> 00:09:24.310 they wanted to know a lot more about the 00:09:24.310 --> 00:09:26.680 test server. And it was totally clear that while 00:09:26.680 --> 00:09:30.019 a working test server has their full support, an 00:09:30.019 --> 00:09:33.480 unreliable test server is a business liability and needs 00:09:33.480 --> 00:09:34.800 to be resolved. 00:09:34.800 --> 00:09:37.660 So, the test server is supposed to solve problems 00:09:37.660 --> 00:09:39.560 for us, and that is the only story that 00:09:39.560 --> 00:09:43.360 I like to tell about it. So, we began 00:09:43.360 --> 00:09:48.310 to fight back. And we personified the random failures. 00:09:48.310 --> 00:09:53.089 They became randos. A rando attack. A rando storm. 00:09:53.089 --> 00:10:00.089 And most memorably, Rando Backstabbian. Intergalactic randomness villain. 00:10:01.649 --> 00:10:03.290 We had a pair working on the test suite 00:10:03.290 --> 00:10:05.490 full time for about three months trying to resolve 00:10:05.490 --> 00:10:08.290 these issues. We tried about a thousand things, and 00:10:08.290 --> 00:10:09.639 some of them worked. And I'm gonna pass along 00:10:09.639 --> 00:10:12.440 the answers we found, and my hypothesis that we 00:10:12.440 --> 00:10:16.720 didn't disprove. Honestly, I'm hoping that you came to 00:10:16.720 --> 00:10:18.940 this talk because you've had similar problems and you 00:10:18.940 --> 00:10:22.680 found better solutions. So, this is just what we 00:10:22.680 --> 00:10:23.600 found. 00:10:23.600 --> 00:10:26.380 I, I have a very important tool for this 00:10:26.380 --> 00:10:29.420 section of the talk. It's the finger of blame. 00:10:29.420 --> 00:10:31.050 We use this a lot when we were like, 00:10:31.050 --> 00:10:34.920 hey, could the problem be Cucumber? And then we 00:10:34.920 --> 00:10:37.550 would go after that. So here comes finger of 00:10:37.550 --> 00:10:39.209 blame. 00:10:39.209 --> 00:10:46.160 Cucumber! Capybara. Poltergeist. Definitely part of the problem. I've 00:10:46.160 --> 00:10:48.959 talked to enough other teams that use these tools 00:10:48.959 --> 00:10:51.740 extensively, and the evidence from our audience, just to 00:10:51.740 --> 00:10:53.630 know that the results are just not as deterministic 00:10:53.630 --> 00:10:57.250 as we want. When you're using multiple threads and 00:10:57.250 --> 00:11:00.940 you're asserting against a browser environment, you're gonna have 00:11:00.940 --> 00:11:02.470 so issues, right. 00:11:02.470 --> 00:11:06.670 And one of those is browser environment, right. Browser 00:11:06.670 --> 00:11:09.709 environment is a euphemism for, like, a complicated piece 00:11:09.709 --> 00:11:12.970 of software that itself is a playground for network 00:11:12.970 --> 00:11:17.430 latency issues and rendering hiccups and a callback soup. 00:11:17.430 --> 00:11:18.810 So your tests have to be written in a 00:11:18.810 --> 00:11:21.490 very specific way to prevent all the threads and 00:11:21.490 --> 00:11:23.490 all the different layers of code from getting confused 00:11:23.490 --> 00:11:24.769 and smashing into each other. 00:11:24.769 --> 00:11:26.910 You know, some of you maybe are lucky, and 00:11:26.910 --> 00:11:29.009 you use the right style most of the time 00:11:29.009 --> 00:11:31.880 by default. Maybe you don't see that many problems. 00:11:31.880 --> 00:11:35.149 A few things you gotta never assume. 00:11:35.149 --> 00:11:38.250 Never assume the page has loaded. Never assume the 00:11:38.250 --> 00:11:42.170 markup you are asserting against exists. Never assume your 00:11:42.170 --> 00:11:46.389 AJAX request actually finished, and never assume the speed 00:11:46.389 --> 00:11:48.709 at which things happen, because until you bolt it 00:11:48.709 --> 00:11:52.410 down, you just don't know. 00:11:52.410 --> 00:11:57.069 So, always make sure the markup exists before you 00:11:57.069 --> 00:11:59.819 assert against it. New Capybara is supposed to be 00:11:59.819 --> 00:12:03.380 better at this, and it, it's improved. But I 00:12:03.380 --> 00:12:06.060 do not trust them. I am super paranoid about 00:12:06.060 --> 00:12:08.509 this stuff. This is a good example of a 00:12:08.509 --> 00:12:11.709 lurking rando, due to a race condition, in your 00:12:11.709 --> 00:12:12.589 browser. 00:12:12.589 --> 00:12:14.470 Capybara is supposed to wait for the page to 00:12:14.470 --> 00:12:17.470 load before it continues after the visit method, but 00:12:17.470 --> 00:12:19.639 I find it has sort of medium success with 00:12:19.639 --> 00:12:24.079 doing that. Bolt it down, right. We used to 00:12:24.079 --> 00:12:26.069 have something called the wait_until block, and that would 00:12:26.069 --> 00:12:28.410 stop execution until a condition was met. And that 00:12:28.410 --> 00:12:31.310 was great. Cause it replaced, like, sleep statements, which 00:12:31.310 --> 00:12:33.120 is what we used before that. 00:12:33.120 --> 00:12:37.430 Modern Capybara, no more wait_until block. It's inside the 00:12:37.430 --> 00:12:42.110 have_cc and have_content matcher. So, always assert that something 00:12:42.110 --> 00:12:46.370 exists before you try to do anything with it. 00:12:46.370 --> 00:12:49.050 And sometimes it might take a long time. The 00:12:49.050 --> 00:12:51.519 default timeout for that, for those Capybara assertions, is 00:12:51.519 --> 00:12:55.720 like five seconds. And sometimes, you need twenty seconds. 00:12:55.720 --> 00:12:57.519 Usually, for us, that's because we're doing like a 00:12:57.519 --> 00:13:01.160 file upload or another lengthy operation. But, again, never 00:13:01.160 --> 00:13:03.829 assume that things are gonna take a normal amount 00:13:03.829 --> 00:13:07.060 of time. 00:13:07.060 --> 00:13:09.940 Race conditions. I would be out of line to 00:13:09.940 --> 00:13:13.300 give this talk without talking explicitly about race conditions, 00:13:13.300 --> 00:13:16.449 right. Whenever you get, create a situation where a 00:13:16.449 --> 00:13:20.470 sequence of key events doesn't happen in a predetermined 00:13:20.470 --> 00:13:24.560 order, you've got a potential race condition. 00:13:24.560 --> 00:13:27.569 So the winner of the race is random. And 00:13:27.569 --> 00:13:31.709 that can create random outcomes in your test suite. 00:13:31.709 --> 00:13:36.449 So what's an example of one of those? AJAX. 00:13:36.449 --> 00:13:42.569 Right? In AJAX, your JavaScript running in Firefox may 00:13:42.569 --> 00:13:46.110 or may not complete its AJAX call and render 00:13:46.110 --> 00:13:49.629 the response before the test thread makes its assertions. 00:13:49.629 --> 00:13:54.060 Now, Capybara tries to fix this by retrying to 00:13:54.060 --> 00:13:57.660 assertions. But that doesn't always work. So, say you're 00:13:57.660 --> 00:14:00.740 clicking a button to submit a form, and then 00:14:00.740 --> 00:14:05.339 you're going to another page or refreshing the page. 00:14:05.339 --> 00:14:07.920 This might cut off that post request, whether it's 00:14:07.920 --> 00:14:10.339 from a regular form or an AJAX form, but 00:14:10.339 --> 00:14:13.290 especially if it's an AJAX request. As soon as 00:14:13.290 --> 00:14:16.439 you say, visit, all the outstanding AJAX requests cancel 00:14:16.439 --> 00:14:18.000 in your browser. 00:14:18.000 --> 00:14:20.740 So, you can fix this by adding an explicit 00:14:20.740 --> 00:14:23.980 wait into your Cucumber step, right. When you need 00:14:23.980 --> 00:14:27.329 to rig the race, jQuery provides this handy counter, 00:14:27.329 --> 00:14:30.529 dollar dot active. That's all the XHR requests that 00:14:30.529 --> 00:14:33.120 are outstanding. So, it's really not hard to keep 00:14:33.120 --> 00:14:36.720 an eye on what's going on. 00:14:36.720 --> 00:14:41.089 Here's another offender. Creating database objects from within the 00:14:41.089 --> 00:14:44.709 test thread, right. What's wrong with this approach? Now, 00:14:44.709 --> 00:14:49.399 if you're using mySQL, maybe nothing's wrong with this, 00:14:49.399 --> 00:14:53.750 right. And that's because mySQL has the transaction hygiene 00:14:53.750 --> 00:14:59.589 of a roadside diner, right. There's no separation. If 00:14:59.589 --> 00:15:02.319 you're using Postgres, which we are, it has stricter 00:15:02.319 --> 00:15:04.569 rules about the transactions. And this can create a 00:15:04.569 --> 00:15:06.720 world of pain. 00:15:06.720 --> 00:15:09.860 So, the test code and the Rails server are 00:15:09.860 --> 00:15:13.050 running in different threads. And this effectively means different 00:15:13.050 --> 00:15:17.019 database connections, and that means different transaction states. Now 00:15:17.019 --> 00:15:20.819 there is some shared database connection code out there. 00:15:20.819 --> 00:15:23.889 And I've had sort of mixed results with it. 00:15:23.889 --> 00:15:26.810 I've heard this thing, right, about shared mutable resources 00:15:26.810 --> 00:15:32.199 between threads being problematic. Like, they are. So let's 00:15:32.199 --> 00:15:34.639 say you're lucky, and both threads are in the 00:15:34.639 --> 00:15:38.910 same database transaction. Both the test thread and the 00:15:38.910 --> 00:15:43.360 server thread are issuing check points and rollbacks against 00:15:43.360 --> 00:15:47.860 the same connection. So sometimes one thread will reset 00:15:47.860 --> 00:15:50.350 to a checkpoint after the other thread has already 00:15:50.350 --> 00:15:54.120 rolled back the entire transaction. Right? And that's how 00:15:54.120 --> 00:15:56.189 you get a rando. 00:15:56.189 --> 00:15:59.029 So, you want to create some state within your 00:15:59.029 --> 00:16:01.079 application to run your test against, but you can't 00:16:01.079 --> 00:16:03.629 trust the test thread and the server thread to 00:16:03.629 --> 00:16:07.379 read the same database state, right. What do you 00:16:07.379 --> 00:16:09.850 do? 00:16:09.850 --> 00:16:11.819 So in our project, we use a single set 00:16:11.819 --> 00:16:14.269 of fixture data that's fixed at the beginning of 00:16:14.269 --> 00:16:16.730 the test run. And, essentially, the server thread, or 00:16:16.730 --> 00:16:20.279 the test thread, sorry, treats the database as immutable. 00:16:20.279 --> 00:16:22.970 It is read only, and any kind of verification 00:16:22.970 --> 00:16:25.680 of changes has to happen via the browser. 00:16:25.680 --> 00:16:29.410 So, we do this using RyanD's fixture_builder gem, to 00:16:29.410 --> 00:16:34.329 combine the maintainable characteristics of factoried objects with the, 00:16:34.329 --> 00:16:37.529 like, set it and forget it simplicity of fixtures. 00:16:37.529 --> 00:16:40.689 So, any state that needs to exist across multiple 00:16:40.689 --> 00:16:43.129 tests is stored in a set of fixtures, and 00:16:43.129 --> 00:16:45.319 those are used throughout the test suite. 00:16:45.319 --> 00:16:50.910 And this is great, except it's also terrible. Unfortunately, 00:16:50.910 --> 00:16:56.290 our fixture_builder definition file is like 900 lines long. 00:16:56.290 --> 00:16:59.170 And it's as dense as a Master's thesis, right. 00:16:59.170 --> 00:17:01.720 It takes about two minutes to rebuild the fixture 00:17:01.720 --> 00:17:03.249 set. And this happens when we rebundle, change the 00:17:03.249 --> 00:17:07.430 factories, change the schema. Fortunately, that only happens a 00:17:07.430 --> 00:17:09.689 couple of times a day, right. So mostly we're 00:17:09.689 --> 00:17:12.119 saving time with it. But seriously? Two minutes as 00:17:12.119 --> 00:17:14.540 your overhead to run one test is brutal. 00:17:14.540 --> 00:17:17.449 So, at our stage, we think the right solution 00:17:17.449 --> 00:17:20.900 is to use fixture_builder sparingly, right. Use it for 00:17:20.900 --> 00:17:24.089 Cucumber tests, because they need an immutable database. And 00:17:24.089 --> 00:17:27.720 maybe use it for core shared models for Rspec, 00:17:27.720 --> 00:17:30.120 but whatever you do, do not create like a 00:17:30.120 --> 00:17:33.500 DC Comics multiverse in your fixture setup file, with 00:17:33.500 --> 00:17:36.940 like different versions for everything, because that leads to 00:17:36.940 --> 00:17:37.600 pain. 00:17:37.600 --> 00:17:40.799 Another thing you want to do is Mutex it. 00:17:40.799 --> 00:17:43.900 So, a key technique we've used to prevent database 00:17:43.900 --> 00:17:46.929 collisions is to put a Mutex on access to 00:17:46.929 --> 00:17:50.779 the database. And this is crazy, but, you know, 00:17:50.779 --> 00:17:53.820 an app running in the browser can make more 00:17:53.820 --> 00:17:56.289 than one connection to the server at once over 00:17:56.289 --> 00:17:59.440 AJAX. And that's a great place to breed race 00:17:59.440 --> 00:18:00.279 conditions. 00:18:00.279 --> 00:18:03.690 So, unless you have a Mutex, to ensure the 00:18:03.690 --> 00:18:07.220 server only responds to one request at a time, 00:18:07.220 --> 00:18:08.720 you don't necessarily know the order in which things 00:18:08.720 --> 00:18:10.309 are gonna happen, and that means you're gonna get 00:18:10.309 --> 00:18:11.460 unreproducible failures. 00:18:11.460 --> 00:18:13.720 In effect, we use a Mutex to rig, rig 00:18:13.720 --> 00:18:15.770 the race. You can check it out on GitHub. 00:18:15.770 --> 00:18:17.539 It's just a sketch of the code we're using. 00:18:17.539 --> 00:18:24.150 It's on omadahealth slash capybara_sync. 00:18:24.150 --> 00:18:30.770 Faker. Some of the randomness in our test suite 00:18:30.770 --> 00:18:33.400 was due to inputs that we gave it. Our 00:18:33.400 --> 00:18:38.260 code depends on factories. And the factories used randomly 00:18:38.260 --> 00:18:41.630 generated fake data to fill in names, zip codes, 00:18:41.630 --> 00:18:44.010 all the text fields. And there are good reasons 00:18:44.010 --> 00:18:45.010 to use random data. 00:18:45.010 --> 00:18:49.270 It regularly exercises your edge cases. Engineers don't have 00:18:49.270 --> 00:18:51.549 to think of all possible first names you could 00:18:51.549 --> 00:18:54.360 use. The code should work the same regardless of 00:18:54.360 --> 00:18:57.760 what zip code someone is in. But sometimes it 00:18:57.760 --> 00:18:59.559 doesn't. 00:18:59.559 --> 00:19:03.650 For example, did you know that Faker includes Guam 00:19:03.650 --> 00:19:05.700 and Porto Rico in the states that it might 00:19:05.700 --> 00:19:09.049 generate for someone? And we didn't include those in 00:19:09.049 --> 00:19:12.690 our states dropdown. So when a Cucumber test edits 00:19:12.690 --> 00:19:15.130 an account for a user that Faker placed in 00:19:15.130 --> 00:19:18.220 Guam, and their state is not entered when you 00:19:18.220 --> 00:19:20.409 try to click save. And that leads to a 00:19:20.409 --> 00:19:23.380 validation failure, and that leads to Cucumber not seeing 00:19:23.380 --> 00:19:26.470 the expected results, and a test run with, from 00:19:26.470 --> 00:19:33.470 a new factory will not reproduce that failure, right. 00:19:37.210 --> 00:19:40.360 Something crazy happened. Here we go. 00:19:40.360 --> 00:19:45.549 Times and dates. Oh, we're out of sync. Let 00:19:45.549 --> 00:19:52.549 me just. Momentary technical difficulties. Mhmm. 00:20:02.309 --> 00:20:09.169 Cool. 00:20:09.169 --> 00:20:16.169 OK. Times and dates. Another subtle input to your 00:20:19.460 --> 00:20:23.850 code is the current time. Our app sets itself 00:20:23.850 --> 00:20:26.720 up to be on the user's time zone, to 00:20:26.720 --> 00:20:29.740 prevent time-dependent data, like which week of our program 00:20:29.740 --> 00:20:32.740 you are on in the middle of Saturday night. 00:20:32.740 --> 00:20:34.740 And this was policy. We all knew about this. 00:20:34.740 --> 00:20:38.350 We always used zone-aware time calls. 00:20:38.350 --> 00:20:40.799 Except that we didn't. Like, when I audited it, 00:20:40.799 --> 00:20:43.029 I found over a hundred places where we neglected 00:20:43.029 --> 00:20:46.649 to use zone-aware time calls. So most of these 00:20:46.649 --> 00:20:50.909 are fine. There's usually nothing wrong with epic seconds. 00:20:50.909 --> 00:20:53.600 But it only takes one misplaced call to time 00:20:53.600 --> 00:20:56.630 dot now to create a failure. It's really best 00:20:56.630 --> 00:20:59.110 to just forget about time dot now. Search your 00:20:59.110 --> 00:21:01.549 code base for it and eliminate it. Always use 00:21:01.549 --> 00:21:04.100 time dot zone dot now. Same thing for date 00:21:04.100 --> 00:21:07.049 dot today. That's time zone dependent. You want to 00:21:07.049 --> 00:21:09.059 use time dot zone dot today. 00:21:09.059 --> 00:21:11.600 Unsurprisingly, I found a bunch of this class of 00:21:11.600 --> 00:21:15.240 failure when I was at RubyConf in Miami. So 00:21:15.240 --> 00:21:19.250 these methods create random failures. Because your database objects 00:21:19.250 --> 00:21:21.279 can be in a different time zone than your 00:21:21.279 --> 00:21:25.580 machine's local time zone. 00:21:25.580 --> 00:21:28.919 External dependencies. Any time you depend on a third 00:21:28.919 --> 00:21:32.260 party service in your test, you introduce a possible 00:21:32.260 --> 00:21:37.190 random element, right. S3, Google Analytics, Facebook. Any of 00:21:37.190 --> 00:21:39.399 these things can go down. They can be slow. 00:21:39.399 --> 00:21:42.669 They can be broken. Additionally, they all depend on 00:21:42.669 --> 00:21:46.100 the quality of your local internet connection. 00:21:46.100 --> 00:21:49.870 So, I'm gonna suggest that if you are affected 00:21:49.870 --> 00:21:54.570 by random failures, it's important to reproduce the failure. 00:21:54.570 --> 00:21:56.130 It is possible. It is possible. It is not 00:21:56.130 --> 00:21:59.240 only possible. It is critical. And any problem that 00:21:59.240 --> 00:22:04.120 you can reproduce, reliably, can be solved. Well, at 00:22:04.120 --> 00:22:06.130 least, if you can reproduce it, you have a 00:22:06.130 --> 00:22:10.210 heck of a lot better chance of solving it. 00:22:10.210 --> 00:22:14.320 So, you have to bolt it all down. How 00:22:14.320 --> 00:22:16.659 do you fix the data? When you're trying to 00:22:16.659 --> 00:22:19.049 reproduce a random failure, you're gonna need the same 00:22:19.049 --> 00:22:22.409 database objects used by the failing test. So if 00:22:22.409 --> 00:22:24.149 you used factories, and there's not a file system 00:22:24.149 --> 00:22:27.029 record when a test starts to fail randomly, you're 00:22:27.029 --> 00:22:29.110 gonna want to document the database state at the 00:22:29.110 --> 00:22:30.250 time of failure. 00:22:30.250 --> 00:22:33.320 And that's gonna mean yml fixtures or, like, and 00:22:33.320 --> 00:22:35.539 SQL dump, or something else clever. You have to 00:22:35.539 --> 00:22:38.120 find a way to re-establish that same state that 00:22:38.120 --> 00:22:41.880 was created at the moment that you had the 00:22:41.880 --> 00:22:44.919 failure. And the network. Great talk before about how 00:22:44.919 --> 00:22:48.049 to nail down the network. API calls and responses 00:22:48.049 --> 00:22:52.279 are input for your code. Web-mock, vcr, other libraries 00:22:52.279 --> 00:22:55.520 exist to replay third party service responses. 00:22:55.520 --> 00:22:57.909 So, if you're trying to reproduce a failure in 00:22:57.909 --> 00:23:00.570 a test that has any third party dependencies, you're 00:23:00.570 --> 00:23:03.380 gonna wanna use a library to capture and replay 00:23:03.380 --> 00:23:05.730 those responses. 00:23:05.730 --> 00:23:09.820 Also, share buttons, right. In your Cucumber tests, you're 00:23:09.820 --> 00:23:12.779 gonna wanna remove the calls to Google Analytics, Facebook 00:23:12.779 --> 00:23:15.679 lite buttons, all that stuff from the browser. These 00:23:15.679 --> 00:23:18.230 slow down your page load time, and they create 00:23:18.230 --> 00:23:21.669 unnecessary failures because of that. 00:23:21.669 --> 00:23:24.409 But, if you're replaying all your network calls, how 00:23:24.409 --> 00:23:27.549 do you know the external API hasn't changed, right? 00:23:27.549 --> 00:23:29.200 You want to test the services that your code 00:23:29.200 --> 00:23:31.409 depends on, too. So you need a build that 00:23:31.409 --> 00:23:34.850 does that. But it shouldn't be the main build. 00:23:34.850 --> 00:23:36.390 Purpose of the main build is to let the 00:23:36.390 --> 00:23:40.000 team know when their code is broken, when their 00:23:40.000 --> 00:23:42.840 code is broken. And it should do that as 00:23:42.840 --> 00:23:44.740 quickly as possible. 00:23:44.740 --> 00:23:46.960 And then we have a separate, external build that 00:23:46.960 --> 00:23:50.539 tests the interactions with third party services. So, essentially, 00:23:50.539 --> 00:23:52.570 external communication is off and then on, and we 00:23:52.570 --> 00:23:54.110 check build results for both. 00:23:54.110 --> 00:23:57.990 So, I want to talk about another reason that 00:23:57.990 --> 00:24:01.360 tests fail randomly. Rspec runs all your tests in 00:24:01.360 --> 00:24:06.200 a random order every time. And obviously this introduces 00:24:06.200 --> 00:24:09.450 randomness. But, there is a reason for that, and 00:24:09.450 --> 00:24:11.130 the reason is to help you stay on top 00:24:11.130 --> 00:24:13.669 of test pollution. 00:24:13.669 --> 00:24:16.019 Test pollution is when state that is changed in 00:24:16.019 --> 00:24:19.870 one test persists and influences the results of other 00:24:19.870 --> 00:24:22.950 tests. Changed state can live in process memory, in 00:24:22.950 --> 00:24:26.460 a database, on the file system, in an external 00:24:26.460 --> 00:24:29.240 service. Right. Lots of places. 00:24:29.240 --> 00:24:32.750 Sometimes, the polluted state causes the subsequent test to 00:24:32.750 --> 00:24:37.070 fail incorrectly. And sometimes it causes the subsequent test 00:24:37.070 --> 00:24:40.789 to pass incorrectly. And this was such a rampant 00:24:40.789 --> 00:24:43.549 issue in the early days of Rspec that the 00:24:43.549 --> 00:24:45.710 Rspec team made running the tests in a random 00:24:45.710 --> 00:24:49.740 order the default as of Rspec 2. So, thank 00:24:49.740 --> 00:24:51.220 you Rspec. 00:24:51.220 --> 00:24:56.419 Now, any test pollution issues should stand out. But 00:24:56.419 --> 00:24:59.390 what do you think happens if you ignore random 00:24:59.390 --> 00:25:04.669 test failures for like a year or so? Yeah. 00:25:04.669 --> 00:25:08.080 Here's some clues that your issue might be test 00:25:08.080 --> 00:25:10.490 pollution, right. 00:25:10.490 --> 00:25:14.090 With test pollution, the effected tests never fail when 00:25:14.090 --> 00:25:17.600 they're run in isolation. Not ever. And rather than 00:25:17.600 --> 00:25:21.970 throwing an unexpected exception, a test pollution failure usually 00:25:21.970 --> 00:25:24.760 takes the form of returning different data than what 00:25:24.760 --> 00:25:26.480 you expected. 00:25:26.480 --> 00:25:29.460 And finally, the biggest clue that you might have 00:25:29.460 --> 00:25:31.730 a test pollution issue is that you haven't really 00:25:31.730 --> 00:25:36.909 been checking for test pollution. So, we gotta reproduce 00:25:36.909 --> 00:25:39.059 test pollution issues. Which means we have to run 00:25:39.059 --> 00:25:40.980 the tests suite in the same order, and we 00:25:40.980 --> 00:25:43.929 have to use the fixture or database data and 00:25:43.929 --> 00:25:46.539 the network data from the failed build. 00:25:46.539 --> 00:25:49.899 So, first you have to identify the random seed. 00:25:49.899 --> 00:25:52.340 Maybe you've seen this cryptic line at the end 00:25:52.340 --> 00:25:55.740 of your Rspec test output. This is not completely 00:25:55.740 --> 00:25:59.990 meaningless. 22164 is your magic key to rerun the 00:25:59.990 --> 00:26:01.820 test in the same order as the build that 00:26:01.820 --> 00:26:05.210 just ran. So you want to modify your dot 00:26:05.210 --> 00:26:08.929 Rspec file to include the seed value. Be sure 00:26:08.929 --> 00:26:11.500 to format, to change the format to documentation as 00:26:11.500 --> 00:26:13.470 well as adding the seed. That will make it 00:26:13.470 --> 00:26:15.640 more readable, for you, so that you can start 00:26:15.640 --> 00:26:17.260 to think about the order that things are running 00:26:17.260 --> 00:26:19.529 in and what could possibly be causing your pollution 00:26:19.529 --> 00:26:21.309 problem. 00:26:21.309 --> 00:26:24.980 So, the problem with test pollution is fundamentally about 00:26:24.980 --> 00:26:28.480 incorrectly persisted state, so that the state that's being 00:26:28.480 --> 00:26:31.600 persisted is important. You want to ensure that the 00:26:31.600 --> 00:26:35.320 data is identical to the failed build. And there's 00:26:35.320 --> 00:26:39.010 lots of ways to do this. 00:26:39.010 --> 00:26:41.679 So you've got your random seed. You've got your 00:26:41.679 --> 00:26:43.940 data from the failed build, and then you rerun 00:26:43.940 --> 00:26:47.110 the specs. And if you see the failure repeated, 00:26:47.110 --> 00:26:50.220 you should celebrate, right. You've correctly diagnosed that the 00:26:50.220 --> 00:26:52.370 issue is test pollution and you are on your 00:26:52.370 --> 00:26:53.440 way to fixing it. 00:26:53.440 --> 00:26:56.370 And if you don't see the failure, maybe it's 00:26:56.370 --> 00:26:59.330 not test pollution. Maybe there's another aspect of your 00:26:59.330 --> 00:27:03.159 build environment that needs to be duplicated, right. But 00:27:03.159 --> 00:27:07.700 even then, say you've reproduced the problem. Now what? 00:27:07.700 --> 00:27:10.570 You still have to diagnose what is causing the 00:27:10.570 --> 00:27:12.909 pollution. You know that running the tests in a 00:27:12.909 --> 00:27:16.950 particular order creates a failure. The problem with test 00:27:16.950 --> 00:27:19.799 pollution is that there is a non-obvious connection between 00:27:19.799 --> 00:27:22.769 where the problem appears in the failed test and 00:27:22.769 --> 00:27:25.830 its source in another test case. 00:27:25.830 --> 00:27:28.279 And you can find out about the failure using 00:27:28.279 --> 00:27:31.980 print statements or debugger, using whatever tools you want. 00:27:31.980 --> 00:27:34.769 But, maybe you get lucky and you are able 00:27:34.769 --> 00:27:37.100 to just figure it out. But in a complex 00:27:37.100 --> 00:27:41.080 code base with thousands of tests the source of 00:27:41.080 --> 00:27:44.710 the pollution can be tricky to track down. 00:27:44.710 --> 00:27:48.940 So, just running through the suite to reproduce the 00:27:48.940 --> 00:27:52.059 failure might take ten minutes. And this is actually 00:27:52.059 --> 00:27:56.090 terrible, right. Waiting ten minutes for feedback? This is 00:27:56.090 --> 00:28:00.659 a source of cognitive depletion. All of the stack 00:28:00.659 --> 00:28:02.649 you've built up in your brain to solve this 00:28:02.649 --> 00:28:05.909 problem is disintegrating over that ten minutes. You're gonna 00:28:05.909 --> 00:28:08.980 work on other problems. You're gonna check Facebook while 00:28:08.980 --> 00:28:11.460 those tests are running. And you're gonna lose your 00:28:11.460 --> 00:28:16.279 focus, right. And that is, essentially, how rando wins. 00:28:16.279 --> 00:28:20.960 Fortunately, we can discard large amounts of complexity and 00:28:20.960 --> 00:28:23.179 noise, by using a stupid process that we don't 00:28:23.179 --> 00:28:28.909 have to think about. Binary search. In code, debugging 00:28:28.909 --> 00:28:32.440 via binary search is a process of repeatedly dividing 00:28:32.440 --> 00:28:36.029 the search space in half, until you locate the 00:28:36.029 --> 00:28:41.190 smallest coherent unit that exhibits the desired behavior. 00:28:41.190 --> 00:28:44.899 OK. So we have the output of a set 00:28:44.899 --> 00:28:47.630 of specs that we ran in documentation mode. This 00:28:47.630 --> 00:28:49.730 is sort of a high level overview that you 00:28:49.730 --> 00:28:52.380 might see in Sublime, right. And in the middle 00:28:52.380 --> 00:28:56.220 here, this red spot is where the failure occurs. 00:28:56.220 --> 00:28:59.690 So we know the cause has to happen before 00:28:59.690 --> 00:29:05.070 the failure, because causality. So in the green block, 00:29:05.070 --> 00:29:08.769 at the top, is, that's the candidate block, or 00:29:08.769 --> 00:29:10.700 the search space. 00:29:10.700 --> 00:29:16.789 So, practically, we split the search space in half, 00:29:16.789 --> 00:29:19.630 and remove half of it. And if the failure 00:29:19.630 --> 00:29:23.399 reoccurs when we rerun with this configuration, we know 00:29:23.399 --> 00:29:28.370 that the cause is in that remaining block, right. 00:29:28.370 --> 00:29:30.669 But sometimes you've got more problems than you know. 00:29:30.669 --> 00:29:32.240 So it's good to test the other half of 00:29:32.240 --> 00:29:35.690 the search space as well. 00:29:35.690 --> 00:29:37.980 So if you're failure appeared in step zero, you 00:29:37.980 --> 00:29:41.100 expect not to see the failure here. If you 00:29:41.100 --> 00:29:44.789 also see the failure here, you might have multiple 00:29:44.789 --> 00:29:48.870 sources of test pollution or, more likely, test pollution 00:29:48.870 --> 00:29:51.279 isn't really your problem, and the problem is actually 00:29:51.279 --> 00:29:55.090 outside of the search space. 00:29:55.090 --> 00:29:58.840 So here's a hiccup. Binary search requires us to 00:29:58.840 --> 00:30:01.149 remove large segments of the test suite to narrow 00:30:01.149 --> 00:30:04.029 in on the test that causes the pollution. And 00:30:04.029 --> 00:30:07.220 this creates a problem, because random ordering in the 00:30:07.220 --> 00:30:11.289 test suite changes when you remove tests. Completely. Remove 00:30:11.289 --> 00:30:14.460 one test, the whole thing reshuffles on the same 00:30:14.460 --> 00:30:17.549 seeds. So there's no way to effectively perform a 00:30:17.549 --> 00:30:21.210 binary search using a random seed. 00:30:21.210 --> 00:30:23.710 So here's the good news. It is possible to 00:30:23.710 --> 00:30:27.120 manually declare the ordering of your Rspec tests, using 00:30:27.120 --> 00:30:34.120 this undocumented configuration option, order_examples. So, config dot order_examples 00:30:35.090 --> 00:30:37.750 takes a block, and that'll get the whole collection 00:30:37.750 --> 00:30:41.179 of Rspec examples after Rspec has loaded the specs 00:30:41.179 --> 00:30:43.429 to be run. And then you just reorder the 00:30:43.429 --> 00:30:45.370 examples in whatever order you want them to be 00:30:45.370 --> 00:30:48.620 ordered in and return that set from the block. 00:30:48.620 --> 00:30:50.730 So, that sounds simple. 00:30:50.730 --> 00:30:56.690 I, I made a little proto-gem for this. It's 00:30:56.690 --> 00:31:00.830 called rspec_manual_order, and basically it takes the output of 00:31:00.830 --> 00:31:04.110 the documentation format from the test that you ran 00:31:04.110 --> 00:31:08.820 earlier, and turns that into an ordering list. So, 00:31:08.820 --> 00:31:11.019 if you, if you log the output of, of 00:31:11.019 --> 00:31:13.580 your Rspec suite with the failure to a file, 00:31:13.580 --> 00:31:16.210 you'll be able to replay it using rspec_manual_order, and 00:31:16.210 --> 00:31:18.929 you can check that out on GitHub. 00:31:18.929 --> 00:31:25.210 So it's possible to reduce the search space and 00:31:25.210 --> 00:31:27.419 do a binary search on Rspec. And once you've 00:31:27.419 --> 00:31:30.080 reduced the search space to a single spec or 00:31:30.080 --> 00:31:32.500 a suite of examples that all cause the problem, 00:31:32.500 --> 00:31:35.509 you put your monkey in the position to shine 00:31:35.509 --> 00:31:38.490 against your test pollution issue, right. This is where 00:31:38.490 --> 00:31:40.570 it actually becomes possible to figure it out by 00:31:40.570 --> 00:31:43.500 looking at the context. 00:31:43.500 --> 00:31:45.720 I've gone in depth into test pollution, because it's 00:31:45.720 --> 00:31:50.860 amenable to investigation using simple techniques, right. Binary search 00:31:50.860 --> 00:31:54.100 and reproducing the failure state are key debugging skills 00:31:54.100 --> 00:31:56.990 that you will improve with practice. When I started 00:31:56.990 --> 00:31:59.470 looking into our random failures, I didn't know we 00:31:59.470 --> 00:32:02.850 had test pollution issues. Turned out we weren't resetting 00:32:02.850 --> 00:32:06.100 the global time zone correctly between tests. 00:32:06.100 --> 00:32:08.110 This was far from the only problem I found. 00:32:08.110 --> 00:32:11.139 But without fixing this one, our suite would never 00:32:11.139 --> 00:32:16.399 be clean. So, every random failure that you are 00:32:16.399 --> 00:32:19.789 chasing has its own unique story. There are some 00:32:19.789 --> 00:32:22.240 in our code that we haven't figured out yet, 00:32:22.240 --> 00:32:23.620 and there are some in your code that I 00:32:23.620 --> 00:32:26.929 hope I never see. 00:32:26.929 --> 00:32:30.610 The key to eliminating random test failures is don't 00:32:30.610 --> 00:32:33.880 give up, right. Today we've covered things that go 00:32:33.880 --> 00:32:36.899 wrong in Cucumber and Capybara. Things that go wrong 00:32:36.899 --> 00:32:39.870 in Rspec and just general sources of randomness in 00:32:39.870 --> 00:32:42.100 your test suite. And hopefully you're walking out of 00:32:42.100 --> 00:32:44.200 here with at least one new technique to improve 00:32:44.200 --> 00:32:46.399 the reliability of your tests. 00:32:46.399 --> 00:32:48.259 We've been working with ours for about eight months, 00:32:48.259 --> 00:32:50.080 and we're in a place where random failures occur 00:32:50.080 --> 00:32:53.789 like, less than five percent of the time. And 00:32:53.789 --> 00:32:55.620 we set up a tiered build system to run 00:32:55.620 --> 00:32:58.820 the tests sequentially when the fast parallel build fails. 00:32:58.820 --> 00:33:02.149 So, the important thing is that when new random 00:33:02.149 --> 00:33:04.570 failures occur, we reliably assign a team to hunt 00:33:04.570 --> 00:33:05.399 them down. 00:33:05.399 --> 00:33:08.049 And if you keep working on your build, eventually 00:33:08.049 --> 00:33:10.049 you'll figure out a combination of tactics that will 00:33:10.049 --> 00:33:12.799 lead to a stable, reliable test suite, that will 00:33:12.799 --> 00:33:15.740 have the trust of your team. So thank you.