Fixing a State-Related Random RSpec Failure

Today I fixed a random test failure in RSpec. An Active Record callback was being fired despite an if: :bla condition that returned false, causing a not_to method expectation to fail. Futhermore, the example passed when running the example directly. It was only when running the whole test suite that it failed, sometimes.

These types of failures are a huuuge pain in the ass, so I wanted to document my thought process and the steps I took to fix it, to hopefully get my future self and anyone reading this a few steps ahead next time it happens.

First thing we know: it’s random. What is random in our test suite? The first thing that pops to mind is the order of examples. There definitely could be other random pieces too. For example, at one point we randomly assigned made-up names to a title attribute on out of our factories. That caused a spec failure occasionally on a uniqueness constraint. We quickly did away with it. I had a pretty good hunch it was the order of examples, a good starting point anyway.
Based on that, I decided to use the --seed option of RSpec to try to reproduce the failure. I looked to one of our random failures and pulled the seed #, which determines the order of examples. Running the full 7-minute suite with the seed passed in, I saw the failure every time. Yess!
Even though I reproduced it at this point, running the whole suite for every cycle of trying was too slow. The spec file in question was a model spec, so I tried running just model specs via rspec spec/models/ --seed #. A bit of a lucky break, but this still caused the failure every time. That reduced the cycle to only about 20 seconds, and further isolated the issue to a small subset of specs. Great!
I then put in a fail call in the bla method that determined if the callback ran (if: :bla). Running the suite again, I saw no RuntimeError, meaning that fail did not get called. This told me something else was calling the Active Record callback in question. Whaaaat? How could that be?
Knowing that some spec executed before this one somehow messed with this callback, the only question was which one. I produced a list of the spec file ordering for that particular seed by running rspec spec/models/ --seed # -f json > /tmp/out.json, and then parsing through it via ruby:
```
j = JSON.parse(File.read('/tmp/out.json')).with_indifferent_access
j[:examples].map { |a| a[:file_path] }.uniq
```
Armed with the list of files, I opened each file, deleted the contents, and re-ran model specs only. It only took a few tries before the original spec passed as I looked first to specs could be related to the Active Record model in question, or specs that are big/complex. It turned out to be a spec helper that we'd written to temporarily skip this callback:
```
TheModel.skip_callback(:validation, :after, :the_callback)
yield
TheModel.set_callback(:validation, :after, :the_callback)
```

That's it! Any examples that test the_callback will mostly fail after running this bit of a hackjob as set_callback will redefine the callback without the original if: :bla condition.

Happy friday!