"randomly" failing CI tests (was: Re: [Rails-core] Re: [CruiseControl] RubyOnRails build 8889 failed)

Joy. That happens on our CI builds at least a few times a month.
Could be anything - architecture differences that cause different test
ordering, machine speed, slightly different dependency versions (use
GemInstaller!), the phase of the moon...

What I usually do is have a separate checkout of the project on the CI
box itself, and run the cruise task and individual test manually from
the command line. If the failure is reproducible that way (often is),
then find out why. If it is not (after several runs), then disable
the test under CI. If that makes you feel dirty - good!. You can
create a separate build with the test enabled to keep digging into the
failure, without spamming or reducing the quality of the 'real' build.

Of course, this approach requires you to have access to the CI box,
which presents a problem in this case since Alexey/Thoughtworks owns
it.

I've held off saying this in the past, but why doesn't Rails run their
CI on a dedicated box that Rails owns (and can pick the architecture
for, if redhat-ish things cause regular problems). This could
introduce other problems (like nobody caring enough to fix regular ci
box issues, as Alexey so kindly does), but it could fix others (like
core committers wondering why the build is failing only on CI, but not
being able to find out why), and demands on Alexey's time (like
getting a stable release branch CI build running).

Initially, this could even be a parallel environment that only emails
rails core or smaller audience until we see if it works. I can set
this up (I didn't say maintain always) if pointed to a box and given
Alexey's current ccrb config files - ccrb is pretty simple, and if it
is on a dedicated box, we can invite other interested parties to help
support it without major security concerns.

-- Chad