How to run unit tests w/o first blowing away test db?

The documentation on fixtures makes a tantalizing claim:

If you preload your test database with all fixture data
(probably in the Rakefile task) and use transactional
fixtures, then you may omit all fixtures declarations
in your test cases since all the data’s already there
and every case rolls back its changes.

Yup! That's exactly what I want: I have some LARGE constant tables that
are used pervasively, so I want to load them ONCE and exploit the
efficiency of transactional fixtures thereafter. I know I can preload
the test database from fixtures like this:

% rake db:fixtures:load RAILS_ENV=test

But then how do I run unit tests while preserving my carefully preloaded
data?

- ff

[some details...]

I know I can't simply use 'rake test:units', defined as:

  Rake::TestTask.new(:units => "db:test:prepare") do |t|
    t.libs << "test"
    t.pattern = 'test/unit/**/*_test.rb'
    t.verbose = true
  end
  Rake::Task['test:units'].comment = "Run the unit tests in test/unit"

The dependency on "db:test:prepare" clears the database. I could define
a rake task "test:units_without_blowing_away_my_data_please" that
eliminates that dependency, or perhaps better, a define a task that
loads the fixtures before running the unit tests.

But the comment in the fixture documentation makes me think that there's
an easier way. Is there?

Fearless Fool wrote:

The documentation on fixtures makes a tantalizing claim:

> If you preload your test database with all fixture data
> (probably in the Rakefile task) and use transactional
> fixtures, then you may omit all fixtures declarations
> in your test cases since all the data’s already there
> and every case rolls back its changes.

Yup! That's exactly what I want: I have some LARGE constant tables that
are used pervasively, so I want to load them ONCE and exploit the
efficiency of transactional fixtures thereafter.

No you don't. You don't need all the large constant table data to run
your tests.

Besides, as you already know, fixtures are crap. What you want to do is
use factories to craft exactly the data you need for each individual
unit test. This will normally only be a few records at a time.

I know I can preload
the test database from fixtures like this:

% rake db:fixtures:load RAILS_ENV=test

But then how do I run unit tests while preserving my carefully preloaded
data?

You don't. Good test isolation *requires* clearing the database for
every test.

Best,

Good ideal to strive for... but not absolutely necessary all the time.
99% of the time, whether in testing Rails apps or anything else,
wisdom lies in knowing where to draw the line... and it's rarely all
the way to one side or the other.

-Dave, foolish about many things, but at least possessed of this one
nugget of wisdom :slight_smile:

Dave Aronson wrote:

Fearless Fool wrote:

But then how do I run unit tests while preserving my carefully preloaded
data?

You don't. �Good test isolation *requires* clearing the database for
every test.

Good ideal to strive for... but not absolutely necessary all the time.

Wrong! If the test has any potential of touching the database, it is
*absolutely necessary all the time* to reset the DB to a known state.
Usuallythat means clearing it.

Ah, there's the key: "Usually". So, the line is at least a smidgen
never-ward of always. I'll grant you at least a few "nines" of
always-ness, but it's still not 100%. Except by rounding. :slight_smile:

*If* a given database (or table or set of tables) is absolutely
constant, so that it will not be added to or subtracted from during
the course of any test, only referenced, as seems to be the case in
FF's situation, then it is (IMVHO) sufficient to only clear-and-load
it at the start of the entire suite, rather than incur the penalties
of doing so on every test. Perhaps that could even be enforced (at
least for the duration of the tests) with some triggers.

On the other claw, if there aren't a large number of tests, or he can
make do with a much smaller (i.e., faster to clear and load) database,
or the database clearing and loading can be done by a much faster
mechanism (such as pointing a symlink to a different set of files?),
the point may be moot....

-Dave

Dave Aronson wrote:

Good ideal to strive for... but not absolutely necessary all the time.

Wrong! If the test has any potential of touching the database, it is
*absolutely necessary all the time* to reset the DB to a known state.
Usuallythat means clearing it.

Ah, there's the key: "Usually". So, the line is at least a smidgen
never-ward of always. I'll grant you at least a few "nines" of
always-ness, but it's still not 100%. Except by rounding. :slight_smile:

Setting the database to a known state must be 100%. Clearing (as
opposed to resetting to known constant data) may be only 99.999...%.

*If* a given database (or table or set of tables) is absolutely
constant, so that it will not be added to or subtracted from during
the course of any test, only referenced, as seems to be the case in
FF's situation,

But there is no way to guarantee that the "constant" table is in fact
constant, unless we get into using permissions in the DB (which will
probably cause more problems than they solve). The values in it could
get accidentally changed at any time.

then it is (IMVHO) sufficient to only clear-and-load
it at the start of the entire suite, rather than incur the penalties
of doing so on every test. Perhaps that could even be enforced (at
least for the duration of the tests) with some triggers.

That is only true insofar as the table can be guaranteed read-only,
which it usually can't be. Transactional tests will probably work here,
though.

On the other claw, if there aren't a large number of tests, or he can
make do with a much smaller (i.e., faster to clear and load) database,

The number of tests has nothing to do with it. I routinely run fairly
large test suites with a small database (by using factories, so that
each test only has a few records in the DB while it runs).

or the database clearing and loading can be done by a much faster
mechanism (such as pointing a symlink to a different set of files?),

What, set up multiple test databases? Yuck. (Of course, it's entirely
possible that the DBMS is doing this and abstracting it anyway.)

the point may be moot....

-Dave

Best,

So what happens if there is a bug that accidentally writes to that
table during a test?

Colin

Reread last sentence. :slight_smile:

-Dave

Dave Aronson wrote:

So what happens if there is a bug that accidentally writes to that
table during a test?

Reread last sentence. :slight_smile:

I can't. You didn't quote it.

Are you referring to the bit about the triggers? If so, do you *really*
think that that's any less work than using transactions or clearing the
database? (Hint: it isn't. Triggers sound great in theory, but they
are a pain in all implementations that I'm aware of.)

-Dave

Best,

I can't. You didn't quote it.

Eh? It's quoted in the copy I have here....

Are you referring to the bit about the triggers? If so, do you *really*
think that that's any less work than using transactions or clearing the
database? (Hint: it isn't. Triggers sound great in theory, but they
are a pain in all implementations that I'm aware of.)

Work is not the only dimension being considered here. It seemed to me
that the OP's objective was to avoid reloading the database for each
test, which isn't any more *human* work than doing it once (at least,
having it in the standard setup section versus each test). Hence his
objective is probably *speed*, which is also important for tests, in
order to run them frequently. Having a trigger, or constraint, or
other such mechanism, on something that isn't supposed to happen, is
bound to be faster than frequently reloading a huge database.

-Dave

Dave Aronson wrote:
[...]

It seemed to me
that the OP's objective was to avoid reloading the database for each
test, which isn't any more *human* work than doing it once (at least,
having it in the standard setup section versus each test). Hence his
objective is probably *speed*, which is also important for tests, in
order to run them frequently. Having a trigger, or constraint, or
other such mechanism, on something that isn't supposed to happen, is
bound to be faster than frequently reloading a huge database.

Faster than reloading? Perhaps. Faster than rolling back a
transaction? Doubtful. Faster than truncating a bunch of tables and
writing tests so they don't involve lots of unnecessary records? Almost
certainly not.

(Yes, I said "unnecessary". You really don't need the entire constant
table for tests.)

-Dave

Best,

In the spirit of peace love and understanding, I'll start by saying that
I appreciate Marnen's philosophy, and I agree that "if the test has any
potential of touching the database, it is *absolutely necessary all the
time* to reset the DB to a known state."

But in my application, I have a few huge tables that are used
exclusively as reference data, are required for running any meaningful
tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour for 10
years (that's 85K+ rows). Each row is a "highly decorated TimeStamp"
which maps a timestamp to day of week as a string, whether or not it's a
holiday and 17 other columns of dimensional goodness. It would take a
LONG time to set it up and tear it down for each test. In comparison to
loading the table, MySQL's rollback mechanism is breathtakingly fast.

[Before you ask why the heck a giant table is preferable to simply using
the functions associated with Date and Time, I'll refer to you to:
  http://www.ralphkimball.com/html/booksDWT2.html
  http://philip.greenspun.com/sql/data-warehousing.html
]

Apart from the large "readonly" tables, I avoid preloaded fixtures for
all the reasons Marnen and others have pointed out: they're "brittle",
they tend obscure tests by separating "expected" and "observed" results.
Etc.

But in this case, it really makes sense to preload the db with large
static tables. The alternative -- reloading the tables for every test
-- would not give me more insights about the business logic in my
application, and would be prohibitively slow.

Peace out.

- ff

Fearless Fool wrote:

In the spirit of peace love and understanding, I'll start by saying that
I appreciate Marnen's philosophy, and I agree that "if the test has any
potential of touching the database, it is *absolutely necessary all the
time* to reset the DB to a known state."

But in my application, I have a few huge tables that are used
exclusively as reference data, are required for running any meaningful
tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour for 10
years (that's 85K+ rows). Each row is a "highly decorated TimeStamp"
which maps a timestamp to day of week as a string, whether or not it's a
holiday and 17 other columns of dimensional goodness. It would take a
LONG time to set it up and tear it down for each test.

But you don't need to do that. For any given test, I'll bet that you
don't need more than 2 or 3 TimeDimension records. And you can create
those on the fly. You don't need 85,000 records of data to test with.

Actually, you probably don't need them to deploy with either, but that's
another issue.

[...]

[Before you ask why the heck a giant table is preferable to simply using
the functions associated with Date and Time, I'll refer to you to:
  http://www.ralphkimball.com/html/booksDWT2.html
  http://philip.greenspun.com/sql/data-warehousing.html
]

Skimmed the Greenspun. I don't buy it at all, particularly not for date
information. You may need fact tables for some things, but your
TimeDimension seems to be along the lines of

x | y | sum |
1 | 1 | 2 |
1 | 2 | 3 |

But I don't want to lose the primary topic here...

[...]

But in this case, it really makes sense to preload the db with large
static tables.

No, it really doesn't. It really only makes sense to create a few
representative records for your actual tests.

Best,

Fearless Fool wrote:
[...]

It would take a
LONG time to set it up and tear it down for each test. In comparison to
loading the table, MySQL's rollback mechanism is breathtakingly fast.

Oh, yeah, one other thing. For a whole variety of performance and
maintainability reasons, you probably should not be using mySQL for
something like this. Check out PostgreSQL.

Best,

In the spirit of peace love and understanding, I'll start by saying that
I appreciate Marnen's philosophy, and I agree that "if the test has any
potential of touching the database, it is *absolutely necessary all the
time* to reset the DB to a known state."

But in my application, I have a few huge tables that are used
exclusively as reference data, are required for running any meaningful
tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour for 10
years (that's 85K+ rows). Each row is a "highly decorated TimeStamp"
which maps a timestamp to day of week as a string, whether or not it's a
holiday and 17 other columns of dimensional goodness. It would take a
LONG time to set it up and tear it down for each test. In comparison to
loading the table, MySQL's rollback mechanism is breathtakingly fast.

I think I recall this kind of question being answered before. The gist was that you can define a 'facts' section of your database.yml (similar to your 'development' or 'test' section) and create the connection to the special 'facts' database for that model. It would be unaffected by the initialization of the 'test' database.

The implementation is something like:

class Fact < ActiveRecord::Base
   establish_connection "facts" if
     RAILS_ENV == 'test' && configurations.has_key?('facts')
end

class TimeDimension < Fact
   # ...
end

Then uses of TimeDimension in the test environment will use the explicitly established connection. (Of course, you could do the same thing regardless of environment, too.)

-Rob

[Before you ask why the heck a giant table is preferable to simply using
the functions associated with Date and Time, I'll refer to you to:
http://www.ralphkimball.com/html/booksDWT2.html
http://philip.greenspun.com/sql/data-warehousing.html
]

Apart from the large "readonly" tables, I avoid preloaded fixtures for
all the reasons Marnen and others have pointed out: they're "brittle",
they tend obscure tests by separating "expected" and "observed" results.
Etc.

But in this case, it really makes sense to preload the db with large
static tables. The alternative -- reloading the tables for every test
-- would not give me more insights about the business logic in myapplication, and would be prohibitively slow.

Peace out.

- ff

Rob Biedenharn
Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/
rab@GaslightSoftware.com http://GaslightSoftware.com/

Rob Biedenharn wrote:

class Fact < ActiveRecord::Base
   establish_connection "facts" [snip]
end

class TimeDimension < Fact
end

Rob Biedenharn
Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/
rab@GaslightSoftware.com http://GaslightSoftware.com/

I like it! That's an elegant approach, and I'm putting that on the
refactoring TODO list.

Many thanks.

- ff

Rob Biedenharn wrote:
[...]

I think I recall this kind of question being answered before. The
gist was that you can define a 'facts' section of your database.yml
(similar to your 'development' or 'test' section) and create the
connection to the special 'facts' database for that model. It would
be unaffected by the initialization of the 'test' database.

What, create a second database for that data? Why? More overhead, more
difference from production, more general hassle.

I still maintain that the right answer is not to test with all 85,000
records. Just create what you need for a particular test.

Best,

Perhaps you missed the part about this being a data warehouse. It's probably already a separate database. Technically, it's just a separate schema within a database. And why does this have to be different from production? If this is truly read-only data, then production or any environment can use exactly the same technique. (And multiple environments could use one golden copy of that reference data, too.)

-Rob

Rob Biedenharn
Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/
rab@GaslightSoftware.com http://GaslightSoftware.com/

Rob Biedenharn wrote:

difference from production, more general hassle.

I still maintain that the right answer is not to test with all 85,000
records. Just create what you need for a particular test.

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
marnen@marnen.org

Perhaps you missed the part about this being a data warehouse.

I didn't miss it. I don't think it's particularly relevant to *testing
approaches*, though.

It's
probably already a separate database.

The OP has given us no reason to believe this.

Technically, it's just a
separate schema within a database. And why does this have to be
different from production?

Because as far as we know, the OP is using one database for production.

If this is truly read-only data, then
production or any environment can use exactly the same technique. (And
multiple environments could use one golden copy of that reference
data, too.)

That's certainly true.

I still think, though, that all this runs into the expensive setup smell
that we generally try to avoid when testing. Your points are valid for
production, but no test environment needs 85,000 reference records.

-Rob

Best,

Marnen, Rob:

I agree with Rob that it makes good sense to partition constant data
into a separate db and use that consistently for development, test and
production.

Marnen, maybe I misunderstand you, but even if I knew a priori which
TimeDimension records my tests were going to use, it's a PITA to
generate TimeDimension records piecemeal (what day did Easter fall on in
2005?)

Furthermore, I have other large constant tables such as the location of
FAA certified airports in North America, cobbled from multiple sources.
Again, it would be a pain to figure out exactly which ones my test code
will reference and then ETL their records on demand.

ANYWAY, I'm in the process of setting up a "constant" database (separate
from development, test or production) and I have questions about that.
But I'll start a new post for that one. Watch for it... :slight_smile:

Thanks, y'all.

- ff