How to run unit tests w/o first blowing away test db?

11155 · July 13, 2010, 2:08am

The documentation on fixtures makes a tantalizing claim:

If you preload your test database with all fixture data (probably in the Rakefile task) and use transactional fixtures, then you may omit all fixtures declarations in your test cases since all the data’s already there and every case rolls back its changes.

Yup! That's exactly what I want: I have some LARGE constant tables that are used pervasively, so I want to load them ONCE and exploit the efficiency of transactional fixtures thereafter. I know I can preload the test database from fixtures like this:

% rake db:fixtures:load RAILS_ENV=test

But then how do I run unit tests while preserving my carefully preloaded data?

- ff

[some details...]

I know I can't simply use 'rake test:units', defined as:

Rake::TestTask.new(:units => "db:test:prepare") do |t| t.libs << "test" t.pattern = 'test/unit/**/*_test.rb' t.verbose = true end Rake::Task['test:units'].comment = "Run the unit tests in test/unit"

The dependency on "db:test:prepare" clears the database. I could define a rake task "test:units_without_blowing_away_my_data_please" that eliminates that dependency, or perhaps better, a define a task that loads the fixtures before running the unit tests.

But the comment in the fixture documentation makes me think that there's an easier way. Is there?

11155 · July 13, 2010, 3:23am

Fearless Fool wrote:

The documentation on fixtures makes a tantalizing claim:

> If you preload your test database with all fixture data > (probably in the Rakefile task) and use transactional > fixtures, then you may omit all fixtures declarations > in your test cases since all the data’s already there > and every case rolls back its changes.

Yup! That's exactly what I want: I have some LARGE constant tables that are used pervasively, so I want to load them ONCE and exploit the efficiency of transactional fixtures thereafter.

No you don't. You don't need all the large constant table data to run your tests.

Besides, as you already know, fixtures are crap. What you want to do is use factories to craft exactly the data you need for each individual unit test. This will normally only be a few records at a time.

I know I can preload the test database from fixtures like this:

% rake db:fixtures:load RAILS_ENV=test

But then how do I run unit tests while preserving my carefully preloaded data?

You don't. Good test isolation *requires* clearing the database for every test.

Best,

Dave_Aronson2 · July 13, 2010, 1:09pm

Good ideal to strive for... but not absolutely necessary all the time. 99% of the time, whether in testing Rails apps or anything else, wisdom lies in knowing where to draw the line... and it's rarely all the way to one side or the other.

-Dave, foolish about many things, but at least possessed of this one nugget of wisdom

11155 · July 13, 2010, 1:19pm

Dave Aronson wrote:

Fearless Fool wrote:

But then how do I run unit tests while preserving my carefully preloaded data?

You don't. �Good test isolation *requires* clearing the database for every test.

Good ideal to strive for... but not absolutely necessary all the time.

Wrong! If the test has any potential of touching the database, it is *absolutely necessary all the time* to reset the DB to a known state. Usuallythat means clearing it.

Dave_Aronson2 · July 13, 2010, 1:45pm

Ah, there's the key: "Usually". So, the line is at least a smidgen never-ward of always. I'll grant you at least a few "nines" of always-ness, but it's still not 100%. Except by rounding.

*If* a given database (or table or set of tables) is absolutely constant, so that it will not be added to or subtracted from during the course of any test, only referenced, as seems to be the case in FF's situation, then it is (IMVHO) sufficient to only clear-and-load it at the start of the entire suite, rather than incur the penalties of doing so on every test. Perhaps that could even be enforced (at least for the duration of the tests) with some triggers.

On the other claw, if there aren't a large number of tests, or he can make do with a much smaller (i.e., faster to clear and load) database, or the database clearing and loading can be done by a much faster mechanism (such as pointing a symlink to a different set of files?), the point may be moot....

-Dave

11155 · July 13, 2010, 2:14pm

Dave Aronson wrote:

Good ideal to strive for... but not absolutely necessary all the time.

Wrong! If the test has any potential of touching the database, it is *absolutely necessary all the time* to reset the DB to a known state. Usuallythat means clearing it.

Ah, there's the key: "Usually". So, the line is at least a smidgen never-ward of always. I'll grant you at least a few "nines" of always-ness, but it's still not 100%. Except by rounding.

Setting the database to a known state must be 100%. Clearing (as opposed to resetting to known constant data) may be only 99.999...%.

*If* a given database (or table or set of tables) is absolutely constant, so that it will not be added to or subtracted from during the course of any test, only referenced, as seems to be the case in FF's situation,

But there is no way to guarantee that the "constant" table is in fact constant, unless we get into using permissions in the DB (which will probably cause more problems than they solve). The values in it could get accidentally changed at any time.

then it is (IMVHO) sufficient to only clear-and-load it at the start of the entire suite, rather than incur the penalties of doing so on every test. Perhaps that could even be enforced (at least for the duration of the tests) with some triggers.

That is only true insofar as the table can be guaranteed read-only, which it usually can't be. Transactional tests will probably work here, though.

On the other claw, if there aren't a large number of tests, or he can make do with a much smaller (i.e., faster to clear and load) database,

The number of tests has nothing to do with it. I routinely run fairly large test suites with a small database (by using factories, so that each test only has a few records in the DB while it runs).

or the database clearing and loading can be done by a much faster mechanism (such as pointing a symlink to a different set of files?),

What, set up multiple test databases? Yuck. (Of course, it's entirely possible that the DBMS is doing this and abstracting it anyway.)

the point may be moot....

-Dave

Best,

Colin_Law1 · July 13, 2010, 2:45pm

So what happens if there is a bug that accidentally writes to that table during a test?

Colin

Dave_Aronson2 · July 13, 2010, 5:21pm

Reread last sentence.

-Dave

11155 · July 13, 2010, 5:34pm

Dave Aronson wrote:

So what happens if there is a bug that accidentally writes to that table during a test?

Reread last sentence.

I can't. You didn't quote it.

Are you referring to the bit about the triggers? If so, do you *really* think that that's any less work than using transactions or clearing the database? (Hint: it isn't. Triggers sound great in theory, but they are a pain in all implementations that I'm aware of.)

-Dave

Best,

Dave_Aronson2 · July 13, 2010, 5:45pm

I can't. You didn't quote it.

Eh? It's quoted in the copy I have here....

Are you referring to the bit about the triggers? If so, do you *really* think that that's any less work than using transactions or clearing the database? (Hint: it isn't. Triggers sound great in theory, but they are a pain in all implementations that I'm aware of.)

Work is not the only dimension being considered here. It seemed to me that the OP's objective was to avoid reloading the database for each test, which isn't any more *human* work than doing it once (at least, having it in the standard setup section versus each test). Hence his objective is probably *speed*, which is also important for tests, in order to run them frequently. Having a trigger, or constraint, or other such mechanism, on something that isn't supposed to happen, is bound to be faster than frequently reloading a huge database.

-Dave

11155 · July 13, 2010, 5:54pm

Dave Aronson wrote: [...]

It seemed to me that the OP's objective was to avoid reloading the database for each test, which isn't any more *human* work than doing it once (at least, having it in the standard setup section versus each test). Hence his objective is probably *speed*, which is also important for tests, in order to run them frequently. Having a trigger, or constraint, or other such mechanism, on something that isn't supposed to happen, is bound to be faster than frequently reloading a huge database.

Faster than reloading? Perhaps. Faster than rolling back a transaction? Doubtful. Faster than truncating a bunch of tables and writing tests so they don't involve lots of unnecessary records? Almost certainly not.

(Yes, I said "unnecessary". You really don't need the entire constant table for tests.)

-Dave

Best,

11155 · July 13, 2010, 7:11pm

In the spirit of peace love and understanding, I'll start by saying that I appreciate Marnen's philosophy, and I agree that "if the test has any potential of touching the database, it is *absolutely necessary all the time* to reset the DB to a known state."

But in my application, I have a few huge tables that are used exclusively as reference data, are required for running any meaningful tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour for 10 years (that's 85K+ rows). Each row is a "highly decorated TimeStamp" which maps a timestamp to day of week as a string, whether or not it's a holiday and 17 other columns of dimensional goodness. It would take a LONG time to set it up and tear it down for each test. In comparison to loading the table, MySQL's rollback mechanism is breathtakingly fast.

[Before you ask why the heck a giant table is preferable to simply using the functions associated with Date and Time, I'll refer to you to: Bộ công cụ Kho dữ liệu, Phiên bản thứ 3 - Ralph Kimball Data Warehousing ]

Apart from the large "readonly" tables, I avoid preloaded fixtures for all the reasons Marnen and others have pointed out: they're "brittle", they tend obscure tests by separating "expected" and "observed" results. Etc.

But in this case, it really makes sense to preload the db with large static tables. The alternative -- reloading the tables for every test -- would not give me more insights about the business logic in my application, and would be prohibitively slow.

Peace out.

- ff

11155 · July 13, 2010, 7:30pm

Fearless Fool wrote:

In the spirit of peace love and understanding, I'll start by saying that I appreciate Marnen's philosophy, and I agree that "if the test has any potential of touching the database, it is *absolutely necessary all the time* to reset the DB to a known state."

But in my application, I have a few huge tables that are used exclusively as reference data, are required for running any meaningful tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour for 10 years (that's 85K+ rows). Each row is a "highly decorated TimeStamp" which maps a timestamp to day of week as a string, whether or not it's a holiday and 17 other columns of dimensional goodness. It would take a LONG time to set it up and tear it down for each test.

But you don't need to do that. For any given test, I'll bet that you don't need more than 2 or 3 TimeDimension records. And you can create those on the fly. You don't need 85,000 records of data to test with.

Actually, you probably don't need them to deploy with either, but that's another issue.

[...]

[Before you ask why the heck a giant table is preferable to simply using the functions associated with Date and Time, I'll refer to you to: Bộ công cụ Kho dữ liệu, Phiên bản thứ 3 - Ralph Kimball Data Warehousing ]

Skimmed the Greenspun. I don't buy it at all, particularly not for date information. You may need fact tables for some things, but your TimeDimension seems to be along the lines of

x | y | sum | 1 | 1 | 2 | 1 | 2 | 3 |

But I don't want to lose the primary topic here...

[...]

But in this case, it really makes sense to preload the db with large static tables.

No, it really doesn't. It really only makes sense to create a few representative records for your actual tests.

Best,

11155 · July 13, 2010, 7:40pm

Fearless Fool wrote: [...]

It would take a LONG time to set it up and tear it down for each test. In comparison to loading the table, MySQL's rollback mechanism is breathtakingly fast.

Oh, yeah, one other thing. For a whole variety of performance and maintainability reasons, you probably should not be using mySQL for something like this. Check out PostgreSQL.

Best,

rab · July 13, 2010, 7:59pm

In the spirit of peace love and understanding, I'll start by saying that I appreciate Marnen's philosophy, and I agree that "if the test has any potential of touching the database, it is *absolutely necessary all the time* to reset the DB to a known state."

But in my application, I have a few huge tables that are used exclusively as reference data, are required for running any meaningful tests and are never modified by the application.

For example, one is a TimeDimension table with one entry per hour for 10 years (that's 85K+ rows). Each row is a "highly decorated TimeStamp" which maps a timestamp to day of week as a string, whether or not it's a holiday and 17 other columns of dimensional goodness. It would take a LONG time to set it up and tear it down for each test. In comparison to loading the table, MySQL's rollback mechanism is breathtakingly fast.

I think I recall this kind of question being answered before. The gist was that you can define a 'facts' section of your database.yml (similar to your 'development' or 'test' section) and create the connection to the special 'facts' database for that model. It would be unaffected by the initialization of the 'test' database.

The implementation is something like:

class Fact < ActiveRecord::Base establish_connection "facts" if RAILS_ENV == 'test' && configurations.has_key?('facts') end

class TimeDimension < Fact # ... end

Then uses of TimeDimension in the test environment will use the explicitly established connection. (Of course, you could do the same thing regardless of environment, too.)

-Rob

[Before you ask why the heck a giant table is preferable to simply using the functions associated with Date and Time, I'll refer to you to: http://www.ralphkimball.com/html/booksDWT2.html Data Warehousing ]

Apart from the large "readonly" tables, I avoid preloaded fixtures for all the reasons Marnen and others have pointed out: they're "brittle", they tend obscure tests by separating "expected" and "observed" results. Etc.

But in this case, it really makes sense to preload the db with large static tables. The alternative -- reloading the tables for every test -- would not give me more insights about the business logic in myapplication, and would be prohibitively slow.

Peace out.

- ff

Rob Biedenharn Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/ rab@GaslightSoftware.com http://GaslightSoftware.com/

11155 · July 13, 2010, 8:40pm

Rob Biedenharn wrote:

class Fact < ActiveRecord::Base establish_connection "facts" [snip] end

class TimeDimension < Fact end

Rob Biedenharn Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/ rab@GaslightSoftware.com http://GaslightSoftware.com/

I like it! That's an elegant approach, and I'm putting that on the refactoring TODO list.

Many thanks.

- ff

11155 · July 13, 2010, 8:58pm

Rob Biedenharn wrote: [...]

I think I recall this kind of question being answered before. The gist was that you can define a 'facts' section of your database.yml (similar to your 'development' or 'test' section) and create the connection to the special 'facts' database for that model. It would be unaffected by the initialization of the 'test' database.

What, create a second database for that data? Why? More overhead, more difference from production, more general hassle.

I still maintain that the right answer is not to test with all 85,000 records. Just create what you need for a particular test.

Best,

rab · July 13, 2010, 9:33pm

Perhaps you missed the part about this being a data warehouse. It's probably already a separate database. Technically, it's just a separate schema within a database. And why does this have to be different from production? If this is truly read-only data, then production or any environment can use exactly the same technique. (And multiple environments could use one golden copy of that reference data, too.)

-Rob

Rob Biedenharn Rob@AgileConsultingLLC.com http://AgileConsultingLLC.com/ rab@GaslightSoftware.com http://GaslightSoftware.com/

11155 · July 13, 2010, 9:38pm

Rob Biedenharn wrote:

difference from production, more general hassle.

I still maintain that the right answer is not to test with all 85,000 records. Just create what you need for a particular test.

Best, -- Marnen Laibow-Koser http://www.marnen.org marnen@marnen.org

Perhaps you missed the part about this being a data warehouse.

I didn't miss it. I don't think it's particularly relevant to *testing approaches*, though.

It's probably already a separate database.

The OP has given us no reason to believe this.

Technically, it's just a separate schema within a database. And why does this have to be different from production?

Because as far as we know, the OP is using one database for production.

If this is truly read-only data, then production or any environment can use exactly the same technique. (And multiple environments could use one golden copy of that reference data, too.)

That's certainly true.

I still think, though, that all this runs into the expensive setup smell that we generally try to avoid when testing. Your points are valid for production, but no test environment needs 85,000 reference records.

-Rob

Best,

11155 · July 13, 2010, 10:28pm

Marnen, Rob:

I agree with Rob that it makes good sense to partition constant data into a separate db and use that consistently for development, test and production.

Marnen, maybe I misunderstand you, but even if I knew a priori which TimeDimension records my tests were going to use, it's a PITA to generate TimeDimension records piecemeal (what day did Easter fall on in 2005?)

Furthermore, I have other large constant tables such as the location of FAA certified airports in North America, cobbled from multiple sources. Again, it would be a pain to figure out exactly which ones my test code will reference and then ETL their records on demand.

ANYWAY, I'm in the process of setting up a "constant" database (separate from development, test or production) and I have questions about that. But I'll start a new post for that one. Watch for it...

Thanks, y'all.

- ff

Topic		Replies	Views
preloading data before running tests? rubyonrails-talk	1	172	June 15, 2010
Testing without Fixtures (?) rubyonrails-talk	4	101	February 17, 2009
Pre-loaded fixtures wiped-out by test run rubyonrails-talk	0	199	July 26, 2017
Testing without a test database rubyonrails-talk	1	115	January 27, 2008
rake test without running re-creating database? rubyonrails-talk	0	97	June 30, 2007

How to run unit tests w/o first blowing away test db?

Related topics

More Resources