Testing without Fixtures (?)

I'm writing unit tests for a rails app whose behavior depends on the relationships of tens of thousands of items in a DB. I need this data in the test DB for a lot of tests. Storing this data as fixtures, keeping it in sync with the production DB, and waiting HOURS for the fixtures to load whenever the test suite is run are all untenable options.

Here is what I <i>think</i> what I want to do: * Create a reference archive of the test DB once all the data is loaded. This will be checked into the repo so anyone can load it (quickly and easily with a capistrano task) prior to running the test suite.

* Trick rails into believing the data was loaded from fixtures, so tests will behave normally. Presumably this means overriding the standard procedure of deleting, re-inserting, and instantiating test data that happens before each test method. I believe this could probably be done with a test_helper method without having to modify ActiveRecord.

Concerns: * This seems to be in violation of the Rails philosophy of "make right things easy and the wrong things hard."

* I think this has to be a fairly common problem, but after lots of google searching, I have found no instances of someone else trying to do this sort of thing.

This makes me think there is a better way, or at least a different way that someone has already implemented.

What I am looking for: * A reality check. Is this the right way to handle this problem?

* Technical guidance. I am a relative novice (read: n00b) with ruby development, so any input on things I should be aware of as I try to figure this out would be helpful...

- OR -

* Sample code. If someone has already solved this problem and talked about it somewhere, just point me in their direction.

Thanks, John

I'm writing unit tests for a rails app whose behavior depends on the relationships of tens of thousands of items in a DB. I need this data in the test DB for a lot of tests. Storing this data as fixtures, keeping it in sync with the production DB, and waiting HOURS for the fixtures to load whenever the test suite is run are all untenable options.

This is the first red flag - test data shouldn't need to "keep in sync with the production DB"; that's why it's TEST DATA.

Concerns: * This seems to be in violation of the Rails philosophy of "make right things easy and the wrong things hard."

This does sound hard, and I think it's the wrong thing...

Even in cases where the production DB is large, there should be a way to trim down the data to get test fixtures. For example, I've been working recently with some code using zipcode distances; the production DB mapping zips to coordinates is 50k+ records. But my fixture only contains a few zips, that match the ones needed by the rest of the test data set. I'd advise a similar strategy for your data set.

--Matt Jones

Because of an NDA, I can't really talk about what is in the DB or why it all needs to be there with any specificity, but will try to explain by analogy.

Suppose I have an app that tries to smartly generate a list for grocery shopping. The db has a table for all the items sold by the grocery store. There is also a table for categories of groceries, and a join table that associates the items with the categories. Cheddar, for instance, could be joined with the categories: cheeses, dairy, items_that_require_refrigeration, etc. We also a have table for different scenarios we might be shopping for, e.g. general_grocery, camping_trip, dinner_party, christmas_baking, etc. and some items are joined with certain scenarios.

The way this might play out in our app: We select dinner_party as our scenario and indicate that we need to buy some cheese. The app provides a list of cheeses to pick from. If we add camembert, perhaps the list suggests a bottle of white wine that pairs well with the cheese, but if we add parmesan, the list gives us the option of adding spaghetti, pasta sauce, and/or all the raw ingredients to make our own pasta sauce.

Suppose the scenario is camping_trip, and we add graham crackers. Chocolate and marshmallows are added to the list automatically. However, if we add graham crackers while shopping for general_grocery, those items are not added.

Some of that behavior may sound a bit obnoxious for generating a shopping list, but remember this is an analogy, and in the real app, that sort of thing is more appropriate. The point is that the behavior of the application is heavily dependent on the grouping relationships of a large data set. So really, our tests need to verify the data relationships as much as the code. I can't really do that if I use a different data set, or restrict the data to a smaller, more manageable load.

There are some tools with which it is possible to trim down the data in a way that respects the relationships between the tables ( http://jailer.sf.net/ for instance)

Thanks, thats quite helpful.