I'm using Rails 4 and the testing framework with which it ships.
I'm using a relational database that needs to be rigorously tested for
internal consistency. In this case, it's a sports statistics database
with player stats that are updated nightly.
I need to run tests like the following when new data arrives each night:
--That the sum of player stats equals the team's.
--That the sum of team stats equals the league's.
--That the sum of wins and losses equals games played.
--etc.
For now I'm copying my development database over to testing and running
these alongside my unit tests in the /test/models/ directory.
This is an awkward set-up as my database testing code isn't comprised of
unit tests in their proper sense, and it doesn't rely on fixtures as the
Rails documentation suggests this folder be used for.
My question is: In Rails, what is the best practice for database testing
like that which I describe above?
I would start by, as far as practical, normalising the database so
that there are no such potential inconsistencies.
Next if it is suggested that it is not practical to fully normalise it
then probe the logic behind that carefully to make sure it is really
true. Do not just assume that something that seems like a good idea
from the performance aspect is necessarily worthwhile. Decades of
experience in the S/W industry has shown time and again that the
bottlenecks in an app are virtually never where they are initially
anticipated.
Next if it is absolutely not practical to fully normalise it then
arrange all updates using transactions such that if an inconsistency
arises it must be a bug.
Next have a comprehensive set of automatic tests that attempt to get
inconsistent data into the database.
Finally have a task which is run on the server at appropriate times,
possibly via cron or automatically invoked when data are updated, that
performs the checks and sounds a siren if a bug is detected.
It's not practical to change the database as data arrives from a 3rd
party provider with a particular structure.
You're note made me realize that I'm going about this the wrong way,
however. I didn't even think to use transactions on insert. I'm
accustomed to an old system where transactions aren't supported and this
didn't enter my thinking. What I'm trying to accomplish shouldn't really
be part of my testing code.
Long experience has taught me that if you're dealing with data with a crappy structure, do not try to clean it up outside the db. Use the tools that a powerful database like PG provides. Create a set of tables which mirrors *exactly* the structure of the data you get, no matter how awful it is. Add appropriate indexes. Test everything you can think of. Then move the verified data into a set of reasonably normalized tables.
Now that does assume a pretty bad data structure being handed to you, which is more often than not the case when dealing with legacy systems. But if the data as you're getting it is actually structured reasonably well, but you just want to verify consistency, feel free to ignore that advice and skip the "staging" tables. I just wanted to point out that you really don't have to be bound by the structure of what you get.