We have a sql server database which can have something like 50 meg of records of any one type. The way that the database was originally designed with activerecord/sql is that every time you get a couple of million records or so, it creates a new shard or a new table.
In effect for one record type we might have 20 or 30 different tables (or shards) that all have the same schema. When you need to look up
a particular record, you get it’s sched field and see which shard (tables) it is in and query those shards. This also involves a manual step
in that someone has to monitor the system and go to the web server and click on “create new shard” whenever there are too many records in
the currently active shard which is where newly created records go.
We are looking at a new approach that involves clustered indexes or non clustered indexes built on some of the fields we typically
query on. It then occurred to me however that if we created a parent record for each sched and had the records use a foriegn key to that
parent, perhaps we could get similar performance for our data and we could not have to worry about the shards ?
This seems like more of a rails activerecord approach and I am very familiar with that way of doing things, but somewhat less familiar with larger data sets such as what I am describing, though when I thought about it, I do recall that there are large data sets used by some
companies, though I forget if there was other strategies such as caching and such used as well.
We currently have 4 different sets of sharded record types. One of them looks something like below for where clauses and group/order.
Where I have a * it indicates that in the Activerecord find(:all) call it may or may not be passed, that is what is in the conditions part can vary. Such a lookup can return many thousands of records all having the same sched, script etc.
where clause: sched, *script
group/order: message_id, timeslice, script, label, scale, scaled_units, *type
My first pass was to create a clustered indexd on sched, script, and label. However I am wondering if there are easier approaches such as I indicated above that might make more sense ?