lookups on large sql server data base with Activerecord that is currently using shards

Jedrin · March 14, 2013, 2:32pm

We have a sql server database which can have something like 50 meg of records of any one type. The way that the database was originally designed with activerecord/sql is that every time you get a couple of million records or so, it creates a new shard or a new table. In effect for one record type we might have 20 or 30 different tables (or shards) that all have the same schema. When you need to look up a particular record, you get it’s sched field and see which shard (tables) it is in and query those shards. This also involves a manual step in that someone has to monitor the system and go to the web server and click on “create new shard” whenever there are too many records in the currently active shard which is where newly created records go.

We are looking at a new approach that involves clustered indexes or non clustered indexes built on some of the fields we typically query on. It then occurred to me however that if we created a parent record for each sched and had the records use a foriegn key to that parent, perhaps we could get similar performance for our data and we could not have to worry about the shards ?

This seems like more of a rails activerecord approach and I am very familiar with that way of doing things, but somewhat less familiar with larger data sets such as what I am describing, though when I thought about it, I do recall that there are large data sets used by some companies, though I forget if there was other strategies such as caching and such used as well.

We currently have 4 different sets of sharded record types. One of them looks something like below for where clauses and group/order. Where I have a * it indicates that in the Activerecord find(:all) call it may or may not be passed, that is what is in the conditions part can vary. Such a lookup can return many thousands of records all having the same sched, script etc.

where clause: sched, *script

group/order: message_id, timeslice, script, label, scale, scaled_units, *type

My first pass was to create a clustered indexd on sched, script, and label. However I am wondering if there are easier approaches such as I indicated above that might make more sense ?

Topic		Replies	Views
Automatic database shard switching with rails rubyonrails-talk activerecord	0	73	November 7, 2024
Handling Hot Data with ActiveRecord rubyonrails-talk	10	198	July 23, 2015
distribution entries for model over multiple tables rubyonrails-talk	1	165	November 22, 2008
[Feature Proposal] define a shards method for ActiveRecord abstract classes rubyonrails-core activerecord	5	1011	December 11, 2020
Best way to go - query large database? rubyonrails-talk	7	149	January 17, 2007

lookups on large sql server data base with Activerecord that is currently using shards

Related topics

More Resources