dynamic tablenames

Hi List!

May I tap into your combined common sense ?

I have a legacy logging app that needs to be modernized. At the moment, Data is going to be stored in 5 tables, one table is going to contain more than 200 million and 2 others about 60 million rows. Mysql will be used, unless someone sees a major advantage in using something else (at the moment data is stored in a raw positioned file format, which I’m not keeping).

For performance reasons I’m thinking about storing the data for each entity in separate tables. So instead of a ‘logdata’ table with >200 million rows, I’d create ‘logdata_001’, ‘logdata_002’,… tables where the rowcount of even the biggest entity logdata stays below 3 million.

Rails seems to have no problem with me setting the tablename (set_table_name) on each incoming request before accessing the database.

I have this gut feeling though I’m overlooking something nasty.

Anyone have any comments/suggestions ? Am I about to create my own personal maintenance nightmare ?

Thanks in advance for your insights!

Piet.

I’m doing a similar thing [setting information about the database on-the-fly] but I’m having to change a model’s table_name_prefix. The one concern I have [though it hasn’t come up so far] is the possibility of two requests being made at the same or close to the same time and one mistakenly using the others’ setup. I’m thinking of using transactions, perhaps. I dunno but I too am interested in if anyone else has done on-the-fly type database configuring.

RSL

Nothing from one request is shared with another one (it's the basis of the "shared nothing" architecture) *.

If you use a before_filter on the controller you should be ok (this is also how acts as authenticated works)

bye Luca

* with he exception of the session data which is shared by all the requests coming from the same browser session.

Why not just have a table with >200m rows? As far as I’m aware, there are no real performance advantages with splitting it up. The point of a database is that it can just eat up data without without causing any performance issues.

Although I can’t imagine what on earth you’d want to 200m rows for! Modeling the cells of a small rodent, maybe?

-Nathan

Hi List!

May I tap into your combined common sense ? I have a legacy logging app that needs to be modernized. At the moment, Data is going to be stored in 5 tables, one table is going to contain more than 200 million and 2 others about 60 million rows. Mysql will be used, unless someone sees a major advantage in using something else (at the moment data is stored in a raw positioned file format, which I'm not keeping).

I'd go with Postgres unless you have some really solid reasons for staying with mysql.

I suggest you set aside a couple of days to configure both db's. (pg ships with very modest defaults -- performance will suffer if you don't give it a bit more shared+working memory - and do turn on auto vacuum), load the data, vacuum analyze, add some indexes, and then try to run a few queries.

Try some fancy subqueries, for instance. IIRC, mysql performance suffers badly from not being able to use an index more than once per query, or something along those lines.

For performance reasons I'm thinking about storing the data for each entity in separate tables. So instead of a 'logdata' table with >200 million rows, I'd create 'logdata_001', 'logdata_002',... tables where the rowcount of even the biggest entity logdata stays below 3 million.

Rails seems to have no problem with me setting the tablename (set_table_name) on each incoming request before accessing the database. I have this gut feeling though I'm overlooking something nasty. Anyone have any comments/suggestions ? Am I about to create my own personal maintenance nightmare ?

Keep the data in rows, that's what RDBMS are made for.

Just make sure you add indexes on the FKs. Periodically clustering on that index could be beneficial too, if you need to do a lot of seq scans on single entities.

There's no benefit to splitting the data across multiple tables and it makes your app more complex. And don't forget about what happens when you're asked to do something that spans all those entities, for whatever reason.

Isak

This is incredibly great news for me! Thanks so much Luca and thanks Piet for bringing up what I’m just now getting around to coding.

RSL

FWIW, there can be significant performance gains using partitioned tables, assuming you have something to key on, such as a year or something which can be logically segmented.

It appears the MySQL 5.1 includes partitioned tables based on the documentation here: http://dev.mysql.com/doc/refman/5.1/en/create-table.html

As for having 200m rows, it is common in data warehouses when you are you dealing with historical data over a 5 year or greater time frame. Hopefully I'll get the opportunity to speak about it at RailsConf 2007, and if so come up and say hi. :slight_smile:

V/r Anthony Eden

How are you all setting table_name or table_name_prefix dynamically? I initially thought I could just set a session variable (since I want this value to persist for a given user across multiple requests until it is reset by the user) and then use in my model: class TestTable < ActiveRecord::Base   self.table_name_prefix = session[:my_little_variable] + '_' end

But this doesn't work, since apparently session data is not directly available in the models. Not every model I'm working with needs a table prefix, so I don't want to set this globally. How do I take a variable selected by the user and cleanly use this as a table name prefix for the models that need it?

What’s the criteria for which models need a new table name/prefix? If it centers around another model then you can put a table_prefix attribute on that model and call it thusly:

def self.set_tables OtherModel.set_table_name “#{table_prefix}_othermodels” end

Don’t use table_name_prefix as it sets ActionRecord::Base itself and not the specific model as you might think. Lots of fun confusion on that for me. Also, you can’t [or I couldn’t figure out how to] set the table_name on the model itself. For me, the models that needed new table names [databases actually] were all related to what site was being used so my Site model has the set_tables method. Call that method from a filter on the appropriate controller methods and voila! On-the-fly table name changes. If you have a relationship between the models, you might also need to tweak your associations code like this:

class Whatever has_many :othermodels, :finder_sql => ‘SELECT #{table_name}_othermodels.* FROM #{table_name}_othermodels WHERE #{table_name}_othermodels.whatever_id = #{id}’

I hope that’s not too obtuse. It’s just that my code uses this functionality to change databases and I don’t want to paste it and have you thinking that it’s merely for changing table names. However, if anyone is wondering… If you do need to change the database, you don’t have to establish a new connection but can use Model.set_table_name “#{db_name}.table_name”. You can use the same trick in the association to select across two databases.

God, I hope that wasn’t too obscure. Apologies in advance.

RSL

Hmm, I posted a reply using the Google Groups interface and it’s not appearing (even though my first reply to this post came through just fine). Lots of posts elsewhere about this annoying Google Groups issue. So at the risk of repeating myself, I’ll try sending an email reply to the group to see if it goes through or disappears into the ether.

Anyway, first off, thanks for your reply. My particular situation is that I have separate tables for individual US states with the two letter state abbreviation as table prefixes ( e.g. ‘ny_things’, ‘ca_things’). For a variety of reasons these tables can’t be merged into a single table. Rather than define (essentially the same) models for every state, I’d rather just have the user select a state, which will the set the state abbreviation in the users session data. From then on, all tables needing a prefix will be dynamically set according to the session data. Some tables do not need a prefix at all, so yes, I ran into the same problem trying to set ‘table_name_prefix’ on an individual model and found it was being prefixed to all table names.

So for my first stab at a solution, I’m using a before_filter in my controllers which call a private method in application.rb (so it can be shared between several controllers) which calls the ‘table_name’ methods for those models needing this:

class ApplicationController < ActionController::Base

def set_table_names Thing.table_name = “#{session[:state_abbreviation]}_things”

end

end

Before this method is called, I call another before_filter method that makes sure the the session variable is set and valid. If not, it redirects the user to select an appropriate value. This is working perfectly for now. I’m using a regular expression to ensure that the state abbreviation is in fact two and only two lower case letters before it is set in the session, so I think I’m fairly safe with this approach.

As far as setting the table name within a model, I’ve always been able to just do:

class ExampleModel < ActiveRecord::Base self.table_name = “some_crazy_table_name_not_conforming_to_conventions”

end

Cheers,

John-Scott

I’ve come to the same result as you with setting the table_name via a filter but I still go through the Site model to change the child table_names because, in my case, the child table names all use an attribute of the site instance.[Site#db_name, which in turn comes from Site#host].

As far as the self.table_name methodology. I had a lot of trouble with that and finally gave up because it wants to set that when the model loads and since I’m actually calling an attribute from another model to get a string, the whole thing would collapse on me. I had the same problem with custom finders until I came across a post somewhere that talked about using single rather than double quotes. I really should have mentioned that earlier too.

RSL