Dynamic Fields and Models Architecture

Hi Everyone,

I’ve to design a system in which different datasets are added periodically to the system and user should be able to perform several predefined set of operations on the data. Datasets are large enough so it makes sense to keep them in separate tables/databases for performance reasons.

The problem I’m facing is that how does it fit into rails ActiveRecord architecture. Do I generate new Models for newly generated tables every time a new dataset is added? Suppose even if i’m able to somehow do that, I still need to provide a generic interface for all datasets. I’m not sure what the architecture should look like while following the rails conventions? Also it is preferable to add new datasets without having to worry about changing the code.

Any kind of help will be appreciated

Regards,

Usman

Hi Everyone,

I've to design a system in which different datasets are added periodically to the system and user should be able to perform several predefined set of operations on the data. Datasets are large enough so it makes sense to keep them in separate tables/databases for performance reasons.

The problem I'm facing is that how does it fit into rails ActiveRecord architecture. Do I generate new Models for newly generated tables every time a new dataset is added? Suppose even if i'm able to somehow do that, I still need to provide a generic interface for all datasets. I'm not sure what the architecture should look like while following the rails conventions? Also it is preferable to add new datasets without having to worry about changing the code.

Any kind of help will be appreciated

I would take a serious look at views or materialized views, if these are reporting data rather than something your users will be adding rows to. That would let you abstract away the implementation details of the database layer from your application. You have a single stable model, and its "table" is a SQL view of one or more actual tables in the DB. We use this all the time for read-only data.

Walter

It’s not a performance issue. the problem is that i don’t have one database, but multiple databases added at run-time. Each database is independent and I don’t know what they look like however, i need to perform similar operation on each of them like taking averages, min, max, count, selections and other read operations. I want to be able to do this without making any changes to code when a new database is added. I’m thinking of going with a schema-less JSON based approach.

It's not a performance issue. the problem is that i don't have one database, but multiple databases added at run-time. Each database is independent and I don't know what they look like however, i need to perform similar operation on each of them like taking averages, min, max, count, selections and other read operations. I want to be able to do this without making any changes to code when a new database is added. I'm thinking of going with a schema-less JSON based approach.

How does that work? How are the databases added while the app is running? What does the adding? There's nothing in Rails itself that is going to help you do this (as far as I know). If the databases are being added by another process while the Rails app is running (remember -- production Rails apps only scan the database for structure once, at startup) then you're going to need that other process to kick off some notification to your Rails app so that it can be forced to reload and get the new structure. Otherwise, what it sounds like you need is some abstraction layer above the database layer so that you can talk to that through a single API, rather than relying on ActiveRecord for this. There's nothing magical about JSON or NoSQL storage engines for this, either. If it quacks like ActiveRecord, then it's designed to export its structure during app startup, and stick to that structure while it is running.

Walter

Ok, i’ll explain this with a hypothetical scenario. Suppose the requirement is that we need to create a portal to provide stats of different products available in the market. And suppose you get data periodically from different places in spreadsheets.Each spreadsheet contains data for one kind of product, e.g cellphones, television etc and each kind of product has different fields. There can be hundreds of thousands of records in one spreadsheet so it must structured for efficient retrieval, however data is read-only, there’ll be no write operations.On each data set we need to be able to perform generic, predefined read operations. e.g. selection of subsets fulfilling certain criteria, sorting, grouping, aggregates. For example, I should be able to write a generic functions, which gets table name, field name and returns count of unique values in that field. this can be used to count televisions with different resolutions and also to count books by different authors. here resolution and author is unique field of respective datasets.

The problem I’m facing is that I’ve to create a different tables and respective models every time a new product type is added. Even though the models are essentially performing the same operations.

Thanks for the response.

Regards,

Usman

Can anyone offer a solution? Any kind of help will be appreciated.

I would just make an importer to put all the data in a normalized set of fields in a single table. Then you can have one set of fields and one model to wrap around them. Parsing Excel is pretty easy these days, because it's just XML.

Walter

Parsing isn’t the issue. The problem is that I don’t want to rewrite code every time a new dataset is added. I want to provide an abstract interface as well under which I can plug any data-set.

The query interface you’re describing sounds a lot like Elastic (formerly Elasticsearch): Elasticsearch: The Official Distributed Search & Analytics Engine | Elastic

–Matt Jones

Thanks Matt. This looks very promising.