Dealing with an EAV database

I've inherited an EAV database and there's really no option to remodel it. Data is stored as key_name, key_value pairs Trying to return a meaningful, unified recordset is far too complex to be efficient. I've decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS functions to return a fast query result, with the consolidated fields as one string. The final data is to be displayed and downloaded (CSV), but not all fields will be available for each "row", which makes serialization awkward. I can't decide where the "heavy lifting" should go to parse and process this data for screen and download.

Does anyone have a good strategy for this?

jrq wrote:

I've inherited an EAV database and there's really no option to remodel it.

Why not? Are others depending on the existing schema?

Data is stored as key_name, key_value pairs Trying to return a meaningful, unified recordset is far too complex to be efficient. I've decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS functions to return a fast query result, with the consolidated fields as one string. The final data is to be displayed and downloaded (CSV), but not all fields will be available for each "row", which makes serialization awkward. I can't decide where the "heavy lifting" should go to parse and process this data for screen and download.

Does anyone have a good strategy for this?

If you can't remodel the DB, decline the project. :slight_smile:

If you can't decline the project, export the data into a reasonable schema (a non-SQL database like MongoDB may be a good option here). A Google search also turned up the acts_as_eav_model plugin, but I know nothing about it.

Another alternative: build a RESTful service (perhaps without Rails) on top of the EAV database, then connect Rails to it through ActiveResource.

Best,

Marnen Laibow-Koser wrote:

jrq wrote:

I've inherited an EAV database and there's really no option to remodel it.

[...]

If you can't remodel the DB, decline the project. :slight_smile:

Hmmm. This question and others like Is it possible to query on virtual attributes? - Rails - Ruby-Forum are making me start to think that I should put the effort into that ORM library I've been meaning to try to write...

Best,

Good and reasonable suggestions. I'd rather avoid supplementing the database end of things. The complete data set is fairly small <20,000 rows and queries are likely to only return 1,000 rows at a time. I don't think the string parsing is <i>that</i> complex, a couple of seconds would be OK, 10 seconds, not so much. It's for read-only reports, so I don't really care about preserving the record integrity. I'm wondering whether it makes sense to do half the processing in the controller/model, and half in some helper (for the screen aspect).

Please quote when replying; otherwise, the discussion becomes difficult to follow.

jrq wrote:

Good and reasonable suggestions. I'd rather avoid supplementing the database end of things.

I'm not sure what you mean by this.

Basically, if you're going to use Rails, you have to make sure that it and the DB understand each other.

The complete data set is fairly small <20,000 rows and queries are likely to only return 1,000 rows at a time. I don't think the string parsing is <i>that</i> complex, a couple of seconds would be OK, 10 seconds, not so much. 3

But you'd be munging all the fields together in the query, then separating them in the app. That's a lot of unnecessary string processing. It would be better to avoid ti.

It's for read-only reports, so I don't really care about preserving the record integrity.

Huh?

I'm wondering whether it makes sense to do half the processing in the controller/model,

Controllers shouldn't do processing.

and half in some helper (for the screen aspect).

Unlikely. Dealing with the DB is really the job of the model. Helpers are only for presentation formatting.

Best,

By "avoid supplementing the database end of things", I mean I want to avoid adding more tables or databases to the database engine, inserting an additional database layer.

Yes, I'm currently intending to "munge" the unorganized fields into a string, and deal with the parsing/processing at another stage. yes, it's inefficient, but it's a relatively small amount of processing, and hopefully it will get me to where I need to be within a short amount of time, which is a factor in this situation.

By "not preserving record integrity", I mean that I am not updating any of these records, it's read-only. I want to pull the data and generate reports, both for screen and CSV stream.

Yes, most of the processing would be in the Model, not the Controller, but since the model is not a good representation of the existing database schema, there's going to be some fudging going on. The question is whether it's worth bothering to model the final data output at all.

I know that helpers are for presentation formatting, which is why I specified that the helper process would be for the "screen".

Thanks.

Again: please quote when replying!

jrq wrote:

By "avoid supplementing the database end of things", I mean I want to avoid adding more tables or databases to the database engine, inserting an additional database layer.

But that's the right way to do it if you can't just get rid of the EAV schema. There is no benefit to avoiding DB objects just for the sake of avoiding DB objects.

Yes, I'm currently intending to "munge" the unorganized fields into a string, and deal with the parsing/processing at another stage. yes, it's inefficient, but it's a relatively small amount of processing,

It's not just inefficient; it will be inconvenient. Don't do it.

and hopefully it will get me to where I need to be within a short amount of time, which is a factor in this situation.

No. It will just complicate your life immensely. It may look simpler now, but once you try to do anything non-trivial, I think you'll find that it's a lot harder.

By "not preserving record integrity", I mean that I am not updating any of these records, it's read-only. I want to pull the data and generate reports, both for screen and CSV stream.

Then do it right and transform the data into a non-EAV representation once you pull it.

Yes, most of the processing would be in the Model, not the Controller, but since the model is not a good representation of the existing database schema, there's going to be some fudging going on.

Then write a model that *is* a good representation of it (perhaps supplementing or completely ditching ActiveRecord).

The question is whether it's worth bothering to model the final data output at all.

Of course it is. That way you can manipulate it more easily than with string parsing.

I know that helpers are for presentation formatting, which is why I specified that the helper process would be for the "screen".

Thanks.

Best,

Have you considered using database views? I have never been in this situation, so take my suggestions with a grain of salt. Here's what I'd try:

Use database views to map the EAV (non-)schema to sane relational schema.

You didn't say if you need to insert into that database. If you do, see if MySQL supports updatable views. If it does, give them a try, otherwise handle inserts yourself.

If none of this works, consider not using ActiveRecord at all, but rather something like sequel is a better fit http://sequel.rubyforge.org/

Michael

Michael Schuerig wrote:

> Yes, I'm currently intending to "munge" the unorganized fields into a > string, and deal with the parsing/processing at another stage. yes, > it's inefficient, but it's a relatively small amount of processing, > and hopefully it will get me to where I need to be within a short > amount of time, which is a factor in this situation.

Have you considered using database views? I have never been in this situation, so take my suggestions with a grain of salt. Here's what I'd try:

I've been looking at that. I'm not sure how that gets around the complex queries, unless MySQL supports Materialized Views, which I don't think it does (and they're are not a great solution). Or are you assuming that the caching will get around this problem, once it's built the first time?

Use database views to map the EAV (non-)schema to sane relational schema.

You didn't say if you need to insert into that database. If you do, see if MySQL supports updatable views. If it does, give them a try, otherwise handle inserts yourself.

I don't need to do any inserts, so that should not be an issue.

jrq wrote:

> > Yes, I'm currently intending to "munge" the unorganized fields > > into a string, and deal with the parsing/processing at another > > stage. yes, it's inefficient, but it's a relatively small > > amount of processing, and hopefully it will get me to where I > > need to be within a short amount of time, which is a factor in > > this situation. > > Have you considered using database views? I have never been in this > situation, so take my suggestions with a grain of salt. Here's what > I'd

> try: I've been looking at that. I'm not sure how that gets around the complex queries, unless MySQL supports Materialized Views, which I don't think it does (and they're are not a great solution). Or are you assuming that the caching will get around this problem, once it's built the first time?

I don't know. Give it a try. Define just the views and do some queries. You wrote earlier that your DB has < 20,000 rows -- in other words, it is almost trivially small and surely fits into memory. That is, unless the attribute values are extremely large.

I don't need to do any inserts, so that should not be an issue.

Does that mean that your app does not change the database or that it does not change at all? Either way, I'd try views first and if that's not fast enough already, there's the option of materializing them yourself simply by copying the views to real tables.

Michael

I've read the rest of the thread with interest, but I just wanted to go back to the start for a second.

I've inherited an EAV database and there's really no option to remodel it.

Where have you inherited this data from? Is there an application that's still populating it? or is it an export of data (from something like an online survey/questionnaire system)?

Trying to return a meaningful, unified recordset is far too complex to be efficient.

Using what approach? The data was set up as EAV for a reason, and would have probably had tools to query and manipulate it - do you not have any access to the original tools?

I've decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS functions to return a fast query result, with the consolidated fields as one string. The final data is to be displayed and downloaded (CSV), but not all fields will be available for each "row", which makes serialization awkward.

So what lead you to decide to use Rails for this? Given Rails has *no* practical EAV support (just a bit of munging with plugins), it strikes me as a strange decision to try to use it to analyse your data. Is there no "EAV data analysis" tool you can find to do the job, rather than having to create huge amounts of DB views and/or model manipulation.

I'm not saying to *not* continue what you're doing - just curious that it seems you've picked a route that needs more work than should be reasonable, and don't seem to be looking at any other options. It's always worth considering admitting that you might be making lots of work for yourself and a new start may be more economical in the long run.

FWIW Many (6-ish) years ago I got an EAV data export from SurveyMonkey dumped on my desk, and after importing it into a SQL DB, I produced Excel cross-tabulated reports with a combination of a big plain old SQL query (with *lots* of joins! :slight_smile: to get the data into a "relation" equivalent, and used Excel's pivot-table functionality to do the "heavy lifting" of getting information from those rows.

> > > Yes, I'm currently intending to "munge" the unorganized fields > > > into a string, and deal with the parsing/processing at another > > > stage. yes, it's inefficient, but it's a relatively small > > > amount of processing, and hopefully it will get me to where I > > > need to be within a short amount of time, which is a factor in > > > this situation.

> > Have you considered using database views? I have never been in this > > situation, so take my suggestions with a grain of salt. Here's what > > I'd

> > try: > I've been looking at that. I'm not sure how that gets around the > complex queries, unless MySQL supports Materialized Views, which I > don't think it does (and they're are not a great solution). Or are > you assuming that the caching will get around this problem, once it's > built the first time?

I don't know. Give it a try. Define just the views and do some queries. You wrote earlier that your DB has < 20,000 rows -- in other words, it is almost trivially small and surely fits into memory. That is, unless the attribute values are extremely large.

> I don't need to do any inserts, so that should not be an issue.

Does that mean that your app does not change the database or that it does not change at all? Either way, I'd try views first and if that's not fast enough already, there's the option of materializing them yourself simply by copying the views to real tables.

My app will not change the tables involved. However, there are other apps that do change this data. This is not a high-volume transaction system, so faking a materialized view may prove to be an option.

I've read the rest of the thread with interest, but I just wanted to go back to the start for a second.

> I've inherited an EAV database and there's really no option to remodel > it.

Where have you inherited this data from? Is there an application that's still populating it? or is it an export of data (from something like an online survey/questionnaire system)?

It's not a static export, and yes there is an application still populating it.

> Trying to return a meaningful, unified recordset is far too complex to > be efficient.

Using what approach? The data was set up as EAV for a reason, and would have probably had tools to query and manipulate it - do you not have any access to the original tools?

Using SQL joins to build a recordset that represents the dataset. As far as I can gauge the data was setup this way to allow an "easy" way to construct a custom set of fields for a relatively non-technical user/admin. It certainly wasn't designed to handle the reporting function I'm being asked to create, or, I suspect the volume of data that is being stored in it.

Yes, the right thing to do would be to replace large chunks of this and design a better normalized data solution that satisfied all the systems involved. Unfortunately, in this instance I don't have the luxury of time or money to go that route. I wouldn't describe myself as a experienced developer, but I have been in this business for a long time, and I know the kind of hole I'm digging for myself by choosing this method.

> I've decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS > functions to return a fast query result, with the consolidated fields > as one string. > The final data is to be displayed and downloaded (CSV), but not all > fields will be available for each "row", which makes serialization > awkward.

So what lead you to decide to use Rails for this? Given Rails has *no* practical EAV support (just a bit of munging with plugins), it strikes me as a strange decision to try to use it to analyse your data. Is there no "EAV data analysis" tool you can find to do the job, rather than having to create huge amounts of DB views and/or model manipulation.

It's more complex (isn't it always). A large part of the database is well-formed, normalized and friendly. This little area that only recently came to my attention is not (EAV). Rather than re-tool my entire solution design, I'm sticking to the original plan, and shoe- horn this section in (and it is a relatively small section in the scheme of things).

I freely admit that Rails is not the best solution for this. I'd even go as far as to say that it falls into the category of "right now I have a hammer, so everything looks like a nail", by which I mean: I don't do a lot of development, especially web-app development, and spent some time at the beginning of the year learning RoR. I liked it, even if I wasn't always using it in the manner for which it's most suitable. It's been suggested that something like "Sinatra" would be a much better fit for this, and quick peek seems to show that it probably would. However, in the short time available to complete this project, I can't deal with struggling to get through all the "gotchas" of another framework.

Lame excuses, I know. Completely at odds with the dogma of RoR development.

I'm not saying to *not* continue what you're doing - just curious that it seems you've picked a route that needs more work than should be reasonable, and don't seem to be looking at any other options. It's always worth considering admitting that you might be making lots of work for yourself and a new start may be more economical in the long run.

FWIW Many (6-ish) years ago I got an EAV data export from SurveyMonkey dumped on my desk, and after importing it into a SQL DB, I produced Excel cross-tabulated reports with a combination of a big plain old SQL query (with *lots* of joins! :slight_smile: to get the data into a "relation" equivalent, and used Excel's pivot-table functionality to do the "heavy lifting" of getting information from those rows.

With data migration, I generally opt to get down to the raw data and manipulate it all in code. I know it can be a little less efficient, but when it's my code, at least I know I can trust it, or at least fix it when it breaks. Having said that, the pivot table stuff in Excel can be pretty useful.

Okay, well, given you want to try to make your life easier in a very special and specific use-case, and I can see a bit more about the angle you're coming at this problem; If you're going to programmatically solve this in a potentially quick-and-dirty way, because the powers-that-be must be demanding the "reports", try seeing if OpenStructs [1] will help you build up collections of pseudo--relational-records made up from the EAV records. They're a nice solution for creating "something" that behaves like an AR object for the majority of time - you could even extend one of the "tableless model" [2] implementations to keep more standards in your interfaces.

[1] Class: OpenStruct (Ruby 3.1.2)

[2] ruby on rails 2 - ActiveRecord::Base Without Table - Stack Overflow or Google for others...

Interesting. OpenStruct might very well come in handy. Thanks.