Dealing with an EAV database

I've inherited an EAV database and there's really no option to remodel
it. Data is stored as key_name, key_value pairs
Trying to return a meaningful, unified recordset is far too complex to
be efficient.
I've decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS
functions to return a fast query result, with the consolidated fields
as one string.
The final data is to be displayed and downloaded (CSV), but not all
fields will be available for each "row", which makes serialization
awkward. I can't decide where the "heavy lifting" should go to parse
and process this data for screen and download.

Does anyone have a good strategy for this?

jrq wrote:

I've inherited an EAV database and there's really no option to remodel
it.

Why not? Are others depending on the existing schema?

Data is stored as key_name, key_value pairs
Trying to return a meaningful, unified recordset is far too complex to
be efficient.
I've decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS
functions to return a fast query result, with the consolidated fields
as one string.
The final data is to be displayed and downloaded (CSV), but not all
fields will be available for each "row", which makes serialization
awkward. I can't decide where the "heavy lifting" should go to parse
and process this data for screen and download.

Does anyone have a good strategy for this?

If you can't remodel the DB, decline the project. :slight_smile:

If you can't decline the project, export the data into a reasonable
schema (a non-SQL database like MongoDB may be a good option here). A
Google search also turned up the acts_as_eav_model plugin, but I know
nothing about it.

Another alternative: build a RESTful service (perhaps without Rails) on
top of the EAV database, then connect Rails to it through
ActiveResource.

Best,

Marnen Laibow-Koser wrote:

jrq wrote:

I've inherited an EAV database and there's really no option to remodel
it.

[...]

If you can't remodel the DB, decline the project. :slight_smile:

Hmmm. This question and others like
http://www.ruby-forum.com/topic/218057 are making me start to think that
I should put the effort into that ORM library I've been meaning to try
to write...

Best,

Good and reasonable suggestions. I'd rather avoid supplementing the
database end of things. The complete data set is fairly small <20,000
rows and queries are likely to only return 1,000 rows at a time. I
don't think the string parsing is <i>that</i> complex, a couple of
seconds would be OK, 10 seconds, not so much. It's for read-only
reports, so I don't really care about preserving the record integrity.
I'm wondering whether it makes sense to do half the processing in the
controller/model, and half in some helper (for the screen aspect).

Please quote when replying; otherwise, the discussion becomes difficult
to follow.

jrq wrote:

Good and reasonable suggestions. I'd rather avoid supplementing the
database end of things.

I'm not sure what you mean by this.

Basically, if you're going to use Rails, you have to make sure that it
and the DB understand each other.

The complete data set is fairly small <20,000
rows and queries are likely to only return 1,000 rows at a time. I
don't think the string parsing is <i>that</i> complex, a couple of
seconds would be OK, 10 seconds, not so much. 3

But you'd be munging all the fields together in the query, then
separating them in the app. That's a lot of unnecessary string
processing. It would be better to avoid ti.

It's for read-only
reports, so I don't really care about preserving the record integrity.

Huh?

I'm wondering whether it makes sense to do half the processing in the
controller/model,

Controllers shouldn't do processing.

and half in some helper (for the screen aspect).

Unlikely. Dealing with the DB is really the job of the model. Helpers
are only for presentation formatting.

Best,

By "avoid supplementing the database end of things", I mean I want to
avoid adding more tables or databases to the database engine,
inserting an additional database layer.

Yes, I'm currently intending to "munge" the unorganized fields into a
string, and deal with the parsing/processing at another stage. yes,
it's inefficient, but it's a relatively small amount of processing,
and hopefully it will get me to where I need to be within a short
amount of time, which is a factor in this situation.

By "not preserving record integrity", I mean that I am not updating
any of these records, it's read-only. I want to pull the data and
generate reports, both for screen and CSV stream.

Yes, most of the processing would be in the Model, not the Controller,
but since the model is not a good representation of the existing
database schema, there's going to be some fudging going on. The
question is whether it's worth bothering to model the final data
output at all.

I know that helpers are for presentation formatting, which is why I
specified that the helper process would be for the "screen".

Thanks.

Again: please quote when replying!

jrq wrote:

By "avoid supplementing the database end of things", I mean I want to
avoid adding more tables or databases to the database engine,
inserting an additional database layer.

But that's the right way to do it if you can't just get rid of the EAV
schema. There is no benefit to avoiding DB objects just for the sake of
avoiding DB objects.

Yes, I'm currently intending to "munge" the unorganized fields into a
string, and deal with the parsing/processing at another stage. yes,
it's inefficient, but it's a relatively small amount of processing,

It's not just inefficient; it will be inconvenient. Don't do it.

and hopefully it will get me to where I need to be within a short
amount of time, which is a factor in this situation.

No. It will just complicate your life immensely. It may look simpler
now, but once you try to do anything non-trivial, I think you'll find
that it's a lot harder.

By "not preserving record integrity", I mean that I am not updating
any of these records, it's read-only. I want to pull the data and
generate reports, both for screen and CSV stream.

Then do it right and transform the data into a non-EAV representation
once you pull it.

Yes, most of the processing would be in the Model, not the Controller,
but since the model is not a good representation of the existing
database schema, there's going to be some fudging going on.

Then write a model that *is* a good representation of it (perhaps
supplementing or completely ditching ActiveRecord).

The
question is whether it's worth bothering to model the final data
output at all.

Of course it is. That way you can manipulate it more easily than with
string parsing.

I know that helpers are for presentation formatting, which is why I
specified that the helper process would be for the "screen".

Thanks.

Best,

Have you considered using database views? I have never been in this
situation, so take my suggestions with a grain of salt. Here's what I'd
try:

Use database views to map the EAV (non-)schema to sane relational
schema.

You didn't say if you need to insert into that database. If you do, see
if MySQL supports updatable views. If it does, give them a try,
otherwise handle inserts yourself.

If none of this works, consider not using ActiveRecord at all, but
rather something like sequel is a better fit
http://sequel.rubyforge.org/

Michael

Michael Schuerig wrote:

> Yes, I'm currently intending to "munge" the unorganized fields into a
> string, and deal with the parsing/processing at another stage. yes,
> it's inefficient, but it's a relatively small amount of processing,
> and hopefully it will get me to where I need to be within a short
> amount of time, which is a factor in this situation.

Have you considered using database views? I have never been in this
situation, so take my suggestions with a grain of salt. Here's what I'd
try:

I've been looking at that. I'm not sure how that gets around the
complex queries, unless MySQL supports Materialized Views, which I
don't think it does (and they're are not a great solution). Or are
you assuming that the caching will get around this problem, once it's
built the first time?

Use database views to map the EAV (non-)schema to sane relational
schema.

You didn't say if you need to insert into that database. If you do, see
if MySQL supports updatable views. If it does, give them a try,
otherwise handle inserts yourself.

I don't need to do any inserts, so that should not be an issue.

jrq wrote:

> > Yes, I'm currently intending to "munge" the unorganized fields
> > into a string, and deal with the parsing/processing at another
> > stage. yes, it's inefficient, but it's a relatively small
> > amount of processing, and hopefully it will get me to where I
> > need to be within a short amount of time, which is a factor in
> > this situation.
>
> Have you considered using database views? I have never been in this
> situation, so take my suggestions with a grain of salt. Here's what
> I'd

> try:
I've been looking at that. I'm not sure how that gets around the
complex queries, unless MySQL supports Materialized Views, which I
don't think it does (and they're are not a great solution). Or are
you assuming that the caching will get around this problem, once it's
built the first time?

I don't know. Give it a try. Define just the views and do some queries.
You wrote earlier that your DB has < 20,000 rows -- in other words, it
is almost trivially small and surely fits into memory. That is, unless
the attribute values are extremely large.

I don't need to do any inserts, so that should not be an issue.

Does that mean that your app does not change the database or that it
does not change at all? Either way, I'd try views first and if that's
not fast enough already, there's the option of materializing them
yourself simply by copying the views to real tables.

Michael

I've read the rest of the thread with interest, but I just wanted to
go back to the start for a second.

I've inherited an EAV database and there's really no option to remodel
it.

Where have you inherited this data from? Is there an application
that's still populating it? or is it an export of data (from something
like an online survey/questionnaire system)?

Trying to return a meaningful, unified recordset is far too complex to
be efficient.

Using what approach? The data was set up as EAV for a reason, and
would have probably had tools to query and manipulate it - do you not
have any access to the original tools?

I've decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS
functions to return a fast query result, with the consolidated fields
as one string.
The final data is to be displayed and downloaded (CSV), but not all
fields will be available for each "row", which makes serialization
awkward.

So what lead you to decide to use Rails for this? Given Rails has *no*
practical EAV support (just a bit of munging with plugins), it strikes
me as a strange decision to try to use it to analyse your data. Is
there no "EAV data analysis" tool you can find to do the job, rather
than having to create huge amounts of DB views and/or model
manipulation.

I'm not saying to *not* continue what you're doing - just curious that
it seems you've picked a route that needs more work than should be
reasonable, and don't seem to be looking at any other options. It's
always worth considering admitting that you might be making lots of
work for yourself and a new start may be more economical in the long
run.

FWIW Many (6-ish) years ago I got an EAV data export from SurveyMonkey
dumped on my desk, and after importing it into a SQL DB, I produced
Excel cross-tabulated reports with a combination of a big plain old
SQL query (with *lots* of joins! :slight_smile: to get the data into a "relation"
equivalent, and used Excel's pivot-table functionality to do the
"heavy lifting" of getting information from those rows.

> > > Yes, I'm currently intending to "munge" the unorganized fields
> > > into a string, and deal with the parsing/processing at another
> > > stage. yes, it's inefficient, but it's a relatively small
> > > amount of processing, and hopefully it will get me to where I
> > > need to be within a short amount of time, which is a factor in
> > > this situation.

> > Have you considered using database views? I have never been in this
> > situation, so take my suggestions with a grain of salt. Here's what
> > I'd

> > try:
> I've been looking at that. I'm not sure how that gets around the
> complex queries, unless MySQL supports Materialized Views, which I
> don't think it does (and they're are not a great solution). Or are
> you assuming that the caching will get around this problem, once it's
> built the first time?

I don't know. Give it a try. Define just the views and do some queries.
You wrote earlier that your DB has < 20,000 rows -- in other words, it
is almost trivially small and surely fits into memory. That is, unless
the attribute values are extremely large.

> I don't need to do any inserts, so that should not be an issue.

Does that mean that your app does not change the database or that it
does not change at all? Either way, I'd try views first and if that's
not fast enough already, there's the option of materializing them
yourself simply by copying the views to real tables.

My app will not change the tables involved. However, there are other
apps that do change this data. This is not a high-volume transaction
system, so faking a materialized view may prove to be an option.

I've read the rest of the thread with interest, but I just wanted to
go back to the start for a second.

> I've inherited an EAV database and there's really no option to remodel
> it.

Where have you inherited this data from? Is there an application
that's still populating it? or is it an export of data (from something
like an online survey/questionnaire system)?

It's not a static export, and yes there is an application still
populating it.

> Trying to return a meaningful, unified recordset is far too complex to
> be efficient.

Using what approach? The data was set up as EAV for a reason, and
would have probably had tools to query and manipulate it - do you not
have any access to the original tools?

Using SQL joins to build a recordset that represents the dataset. As
far as I can gauge the data was setup this way to allow an "easy" way
to construct a custom set of fields for a relatively non-technical
user/admin. It certainly wasn't designed to handle the reporting
function I'm being asked to create, or, I suspect the volume of data
that is being stored in it.

Yes, the right thing to do would be to replace large chunks of this
and design a better normalized data solution that satisfied all the
systems involved. Unfortunately, in this instance I don't have the
luxury of time or money to go that route. I wouldn't describe myself
as a experienced developer, but I have been in this business for a
long time, and I know the kind of hole I'm digging for myself by
choosing this method.

> I've decided to make use of the MySQL GROUP_CONCAT, and CONCAT_WS
> functions to return a fast query result, with the consolidated fields
> as one string.
> The final data is to be displayed and downloaded (CSV), but not all
> fields will be available for each "row", which makes serialization
> awkward.

So what lead you to decide to use Rails for this? Given Rails has *no*
practical EAV support (just a bit of munging with plugins), it strikes
me as a strange decision to try to use it to analyse your data. Is
there no "EAV data analysis" tool you can find to do the job, rather
than having to create huge amounts of DB views and/or model
manipulation.

It's more complex (isn't it always). A large part of the database is
well-formed, normalized and friendly. This little area that only
recently came to my attention is not (EAV). Rather than re-tool my
entire solution design, I'm sticking to the original plan, and shoe-
horn this section in (and it is a relatively small section in the
scheme of things).

I freely admit that Rails is not the best solution for this. I'd even
go as far as to say that it falls into the category of "right now I
have a hammer, so everything looks like a nail", by which I mean: I
don't do a lot of development, especially web-app development, and
spent some time at the beginning of the year learning RoR. I liked
it, even if I wasn't always using it in the manner for which it's most
suitable. It's been suggested that something like "Sinatra" would be
a much better fit for this, and quick peek seems to show that it
probably would. However, in the short time available to complete this
project, I can't deal with struggling to get through all the "gotchas"
of another framework.

Lame excuses, I know. Completely at odds with the dogma of RoR
development.

I'm not saying to *not* continue what you're doing - just curious that
it seems you've picked a route that needs more work than should be
reasonable, and don't seem to be looking at any other options. It's
always worth considering admitting that you might be making lots of
work for yourself and a new start may be more economical in the long
run.

FWIW Many (6-ish) years ago I got an EAV data export from SurveyMonkey
dumped on my desk, and after importing it into a SQL DB, I produced
Excel cross-tabulated reports with a combination of a big plain old
SQL query (with *lots* of joins! :slight_smile: to get the data into a "relation"
equivalent, and used Excel's pivot-table functionality to do the
"heavy lifting" of getting information from those rows.

With data migration, I generally opt to get down to the raw data and
manipulate it all in code. I know it can be a little less efficient,
but when it's my code, at least I know I can trust it, or at least fix
it when it breaks. Having said that, the pivot table stuff in Excel
can be pretty useful.

Okay, well, given you want to try to make your life easier in a very
special and specific use-case, and I can see a bit more about the
angle you're coming at this problem; If you're going to
programmatically solve this in a potentially quick-and-dirty way,
because the powers-that-be must be demanding the "reports", try seeing
if OpenStructs [1] will help you build up collections of
pseudo--relational-records made up from the EAV records. They're a
nice solution for creating "something" that behaves like an AR object
for the majority of time - you could even extend one of the "tableless
model" [2] implementations to keep more standards in your interfaces.

[1] http://ruby-doc.org/stdlib/libdoc/ostruct/rdoc/classes/OpenStruct.html

[2] http://stackoverflow.com/questions/937429/activerecordbase-without-table-rails
or Google for others...

Interesting. OpenStruct might very well come in handy. Thanks.