database design question: building features

I am trying to decide between two different database architectures for an application to manage properties. For capturing the property features, I have two options: a single model/table with all the features as fields, or a set of joined models - property, feature, feature data. (see attached ERD diagram)

OPTION A Properties name address has_pool has_doorman has_etc. (a bunch of values, not all boolean)

=OR=

OPTION B Properties id name address

Features id title featurable_type (polymorphic, in this case points to "property.id") data_type

Feature_Data id feature_id featureable_id (polymorphic, in this case points to "property.id")

My ingoing assumption is that we do not know all the features that can be associated with a property, and as time goes by new features will be requested by users and need to be added. Here is my question: is going to the hassle of joining two extra models just to make the features a bit more independent worth the difficulty of implementation versus just keeping everything in the main property model?

Let's assume that this application will be scaled to a high level. Are there any performance issues associated with OPTION B that should be considered? Is there another way to model this data that makes more sense?

Thanks in advance.

Taylor

Attachments: http://www.ruby-forum.com/attachment/4417/ERD_sample.png

Option B is the better choice. If a change is ever made to the potential feature set, it is far more manageable to be able to manipulate the data as opposed to having the alter the structure of the DB itself. It's logically a one-to-many relationship, so I'd reflect that fully in your schema. Plus, because of the fields that allow polymorphic behavior you'll have the benefit of data independence, in that features could potentially belong to other types of models in the future.

Option B is certainly more flexible, but I'm unclear on why you have these associations set up as polymorphic. Are you planning on having other things with features besides Property? Note that subclasses of Property don't count, as they'll work with plain associations.

I'm also guessing there's at least one typo above - Feature shouldn't actually be referring to Property, right? You've essentially got a "decorated join table" situation, with FeatureData joining Property to Feature.

--Matt Jones

Do you really need the flexible and dynamic scheme of option 2?

As long as you only have a few settings I would stick to option 1. You can always migrate your data and move to scheme 2 if required.

/K

Seirie - thanks for the response!

Matt - you are right about the typos - featureable_type in Features would be referring to "property" rather than "property.id" and Feature_Data should have a field "value." The ERD is correct. I set this up as polymorphic since at least two more models will have Features: Units and Rooms.

I broke out Features into two models (Features and Feature_Data) for two reasons: I wanted to normalize the feature titles and wanted to have data_type which could record whether the data is an integer, boolean, string, etc. (since I assume I will have to store everything in Feature_Data as a string and convert back using rails) - a real pain!

Thinking through it again, since I would have an "unlimited" number of features available through a polymorphic association, maybe I should make all the data boolean and rework as such: (is there a more graceful way?)

Properties id yadda yadda

Properties_Features property_id feature_id

Features id featurable_type (polymorphic - property, unit, or room) title value (boolean, always true?)

Kristian - thanks for the reply. I think it would be easier, too! But... I was wondering if I am violating some kind of best practice to put all the fields in one model even if half of them will be empty most of the time?

Taylor Strait wrote:

Kristian - thanks for the reply. I think it would be easier, too! But... I was wondering if I am violating some kind of best practice to put all the fields in one model even if half of them will be empty most of the time?

i think it can sometimes be a fault to over-engineer a solution beyond the 'fit for purpose' threshold :wink: A textbook solution may be option B and i think it looks the more interesting challenge but in reality you know your system best. If option A is the best fit (and consider your longer term plans as well here) then why not.

I would look into understanding how to normalize your databases.

The problems I have personally experienced with normalizing a database is that oftentimes you can over-normalize a database to the point that it creates havoc on response times.

I would read up on how to normalize your database up to 3NF for this particular situation.

Taylor Strait wrote:

Kristian - thanks for the reply. I think it would be easier, too! But... I was wondering if I am violating some kind of best practice to put all the fields in one model even if half of them will be empty most of the time?

They won't be empty; they'll be false. Big difference.

Best,

You could also check up document oriented and schema free databases like couchdb and mongodb if you would like to see your design problem from another perspective.

Good luck, Kristian

Taylor Strait wrote:

I am trying to decide between two different database architectures for an application to manage properties. For capturing the property features, I have two options: a single model/table with all the features as fields, or a set of joined models - property, feature, feature data. (see attached ERD diagram)

You're right that option A is pretty much a bad idea. I can't tell you how much time I have spent refactoring databases that were *guaranteed* never to change. In that sense option B is much better.

However, there may be a solution between the two.

Consider the following table:

Features id name (the name of the feature: 'pool', 'doorman', or whatever) featureable_type ('property', 'unit', or 'room') featureable_id (property.id, unit.id, or room.id)

With this, you just add whatever feature you want. If the record exists the property has the feature. The name field can be user enterable, or you can fill it from a listbox if you want to control what the user can enter. The list can even be editable by an admin if you want the control and the extensibility.

Will Merrell wrote:

Taylor Strait wrote:

I am trying to decide between two different database architectures for an application to manage properties. For capturing the property features, I have two options: a single model/table with all the features as fields, or a set of joined models - property, feature, feature data. (see attached ERD diagram)

You're right that option A is pretty much a bad idea. I can't tell you how much time I have spent refactoring databases that were *guaranteed* never to change.

That shouldn't be a problem. Broadly speaking, it is better to refactor a database tomorrow than to overdesign it today.

In that sense option B is much better.

However, there may be a solution between the two.

Consider the following table:

Features id name (the name of the feature: 'pool', 'doorman', or whatever) featureable_type ('property', 'unit', or 'room') featureable_id (property.id, unit.id, or room.id)

With this, you just add whatever feature you want. If the record exists the property has the feature. The name field can be user enterable, or you can fill it from a listbox if you want to control what the user can enter. The list can even be editable by an admin if you want the control and the extensibility.

That's terrible! It defeats the point of having a database. Don't ever do that unless there's absolutely no alternative.

Best,

Marnen Laibow-Koser wrote:

Will Merrell wrote:   

Taylor Strait wrote:     

I am trying to decide between two different database architectures for an application to manage properties. For capturing the property features, I have two options: a single model/table with all the features as fields, or a set of joined models - property, feature, feature data. (see attached ERD diagram)       

You're right that option A is pretty much a bad idea. I can't tell you how much time I have spent refactoring databases that were *guaranteed* never to change.      That shouldn't be a problem. Broadly speaking, it is better to refactor a database tomorrow than to overdesign it today.    I'm certainly not in favor of over design, which is why I suggested something in between. That said, I have rarely seen a case where wide and shallow is the proper solution. The OPs problem looks like it needs some kind of normalization.

In that sense option B is much better.

However, there may be a solution between the two.

Consider the following table:

Features id name (the name of the feature: 'pool', 'doorman', or whatever) featureable_type ('property', 'unit', or 'room') featureable_id (property.id, unit.id, or room.id)

With this, you just add whatever feature you want. If the record exists the property has the feature. The name field can be user enterable, or you can fill it from a listbox if you want to control what the user can enter. The list can even be editable by an admin if you want the control and the extensibility.      That's terrible! It defeats the point of having a database. Don't ever do that unless there's absolutely no alternative.    Could you say a little more about which part of this you find terrible. I have used techniques like this for some situations, and have seen others use it also. If I'm missing something I want to know. If I misspoke, I want to clean it up.

-- Will

Will Merrell wrote:

Marnen Laibow-Koser wrote:

You're right that option A is pretty much a bad idea. I can't tell you how much time I have spent refactoring databases that were *guaranteed* never to change.      That shouldn't be a problem. Broadly speaking, it is better to refactor a database tomorrow than to overdesign it today.    I'm certainly not in favor of over design, which is why I suggested something in between. That said, I have rarely seen a case where wide and shallow is the proper solution.

Likewise. But the OP's problem -- lots of independent attributes -- is one such case.

The OPs problem looks like it needs some kind of normalization.

Nope! The table is already well normalized, I think. If you disagree, please tell me what normalization condition is being violated.

featureable_type ('property', 'unit', or 'room')

do that unless there's absolutely no alternative.    Could you say a little more about which part of this you find terrible. I have used techniques like this for some situations, and have seen others use it also.

Oh, it gets used, all right -- by people who either are dealing with unusual situations or don't know how to use a database properly.

If I'm missing something I want to know. If I misspoke, I want to clean it up.

Then just avoid this pattern altogether. See discussion at http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/2ee6d1546a20409e?fwc=1 for more information.

Best,

Marnen Laibow-Koser wrote:

Will Merrell wrote:   

Marnen Laibow-Koser wrote:     

That shouldn't be a problem. Broadly speaking, it is better to refactor a database tomorrow than to overdesign it today.   

I'm certainly not in favor of over design, which is why I suggested something in between. That said, I have rarely seen a case where wide and shallow is the proper solution.      Likewise. But the OP's problem -- lots of independent attributes -- is one such case.

The OPs problem looks like it needs some kind of normalization.      Nope! The table is already well normalized, I think. If you disagree, please tell me what normalization condition is being violated.   

If the OP's problem really is lots of truly independent attributes, selected from a finite list, with a limited and known number of repeated attributes, then yes, wide and flat, (the OP's option A) is probably an optimal solution.

But, that is not what I read in his post. How many pools should he be able to capture for a property? Can a property have six doormen? What about the property with the helipad? The problem with wide and flat is that it requires a fixed problem domain. Cleaver as I am, I can not fully anticipate what crazy examples the real world of the users is going to through at my system.

As I read the OP's description I saw that the list of features he needs to capture is not fixed and that some features could occur multiple times. To me that demands that features be objects in their own right and that the database design should capture the relationship between properties (et al) and features. There are at least two tables here, properties and features. The exact structure (and number) of these tables is a second question.

featureable_type ('property', 'unit', or 'room')         

do that unless there's absolutely no alternative.   

Could you say a little more about which part of this you find terrible. I have used techniques like this for some situations, and have seen others use it also.      Oh, it gets used, all right -- by people who either are dealing with unusual situations or don't know how to use a database properly.

If I'm missing something I want to know. If I misspoke, I want to clean it up.      Then just avoid this pattern altogether. See discussion at http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/2ee6d1546a20409e?fwc=1 for more information.   

Thank you for the link, I see what you're driving at. I mostly agree. Certainly in the case discussed in the thread you linked to the key/value pattern is a bad idea, and should be avoided.

There were really two intents in my original post. The first, and most important, is that there are other ways to solve the OP's problem beyond the two that he presented. His first seemed to me too simplistic and headed for trouble (as discussed above) and the second seemed too complex and also headed for trouble. I simply wanted to suggest that there may be a third choice. Whether my suggestion is that better choice is a separate issue.

The second point i wanted to make is that the recording of a feature need not be very complex. The presence or absence of a record may be sufficient. What the OP needs to record about a given feature probably needs a good deal more thought and discussion, but the existence of the feature can be very simple, and just naming it may be enough.

I really do agree that recording a feature as a collection of key/value pairs is almost certainly a bad idea, and I did not intend to suggest such a structure. If you heard that in what I posted then I did misspeak.

I think what we both mean to say is that good database design requires a good understanding the problem space. Duct tape and bailing wire works for MacGuyver, but not for the rest of us.

-- Will

Will Merrell wrote:

Marnen Laibow-Koser wrote:

and shallow is the proper solution.

Nope! The table is already well normalized, I think. If you disagree, please tell me what normalization condition is being violated.   

If the OP's problem really is lots of truly independent attributes, selected from a finite list, with a limited and known number of repeated attributes, then yes, wide and flat, (the OP's option A) is probably an optimal solution.

That is what the OP said his problem is.

But, that is not what I read in his post. How many pools should he be able to capture for a property? Can a property have six doormen? What about the property with the helipad?

What about it? Those fields could be integers.

The problem with wide and flat is that it requires a fixed problem domain.

The OP seems to think he has one.

Cleaver as I am, I can not fully anticipate what crazy examples the real world of the users is going to through at my system.

From what the OP said, I don't see where you get the requirement of the users specifying custom attributes. You are trying to solve a different problem.

To be sure, he said it was his "ingoing assumption", but I read that as meaning he's guessing.

As I read the OP's description I saw that the list of features he needs to capture is not fixed and that some features could occur multiple times.

Nowhere does he say anything about features occurring multiple times.

To me that demands that features be objects in their own right

Relational databases don't have objects.

and that the database design should capture the relationship between properties (et al) and features. There are at least two tables here, properties and features. The exact structure (and number) of these tables is a second question.

Yes -- if a feature could occur multiple times. But that is not the OP's use case.

Oh, it gets used, all right -- by people who either are dealing with   

Thank you for the link, I see what you're driving at. I mostly agree. Certainly in the case discussed in the thread you linked to the key/value pattern is a bad idea, and should be avoided.

There were really two intents in my original post. The first, and most important, is that there are other ways to solve the OP's problem beyond the two that he presented. His first seemed to me too simplistic and headed for trouble (as discussed above)

Then it can be refactored. Always start with the simplest approach.

and the second seemed too complex and also headed for trouble.

Actually, option B is better normalized than your idea. I think it's preferable.

I simply wanted to suggest that there may be a third choice. Whether my suggestion is that better choice is a separate issue.

I don't think it is.

The second point i wanted to make is that the recording of a feature need not be very complex. The presence or absence of a record may be sufficient.

Or, for that matter, an integer in a field!

What the OP needs to record about a given feature probably needs a good deal more thought and discussion, but the existence of the feature can be very simple, and just naming it may be enough.

Right.

I really do agree that recording a feature as a collection of key/value pairs is almost certainly a bad idea, and I did not intend to suggest such a structure. If you heard that in what I posted then I did misspeak.

Not a collection. Your proposal, though, was something very close to the key/value schema.

I think what we both mean to say is that good database design requires a good understanding the problem space. Duct tape and bailing wire works for MacGuyver, but not for the rest of us.

:slight_smile:

-- Will

Best,