Learning some best DB practices

I am completely new to rails, and I have a very useful way of dealing with database structures from my background in php. I don't want to be one of those who brings bad habits into rails...so I am curious if my practices below violate rails' best practices.

(SQL is below, but I try to explain here for clarity) For example, let's say I want a user's table in the database. I will usually create it with only necessary information. Name, username, password, timestamps, etc. Then I set up a "users_profiles" table that is essentially a table with only three columns (1. A reference to the user_id, 2. A 'key' column (varchar), and 3. A 'value' column (text).). The primary key is a combo of the referenced user_id and the 'key' column. Within my "user" model, I would instantiate an new "users_profile", by calling $this->profile = new Users_Profiles(). Then, if I want to set an email, all I have to do is call $this->user-

profile->email = "example email". Then $this->user->save();. This

technique, allowed for extreme flexibility. Note that any call just to $this->user->name would have to match the database table column name, but with the key/value setup in the "users_profiles" table (using a few setters and getters) I am able to set the "key" to email without any change to the database.

I appreciated this flexibility and decreased need to alter the database. Since I see huge advantage in the "Rails Way"...particularly in ActiveRecord, I would like to adhere to "best- practices." Does anyone know if this would be considered a poor way of managing data in the "rails way". It seems to me that it would function similarly, in that it would save a lot of updates with the db- migrations.

Thank you in advance for your help!

Andrew Pace

Here are the actual SQL statements for a real site.

Andrew Pace wrote: [...]

(SQL is below, but I try to explain here for clarity) For example, let's say I want a user's table in the database. I will usually create it with only necessary information. Name, username, password, timestamps, etc.

What do you mean by "only necessary information"? If information is unnecessary, you shouldn't be storing it. If it is necessary, you should bother to design a proper DB schema. This is true regardless of whether you're using PHP, Ruby, Python, Perl, BASIC, or Lolcode for your application. :slight_smile:

Then I set up a "users_profiles" table that is essentially a table with only three columns (1. A reference to the user_id, 2. A 'key' column (varchar), and 3. A 'value' column (text).).

So the key column is a type descriptor for the value column, so that you might have

user_id | key. | value 1 | email | joe@aol.com 1 | phone | (555) 123-4567 1 | birthday | 4 Jul 1976 2 | birthday | 02/12/1982

?

If so, then let me tell you that in most cases, this is a bad habit in any language context. It basically defeats the purpose of having a structured database by munging all the data into a single text column. This makes queries harder. For example, how would you search for all users with birthdays greater than a certain date? Heck, how do you store data in a consistent way (was user 2 born on 2 Dec or 12 Feb)?

There are situations where extremely flexible databases like CouchDB without a consistent record schema may be beneficial. But storing user information is probably not one of them.

[...]

Does anyone know if this would be considered a poor way of managing data in the "rails way".

This would be considered a poor way of managing data, period, because it doesn't manage the data -- it just stores it in a messy way that makes querying difficult.

(If you *must* do something like this in Rails, consider serialize. But avoid it if at all possible)

It seems to me that it would function similarly, in that it would save a lot of updates with the db- migrations.

There's no advantage to saving migrations just for the sake of saving migrations. The structure of the DB should, as far as possible, reflect the structure of the data, not a half-assed design that a lazy programmer came up with because he didn't want to bother figuring out what he needed to store.

Thank you in advance for your help!

Andrew Pace

Best,

Thanks for your feedback.

It is possible to ensure certain formats for the text column prior to saving it to a database. This makes querying this data quite easy in fact. Just because the user entered in a form like 2/12/1980, doesn't mean it has to get stored in that fashion....I would hope that is obvious, as this should be a concern no matter how you structure your database. A simple validation check solves that one.

Finding a value is then as simple as: select all from users_profile where profile_key = "birthday" and profile value =,>,< "whatever". Not too difficult.

Here is an SQL statement that works perfectly:     $sql = 'SELECT * FROM `users_profile` WHERE profile_key = \'email \' and profile_value = \'andrewppace@gmail.com\';

Another example.     $sql = 'SELECT * FROM `users_profile` WHERE profile_key = \'zipcode \' and profile_value > \'70000\';

The data is not mangled at all, and is still easily retrievable. It is, however, very flexible. I accept that searching through all text fields can be slower from a performance standpoint than using other types of fields.

Andrew

Thanks for your feedback.

It is possible to ensure certain formats for the text column prior to saving it to a database.

Yes, but that puts all the type-checking in the application layer and defeats one of the big advantages of using a database instead of a flat text file.

This makes querying this data quite easy in fact. Just because the user entered in a form like 2/12/1980, doesn't mean it has to get stored in that fashion....I would hope that is obvious, as this should be a concern no matter how you structure your database. A simple validation check solves that one.

No it doesn't. 2/12/1980 is ambiguous no matter how much validation you throw at it.

Finding a value is then as simple as: select all from users_profile where profile_key = "birthday" and profile value =,>,< "whatever". Not too difficult.

No. > and < don't work on dates stored as strings, which means you need a date parsing function. IIRC, this is generally not present in SQL, which means you have to involve the application layer to complete the query, which is a performance and architectural problem. If you had a date field in the DB for birthday, this query would be absolutely trivial and would be possible in the DB alone.

Here is an SQL statement that works perfectly: $sql = 'SELECT * FROM `users_profile` WHERE profile_key = \'email \' and profile_value = \'andrewpp...@gmail.com\';

Another example. $sql = 'SELECT * FROM `users_profile` WHERE profile_key = \'zipcode \' and profile_value > \'70000\';

The data is not mangled at all,

Yes it is. If the data is numeric or another non-text type, then it is mangled by being declared as text. There's a difference between "4" and the number 4, and between "10/10/1974" and the date of 10 October 1974.

and is still easily retrievable.

Only for the simplest queries. This sort of schema effectively prevents you from harnessing the power of the database for complex queries, because the database cannot reason about the different fields that you are using (since they're all stored as if they're the same field).

It is, however, very flexible. I accept that searching through all text fields can be slower from a performance standpoint than using other types of fields.

Yes, that's one problem -- even if you index the field, the index will take much longer to search than necessary. There is a more fundamental problem, though, and that's that the "data model" you have simply does not model the data -- for example, it represents text, numbers, and dates all as if they were text, and it blurs the difference between fields. There is more to good database design than simply providing enough space to store the data.

Really, I know this may look simpler, but the fact is that it is a bad idea for most use cases, and it does not scale past the most trivial queries. I am hard put to think of a single good reason for the design you describe. Just don't do it.

Andrew

Best,

Also consider the demands of a very very rapidly changing field, like medical informatics. In this area, many prefer the flexibility of a key/value system because adding columns constantly would be a nightmare. Think off adding columns for every new lab test, imaging type, procedure, etc. This is where a system like this is extremely helpful.

Another addition that can sometimes be helpful is to add an additional column that holds a "datatype" that represents a traditional database constraint. This allows for easy programmatic testing against this datatype before the data enters the database. Obviously this requires some programming work up-front, but this can be exceedingly useful once it is built. Again, flexibility is the key here.

Some links:

http://www.cdc.gov/nedss/DataModels/index.html

Andrew

Even a wikipedia article on it.

Andrew

In this case (medical informatics), then a document oriented database like CouchDB (as Marnen pointed out already) may be the best solution.

Also consider the demands of a very very rapidly changing field, like medical informatics. In this area, many prefer the flexibility of a key/value system because adding columns constantly would be a nightmare.

First of all, in a typical application, user profiles contain a well- defined set of data, unlike the case you are describing, and so it should be possible to design a real schema for the data without too much trouble.

Second, part of the point of Rails' system of migrations is to make database changes *not* a nightmare.

Think off adding columns for every new lab test, imaging type, procedure, etc.

If you're doing that, then you're doing something wrong. This is where a bunch of linked tables come in handy, as well as (perhaps) some limited use of the key-value pattern, or perhaps some serialized arrays to represent complex results. But this is a solution of last resort when the data does not lend itself to a less amorphous schema.

This is where a system like this is extremely helpful.

Perhaps. But this is not your use case as you've described it, so your bringing it up is a red herring.

Another addition that can sometimes be helpful is to add an additional column that holds a "datatype" that represents a traditional database constraint.

And at that point, you might just as well build a proper database schema. Outside of a couple of special cases, there's so much wrong with this idea that I don't even know where to start.

This allows for easy programmatic testing against this

datatype before the data enters the database. Obviously this requires some programming work up-front, but this can be exceedingly useful once it is built. Again, flexibility is the key here.

Some links:

Biomedical Informatics & Data Science < Biomedical Informatics & Data Science http://www.cdc.gov/nedss/DataModels/index.html IBM Research Publications | IBM Research

These basically seem to talk about reinventing database features within tables -- even to the point of storing metadata. That way lies abstraction inversion. Don't reimplement what the database will already do for you -- unless you have a very good reason to do so. In your use case, you don't.

Andrew

Best,

Thanks