Learning some best DB practices

I am completely new to rails, and I have a very useful way of dealing
with database structures from my background in php. I don't want to
be one of those who brings bad habits into rails...so I am curious if
my practices below violate rails' best practices.

(SQL is below, but I try to explain here for clarity) For example,
let's say I want a user's table in the database. I will usually
create it with only necessary information. Name, username, password,
timestamps, etc. Then I set up a "users_profiles" table that is
essentially a table with only three columns (1. A reference to the
user_id, 2. A 'key' column (varchar), and 3. A 'value' column
(text).). The primary key is a combo of the referenced user_id and
the 'key' column. Within my "user" model, I would instantiate an new
"users_profile", by calling $this->profile = new Users_Profiles().
Then, if I want to set an email, all I have to do is call $this->user-

profile->email = "example email". Then $this->user->save();. This

technique, allowed for extreme flexibility. Note that any call just
to $this->user->name would have to match the database table column
name, but with the key/value setup in the "users_profiles" table
(using a few setters and getters) I am able to set the "key" to email
without any change to the database.

I appreciated this flexibility and decreased need to alter the
database. Since I see huge advantage in the "Rails
Way"...particularly in ActiveRecord, I would like to adhere to "best-
practices." Does anyone know if this would be considered a poor way
of managing data in the "rails way". It seems to me that it would
function similarly, in that it would save a lot of updates with the db-
migrations.

Thank you in advance for your help!

Andrew Pace

Here are the actual SQL statements for a real site.

Andrew Pace wrote:
[...]

(SQL is below, but I try to explain here for clarity) For example,
let's say I want a user's table in the database. I will usually
create it with only necessary information. Name, username, password,
timestamps, etc.

What do you mean by "only necessary information"? If information is
unnecessary, you shouldn't be storing it. If it is necessary, you
should bother to design a proper DB schema. This is true regardless of
whether you're using PHP, Ruby, Python, Perl, BASIC, or Lolcode for your
application. :slight_smile:

Then I set up a "users_profiles" table that is
essentially a table with only three columns (1. A reference to the
user_id, 2. A 'key' column (varchar), and 3. A 'value' column
(text).).

So the key column is a type descriptor for the value column, so that you
might have

user_id | key. | value
1 | email | joe@aol.com
1 | phone | (555) 123-4567
1 | birthday | 4 Jul 1976
2 | birthday | 02/12/1982

?

If so, then let me tell you that in most cases, this is a bad habit in
any language context. It basically defeats the purpose of having a
structured database by munging all the data into a single text column.
This makes queries harder. For example, how would you search for all
users with birthdays greater than a certain date? Heck, how do you
store data in a consistent way (was user 2 born on 2 Dec or 12 Feb)?

There are situations where extremely flexible databases like CouchDB
without a consistent record schema may be beneficial. But storing user
information is probably not one of them.

[...]

Does anyone know if this would be considered a poor way
of managing data in the "rails way".

This would be considered a poor way of managing data, period, because it
doesn't manage the data -- it just stores it in a messy way that makes
querying difficult.

(If you *must* do something like this in Rails, consider serialize. But
avoid it if at all possible)

It seems to me that it would
function similarly, in that it would save a lot of updates with the db-
migrations.

There's no advantage to saving migrations just for the sake of saving
migrations. The structure of the DB should, as far as possible, reflect
the structure of the data, not a half-assed design that a lazy
programmer came up with because he didn't want to bother figuring out
what he needed to store.

Thank you in advance for your help!

Andrew Pace

Best,

Thanks for your feedback.

It is possible to ensure certain formats for the text column prior to
saving it to a database. This makes querying this data quite easy in
fact. Just because the user entered in a form like 2/12/1980, doesn't
mean it has to get stored in that fashion....I would hope that is
obvious, as this should be a concern no matter how you structure your
database. A simple validation check solves that one.

Finding a value is then as simple as: select all from users_profile
where profile_key = "birthday" and profile value =,>,< "whatever".
Not too difficult.

Here is an SQL statement that works perfectly:
    $sql = 'SELECT * FROM `users_profile` WHERE profile_key = \'email
\' and profile_value = \'andrewppace@gmail.com\';

Another example.
    $sql = 'SELECT * FROM `users_profile` WHERE profile_key = \'zipcode
\' and profile_value > \'70000\';

The data is not mangled at all, and is still easily retrievable. It
is, however, very flexible. I accept that searching through all text
fields can be slower from a performance standpoint than using other
types of fields.

Andrew

Thanks for your feedback.

It is possible to ensure certain formats for the text column prior to
saving it to a database.

Yes, but that puts all the type-checking in the application layer and
defeats one of the big advantages of using a database instead of a
flat text file.

This makes querying this data quite easy in
fact. Just because the user entered in a form like 2/12/1980, doesn't
mean it has to get stored in that fashion....I would hope that is
obvious, as this should be a concern no matter how you structure your
database. A simple validation check solves that one.

No it doesn't. 2/12/1980 is ambiguous no matter how much validation
you throw at it.

Finding a value is then as simple as: select all from users_profile
where profile_key = "birthday" and profile value =,>,< "whatever".
Not too difficult.

No. > and < don't work on dates stored as strings, which means you
need a date parsing function. IIRC, this is generally not present in
SQL, which means you have to involve the application layer to complete
the query, which is a performance and architectural problem. If you
had a date field in the DB for birthday, this query would be
absolutely trivial and would be possible in the DB alone.

Here is an SQL statement that works perfectly:
$sql = 'SELECT * FROM `users_profile` WHERE profile_key = \'email
\' and profile_value = \'andrewpp...@gmail.com\';

Another example.
$sql = 'SELECT * FROM `users_profile` WHERE profile_key = \'zipcode
\' and profile_value > \'70000\';

The data is not mangled at all,

Yes it is. If the data is numeric or another non-text type, then it
is mangled by being declared as text. There's a difference between
"4" and the number 4, and between "10/10/1974" and the date of 10
October 1974.

and is still easily retrievable.

Only for the simplest queries. This sort of schema effectively
prevents you from harnessing the power of the database for complex
queries, because the database cannot reason about the different fields
that you are using (since they're all stored as if they're the same
field).

It
is, however, very flexible. I accept that searching through all text
fields can be slower from a performance standpoint than using other
types of fields.

Yes, that's one problem -- even if you index the field, the index will
take much longer to search than necessary. There is a more
fundamental problem, though, and that's that the "data model" you have
simply does not model the data -- for example, it represents text,
numbers, and dates all as if they were text, and it blurs the
difference between fields. There is more to good database design than
simply providing enough space to store the data.

Really, I know this may look simpler, but the fact is that it is a bad
idea for most use cases, and it does not scale past the most trivial
queries. I am hard put to think of a single good reason for the
design you describe. Just don't do it.

Andrew

Best,

Also consider the demands of a very very rapidly changing field, like
medical informatics. In this area, many prefer the flexibility of a
key/value system because adding columns constantly would be a
nightmare. Think off adding columns for every new lab test, imaging
type, procedure, etc. This is where a system like this is extremely
helpful.

Another addition that can sometimes be helpful is to add an additional
column that holds a "datatype" that represents a traditional database
constraint. This allows for easy programmatic testing against this
datatype before the data enters the database. Obviously this requires
some programming work up-front, but this can be exceedingly useful
once it is built. Again, flexibility is the key here.

Some links:

http://ycmi.med.yale.edu/nadkarni/eav_CR_contents.htm
http://www.cdc.gov/nedss/DataModels/index.html
http://www.research.ibm.com/journal/sj/461/eggebraaten.html

Andrew

Even a wikipedia article on it.

http://en.wikipedia.org/wiki/Entity-attribute-value_model

Andrew

In this case (medical informatics), then a document oriented database
like CouchDB (as Marnen pointed out already) may be the best solution.

Also consider the demands of a very very rapidly changing field, like
medical informatics. In this area, many prefer the flexibility of a
key/value system because adding columns constantly would be a
nightmare.

First of all, in a typical application, user profiles contain a well-
defined set of data, unlike the case you are describing, and so it
should be possible to design a real schema for the data without too
much trouble.

Second, part of the point of Rails' system of migrations is to make
database changes *not* a nightmare.

Think off adding columns for every new lab test, imaging
type, procedure, etc.

If you're doing that, then you're doing something wrong. This is
where a bunch of linked tables come in handy, as well as (perhaps)
some limited use of the key-value pattern, or perhaps some serialized
arrays to represent complex results. But this is a solution of last
resort when the data does not lend itself to a less amorphous schema.

This is where a system like this is extremely
helpful.

Perhaps. But this is not your use case as you've described it, so
your bringing it up is a red herring.

Another addition that can sometimes be helpful is to add an additional
column that holds a "datatype" that represents a traditional database
constraint.

And at that point, you might just as well build a proper database
schema. Outside of a couple of special cases, there's so much wrong
with this idea that I don't even know where to start.

This allows for easy programmatic testing against this

datatype before the data enters the database. Obviously this requires
some programming work up-front, but this can be exceedingly useful
once it is built. Again, flexibility is the key here.

Some links:

http://ycmi.med.yale.edu/nadkarni/eav_CR_contents.htm
http://www.cdc.gov/nedss/DataModels/index.html
http://www.research.ibm.com/journal/sj/461/eggebraaten.html

These basically seem to talk about reinventing database features
within tables -- even to the point of storing metadata. That way lies
abstraction inversion. Don't reimplement what the database will
already do for you -- unless you have a very good reason to do so. In
your use case, you don't.

Andrew

Best,

Thanks