Patch Testers Sought

Hey guys,

I've posted a fairly large refactoring of rails' attribute
related-methods to trac at

http://dev.rubyonrails.org/ticket/9241

As it makes a few non-minor changes to the core of activerecord, I
thought I'd post it for review here. I'm especially interested in any
breakage that you notice related to either the NoMethodError or
@attributes_cache changes.

The patch itself isn't done yet, so don't worry about any
'peculiarities' of the implementation, what I'm concerned about is
whether the subtly different semantics and caching introduce any bugs
in your applications.

== description follows ==

This patch refactors the attribute-generation methods in
ActiveRecord::Base in the following ways:

    * Move generation from instance methods to class methods
    * In addition to accessors, generate mutators
    * Always generate accessors and mutators, the overhead in
development mode is minimal.

It includes one functional change, results of typecasted attributes
are cached in a new hash @attributes_cache. The rationale for this
change is that repeated access to columns requiring typecasting (such
as Times) can be a significant overhead. Code like the following will
repeatedly call the Time.parse functions.

  50.times { object.created_at }

This can contribute to performance degradation in datetime-heavy applications.

Finally the behaviour of the respond_to? and method_missing methods
has changed when you use find with :select.

   # OLD
   @customer = Customer.find(1, :select=>"first_name").
   @customer.respond_to?("last_name") # => false
   @customer.has_attribute?("last_name") # => false
   @customer.last_name # raises NoMethodError

   # NEW
   @customer = Customer.find(1, :select=>"first_name").
   @customer.respond_to?("last_name") # => true
   @customer.has_attribute?("last_name") # => false
   @customer.last_name # raises MissingAttributeError

This change simplifies the code somewhat, and also makes it easier to
track down bugs in this area. A simple hook is provided so plugin
authors can experiment with lazily fetching the missing columns. While
this could be useful for large TEXT or BLOB attributes, I don't intend
to include functionality like that without battle-testing it in some
way.

Hi Koz,

Your proposal looks sound to me, however I think it only goes half
way. In my view there are two main issues with active record:

* The mixing of Ruby objects and raw database values (strings) in the
attributes hash table

* The lack of proper column objects

For more details, I wrote up some of my thoughts at
http://cfis.savagexi.com/articles/2007/08/11/making-rails-better-fixing-architecture-flaws-in-active-record.

But in a nutshell, in think much of the grunginess of ActiveRecord
could be eliminated by introducing a column object per data type. So
FloatColumn, StringColumn, BooleanColumn, etc. Each column would
provide an api for serializing/deserializing Ruby objects to and from
the database's string representation. That would clean up ActiveRecord
by removing a large number of case statements in the connection
adapters and schema_definitions.rb. And more importantly, it would
make it much easier for developers to add their own custom data types
when needed. To see what I mean, try to add your own custom data type
(say Postgresql PostGIS data types) to ActiveRecord and see how
difficult it is (without of course modifying ActiveRecord itself).

Adding a column type directly influences your proposal, because the
serialization methods would be used to convert to and from the
attributes hash (sounds like you are using that for the raw value from
the database) and the attributes_cache hash (which stores the
serialized form as a Ruby object). So something like:

attributes_cache['foo'] =
self.columns['foo'].serialize(attributes['foo'])

And of course vice-versa.

Last, it would be really useful to add a "changed" flag for each field
in each record. Then when ActiveRecord creates an UPDATE query it
would only modify the values that actually changed. Right now,
ActiveRecord updates all columns, which turns out to be a pain because
it has a tendency to botch values for columns that have a default
expression or have a type that it doesn't know about (postgis fields
for example).

So just my two cents - happy to discuss more if these seem like useful
ideas.

Charlie

Your proposal looks sound to me, however I think it only goes half
way. In my view there are two main issues with active record:

* The mixing of Ruby objects and raw database values (strings) in the
attributes hash table

* The lack of proper column objects

The column objects we currently have are certainly anemic, and I agree
it'd be nice if the type casting code was pushed out of Base and into
the adapters. All things in good time, assuming the caching approach
works, I'd definitely like to tidy the implementation a little more.
I definitely like the general shape of:

attributes_cache['foo'] = self.columns['foo'].serialize(attributes['foo'])

So assuming we don't see any breakage, I'll investigate that kind of a change.

Last, it would be really useful to add a "changed" flag for each field
in each record. Then when ActiveRecord creates an UPDATE query it
would only modify the values that actually changed. Right now,
ActiveRecord updates all columns, which turns out to be a pain because
it has a tendency to botch values for columns that have a default
expression or have a type that it doesn't know about (postgis fields
for example).

We've discussed partial-row updates in the past here, and they're
incompatible with model validations. Concurrent updates can each
see a valid state, issue an update, and leave the record invalid. So
it's unlikely we'll support them within the 2.0 timeframe, however it
could definitely be put into a plugin, and I'm happy to roll in any
hooks needed to simplify that.

As for change tracking, jeremy has had a plugin around for ages
(http://code.bitsweat.net/svn/dirty/) which does the job nicely the
few times I've used it. The fact it hasn't gathered a large number of
users indicates to me that perhaps it's a bit of a niche feature.

Michael Koziarski wrote:


Your proposal looks sound to me, however I think it only goes half
way. In my view there are two main issues with active record:
* The mixing of Ruby objects and raw database values (strings) in the
attributes hash table
* The lack of proper column objects

 The column objects we currently have are certainly anemic, and I agree
it'd be nice if the type casting code was pushed out of Base and into
the adapters. All things in good time, assuming the caching approach
works, I'd definitely like to tidy the implementation a little more.
I definitely like the general shape of:
attributes_cache['foo'] = self.columns['foo'].serialize(attributes['foo'])
So assuming we don't see any breakage, I'll investigate that kind of a change.
Last, it would be really useful to add a "changed" flag for each field
in each record. Then when ActiveRecord creates an UPDATE query it
would only modify the values that actually changed. Right now,
ActiveRecord updates all columns, which turns out to be a pain because
it has a tendency to botch values for columns that have a default
expression or have a type that it doesn't know about (postgis fields
for example).
We've discussed partial-row updates in the past here, and they're
incompatible with model validations. Concurrent updates can each
see a valid state, issue an update, and leave the record invalid. So
it's unlikely we'll support them within the 2.0 timeframe, however it
could definitely be put into a plugin, and I'm happy to roll in any
hooks needed to simplify that.

What about requiring optimistic locking for partial row updates? That
would seem to allow it to work without greatly contorting the rest of
ActiveRecord.

As for change tracking, jeremy has had a plugin around for ages
() which does the job nicely the
few times I've used it. The fact it hasn't gathered a large number of
users indicates to me that perhaps it's a bit of a niche feature.

So just my two cents - happy to discuss more if these seem like useful
ideas.
Charlie

Jack

The column objects we currently have are certainly anemic, and I agree
it'd be nice if the type casting code was pushed out of Base and into
the adapters. All things in good time, assuming the caching approach
works, I'd definitely like to tidy the implementation a little more.
I definitely like the general shape of:

attributes_cache['foo'] = self.columns['foo'].serialize(attributes['foo'])

So assuming we don't see any breakage, I'll investigate that kind of a change.

Excellent - that would be great.

We've discussed partial-row updates in the past here, and they're
incompatible with model validations. Concurrent updates can each
see a valid state, issue an update, and leave the record invalid.

Hmm. Thinking out loud, the validations would only check values that
have changed, so that seems ok. Then when you do an update, you'd
send the key fields and the changed fields.

Now if 2 people tried to update the same record, and ActiveRecord was
doing partial updates, then I could see how might end up in an invalid
state on the database side. Of course that's easily solved on the
database side - but there would be no way to solve it on the rails
side.

could definitely be put into a plugin, and I'm happy to roll in any
hooks needed to simplify that.

A possible hook would be a setter method you could override, or for a
cleaner implementation, fire a changed event when a field is updated.
Then you could roll your own "changed" hash table.

The fact it hasn't gathered a large number of
users indicates to me that perhaps it's a bit of a niche feature.

Interesting - ok.

Anyway, I'd agree keeping track of changes is not nearly as important
as separating out serialized/unserialized values in the attributes
hash table and having a good column implementation.

I'm excited you're looking at this, this will be a big improvement for
ActiveRecord.

Charlie

What about requiring optimistic locking for partial row updates? That would
seem to allow it to work without greatly contorting the rest of
ActiveRecord.

Yeah, if someone wanted to implement this kind of behaviour with a
plugin, they'd definitely need to be checking locks_optimistically? or
whatever that boolean is. Starting with Jeremy's dirty plugin, I
imagine it won't be a huge amount of work.