Patch Testers Sought

Hey guys,

I've posted a fairly large refactoring of rails' attribute related-methods to trac at

http://dev.rubyonrails.org/ticket/9241

As it makes a few non-minor changes to the core of activerecord, I thought I'd post it for review here. I'm especially interested in any breakage that you notice related to either the NoMethodError or @attributes_cache changes.

The patch itself isn't done yet, so don't worry about any 'peculiarities' of the implementation, what I'm concerned about is whether the subtly different semantics and caching introduce any bugs in your applications.

== description follows ==

This patch refactors the attribute-generation methods in ActiveRecord::Base in the following ways:

    * Move generation from instance methods to class methods     * In addition to accessors, generate mutators     * Always generate accessors and mutators, the overhead in development mode is minimal.

It includes one functional change, results of typecasted attributes are cached in a new hash @attributes_cache. The rationale for this change is that repeated access to columns requiring typecasting (such as Times) can be a significant overhead. Code like the following will repeatedly call the Time.parse functions.

  50.times { object.created_at }

This can contribute to performance degradation in datetime-heavy applications.

Finally the behaviour of the respond_to? and method_missing methods has changed when you use find with :select.

   # OLD    @customer = Customer.find(1, :select=>"first_name").    @customer.respond_to?("last_name") # => false    @customer.has_attribute?("last_name") # => false    @customer.last_name # raises NoMethodError

   # NEW    @customer = Customer.find(1, :select=>"first_name").    @customer.respond_to?("last_name") # => true    @customer.has_attribute?("last_name") # => false    @customer.last_name # raises MissingAttributeError

This change simplifies the code somewhat, and also makes it easier to track down bugs in this area. A simple hook is provided so plugin authors can experiment with lazily fetching the missing columns. While this could be useful for large TEXT or BLOB attributes, I don't intend to include functionality like that without battle-testing it in some way.

Hi Koz,

Your proposal looks sound to me, however I think it only goes half way. In my view there are two main issues with active record:

* The mixing of Ruby objects and raw database values (strings) in the attributes hash table

* The lack of proper column objects

For more details, I wrote up some of my thoughts at Making Rails Better – Fixing Architecture Flaws in Active Record | CFIS.

But in a nutshell, in think much of the grunginess of ActiveRecord could be eliminated by introducing a column object per data type. So FloatColumn, StringColumn, BooleanColumn, etc. Each column would provide an api for serializing/deserializing Ruby objects to and from the database's string representation. That would clean up ActiveRecord by removing a large number of case statements in the connection adapters and schema_definitions.rb. And more importantly, it would make it much easier for developers to add their own custom data types when needed. To see what I mean, try to add your own custom data type (say Postgresql PostGIS data types) to ActiveRecord and see how difficult it is (without of course modifying ActiveRecord itself).

Adding a column type directly influences your proposal, because the serialization methods would be used to convert to and from the attributes hash (sounds like you are using that for the raw value from the database) and the attributes_cache hash (which stores the serialized form as a Ruby object). So something like:

attributes_cache['foo'] = self.columns['foo'].serialize(attributes['foo'])

And of course vice-versa.

Last, it would be really useful to add a "changed" flag for each field in each record. Then when ActiveRecord creates an UPDATE query it would only modify the values that actually changed. Right now, ActiveRecord updates all columns, which turns out to be a pain because it has a tendency to botch values for columns that have a default expression or have a type that it doesn't know about (postgis fields for example).

So just my two cents - happy to discuss more if these seem like useful ideas.

Charlie

Your proposal looks sound to me, however I think it only goes half way. In my view there are two main issues with active record:

* The mixing of Ruby objects and raw database values (strings) in the attributes hash table

* The lack of proper column objects

The column objects we currently have are certainly anemic, and I agree it'd be nice if the type casting code was pushed out of Base and into the adapters. All things in good time, assuming the caching approach works, I'd definitely like to tidy the implementation a little more. I definitely like the general shape of:

attributes_cache['foo'] = self.columns['foo'].serialize(attributes['foo'])

So assuming we don't see any breakage, I'll investigate that kind of a change.

Last, it would be really useful to add a "changed" flag for each field in each record. Then when ActiveRecord creates an UPDATE query it would only modify the values that actually changed. Right now, ActiveRecord updates all columns, which turns out to be a pain because it has a tendency to botch values for columns that have a default expression or have a type that it doesn't know about (postgis fields for example).

We've discussed partial-row updates in the past here, and they're incompatible with model validations. Concurrent updates can each see a valid state, issue an update, and leave the record invalid. So it's unlikely we'll support them within the 2.0 timeframe, however it could definitely be put into a plugin, and I'm happy to roll in any hooks needed to simplify that.

As for change tracking, jeremy has had a plugin around for ages (http://code.bitsweat.net/svn/dirty/) which does the job nicely the few times I've used it. The fact it hasn't gathered a large number of users indicates to me that perhaps it's a bit of a niche feature.

Michael Koziarski wrote:


Your proposal looks sound to me, however I think it only goes half
way. In my view there are two main issues with active record:
* The mixing of Ruby objects and raw database values (strings) in the
attributes hash table
* The lack of proper column objects

 The column objects we currently have are certainly anemic, and I agree
it'd be nice if the type casting code was pushed out of Base and into
the adapters. All things in good time, assuming the caching approach
works, I'd definitely like to tidy the implementation a little more.
I definitely like the general shape of:
attributes_cache['foo'] = self.columns['foo'].serialize(attributes['foo'])
So assuming we don't see any breakage, I'll investigate that kind of a change.
Last, it would be really useful to add a "changed" flag for each field
in each record. Then when ActiveRecord creates an UPDATE query it
would only modify the values that actually changed. Right now,
ActiveRecord updates all columns, which turns out to be a pain because
it has a tendency to botch values for columns that have a default
expression or have a type that it doesn't know about (postgis fields
for example).
We've discussed partial-row updates in the past here, and they're
incompatible with model validations. Concurrent updates can each
see a valid state, issue an update, and leave the record invalid. So
it's unlikely we'll support them within the 2.0 timeframe, however it
could definitely be put into a plugin, and I'm happy to roll in any
hooks needed to simplify that.

What about requiring optimistic locking for partial row updates? That would seem to allow it to work without greatly contorting the rest of ActiveRecord.

As for change tracking, jeremy has had a plugin around for ages
() which does the job nicely the
few times I've used it. The fact it hasn't gathered a large number of
users indicates to me that perhaps it's a bit of a niche feature.

So just my two cents - happy to discuss more if these seem like useful
ideas.
Charlie

Jack

The column objects we currently have are certainly anemic, and I agree it'd be nice if the type casting code was pushed out of Base and into the adapters. All things in good time, assuming the caching approach works, I'd definitely like to tidy the implementation a little more. I definitely like the general shape of:

attributes_cache['foo'] = self.columns['foo'].serialize(attributes['foo'])

So assuming we don't see any breakage, I'll investigate that kind of a change.

Excellent - that would be great.

We've discussed partial-row updates in the past here, and they're incompatible with model validations. Concurrent updates can each see a valid state, issue an update, and leave the record invalid.

Hmm. Thinking out loud, the validations would only check values that have changed, so that seems ok. Then when you do an update, you'd send the key fields and the changed fields.

Now if 2 people tried to update the same record, and ActiveRecord was doing partial updates, then I could see how might end up in an invalid state on the database side. Of course that's easily solved on the database side - but there would be no way to solve it on the rails side.

could definitely be put into a plugin, and I'm happy to roll in any hooks needed to simplify that.

A possible hook would be a setter method you could override, or for a cleaner implementation, fire a changed event when a field is updated. Then you could roll your own "changed" hash table.

The fact it hasn't gathered a large number of users indicates to me that perhaps it's a bit of a niche feature.

Interesting - ok.

Anyway, I'd agree keeping track of changes is not nearly as important as separating out serialized/unserialized values in the attributes hash table and having a good column implementation.

I'm excited you're looking at this, this will be a big improvement for ActiveRecord.

Charlie

What about requiring optimistic locking for partial row updates? That would seem to allow it to work without greatly contorting the rest of ActiveRecord.

Yeah, if someone wanted to implement this kind of behaviour with a plugin, they'd definitely need to be checking locks_optimistically? or whatever that boolean is. Starting with Jeremy's dirty plugin, I imagine it won't be a huge amount of work.