detecting width overflow in serialized column with mysql

So I've got an ActiveRecord model pointing to a MySQL db, with an auto-serialized column ("serialize :columnName").

Thing is, MySQL, depending on how it's configured (like, by default), has a bad habit of just truncating your data if it's too wide for the column, with no error raised. Yeah, I can probably reconfigure MySQL and/or my AR connection to it. But I'm wanting to distribute this code to all kinds of people who won't think of that or won't want to do that, I'd really like it to work with the default setup.

But obviously, if a yaml serialization gets truncated, it's not going to unserialize very well.

At the point of insert/update, is there any good way for me to detect that the serialization was too big for the column, even though MySQL isn't complaining?

Jonathan Rochkind wrote:

So I've got an ActiveRecord model pointing to a MySQL db, with an auto-serialized column ("serialize :columnName").

Thing is, MySQL, depending on how it's configured (like, by default), has a bad habit of just truncating your data if it's too wide for the column, with no error raised. Yeah, I can probably reconfigure MySQL and/or my AR connection to it. But I'm wanting to distribute this code to all kinds of people who won't think of that or won't want to do that, I'd really like it to work with the default setup.

But obviously, if a yaml serialization gets truncated, it's not going to unserialize very well.

At the point of insert/update, is there any good way for me to detect that the serialization was too big for the column, even though MySQL isn't complaining?

This may do it:

def validate    if columnName.to_yaml > column_for_attribute(:columnName).limit      errors.add(:columnName, 'columnName is too big to be serialized')    end end

It'd be good if something like this was added automatically by serialize.

Interesting, thanks, lots of tricks there I wouldn'thave figured out for myself. But am I being really inefficient now by calling to_yaml an extra time? I'll call it once to validate, and then throw away the results. Then serialize will call it again when it actually wants to save. Calling it twice for no reason. But I guess to_yaml shouldn't be _that_ expensive?

Now I wonder if I should abandon the automated serialization altogether, and just handle it myself manually through callbacks. Or maybe I should try monkey-patching the auto-serialization stuff to perform this check itself?

Jonathan

Mark Reginald James wrote:

Jonathan Rochkind wrote:

Interesting, thanks, lots of tricks there I wouldn'thave figured out for myself. But am I being really inefficient now by calling to_yaml an extra time? I'll call it once to validate, and then throw away the results. Then serialize will call it again when it actually wants to save. Calling it twice for no reason. But I guess to_yaml shouldn't be _that_ expensive?

Now I wonder if I should abandon the automated serialization altogether, and just handle it myself manually through callbacks. Or maybe I should try monkey-patching the auto-serialization stuff to perform this check itself?

Yeah, the double yaml is a waste, but don't worry about it unless this is being done sufficiently often to slow down your app in a way that generates more work or cost.

See if you can come up for a patch for Rails that automatically adds a such a validation for serialized attributes, where this validation method and the attributes_with_quotes method that generates the DB request both draw from a cache of YAMLized attribute values.

Here's a different (easier :slight_smile: ) tack to monkey patching AR to solve this problem efficiently. Turns out the actual serialization is done in quote_value (and interestingly, done whether or not the column is declared serialized; the serialized decleration only effects de-serialization). Does anyone think this following approach is a good idea?

Does anyone think it's a good enough idea I should submit it as a patch to AR? I've never submitted a patch before, not sure how it's done. I'm still using Rails 1.8.x, not sure if this is still a problem in Rails 2, so that might make it hard to submit a patch.

class ActiveRecord::Base     alias :old_quote_value :quote_value     def quote_value(value, column = nil)         if column && serialized_attributes[column.name]             serialization = value.to_yaml             raise new ActiveRecord::StatementInvalid("Can not serialize column #{column.name}, length #{serialization.length} is greater than column limit of #{column.limit}") if if serialization.length > column.limit             "#{connection.quoted_string_prefix}'#{connection.quote_string(serialization)}'"         else             old_quote_value         end     end end

Mark Reginald James wrote:

Jonathan Rochkind wrote:

Here's a different (easier :slight_smile: ) tack to monkey patching AR to solve this problem efficiently. Turns out the actual serialization is done in quote_value (and interestingly, done whether or not the column is declared serialized; the serialized decleration only effects de-serialization). Does anyone think this following approach is a good idea?

Does anyone think it's a good enough idea I should submit it as a patch to AR? I've never submitted a patch before, not sure how it's done. I'm still using Rails 1.8.x, not sure if this is still a problem in Rails 2, so that might make it hard to submit a patch.

class ActiveRecord::Base     alias :old_quote_value :quote_value     def quote_value(value, column = nil)         if column && serialized_attributes[column.name]             serialization = value.to_yaml             raise new ActiveRecord::StatementInvalid("Can not serialize column #{column.name}, length #{serialization.length} is greater than column limit of #{column.limit}") if if serialization.length > column.limit             "#{connection.quoted_string_prefix}'#{connection.quote_string(serialization)}'"         else             old_quote_value         end     end end

Jonathan, that's a fine solution if a database exception is a better way to handle it than an automatic validation. The validation route may however be more compatible with the current save! idiom.

Also note that ActiveRecord trunk does the YAML conversion in the attributes_with_quotes method instead.

Here's a different (easier :slight_smile: ) tack to monkey patching AR to solve
this problem efficiently. Turns out the actual serialization is done in quote_value (and interestingly, done whether or not the column is declared serialized; the serialized decleration only effects de-serialization). Does anyone think this following approach is a good idea?

Does anyone think it's a good enough idea I should submit it as a
patch to AR? I've never submitted a patch before, not sure how it's done.
I'm still using Rails 1.8.x, not sure if this is still a problem in
Rails 2, so that might make it hard to submit a patch.

You're almost certainly not using Rails 1.8.x because that version has
never existed (but you are probably running ruby 1.8.x (where x is
probably 6)) Patches can be submitted at rails.lighthouseapp.com and/or discussed
on the rubyonrails-core google group mailing list

Fred