Bug or misunderstanding with serialize?

Hi, I get a strange bug when using "serialize" in a model. Here's an example.

1. creating a table

  create_table "objs" do |t|      t.column "a", :string      t.column "b", :string   end

2. code using the table

  class Obj < ActiveRecord::Base     serialize :b   end

  x = Obj.new   x.a = "foo #f"   puts x.a # prints 'foo #f'   x.b = "foo #f"   puts x.b # prints 'foo'

It seems like strings containing # aren't serialized correctly? Everything after # seems to be ignored, as if the string was ruby code. Any comments on this?

(using activerecord-1.14.4, ruby 1.8.4) Have a nice day, T

At first glance, what you're doing there doesn't appear to make sense.

You're supposed to use ActiveRecord::Base.serialize to save arbitrary Ruby instances to the server. If your aim is however to save strings, you shouldn't serialize but simply use the default accessor, i.e. just drop the "serialize :b" in your above paste.

Moreover, even if you wanted to serialize a String instance (which you have no reason to, as it is automatically treated correctly), you should have specified the String class so:

serialize :b, String

-Chris

Looks like serialize assumes you'll be serializing things other than strings (after all, they're already serialized, aren't they?).

You should use the serialized column as you would any other data structure EXCEPT for strings.

eg,

x.b = { :this => 'should', :come => 'out', :as => 'a hash' } p x.b # => {:come=>"out", :as=>"a hash", :this=>"should"} x.save

y = Obj.find(x.id) p y.b # => {:come=>"out", :as=>"a hash", :this=>"should"}

I am using serialize to save arbitrary Ruby instances. String is one of them. In some other cases I store a Hash in the same field.

I get the same results in my example, even when I specify the String class.

  serialize :b, String

T

The bug seems to be that "serialize" doesn't serialize strings, but it _deserializes_ them. It can be examplified with:

YAML.load "foo #f".to_yaml => "foo #f"

versus

YAML.load "foo #f" => "foo"

T

'The bug seems to be that "serialize" doesn't serialize strings, but it _deserializes_ them.'

ActiveRecord::Base.serialize wraps the column accessor with a method that pulls the string from the column in the database and deserialzies it.

I.e. any string that you store in a serialize'd column would by design deserialized.

This makes sense as people were not expected to use serialize for Strings.

-Chris

I am using serialize to save arbitrary Ruby instances. String is one of them. In some other cases I store a Hash in the same field.

You'll have to overwrite the default accessor then:

class Obj   serialize :b   def b=(s)     self[:b] = s.is_a?(String) ? s.to_yaml : s   end end

I agree with you that it is sorta strange that serialize doesn't support your case right out of the box. I see no reason for it not to... Maybe you can file a bug on dev.rubyonrails.org?

If you have a lot of fields, you can patch ActiveRecord::Base directly:

class ActiveRecord::Base   private     alias_method :ofy, :object_from_yaml     def object_from_yaml(str)        obj = ofy(str)        obj.is_a?(String) ? str : obj     end end

It sort of defeats the Ruby Way if we need to keep track of what objects we want to store in a general purpose field and do different things depending on the object type. Or am I missing something?

I understand that for a normal String field it would be insane to serialize to and from yaml but in a more general case where the field is a container for something that *sometimes* happens to be a String and sometimes not you would think it would be better to handle it uniformly so you don't break those cases where it actually is a String. Having a switch on the type feels kind of ugly and definitely isn't especially rubyesque.

/M

I.e. any string that you store in a serialize'd column would by design deserialized.

This doesn't really make sense. If I have a ruby object that happens to be a String, I'd expect it to be serialized and deserialized without being mangled in any way.

eg,

b = Obj.new myString = String.new("# of new ruby users") b.b = myString b.b

=> false

What? Why did my String get turned into a FalseClass? Even worse:

b.b = "a: *b" b.b

=> {"a"=>#<YAML::Syck::BadAlias:0xb735dab4 @name="b">}

You only get the right string back if YAML::load() fails:

b.b = "a: *b b #f" b.b

=> "a: *b b #f"

Those are some pretty weird semantics... Why can't we operate under the assumption that serialize allows you to store any ruby object (including strings).

eden li wrote:

This doesn't really make sense. If I have a ruby object that happens to be a String, I'd expect it to be serialized and deserialized without being mangled in any way.

I'm not shy to designate Rails' shortcomings as such. This however isn't one of them.

The working assumption of ActiveRecord::Base.serialize is that it would be used it to persist instances of arbitrary classes _that are not_ the basic datatypes that are automagically typcasted. Frankly, it seems strange to me that anyone would want to ActiveRecord::Base.serialize a String. The only remotely reasonable reason to do it is if that particular column might contain *any* kind of object, and sometimes Strings. And I have to admit this sounds slightly insane, and if you're doing that kind of crazy stuff you should expect extra effort. With such an expectation, the necessary monkey-patch to ActiveRecord::Base.serialize would seem pretty easy.

-Chris

Chris Pearl wrote:

The working assumption of ActiveRecord::Base.serialize is that it would be used it to persist instances of arbitrary classes _that are not_ the basic datatypes that are automagically typcasted. Frankly, it seems strange to me that anyone would want to ActiveRecord::Base.serialize a String. The only remotely reasonable reason to do it is if that particular column might contain *any* kind of object, and sometimes Strings. And I have to admit this sounds slightly insane, and if you're doing that kind of crazy stuff you should expect extra effort. With such an expectation, the necessary monkey-patch to ActiveRecord::Base.serialize would seem pretty easy.

I would agree with all of this if the second argument (class_name) to serialize was mandatory. I think the can of worms is already open, and I don't quite understand why ONLY String gets excluded from being stored in a column that can store pretty much anything else.

Great, thanks for the answer! I just figured out though that the first solution only works until you look at the value, since ActiveRecord then stores the unyamled expression in the variable again.

The second solution works until you try to store a string looking like "foo : bar" since YAML will parse this as a Hash.

My current ugly solution is:

class ActiveRecord::Base   class << self     def serialize(attr_name, class_name = Object)       class_eval <<-EOS         def #{attr_name}=(s)           self[:#{attr_name}] = s.to_yaml         end         def #{attr_name}           YAML.load self[:#{attr_name}]         end       EOS     end   end end

I haven't taken the time to figure out where ActiveRecord serializes the data, but that's probably where the code should be changed.

Regards, T

eden li wrote:

Chris Pearl wrote: > The working assumption of ActiveRecord::Base.serialize is that it would > be used it to persist instances of arbitrary classes _that are not_ the > basic datatypes that are automagically typcasted. Frankly, it seems > strange to me that anyone would want to ActiveRecord::Base.serialize a > String. The only remotely reasonable reason to do it is if that > particular column might contain *any* kind of object, and sometimes > Strings. And I have to admit this sounds slightly insane, and if you're > doing that kind of crazy stuff you should expect extra effort. With > such an expectation, the necessary monkey-patch to > ActiveRecord::Base.serialize would seem pretty easy.

I would agree with all of this if the second argument (class_name) to serialize was mandatory. I think the can of worms is already open, and I don't quite understand why ONLY String gets excluded from being stored in a column that can store pretty much anything else.

I don't think the the second argument being mandatory has much to do with what I said. Though personally, I agree the default (Object) is kind of silly - when was the last time anyone used direct Object instances as persistence-worthy business objects?!

The problem with String is definitely an implementation limitation, not a design decision. But it's a reasonable limitation to have, for the above stated reasons. If a lot of people would need to serialize Strings, I suppose the Rails Core developers would bother to write the extra code to enable it.

-Chris

I haven't taken the time to figure out where ActiveRecord serializes the data, but that's probably where the code should be changed.

It looks like the assumption that you're not serializing a string runs pretty deep. The yamling isn't happening all the way until AR gets ready to save the field and quotes it for the database in ActiveRecord::ConnectionAdapters::Quoting#quote. At that point you probably can't know if the field is supposed to be serialized or not...

Your method works except that it thwarts the class_name check, but that's probably OK given the argument that Chris lays out in other posts on this thread.