Bug or misunderstanding with serialize?

Hi, I get a strange bug when using "serialize" in a model. Here's an
example.

1. creating a table

  create_table "objs" do |t|
     t.column "a", :string
     t.column "b", :string
  end

2. code using the table

  class Obj < ActiveRecord::Base
    serialize :b
  end

  x = Obj.new
  x.a = "foo #f"
  puts x.a # prints 'foo #f'
  x.b = "foo #f"
  puts x.b # prints 'foo'

It seems like strings containing # aren't serialized correctly?
Everything after # seems to be ignored, as if the string was ruby code.
Any comments on this?

(using activerecord-1.14.4, ruby 1.8.4)
Have a nice day,
T

At first glance, what you're doing there doesn't appear to make sense.

You're supposed to use ActiveRecord::Base.serialize to save arbitrary
Ruby instances to the server. If your aim is however to save strings,
you shouldn't serialize but simply use the default accessor, i.e. just
drop the "serialize :b" in your above paste.

Moreover, even if you wanted to serialize a String instance (which you
have no reason to, as it is automatically treated correctly), you
should have specified the String class so:

serialize :b, String

-Chris

Looks like serialize assumes you'll be serializing things other than
strings (after all, they're already serialized, aren't they?).

You should use the serialized column as you would any other data
structure EXCEPT for strings.

eg,

x.b = { :this => 'should', :come => 'out', :as => 'a hash' }
p x.b # => {:come=>"out", :as=>"a hash", :this=>"should"}
x.save

y = Obj.find(x.id)
p y.b # => {:come=>"out", :as=>"a hash", :this=>"should"}

I am using serialize to save arbitrary Ruby instances. String is one of
them. In some other cases I store a Hash in the same field.

I get the same results in my example, even when I specify the String
class.

  serialize :b, String

T

The bug seems to be that "serialize" doesn't serialize strings, but it
_deserializes_ them. It can be examplified with:

YAML.load "foo #f".to_yaml
=> "foo #f"

versus

YAML.load "foo #f"
=> "foo"

T

'The bug seems to be that "serialize" doesn't serialize strings, but it
_deserializes_ them.'

ActiveRecord::Base.serialize wraps the column accessor with a method
that pulls the string from the column in the database and deserialzies
it.

I.e. any string that you store in a serialize'd column would by design
deserialized.

This makes sense as people were not expected to use serialize for
Strings.

-Chris

I am using serialize to save arbitrary Ruby instances. String is one of
them. In some other cases I store a Hash in the same field.

You'll have to overwrite the default accessor then:

class Obj
  serialize :b
  def b=(s)
    self[:b] = s.is_a?(String) ? s.to_yaml : s
  end
end

I agree with you that it is sorta strange that serialize doesn't
support your case right out of the box. I see no reason for it not
to... Maybe you can file a bug on dev.rubyonrails.org?

If you have a lot of fields, you can patch ActiveRecord::Base directly:

class ActiveRecord::Base
  private
    alias_method :ofy, :object_from_yaml
    def object_from_yaml(str)
       obj = ofy(str)
       obj.is_a?(String) ? str : obj
    end
end

It sort of defeats the Ruby Way if we need to keep track of what
objects we want to store in a general purpose field and do different
things depending on the object type. Or am I missing something?

I understand that for a normal String field it would be insane to
serialize to and from yaml but in a more general case where the field
is a container for something that *sometimes* happens to be a String
and sometimes not you would think it would be better to handle it
uniformly so you don't break those cases where it actually is a
String. Having a switch on the type feels kind of ugly and definitely
isn't especially rubyesque.

/M

I.e. any string that you store in a serialize'd column would by design
deserialized.

This doesn't really make sense. If I have a ruby object that happens
to be a String, I'd expect it to be serialized and deserialized without
being mangled in any way.

eg,

b = Obj.new
myString = String.new("# of new ruby users")
b.b = myString
b.b

=> false

What? Why did my String get turned into a FalseClass? Even worse:

b.b = "a: *b"
b.b

=> {"a"=>#<YAML::Syck::BadAlias:0xb735dab4 @name="b">}

You only get the right string back if YAML::load() fails:

b.b = "a: *b b #f"
b.b

=> "a: *b b #f"

Those are some pretty weird semantics... Why can't we operate under the
assumption that serialize allows you to store any ruby object
(including strings).

eden li wrote:

This doesn't really make sense. If I have a ruby object that happens
to be a String, I'd expect it to be serialized and deserialized without
being mangled in any way.

I'm not shy to designate Rails' shortcomings as such. This however
isn't one of them.

The working assumption of ActiveRecord::Base.serialize is that it would
be used it to persist instances of arbitrary classes _that are not_ the
basic datatypes that are automagically typcasted. Frankly, it seems
strange to me that anyone would want to ActiveRecord::Base.serialize a
String. The only remotely reasonable reason to do it is if that
particular column might contain *any* kind of object, and sometimes
Strings. And I have to admit this sounds slightly insane, and if you're
doing that kind of crazy stuff you should expect extra effort. With
such an expectation, the necessary monkey-patch to
ActiveRecord::Base.serialize would seem pretty easy.

-Chris

Chris Pearl wrote:

The working assumption of ActiveRecord::Base.serialize is that it would
be used it to persist instances of arbitrary classes _that are not_ the
basic datatypes that are automagically typcasted. Frankly, it seems
strange to me that anyone would want to ActiveRecord::Base.serialize a
String. The only remotely reasonable reason to do it is if that
particular column might contain *any* kind of object, and sometimes
Strings. And I have to admit this sounds slightly insane, and if you're
doing that kind of crazy stuff you should expect extra effort. With
such an expectation, the necessary monkey-patch to
ActiveRecord::Base.serialize would seem pretty easy.

I would agree with all of this if the second argument (class_name) to
serialize was mandatory. I think the can of worms is already open, and
I don't quite understand why ONLY String gets excluded from being
stored in a column that can store pretty much anything else.

Great, thanks for the answer! I just figured out though that the first
solution only works until you look at the value, since ActiveRecord
then stores the unyamled expression in the variable again.

The second solution works until you try to store a string looking like
"foo : bar" since YAML will parse this as a Hash.

My current ugly solution is:

class ActiveRecord::Base
  class << self
    def serialize(attr_name, class_name = Object)
      class_eval <<-EOS
        def #{attr_name}=(s)
          self[:#{attr_name}] = s.to_yaml
        end
        def #{attr_name}
          YAML.load self[:#{attr_name}]
        end
      EOS
    end
  end
end

I haven't taken the time to figure out where ActiveRecord serializes
the data, but that's probably where the code should be changed.

Regards,
T

eden li wrote:

Chris Pearl wrote:
> The working assumption of ActiveRecord::Base.serialize is that it would
> be used it to persist instances of arbitrary classes _that are not_ the
> basic datatypes that are automagically typcasted. Frankly, it seems
> strange to me that anyone would want to ActiveRecord::Base.serialize a
> String. The only remotely reasonable reason to do it is if that
> particular column might contain *any* kind of object, and sometimes
> Strings. And I have to admit this sounds slightly insane, and if you're
> doing that kind of crazy stuff you should expect extra effort. With
> such an expectation, the necessary monkey-patch to
> ActiveRecord::Base.serialize would seem pretty easy.

I would agree with all of this if the second argument (class_name) to
serialize was mandatory. I think the can of worms is already open, and
I don't quite understand why ONLY String gets excluded from being
stored in a column that can store pretty much anything else.

I don't think the the second argument being mandatory has much to do
with what I said. Though personally, I agree the default (Object) is
kind of silly - when was the last time anyone used direct Object
instances as persistence-worthy business objects?!

The problem with String is definitely an implementation limitation, not
a design decision. But it's a reasonable limitation to have, for the
above stated reasons. If a lot of people would need to serialize
Strings, I suppose the Rails Core developers would bother to write the
extra code to enable it.

-Chris

I haven't taken the time to figure out where ActiveRecord serializes
the data, but that's probably where the code should be changed.

It looks like the assumption that you're not serializing a string runs
pretty deep. The yamling isn't happening all the way until AR gets
ready to save the field and quotes it for the database in
ActiveRecord::ConnectionAdapters::Quoting#quote. At that point you
probably can't know if the field is supposed to be serialized or not...

Your method works except that it thwarts the class_name check, but
that's probably OK given the argument that Chris lays out in other
posts on this thread.