Optimization: Marshal serialized attributes

Here's a ticket for a simple patch to use Marshal instead of YAML for attribute serialization. Marshaling is significantly faster (see in link), and fixes some YAML load issues (including an outstanding ticket).

http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/1191-marshal-serialized-attributes

Simple enough?

Stephen

Here's a ticket for a simple patch to use Marshal instead of YAML for attribute serialization. Marshaling is significantly faster (see in link), and fixes some YAML load issues (including an outstanding ticket).

#1191 Marshal serialized attributes - Ruby on Rails - rails

While the option is fair enough, I don't thing all existing apps
wouldn't want this turned on "silently": - if other people use your database yaml is ok as there are parsers
for it in many languages whereas Marshal would be a PITA - if your existing column is not a blob column (which it wouldn't have
to be previously since yaml generates plain text), the database will
throw a hissy fit (or just truncate the data) when you try to insert a
character that is not legal in the charset used. - should you be calling string_to_binary if the column supports it ?

Fred

To add additional point for -1: Marshal uses a binary format and does
not guarantee any compatibility with anything except the ruby on the
computer which created it. Marshal uses internal format versioning
numbers which do not correspond to ruby versions. This means your
database backups are potentially non-portable to other OS/ruby/ computer version. This in itself makes this option a non-starter, imho.

You could add the option, but it should never become a default. And
even an option should come with a big warning.

izidor

Yes; after Fred's post I did a bit more research and realize it's a big no for portable apps. Not an impossible hurdle to deal with, but not a desirable default. I also realized that AR in its current state has no simple interface for such an option: quoting's serialization is abstracted away from table and model information. While MySQL handled Marshal swimmingly in string columns, I don't think other adapters would agree.

I've tried to stay away from serialize where possible, but am dealing with it on a current project and noticed complaints from others regarding the speed at which large groups of objects with serialized attributes are instantiated from the database. YAML has been the culprit.

Stephen

There are also faster YAML dumpers out there. A recent Portland code sprint produced ZAML, a work-in-progress that offers (if I recall correctly) something like a 14x speed boost over vanilla YAML.dump

http://github.com/hallettj/zaml/tree/master/zaml.rb

YAML’s slowness has been a pain point, but we don’t have to sacrifice portability.

I did a quick benchmark of ZAML and found it to be slower:

http://pastie.org/288592

I did a quick benchmark of ZAML and found it to be slower:

http://pastie.org/288592

I swapped the IO for a StringIO and ZAML was then twice as fast:

       user system total real yaml 0.360000 0.010000 0.370000 ( 0.375689) zaml 0.150000 0.000000 0.150000 ( 0.154976)

making the dumped object a little less trivial (seems to me like the
initial test only really tests the overhead in getting things set up)
further increases the difference to_dump = ['foo', 'bar', 'baz', Time.now, {'key' => 'value', 'bar' =>
{1 => 'hello', 2 => 'world'}}]*5

yaml 9.260000 0.060000 9.320000 ( 9.413077) zaml 2.790000 0.010000 2.800000 ( 2.827708)