Optimization: Marshal serialized attributes

Here's a ticket for a simple patch to use Marshal instead of YAML for
attribute serialization. Marshaling is significantly faster (see in
link), and fixes some YAML load issues (including an outstanding
ticket).

http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/1191-marshal-serialized-attributes

Simple enough?

Stephen

Here's a ticket for a simple patch to use Marshal instead of YAML for
attribute serialization. Marshaling is significantly faster (see in
link), and fixes some YAML load issues (including an outstanding
ticket).

http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/1191-marshal-serialized-attributes

While the option is fair enough, I don't thing all existing apps
wouldn't want this turned on "silently":
- if other people use your database yaml is ok as there are parsers
for it in many languages whereas Marshal would be a PITA
- if your existing column is not a blob column (which it wouldn't have
to be previously since yaml generates plain text), the database will
throw a hissy fit (or just truncate the data) when you try to insert a
character that is not legal in the charset used.
- should you be calling string_to_binary if the column supports it ?

Fred

To add additional point for -1: Marshal uses a binary format and does
not guarantee any compatibility with anything except the ruby on the
computer which created it. Marshal uses internal format versioning
numbers which do not correspond to ruby versions. This means your
database backups are potentially non-portable to other OS/ruby/
computer version. This in itself makes this option a non-starter, imho.

You could add the option, but it should never become a default. And
even an option should come with a big warning.

izidor

Yes; after Fred's post I did a bit more research and realize it's a
big no for portable apps. Not an impossible hurdle to deal with, but
not a desirable default. I also realized that AR in its current state
has no simple interface for such an option: quoting's serialization is
abstracted away from table and model information. While MySQL handled
Marshal swimmingly in string columns, I don't think other adapters
would agree.

I've tried to stay away from serialize where possible, but am dealing
with it on a current project and noticed complaints from others
regarding the speed at which large groups of objects with serialized
attributes are instantiated from the database. YAML has been the
culprit.

Stephen

There are also faster YAML dumpers out there. A recent Portland code sprint produced ZAML, a work-in-progress that offers (if I recall correctly) something like a 14x speed boost over vanilla YAML.dump

http://github.com/hallettj/zaml/tree/master/zaml.rb

YAML’s slowness has been a pain point, but we don’t have to sacrifice portability.

I did a quick benchmark of ZAML and found it to be slower:

http://pastie.org/288592

I did a quick benchmark of ZAML and found it to be slower:

http://pastie.org/288592

I swapped the IO for a StringIO and ZAML was then twice as fast:

       user system total real
yaml 0.360000 0.010000 0.370000 ( 0.375689)
zaml 0.150000 0.000000 0.150000 ( 0.154976)

making the dumped object a little less trivial (seems to me like the
initial test only really tests the overhead in getting things set up)
further increases the difference
to_dump = ['foo', 'bar', 'baz', Time.now, {'key' => 'value', 'bar' =>
{1 => 'hello', 2 => 'world'}}]*5

yaml 9.260000 0.060000 9.320000 ( 9.413077)
zaml 2.790000 0.010000 2.800000 ( 2.827708)