to_json performance

Originally posted on github, reported to the right place.

I would like to open a discussion about how `to_json` and `as_json` operates in Rails from a performance standpoint.

I'm using Rails 3.2 but this issue applies to almost all versions of Rails.

Our use case presents the challenge in sending out potentially large JSON (or XML, but we'll focus on JSON rendering here) bodies as a response of a API calls that runs through a normal Rails controller.

Someone states that if you are sending out more than 1MB of data "you're doing it wrong". I respectfully disagree, as there are different cases, based also on what an application should do and is designed for.

There can be few major bottlenecks in rendering JSON as fast as possibile from start to finish, three of them are:

* database query * AR objects instantiation * collection transformation in JSON

I'll exclude the AR timings, so let's tackle the first problem, AR objects.

Having a result set of 50k entries is obviously painful, and doing it with AR it will kill your instance, no questions asked. Worse, in my use case this result set can be asked frequently from the client. This however can be easily solved by using a sapient mixture of `caches_action`and `stale?`, although there may be few key issues when using memcached and friends, because especially in managed implementations, they may not support data keys larger than 1MB.

Since instantiating AR objects it's out of the question let's focus and getting what we really want for our tests, an 'attribute' Hash to convert to JSON. Valium helps a bit here (it should be in AR core in my opinion), it can selectively pick the columns you need and it will return back values in an Array per every object, so we can have something like this: `[['a', 'b', 'c'], ['a', 'c', 'd']]`

With little sorcery we can transform each Array representing the requested record values in a Hash with keys mapped, like so:

    class Hash       def self.transpose(keys, values)         self[*keys.zip(values).flatten]       end     end

    keys = %W{token artist album genre release_year title duration duration_in_seconds position play_count favorited disc_number}     my_hash = @playlist.media_files.ordered.values_of(*keys).collect{ | v> Hash.transpose(keys, v) }

Nice, now we have a Array of Hash without instantiating AR objects but with all the info we need about our objects. This operation proved to be _extremely_ fast even if we are transforming and inserting 50k records.

Now, on to the rendering:

    render json: my_hash

This should try to call `to_json`on the Hash.

Since few Rails versions JSON encoding has been delegated to `MultiJson`, which is ok. Needless to say I'm using and requiring `yajl-ruby`, so I'm pretty confident that MultiJson will pick it up as my engine, and in fact it is:

    1.9.3p0 :001 > MultiJson.engine      => MultiJson::Engines::Yajl

The key issue here is that with a large Hash the render action takes 36786ms to finish ( of which 8253.6ms spent for the database query, that high because of multiple ORDER BY, that's another matter).

So I tried another way:

    render text: Yajl::Encoder.encode(my_hash)

Using `Yajl::Encoder.encode`(or `MultiJson.encode`for that matter) yields another kind of result: 12614ms (8253.6ms always spent on the database query)

Is there something we can do about this?

Note: some may point out to try few gems like `rabl`or `acts_as_api`, but don't forget they are mainly for presentation purpose and they are comparable to the `to_json` performance in this regard.

Thanks for reading.

1\.9\.3p0 :001 > MultiJson\.engine
 => MultiJson::Engines::Yajl

The key issue here is that with a large Hash the render action takes 36786ms to finish ( of which 8253.6ms spent for the database query, that high because of multiple ORDER BY, that's another matter).

So I tried another way:

render text: Yajl::Encoder\.encode\(my\_hash\)

Using `Yajl::Encoder.encode`(or `MultiJson.encode`for that matter) yields another kind of result: 12614ms (8253.6ms always spent on the database query)

Is there something we can do about this?

Skimming through the code in encoding.rb (in active support) it looks like the to_json method rails adds ignores ActiveSupport::JSON.backend and just does the encoding itself. I can only imagine this is because for most people the bottleneck is parsing rather than generation.

Fred

"Valium helps a bit here (it should be in AR core in my opinion), it

can selectively pick the columns you need and it will return back

values in an Array per every object, so we can have something like

this: [['a', 'b', 'c'], ['a', 'c', 'd']] "

ActiveRecord now has ‘pluck’ which I think does the same.

http://guides.rubyonrails.org/active_record_querying.html#pluck

https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation/calculations.rb#L169