ParamsParser and request body streams

Hey everyone, I was wondering what the possability was of changing these lines:

body = request.raw_post Hash.from_xml(body) or data = ActiveSupport::JSON.decode(body)

to instead be handed the env['rack.input'] IO itself?

Being as though both calls (from_xml and decode) are handled by rails elsewhere, we could move the responsibility of parsing of the request body IO to the swappable XML/JSON backends.

I'm asking because I'm looking to integrate yajl-ruby as one of the JSON backends for ActiveSupport in Rails 3. It's capable of parsing JSON right off the IO as a stream, and this change would allow it to perform it's best (and keep memory usage very low for large request bodies).

As for doing it for the XML backends as well, it opens up the possibility of doing the type of stream parsing for XML.

Comments?

Comments?

This sounds like a great idea to me, without much in the way of downsides.

So give it a go and see what breaks.

Check out how the current ParamsParser middleware works.

You can use whatever parser to extract the request params in your middleware and Rails will be fooled into using that instead.

The problem is that the ParamsParser reads the entire body into a string (from what I can tell by the line "body = request.raw_post") before handing it to the parsing backend. My suggestion is that it hand the IO object (env['rack.input'] I assume?) to the parser instead. This way the parser can either read the entire string, then parse - OR if it supports parsing as a stream, start doing so directly off the IO.

Ah, sure. I was thinking about something different.

Yeah, sure we could just hand off the raw IO object to the parser instead.

I guess the real patch would allow "ActiveSupport::JSON.decode" to accept an IO object as well. Then we could just pass that directly in.

I guess the real patch would allow "ActiveSupport::JSON.decode" to accept an IO object as well. Then we could just pass that directly in.

yeah, ideally both the JSON and XML parsers would accept an IO, and the 'read into a string' logic would live in the implementations which don't support streaming.

Exactly.

What's the next step here? I can fork Rails and work on a patch, though I'm unsure how many/which tests will need to be refactored.

I don’t think existing tests need changes. New tests have to be added:

  1. that JSON parser works with an IO;
  2. that XML parser works with an IO;
  3. that the ParamsParser middleware doesn’t read the entire stream into a string itself.

Ok I forked and patched the ParamsParser to just pass request.body to the parsers. I did it for JSON, XML and YAML (using YAML.load_stream instead of just load). I also patched the XmlMini and JSON decoders for parsing from an IO, in addition to a string. And as a result (like you said), I didn't have to refactor any tests. Just added the ones regarding parsing from an IO. The one test I haven't figured out how to write, is the "that the ParamsParser middleware doesn't read the entire stream into a string itself." test. I can imagine how I might do it using rspec/mocha but no idea using Test::Unit.

Here's the commit: http://github.com/brianmario/rails/commit/c63703489eb1f3f4dd96885e1e223126e5208638

Feedback?

Also, should I make a lighthouse ticket for this?

-Brian

Yeah, create a LH patch and assign it to me plz

Done: #2659 ParamsParser and request body streams - Ruby on Rails - rails

Let me know if you need anything else.

-Brian

Thanks for committing that, but what was the reason you removed the yajl JSON backend and updated JSON test? I should have noted somewhere that the JSON test and yajl.rb backend were originally written by Rick Olson, and I made modifications to support the ability to be passed an IO.

-Brian

I've updated the LH ticket with another patch - finishing off the changes to actually parse from the IO (instead of converting to a string first) for the rexml, libxml and nokogiri backends. Should this be in another ticket?

-Brian