ParamsParser and request body streams

Hey everyone,
I was wondering what the possability was of changing these lines:

body = request.raw_post
Hash.from_xml(body)
or
data = ActiveSupport::JSON.decode(body)

to instead be handed the env['rack.input'] IO itself?

Being as though both calls (from_xml and decode) are handled by rails
elsewhere, we could move the responsibility of parsing of the request
body IO to the swappable XML/JSON backends.

I'm asking because I'm looking to integrate yajl-ruby as one of the
JSON backends for ActiveSupport in Rails 3. It's capable of parsing
JSON right off the IO as a stream, and this change would allow it to
perform it's best (and keep memory usage very low for large request
bodies).

As for doing it for the XML backends as well, it opens up the
possibility of doing the type of stream parsing for XML.

Comments?

Comments?

This sounds like a great idea to me, without much in the way of downsides.

So give it a go and see what breaks.

Check out how the current ParamsParser middleware works.

You can use whatever parser to extract the request params in your
middleware and Rails will be fooled into using that instead.

The problem is that the ParamsParser reads the entire body into a
string (from what I can tell by the line "body = request.raw_post")
before handing it to the parsing backend. My suggestion is that it
hand the IO object (env['rack.input'] I assume?) to the parser
instead. This way the parser can either read the entire string, then
parse - OR if it supports parsing as a stream, start doing so directly
off the IO.

Ah, sure. I was thinking about something different.

Yeah, sure we could just hand off the raw IO object to the parser instead.

I guess the real patch would allow "ActiveSupport::JSON.decode" to
accept an IO object as well. Then we could just pass that directly in.

I guess the real patch would allow "ActiveSupport::JSON.decode" to
accept an IO object as well. Then we could just pass that directly in.

yeah, ideally both the JSON and XML parsers would accept an IO, and
the 'read into a string' logic would live in the implementations which
don't support streaming.

Exactly.

What's the next step here?
I can fork Rails and work on a patch, though I'm unsure how many/which
tests will need to be refactored.

I don’t think existing tests need changes. New tests have to be added:

  1. that JSON parser works with an IO;
  2. that XML parser works with an IO;
  3. that the ParamsParser middleware doesn’t read the entire stream into a string itself.

Ok I forked and patched the ParamsParser to just pass request.body to
the parsers. I did it for JSON, XML and YAML (using YAML.load_stream
instead of just load).
I also patched the XmlMini and JSON decoders for parsing from an IO,
in addition to a string. And as a result (like you said), I didn't
have to refactor any tests. Just added the ones regarding parsing from
an IO.
The one test I haven't figured out how to write, is the "that the
ParamsParser middleware doesn't read the entire stream into a string
itself." test. I can imagine how I might do it using rspec/mocha but
no idea using Test::Unit.

Here's the commit:
http://github.com/brianmario/rails/commit/c63703489eb1f3f4dd96885e1e223126e5208638

Feedback?

Also, should I make a lighthouse ticket for this?

-Brian

Yeah, create a LH patch and assign it to me plz

Done: https://rails.lighthouseapp.com/projects/8994/tickets/2659-paramsparser-and-request-body-streams#ticket-2659-3

Let me know if you need anything else.

-Brian

Thanks for committing that, but what was the reason you removed the
yajl JSON backend and updated JSON test?
I should have noted somewhere that the JSON test and yajl.rb backend
were originally written by Rick Olson, and I made modifications to
support the ability to be passed an IO.

-Brian

I've updated the LH ticket with another patch - finishing off the
changes to actually parse from the IO (instead of converting to a
string first) for the rexml, libxml and nokogiri backends.
Should this be in another ticket?

-Brian