streaming a large XML file; optimizing large file downloads in RAILS

I occasionally need to stream a large XML data file that represents key data in a DB. I'm porting over an application from PHP Symfony, and with my initial implementation, it takes around 7 times as long with rails. Also with Symfony, data begins to download almost as soon as I invoke the URL, whereas with rails, all data is processed on the server side before the client gets the first byte. I have a hand- crafted query to hit the DB, and use fetch_hash to use the raw data from the mysql gem and that renders extremely quickly. Also I've tried to write a tiny subset of XML while reading the entire resultSet; with that I get much faster performance, but of course that way the XML doesn't come.

I spent most of the past weekend trying to determine how to optimize this (hoping to do at least as well as PHP symfony) but can't do it.

I tried: - used render :text => (lambda do |response, output| ... ) - ruby 1.8.7 vs. ruby 1.9.2 - rails 2.3.5 vs. rails 3 - XmlBuilder vs. Nokogiri::XML::Builder - HAML vs ERB - passenger vs. script/server Nothing honestly moved the performance needle in a serious way.

I've finally come to the conclusion that rails does not stream out as I'd expect. Here's a look at the perf stats rendered as the request runs:

Rendered hgrants/_request_detail (2.2ms) Rendered hgrants/_request_detail (3.9ms) Rendered hgrants/_request_detail (2.4ms) Rendered hgrants/_request_detail (2.3ms) Rendered hgrants/_request_detail (242.7ms) Rendered hgrants/_request_detail (2.2ms) Rendered hgrants/_request_detail (1.9ms) Rendered hgrants/_request_detail (1.8ms)

We went from an average 2ms up to 242ms then back down. I saw this sporadically throughout the 1000 template renderings That suggests to me that memory is getting garbage collected. Also, I'm invoking the request from curl, and it reports no data downloaded until after my logfile tells me rails has finished processing all records in the view. The model IDs that result in the over-sized ms count vary from one request to another, so I'm convinced there is nothing in the app that is doing this. I even tested this by removing the call to the HAML template and replacing it with a block of generated text and observed similar behavior.

This is how I'm invoking HAML from the XML Builder template:         xml << render(:partial => 'hgrants/ request_detail.html.haml', :locals => { :model => model })

I also tried using this trick to try to get it to stream, but I observed exactly the same behavior; no data showed up in curl until all records had been processed.      render :text => (lambda do |response, output|          extend ApplicationHelper

         xml = Builder::XmlMarkup.new(            :target => StreamingOutputWrapper.new(output),            :indent => 2)         eval(default_template.source, binding, default_template.path)      end)

(Also, in rails 3, the render :text with a Proc, rails 3 renders the Proc as a to_str rather than calling it.)

This particular issue I can certainly work around but it's disappointing if it's true that there's no way in rails to stream output to the browser for large pages. And particularly disappointing if PHP/Symfony can outgun rails for streaming. I've been using rails since 2006 and most requests have fairly small responses so maybe the answer is to defer to a different technology for streaming larger files. But it seems like there should be a good solution for streaming data and flushing the output stream.

Any help is greatly appreciated! Eric

I occasionally need to stream a large XML data file that represents key data in a DB. I'm porting over an application from PHP Symfony, and with my initial implementation, it takes around 7 times as long with rails.

[...]

I've finally come to the conclusion that rails does not stream out as I'd expect.

[...]

Have you tried send_data?

Alternatively, how does Symfony do its streaming? Can you write something equivalent for Rails?

Best,

I occasionally need to stream a large XML data file that represents key data in a DB. I'm porting over an application from PHP Symfony, and with my initial implementation, it takes around 7 times as long with rails.

[...]

I've finally come to the conclusion that rails does not stream out as I'd expect.

[...]

Have you tried send_data? I think that's what most people use to stream dynamic content.

Alternatively, how does Symfony do its streaming? Can you write something equivalent for Rails?

Best,

Nothing honestly moved the performance needle in a serious way.

I've finally come to the conclusion that rails does not stream out as I'd expect. Here's a look at the perf stats rendered as the request runs:

it doesn't. Rails 3.1 will change some of that apparently (http:// yehudakatz.com/2010/09/07/automatic-flushing-the-rails-3-1-plan/)

If you drop down to the rack level (ie write this as a rails metal) you should be able to stream responses - the rack body response can be any thing that responds to each. and rack will keep calling that each method until you're done.

.The docs also say that render :text => lambda { ...} allows streaming but with various conflicting opinions form actual users (I've never tried that). This may also depend on the server (mongrel, thin etc) you use - it's no good you streaming data to rack if the next person down the chain sits on it until is done

Fred

Hi Fred-

What you're saying makes a lot of sense. As your automatic-flushing- the-rails-3-1-plan article relates, for most rails interactions it's difficult to stream because of all the evaluation that needs to occur. Larger file downloads really are a special case. Using rails metal to respond seems logical.

When I get a moment I'll create a brand new rails app and see if I can get rails to stream as I'd expect; perhaps there is something in rack that is preventing the streaming.

In rails 3, the render :text => lambda { ... } is definitely broken.

Thanks for the help!

--> Eric

I occasionally need to stream a large XML data file that represents key data in a DB. I'm porting over an application from PHP Symfony,

[...]

This particular issue I can certainly work around but it's disappointing if it's true that there's no way in rails to stream output to the browser for large pages. And particularly disappointing if PHP/Symfony can outgun rails for streaming. I've been using rails since 2006 and most requests have fairly small responses so maybe the answer is to defer to a different technology for streaming larger files. But it seems like there should be a good solution for streaming data and flushing the output stream.

I'm in the same boat, Rails 2-3-stable, output.flush is said to be deprecated and no longer works, but it seems that using render :text => proc { |response, output| doesn't send streamed data at all. I also tried with send_data without luck.

After some research I thought that the flush would happen after a output.write but that does not seem the case, at least where I looked.

We have potentially very large ajax requests (3mb) and from a java server we were able to cut down the action time greatly by manipulating the response; I'm trying to achieve the same from Rails but nothing I tried currently works.

Claudio Poli wrote in post #949941: [...]

We have potentially very large ajax requests (3mb)

It sounds like Rails' streaming needs to improve, but a 3MB Ajax request is a huge design problem! For performance reasons, it should rarely be necessary to request more than 100K or so through Ajax.

Best,

ehansen486 wrote in post #949547:

In rails 3, the render :text => lambda { ... } is definitely broken.

I suppose then it might not be a bad idea to submit a documentation patch to either remove or note that this is broken in Rails 3.0.

send_data ... ... Tip: if you want to stream large amounts of on-the-fly generated data to the browser, then use render :text => proc { ... } instead. See ActionController::Base#render for more information.