Calling an external API as quickly as possible

Hi

I'm a Java developer working with a team that's moving over to Ruby
and Ruby on Rails - we're really excited!

We are writing a replacement for a large, java-based, e-commerce
website for a client who is based in the United Kingdom. This will be
the first web site of its kind written in RoR.

One of the things we need to do is to access various external API's to
help us build each web page. We are required to support a number of
different text-based APIs (XML, key/value pairs etc etc) via HTTP.

Typically, we will call an API to get some data, let's say a list of
Countries, for example. When we receive the data we build a list of
Country objects. If we simulate the same thing using ActiveRecord
(SELECT * FROM Countries), we find that the task is about 5 times
faster.

I am assuming that the difference is that ActiveRecord creates its
objects in-line, while the data is arriving and the HTTP method does
not start until the entire request is complete.

Can anyone suggest ideas of how we might go about doing this type of
in-line processing while reading the HTTP response. I do not want to
have this kind of low-level code in every controller, so we could
probably do some sort of helper.

I would recommend caching the results. Enduring HTTP request/response cycles for infrequently changed data such as lists of countries is very time-consuming. Another strategy you might examine is dRb. In your worker, hit the Web services periodically for a refresh and from your Rails application make the request to the local dRb instance. If dRb is too big a hammer, then just run a cron job that periodically updates a memcache server.

To extract the code from your controllers, use a before_filter in app/controllers/application.rb and the request to construct your objects will be made before each action is begun.

Still, it seems like a very expensive way to retrieve what seems to be somewhat static data.

Hope this helps.

Many thanks for the response. I only used a country list as an
example. In fact, the API calls will retrieve dynamic data. All I
want to do is to start reading the stream of data that comes down the
socket before the stream has finished.

Have you considered serving each section of the page asynchronously as
an ajax request? Basically, load the page with a bunch of empty divs.
Then with your Prototype's dom:loaded event, issue a bunch of Ajax
requests to fill these divs in. What happens is that each is populated
by a different HTTP request, allowing for a perceived performance
improvement because the page loads and content populates as it is
available.

Do know that although it is perceived as loading faster, it’ll probably be slower and put a bigger strain on your server and database.

Three factors come into play:

  • Page deflating by apache will be less, since there’s less data to work with and compress (the less data, the less efficient the compression usually is, especially with text)

  • browser limiting the number of simultaneous connections to the same domain, that’s why Rails 2 has the asset host feature nowadays, to work around that issue

  • the number of extra database hits you’ll make:

example:

loading page in one request: fetch article with id specified in url and eagerly load the associated comments, pictures: 1 database hit, 1 rendering cycle

loading page using several requests: fetch article, render page, fetch article and comments, render comments section, fetch article and pictures, render picture section: 3 database hits, 3 rendering cycles

The last two can be worked around and optimized (balancing over several virtual hostnames, caching of asynchronous pages), but one may start wondering if it’s worth going through all this trouble.

On top of that, browsers that have javascript disabled will get a largely empty page.

Best regards

Peter De Berdt

Swordfish wrote:

Hi

I'm a Java developer working with a team that's moving over to Ruby
and Ruby on Rails - we're really excited!

We are writing a replacement for a large, java-based, e-commerce
website for a client who is based in the United Kingdom. This will be
the first web site of its kind written in RoR.

One of the things we need to do is to access various external API's to
help us build each web page. We are required to support a number of
different text-based APIs (XML, key/value pairs etc etc) via HTTP.
  
If you do XML parsing with Ruby it can become slow enough to be noticeable in an interactive context. I've seen 100ms delay for in memory parsing of small and simple contents (3 or four levels of elements, probably less than 100 elements) recently with the built in parser and I was really surprised.

I'm not sure how you can go around that. Manual parsing with regexs might be at least one order of magnitude faster but the code can become a mess if the XML is complex.

In your position I'd bench your XML parsing on actual data in memory to avoid any network-related latency problem to find out if this is one of the things that slow you down.

I've seen some XPath parsing being really slow on Ruby compiled with pthread support too. All my systems now have Ruby compiled without pthread support for a minor global speed boost so I can't test if the problem was really in the Xpath queries or in the basic XML parsing but leaving pthread out of the way gave me 100x the perf on the Ruby script I have in mind.

Lionel

You raise good points. However, unless I completely missed Swordfish's question, the goal was to have the web service requests fulfilled quickly and probably asynchronously. With Rails, options are limited WRT any asynchronous execution. Seems one architecture that stays inside the Rails framework would be something along the lines of:

before_filter :do_ws_fetches
after_filter :wait_for_fetches

# ...

def do_ws_fetches
   @ws = []
   @ws << Thread do
     # code to do first ws fetch
   end
   @ws << Thread do
     # code to do second ws fetch
   end
   # ... and so on ...
   # don't join so everything else can go ahead pretty much as planned, except render.
end

def fetches_still_happening?
   @ws.detect{|d| d.alive?}
end

def wait_for_fetches
   # subject to some give-up / timeout criteria
   0.upto(timeout_in_milliseconds / 10)
     break if !fetches_still_happening?
     sleep(.01)
   end
end

Of course, this whole idea leads us down the Rails and concurrency path, but if all that being done is foraging for data from remote sources, some careful programming can reduce the risk of stepping on data.

WDYT?

Very interesting use of filters. This is exactly the kind of approach
I was thinking about. I will have a go.

Many thanks.