HTML snapshots for crawlable ajax

Hi, There doesn't seem to be any reference for taking HTML snapshots from within a Rails server. I wonder how one could implement Google's crawlable AJAX spec (Understand JavaScript SEO Basics | Google Search Central  |  Documentation  |  Google Developers)on a Rails application?

To summarize: I have a Rails application with a Javascript front-end with lots of AJAX. I need Google to index the AJAX content, hence need to implement the above spec. Now, I can send an AJAX request to Rails for a link that the crawler asks; I need Rails server to respond with HTML snapshot. Can this be handled on a single Rails running on nginx? Or do we need to send the link to a HTMLUnit headless browser to take a snapshot?

Has anyone done this for a Rails app?

Hi, There doesn’t seem to be any reference for taking HTML snapshots from within a Rails server. I wonder how one could implement Google’s crawlable AJAX spec (http://code.google.com/web/ajaxcrawling/docs/learn-more.html)on a Rails application?

As always, there are several ways…

To summarize: I have a Rails application with a Javascript front-end with lots of AJAX. I need Google to index the AJAX content, hence need to implement the above spec. Now, I can send an AJAX request to Rails for a link that the crawler asks; I need Rails server to respond with HTML snapshot. Can this be handled on a single Rails running on nginx? Or do we need to send the link to a HTMLUnit headless browser to take a snapshot?

Does your “Javascript front-end with lots of AJAX” create or render lots of new HTML content? Or, is your AJAX the kind mostly manipulates the DOM by getting new HTML document fragments via XHR requests?

As the Google docs on the subject itself mentions, if it is the former case then you may want to consider a server-side “browser” like HTMLUnit. Otherwise, you might want to focus more on your actual rails code. Even within the framework of rails conventions, there is so much latitude in how sites implement AJAX applications that there are lots of possible answers.

For example, I’ve got a rails app (its an older rails 2 app) that has fair amount of AJAX. I first developed it statically and used “progressive enhancement” techniques to add AJAX functionality. The result is that in many cases I have controller actions that when executed may “return” (render) either a full HTML document or a document fragment, depending on whether the request is an XHR. If I were updating this site (quick and dirty) to support this Google spec., I’d simply make it so that said actions return a full HTML document when an AJAX request has the special escaped_fragment parameter.

However, I can conceive of several different techniques (and have used different ones to various degrees) that would require a different approach.

This might be an area that would be good for some kind of rails (and/or rack) gem built around a specific set of AJAX conventions and design patterns, that integrates or is solely written to implement this Google spec. If, indeed, such a beast doesn’t already exist. Such a solution would still only work for those who want to, are willing to, or already do adhere to the chosen conventions. But, then, rails users do the same for web app. dev. in general.

Anyone else care to let their mind wander too?

Kendall Gifford wrote in post #999301:

Does your "Javascript front-end with lots of AJAX" _create_ or _render_ lots of new HTML content? Or, is your AJAX the kind mostly manipulates the DOM by getting new HTML document fragments via XHR requests?

It's the former, a Javascript-minified web application, it manages the entire front-end.

As the Google docs on the subject itself mentions, if it is the former case then you may want to consider a server-side "browser" like HTMLUnit. Otherwise, you might want to focus more on your actual rails code. Even within the framework of rails conventions, there is so much latitude in how sites implement AJAX applications that there are lots of possible answers.

It's the former one, hence the need for HTMLunit. I came across this (http://tinyurl.com/6yxrch7) implementing HTMLUnit on GWT. I'm not an expert in GWT, hence deffering it for now until I can find a better solution.

For example, I've got a rails app (its an older rails 2 app) that has fair amount of AJAX. I first developed it statically and used "progressive enhancement" techniques to add AJAX functionality. The result is that in many cases I have controller actions that when executed may "return" (render) either a full HTML document or a document fragment, depending on whether the request is an XHR. If I were updating this site (quick and dirty) to support this Google spec., I'd simply make it so that said actions return a full HTML document when an AJAX request has the special _escaped_fragment_ parameter.

Mine is a Rails2 app too, I return raw data to the client where it gets put in my custom templates. I reckon you mean RJS by "return a full HTML document", though it can get very complicated if I were to build the styles in a few Rails views.

However, I can conceive of several different techniques (and have used different ones to various degrees) that would require a different approach.

This might be an area that would be good for some kind of rails (and/or rack) gem built around a specific set of AJAX conventions and design patterns, that integrates or is solely written to implement this Google spec. If, indeed, such a beast doesn't already exist. Such a solution would still only work for those who want to, are willing to, or already do adhere to the chosen conventions. But, then, rails users do the same for web app. dev. in general.

Anyone else care to let their mind wander too?

I've come across Crowljax, which seems to be mostly for testing. I agree, it'd be great if there was a gem that ushered _escaped_fragment requests to a GAE app and returned back the HTML snapshot. It wouldn't require high traffic volume, running only for crawling requests. Without such a solution, folks like me will go ahead and build their own GAE app I imagine... unless there is another way!