Rails 3 application capable of generating an offline version of itself for download as zip archive

I'm kinda newbie in RoR yet and I'm having a hard time trying to figure out how should I implement this. I'm writing an application to store and display information about insects and their distribution. Currently I have almost all functionality implemented, except for a **very** important one: The application must be capable of "crawling" itself and generate a zip archive for download. Actually, crawling itself isn't accurate enough, since the views must be sightly different (e.g. don't provide functionality not available without Internet connection, indicate in the title that the page is an offline copy, etc).

The question is: Do you have any suggestions as to how I should implement this?

One approach I had in mind (although I don't know how to program it), would be calling from the controller that triggers the archive generation, all the publicly accessible controllers and each of them providing a non-routable method which uses the offline templates. This method would be an index-like one except that it will repeatedly use the offline view for #show, rendering to string and storing in a stream.

Another approach, let the zip generation controller access the views from all the resources and iterate over all model data, making a really centralized and big controller.

And lastly, make all public controllers check for "/static/" to tell them to use the offline templates and then use some self crawling by iterating over all model data. (For some reason, trying to self-crawl didn't work for me in development even when using Thin which explicitly advertises ">> Maximum connections set to 1024" when it boots up. The problem would probably solve itself by using delayed job, though, haven't tested yet.)

All the controllers dealing with resources were created with "rails g scaffold ...".

In all the "solutions" above (except maybe in the third method) are missing a procedure to include all assets in the zip archive properly (i.e. enumerate them all and store them with the correct file name).

I'll be extremely grateful for any suggestions about how should I tackle this problem!

I would check out something like Jekyll.

A tip when searching for a RoR version of XYZ, Google something like “XYZ ruby rails github gem”. There is a gem to accomplish most complex/mundane tasks.

I would let something like Jekyll handle the static site creation. You could then have a ruby script create a custom config file for each type site or actually create another gem that handles that.

Have everything run in the background like so:

  1. Request comes in for new static build.
  2. Ruby script initializes and custom variables
  3. Ruby script creates unique temp folder. Keep track of this folder so it can be logged and deleted.
  4. Write log to database that process has started.
  5. Ruby script runs jekyll command to create static site.
  6. If no errors write to db log that process completed. Log error otherwise and notify admin via email.
  7. Zip temp directory and notify db log process is complete and zip ready.
  8. Notify user via email or prompt that zip is ready.
  9. Push zip to browser with new pretty file name.
  10. Delete temp zip and directory. Log in db that file was successfully delivered.

This process under a heavy server load can take a while. Use a background task gem that logs to a DB or roll your own.

Utilize your Linux server via bash as much as possible. You can control your server via Ruby. The system in many instances will run tasks like zipping way faster than Ruby. Use Cron for scheduled tasks and the “God” gem for crash detection.

Use good error handling as much as possible and log all “states” so you can debug where issues arise.

Good Luck. I hope I addressed your issue.

My suggestion is I hope simple. Use wget to crawl/mirror the site, using a query string parameter to indicate you want the "offline" views -- you still need to implement them if they are different enough -- by checking that the special parameter is set; you should be able to set it just once for the session, and have wget use cookies to maintain the session info.

Another alternative instead of the query string parm could be using the user agent string wget sends, and always deliver the "offline" version to that UA string.

The mirroring will pull all the urls that are included under the main one. If your assets are not under that main url, this won't work. You can tell wget to pull from elsewhere, but it can easily get out of hand.

Hope this helps.

Just throwing in an idea, assuming you would want to download these for documentation/printing purposes :

Why not generate pdf's of all the entries on your website using a gem, saving them to a specific folder, zip them up with another gem and then force a download when entering a specific url/triggering a specific action?

Here's some information to get your adventure started:

http://rubyzip.sourceforge.net/

Good luck! If you have more questions, let me know.

Cheers, Nick

It sounds like you might want to implement an offline app instead. There is browser support in most browsers for appcache to make all your assets available offline, and there is the javascript call navigator.onLine that will tell you if you are offline or not. You’ll have to create and maintain an appcache file. There is a gem that handles that, but I ended up just creating an appcache controller and serving it myself.

This is quite a fundamental architecture change, though, so you might be too far along in development to do it. You could create a small sample app to prove the concept, then drag all your existing code into it.

See http://railscasts.com/episodes/247-offline-apps-part-1 for an introduction.