I am building out an app that will allow the user to import a data set
(as a CSV) from another application and view reports that will allow
them to make educated decisions. The most common task users will be
doing is importing CSV files.
An average CSV file will contain 1000 to 2500 rows and the system
will need to import approx. 50 CSV files per hour in the beginning and
could easily grow to 5000+ CSV files per hour as I intend on making
the basic plan w/ 60% of features available for free.
The CSV files will all be standardized. I don't need to worry about
column variations or dirty data for version 1.0.
I am wondering is if FasterCSV is the right tool to use in this
scenario.
In addition, there will be lots of data crunching for preparing the
reports. Is ROR even the right solution for this problem? I think it
is but what do you think?
The ar-extensions project is worth a look, as it can drastically speed up data loads in Rails. One very happy user wrote about it here:
http://www.jobwd.com/article/show/31
The developer’s blog articles about it are here:
http://continuousthinking.com/tags/arext
Beware, though, that some links, like those to the RDocs, aren’t working at the moment. I’ve written to the developer about that.
Regards,
Craig
Thanks Craig. I'll check out AR-Extensions. Anyone who has used AR-E
wanna comment with their thoughts?
I would think you'd use something like "spawn"
http://spawn.rubyforge.org/svn/spawn/
to push the import in the background and use Ajax (or even better
Comet) to
notify the user their import is completed. If you go with the
asynchronous model,
you don't even really need spawn.
Under this or similar architecture you could do the import in native
bulk
loaders, C, ruby, rails; whatever suits your fancy. You could even
(and
probably should) move the processing onto a different box. Web
servers should
be fulfilling HTTP requests; not messing with potentially long running
batch jobs.
0.02