Updating database via cron?

Hi there

I have a Rails app running that needs to have it's database
periodically updated from an upload of a text file.

The data file is in UIEE format and I'm working on a Ruby script that
parses the file and inserts the data into MySQL. That should be OK I
think at this stage.

However I need to automate this process - basically the client is
uploading the UIEE file to a prescribed directory on the remote
server. I then either need to detect whether there has been a fresh
upload OR rely on cron to look into the directory and then parse the
fresh file (if present).

Am I on the right track looking at cron to do this? Maybe I'm better
building an admin page where the user can manually trigger the running
of the database update script? Or is there another unix command that
can detect a change in a directory and thereby trigger the script?

Any clues on how best to approach this situation would be

Cron is a fine approach if you want an action to be based on time.
If you want some action to be based on some user action, just
redirect_to it after the file is uploaded.
If you're using Cron, it's likely best to use a rake task. Quite
easy, if you haven't done it before. Much like writing a little
controller. This gives you access to the Rails stack without dealing
with your web server.
You can call the rake task from your crontab file. Don't forget to
use absolute paths and a subshell. I do it like this (after the time
system_user_name (cd /path/to/rails/app; /usr/bin/rake rake_task_name)
I'd be looking at the File class if you're having trouble with your
particular file encoding.
Worst case there is that you have to use "system
some_unix_utility_that_will_convert_your_file some_arguments" to do
some converting before you open the file using Ruby.
The rest should just be string manipulation and normal record creation
the Rails way.


One of the things I've loved about rails is the ease with which you
can leverage your existing app code for shell/cli/scripting purposes
using ./script/console and ./script/runner. As such, I highly
recommend implementing your parsing/processing code from within your
rails app, versus having some separate ruby code that parses/processes
data and persists it directly in the db.

The major benefits of doing this is flexibility and d-r-y-ness of your
code by: leveraging all of your existing code/rules/etc for persisting
such data in the db; ability to easily test all of the pieces that
make up that processing just like any other part of your rails app;
ability to call such processing from both within your rails app via a
controller or via console or runner; easily perform such processing
against test, dev, or prod dbs; ....

So, say the model ob you need to process data for is Foo, and the dir
that your client is uploading new data files to is found under your
proj root in ./private/newdata, and when a file is successfully
processed it is mv'd to ./private/processeddata, and you log
processing attempts in ./log/foo_processor.log, ....:

  # in ./app/models/foo.rb
  PROJ_DIR = File.expand_path("#{File.dirname(__FILE__)}/../..")
  NEWDATA_DIR = "#{PROJ_DIR}/private/newdata"
  PROCESSEDDATA_DIR = "#{PROJ_DIR}/private/processeddata"
  PROCESSOR_LOG = "#{PROJ_DIR}/log/foo_processor.log"

  def Foo.process_data(somefile=nil, is_debug=false)
    # if not somefile, grab list of un-processed NEWDATA_DIR files,
    # ... and process data ....

You could then call that class meth in some controller for uploading/
processing new data via your app:

  # in ./app/controller/some_such_controller.rb

  def upload_newdata
    # after saving successfully uploaded datafile in NEWDATA_DIR ...

or call it in some console session:

$ ./script/console development

Foo.process_data('some_datafile.txt', true)


or call it from shell/cli via runner:

$ ./script/runner -e development 'Foo.process_data
("some_other_datafile.txt", true)'

or call it via cron:

# in appropriate crontab ...
# at 2:03am every night, process all new datafiles in production env:
3 2 * * * appuser /path/to/proj/script/runner -e production
'Foo.process_data' 2>&1