Updating database via cron?

Hi there

I have a Rails app running that needs to have it's database periodically updated from an upload of a text file.

The data file is in UIEE format and I'm working on a Ruby script that parses the file and inserts the data into MySQL. That should be OK I think at this stage.

However I need to automate this process - basically the client is uploading the UIEE file to a prescribed directory on the remote server. I then either need to detect whether there has been a fresh upload OR rely on cron to look into the directory and then parse the fresh file (if present).

Am I on the right track looking at cron to do this? Maybe I'm better building an admin page where the user can manually trigger the running of the database update script? Or is there another unix command that can detect a change in a directory and thereby trigger the script?

Any clues on how best to approach this situation would be appreciated...

Cron is a fine approach if you want an action to be based on time. If you want some action to be based on some user action, just redirect_to it after the file is uploaded. If you're using Cron, it's likely best to use a rake task. Quite easy, if you haven't done it before. Much like writing a little controller. This gives you access to the Rails stack without dealing with your web server. You can call the rake task from your crontab file. Don't forget to use absolute paths and a subshell. I do it like this (after the time declaration): system_user_name (cd /path/to/rails/app; /usr/bin/rake rake_task_name) I'd be looking at the File class if you're having trouble with your particular file encoding. Worst case there is that you have to use "system some_unix_utility_that_will_convert_your_file some_arguments" to do some converting before you open the file using Ruby. The rest should just be string manipulation and normal record creation the Rails way.

(removed/reposted)

One of the things I've loved about rails is the ease with which you can leverage your existing app code for shell/cli/scripting purposes using ./script/console and ./script/runner. As such, I highly recommend implementing your parsing/processing code from within your rails app, versus having some separate ruby code that parses/processes data and persists it directly in the db.

The major benefits of doing this is flexibility and d-r-y-ness of your code by: leveraging all of your existing code/rules/etc for persisting such data in the db; ability to easily test all of the pieces that make up that processing just like any other part of your rails app; ability to call such processing from both within your rails app via a controller or via console or runner; easily perform such processing against test, dev, or prod dbs; ....

So, say the model ob you need to process data for is Foo, and the dir that your client is uploading new data files to is found under your proj root in ./private/newdata, and when a file is successfully processed it is mv'd to ./private/processeddata, and you log processing attempts in ./log/foo_processor.log, ....:

  # in ./app/models/foo.rb   ...   PROJ_DIR = File.expand_path("#{File.dirname(__FILE__)}/../..")   NEWDATA_DIR = "#{PROJ_DIR}/private/newdata"   PROCESSEDDATA_DIR = "#{PROJ_DIR}/private/processeddata"   PROCESSOR_LOG = "#{PROJ_DIR}/log/foo_processor.log"   ...

  def Foo.process_data(somefile=nil, is_debug=false)     # if not somefile, grab list of un-processed NEWDATA_DIR files,     # ... and process data ....     ...   end   ...

You could then call that class meth in some controller for uploading/ processing new data via your app:

  # in ./app/controller/some_such_controller.rb   ...

  def upload_newdata     ...     # after saving successfully uploaded datafile in NEWDATA_DIR ...     Foo.process_data(datafile_name)     ...   end

or call it in some console session:

$ ./script/console development ...

Foo.process_data('some_datafile.txt', true)

...

or call it from shell/cli via runner:

$ ./script/runner -e development 'Foo.process_data ("some_other_datafile.txt", true)' ...

or call it via cron:

# in appropriate crontab ... ... # at 2:03am every night, process all new datafiles in production env: 3 2 * * * appuser /path/to/proj/script/runner -e production 'Foo.process_data' 2>&1 ...

Jeff