I'm working on an app that uses info from a government database. For a
variety of reasons (not the least of which is that their database is in
MS Access), I can't simply write a wrapper around the legacy database.
Instead, we're looking to have an "import" script in the Rails
environment (it's fine if it's run from the server and through the web)
that will open up a connection to another MySQL schema and selectively
import data into the Rail app's database.
What's a good way to go about this? From the message board, it looks
like the best route is to make a script that instantiates an
ActiveRecord::Base connection, and then loads information out of the
government database (exported to MySQL) into the appropriate
ActiveRecord objects, and saves these objects to the new database. I'm
unclear as to how to set this up though. Will Rails magically load the
environment so I can just create a new Foo object and save it, or will
I need to require some files? Where is an appropriate place to put this
script? The lib folder, vendor folder, elsewhere?
Any other suggestions on how to go about this would be equally
appreciated. Thanks!
Hello,
I'm working on an app that uses info from a government database. For a
variety of reasons (not the least of which is that their database is in
MS Access), I can't simply write a wrapper around the legacy database.
Instead, we're looking to have an "import" script in the Rails
environment (it's fine if it's run from the server and through the web)
that will open up a connection to another MySQL schema and selectively
import data into the Rail app's database.
What's a good way to go about this? From the message board, it looks
like the best route is to make a script that instantiates an
ActiveRecord::Base connection, and then loads information out of the
government database (exported to MySQL) into the appropriate
ActiveRecord objects, and saves these objects to the new database. I'm
unclear as to how to set this up though. Will Rails magically load the
environment so I can just create a new Foo object and save it, or will
I need to require some files? Where is an appropriate place to put this
script? The lib folder, vendor folder, elsewhere?
You have two main options and your situation will determine which is better for you.
In both options below, the first CSV step assumes you just do a simple 'export to CSV' from Access rather than writing a script to pull out only the data you want. If you do the latter instead you can skip the first CSV step.
Option 1
[snip]
You would write the script to read the legacy CSV files and create
ActiveRecord objects. The data are validated along the way. In
order for your script to work with the AR objects, include your /
models directory on the load path. E.g:
$ ruby -I lib:/path/to/your/app/models lib/data_migration.rb
[snip]
Forgot to mention that you also need in your script to connect to your database via ActiveRecord. This does the trick:
ActiveRecord::Base.establish_connection(
:adapter => settings['adapter'],
:host => settings['host'],
:username => settings['username'],
:password => settings['password'],
:database => settings['database'])
where settings are pulled out of my Rails app's config/database.yml for the appropriate environment.
Regards,
Andy Stewart
Andy,
Your email is a very good write up - you should publish it somewhere
perminantly so it can be referenced in the future.
I just wanted to add that I have the beginnings of a Ruby ETL tool in
the ActiveWarehouse Subversion repository. The description of the ETL
tool is here:
http://activewarehouse.rubyforge.org/etl/
The code in there is relatively new, but it may be of use.
V/r
Anthony Eden
Andy,
Your email is a very good write up - you should publish it somewhere
perminantly so it can be referenced in the future.
Thank you for your kind words. I have followed your advice and written up my email here:
http://blog.airbladesoftware.com/2006/12/5/migrating-data-into-rails
I just wanted to add that I have the beginnings of a Ruby ETL tool in
the ActiveWarehouse Subversion repository. The description of the ETL
tool is here:
http://activewarehouse.rubyforge.org/etl/
The code in there is relatively new, but it may be of use.
This does look useful. I have included ETL in a 'further reading' section in my article.
Regards,
Andy Stewart