How can I import an Excel file with a migration?

11175 · March 28, 2008, 12:39pm

I have an Excel file and I want to do a migration with it. How can I do it?

Peter_De_Berdt · March 28, 2008, 2:30pm

Converting it to a more Rails-friendly format is indeed the way to go. However, if Excel import is a feature you want to support in your application, you could have a look at http://roo.rubyforge.org/

But for your migration import, please go for a csv solution (or even go a step further if it’s a one time operation and convert it to something even more ruby like using regex or some text processing script such as yaml or even Rails commands).

bphogan · March 28, 2008, 2:39pm

I’d add that this should not be part of a migration. Migrations are for defining your database, not loading data. If you need to load data, consider a Rake task

lib/tasks/import.rake

namespace :db do

desc “load data from excel” task :load_from_csv => :environment do # your code goes here end

end

Then rake db:load_from_csv

Peter_De_Berdt · March 28, 2008, 2:50pm

I don’t agree.

Migrations can be used for anything that has to do with setting up your application:

define database structure, indices, …
alter the database structure
convert data in later migrations (e.g. convert a one-to-many to a habtm) if the need arises
add default data

Best regards

Peter De Berdt

mridelmann · March 28, 2008, 3:10pm

Hi Peter,

You should use FasterCSV and create your migration like so:

require 'fastercsv'

class LoadContacts < ActiveRecord::Migration

def self.up FasterCSV.foreach('db/data/contacts.csv') do |row|

Assignment.create(:name=>row[0], :address=>row[1], :phone=>row[2], :mobile=>row[3], :city=>row[4]) end

end

def self.down Contact.delete_all end end

In your CSV file you would add data like so: "some data","some more data"

The run rake db:migrate

Hope this helps,

Mike

Peter_De_Berdt · March 28, 2008, 3:22pm

Hi Peter,

You should use FasterCSV and create your migration like so:

…

In your CSV file you would add data like so: “some data”,“some more data”

The run rake db:migrate

Sure, and you’re loading data in your migration when you to this.

Best regards

Peter De Berdt

bphogan · March 31, 2008, 1:11pm

@Peter De Berdt

You can disagree all you want, but imagine this scenario:

Let’s say you have a migration that creates a user table Then you have another migration that adds three users to your table.

User.create :login =>“homer”, :email=>“homer@simpsons.com”, :password=>“1234”, :password_confirmation=>“1234” User.create :login =>“BART”, :email=>“bart@simpsons.com”, :password=>“1234”, :password_confirmation=>“1234” User.create :login =>“lisa”, :email=>“lisa@simpsons.com”, :password=>“1234”, :password_confirmation=>“1234”

Life is good…

Then you add a new field called “gender” to the user model

Then you take that new field and add validates_presence_of :gender

Now, let’s say it’s time to deploy.

You’ve been told by DHH and others that you’re SUPPOSED to use rake db:schema:load to setup your database. Of course, that doesn’t contain your “add users” step, so you just run your migrations on the server to build and populate the database.

Guess what happens? Massive and epic FAIL. Your migration that adds records breaks because your model is validating a field that isn’t added until a later migration. You could argue that moving them to a rake task wouldn’t fix this problem… but I argue it does. You are not supposed to go back and edit a previous migration.

Using Rake to boostrap your database allows you to

separate loading of data from schema
choose when you want to boostrap the database with fake data.

I used to use migrations for this all the time. I know better now after working on projects with multiple people that this practice is bad. Just because you can add records in a migration doesn’t mean you should.

So, do you still disagree?

(Just my .02 on this…)

-Brian

Peter_De_Berdt · March 31, 2008, 2:42pm

You can disagree all you want, but imagine this scenario:

Let’s say you have a migration that creates a user table Then you have another migration that adds three users to your table.

User.create :login =>“homer”, :email=>“homer@simpsons.com”, :password=>“1234”, :password_confirmation=>“1234” User.create :login =>“BART”, :email=>“bart@simpsons.com”, :password=>“1234”, :password_confirmation=>“1234” User.create :login =>“lisa”, :email=>“lisa@simpsons.com”, :password=>“1234”, :password_confirmation=>“1234”

Life is good…

Then you add a new field called “gender” to the user model

Then you take that new field and add validates_presence_of :gender

Now, let’s say it’s time to deploy.

You’ve been told by DHH and others that you’re SUPPOSED to use rake db:schema:load to setup your database. Of course, that doesn’t contain your “add users” step, so you just run your migrations on the server to build and populate the database.

Guess what happens? Massive and epic FAIL. Your migration that adds records breaks because your model is validating a field that isn’t added until a later migration. You could argue that moving them to a rake task wouldn’t fix this problem… but I argue it does. You are not supposed to go back and edit a previous migration.

… unless the application has never been deployed before.

Using Rake to boostrap your database allows you to

separate loading of data from schema

choose when you want to boostrap the database with fake data.

I mentioned adding default data, not fake data. I agree fake data should be loaded through a rake task.

I used to use migrations for this all the time. I know better now after working on projects with multiple people that this practice is bad. Just because you can add records in a migration doesn’t mean you should.

So, do you still disagree?

respond_to? as a condition on the validation can help you out here.

That said, I do understand your gripes with populating the database from migrations, but having had to take over an existing rails project with “data import from a bunch of yaml files” (or any other importable file for that matter), let me plot out what happened to me there:

it used an import of several linked tables, forcing the original developer to fill in id and foreign keys by himself to support the relationships. That led to numerous problems on existing deploys as default data was being added along the way. The same problems you mention started popping up, but effectively made things worse. Some of the yaml files were imported and linked to wrong records while others failed because certain records already existed.

The key rule to follow here is to make sure your data is pretty stable after import to be sure. Default data to me is for example: belgian postal code tables (which is just a simple lookup table), default preferences, where I’m using a custom Ruby class to handle preference saving in certain fields, basically there’s no way it can go wrong there.

To ensure migrations don’t break, we have some tests in place that basically migrate from a number of different migrations along the way, to see if they still run and return the same result.

Best regards

Peter De Berdt

Frederick_Cheung · March 31, 2008, 2:56pm

@Peter De Berdt

Guess what happens? Massive and epic FAIL. Your migration that adds
records breaks because your model is validating a field that isn't
added until a later migration. You could argue that moving them to a
rake task wouldn't fix this problem.... but I argue it does. You are
not supposed to go back and edit a previous migration.

Using Rake to boostrap your database allows you to 1. separate loading of data from schema 2. choose when you want to boostrap the database with fake data.

I used to use migrations for this all the time. I know better now
after working on projects with multiple people that this practice is
bad. Just because you *can* add records in a migration doesn't mean
you *should*.

So, do you still disagree?

You definitely need to be careful when using models in migrations.
Stick Class User < ActiveRecord::Base; end; in your migration and
you're good to go (assuming you were using the model purely as a
facilitator for inserting the data in the database. It also depends to an extent on what type of app you're working on.
There is only one 'live' deployment of one of the apps I work on, it's
a custom built app that runs our systems. No one is ever going to
create an empty database and run rake db:migrate. Having things like
this in migrations means that when we rollout a new version to our
servers it's basically impossible for us to forget to run data
insertion/massage tasks, and any that makes deployment less risky is a
win for me.

Fred

bphogan · March 31, 2008, 5:35pm

@Fred:

Julian_Leviston · April 2, 2008, 2:03am

That’s why you should re-define your classes that may be affected from inside the migration.

For instance,

in the model

class User < ActiveRecord::Base

validates_presence_of :password

end

class User < ActiveRecord::Base; end

class AddFieldToSomething < ActiveRecord::Migration

def self.up

# do something

end

def self.down

# undo something

end

That way, the migration won’t pick up the validations at all, and you’ll be able to do whatever you need to within them. Works well within data migrations, too.

The reason this is the case is that ActiveRecord uses lazy loading and also dynamically finds your class files, so what normally happens when one of your AR classes is referenced, is that it lazy loads it at the time it’s needed. However, if you’ve already defined the class in the migration file, it won’t bother to lazy load it, because you’ve essentially already “loaded” it (ie defined the class).

Julian.

Learn Ruby on Rails! CHECK OUT THE FREE VIDS (LIMITED TIME) NEW VIDEO OUT 3rd APRIL

http://sensei.zenunit.com/

bphogan · April 2, 2008, 3:43am

Yes, that works… today. Will it work later? It’s been mentioned twice on this thread already… data in migrations breaks down during deployment and migrations should not be used for deployment. if you are using migrations to build a states table and populate it, for example, you’re going against the grain. Migrations are developer tools, not deployment tools.

I recommend this well-written and thought-provoking discussion - http://quotedprintable.com/2007/11/16/seed-data-in-rails

Peter_De_Berdt · April 2, 2008, 7:32am

Oh my, the fixtures loading solution.

How will you load default data when two of the tables are interlinked in some way using a foreign key? Manually enter them from your fixture? Use the new fixture system where a number of totally random ids are filled in, but you can use the yaml key as the foreign key?

Best regards

Peter De Berdt

bphogan · April 2, 2008, 6:50pm

Sigh… the discussion was good. I don’t necessarily agree with the use of fixtures… but the fact that he used a rake task to do the loading was my point.

I’m just saying that “The Rails Way” (at least according to the docs) is to not use migrations for rebuilding your database when you deploy. That infers that loading data in a migration is not an optimal practice.

Take a look at Mephisto. There’s a bootstrap rake task that you use to set up your database.

But there’s nothing preventing people from going against the opinions of the Rails team. So you do what you want, and I’ll do what I want, and we can agree to disagree.

Have a good day. If you’re going to RailsConf, let’s have a drink on me!

Topic		Replies	Views
Massive data insert during migration rubyonrails-talk	1	117	September 14, 2006
Migrate Excel (MS Access) to Rails rubyonrails-talk	3	185	January 7, 2008
Migration: upload a large set of data via a separate sql file? rubyonrails-talk	2	139	May 12, 2007
Using Migrations to load user data? rubyonrails-talk	3	164	November 14, 2014
Patch: import csv on migration rubyonrails-talk	1	200	September 4, 2006

How can I import an Excel file with a migration?

Related topics

More Resources