Commit every Nth INSERT in migration

Hugh_Sasse · June 19, 2007, 3:43pm

Ruby. I extracted the import of a single table into a script and trimmed the CSV down to some 5000 rows ... it still takes some 157s. While a raw DBI+DBD::msyql import in Perl takes some 72s with autocommit and 1-2s while commiting only once every 100 rows.

You may find that (Perl|Python|Lua|...) are quicker than ruby for some tasks. If performance is your biggest constraint then it may be worth choosing the fastest tool for each bit, till the others catch up... But, is there anything else to be done? [...]

I tried to do the commit trick, but seems Ruby doesn't like the least bit more than I like Ruby and it doesn't work. What am I doing wrong?

require 'rubygems' require 'active_record' require 'yaml' require 'logger' require 'fastercsv'

dbconfig = YAML::load(File.open('M:\ecom\config\database.yml'))

[OT, but may be worth having a bit more error checking in there]

ActiveRecord::Base.establish_connection(dbconfig['development']) #ActiveRecord::Base.logger = Logger.new('zkSqlBatch2.log', File::WRONLY > File::APPEND | File::CREAT) #ActiveRecord::Base.logger.level = Logger::DEBUG

#ActiveRecord::Base.colorize_logging = false

class ZipCode < ActiveRecord::Base end

ZipCode.delete_all

All that looks fairly normal to me...

started = Time.new()

i = 0 ZipCode.connection.begin_db_transaction()

To squeeze this further, put the result of zipconde.connection in a variable.

FasterCSV.foreach("M:/ecom/db/csv_lookup_data/postal_codes_some.csv", :col_sep => "|") do |row|

FasterCSV is certainly faster than CSV, but it may be quicker to just split lines on /\|/, if they are sufficiently simple.

M: -- speed of access to that drive isn't a problem? If you've tried Perl etc then probably not. (Rhetorical question, BYTK)

ZipCode.create( :postal_code => row[1] , :country_id => 159, :city => row[3], :latitude => row[4], :longitude=> row[5],:state_id => row[6] ) i+=1 if (i > 100) then i = 0 ZipCode.connection.commit_db_transaction ZipCode.connection.begin_db_transaction end end ZipCode.connection.commit_db_transaction

puts "Taken #{Time.new - started} s"

As this is ruby code, I suggest you run it with -rprofile to find out which bits are actually slow. The profiler will slow things down of course, but you'll at least see where the bottlenecks are so you can work on those.

Hugh

Trevor_Squires · June 19, 2007, 4:50pm

Hey,

Sorry if this has already been mentioned.

If you manually create a couple of zipcodes like this

ZipCode.transaction do ZipCode.create(:foo => 'data', :bar => 'data') ZipCode.create(:foo => 'data', :bar => 'data') end

The logs will show you that only one transaction was issued.

This is (in rails) by far the most common strategy used for taking control of your transactions: passing a block to transaction().

To take advantage of it you're going to have to partition your input csv rows into groups of 100 - something like this:

all_your_csv_input_rows.each_slice(100) do |batch| ZipCode.transaction do batch.each do |row| ZipCode.create(:foo => row[1], :bar => row[2]) end end end

HTH, Trevor

Topic		Replies	Views
Commit every Nth INSERT in migration rubyonrails-talk	0	106	June 19, 2007
How I Import file CSV into database rubyonrails-talk	1	227	March 2, 2007
report rubyonrails-talk	0	143	November 27, 2008
Inserting A Large Amount of Records into DB using Rake, Ruby and ActiveRecord? Help :-) rubyonrails-talk	1	135	November 23, 2006
migration rubyonrails-talk	1	106	October 3, 2008

Commit every Nth INSERT in migration

Related topics

More Resources