Commit every Nth INSERT in migration

Ruby. I extracted the import of a single table into a script and trimmed the CSV down to some 5000 rows ... it still takes some 157s. While a raw DBI+DBD::msyql import in Perl takes some 72s with autocommit and 1-2s while commiting only once every 100 rows.

You may find that (Perl|Python|Lua|...) are quicker than ruby for some tasks. If performance is your biggest constraint then it may be worth choosing the fastest tool for each bit, till the others catch up... But, is there anything else to be done?         [...]

I tried to do the commit trick, but seems Ruby doesn't like the least bit more than I like Ruby and it doesn't work. What am I doing wrong?

require 'rubygems' require 'active_record' require 'yaml' require 'logger' require 'fastercsv'

dbconfig = YAML::load(File.open('M:\ecom\config\database.yml'))

[OT, but may be worth having a bit more error checking in there]

ActiveRecord::Base.establish_connection(dbconfig['development']) #ActiveRecord::Base.logger = Logger.new('zkSqlBatch2.log', File::WRONLY > File::APPEND | File::CREAT) #ActiveRecord::Base.logger.level = Logger::DEBUG

#ActiveRecord::Base.colorize_logging = false

class ZipCode < ActiveRecord::Base end

ZipCode.delete_all

All that looks fairly normal to me...

started = Time.new()

i = 0 ZipCode.connection.begin_db_transaction()

To squeeze this further, put the result of zipconde.connection in a variable.

FasterCSV.foreach("M:/ecom/db/csv_lookup_data/postal_codes_some.csv", :col_sep => "|") do |row|

FasterCSV is certainly faster than CSV, but it may be quicker to just split lines on /\|/, if they are sufficiently simple.

M: -- speed of access to that drive isn't a problem? If you've tried Perl etc then probably not. (Rhetorical question, BYTK)

  ZipCode.create(     :postal_code => row[1] , :country_id => 159, :city => row[3], :latitude => row[4],     :longitude=> row[5],:state_id => row[6]   )   i+=1   if (i > 100) then     i = 0     ZipCode.connection.commit_db_transaction     ZipCode.connection.begin_db_transaction   end end ZipCode.connection.commit_db_transaction

puts "Taken #{Time.new - started} s"

As this is ruby code, I suggest you run it with -rprofile to find out which bits are actually slow. The profiler will slow things down of course, but you'll at least see where the bottlenecks are so you can work on those.

        Hugh

Hey,

Sorry if this has already been mentioned.

If you manually create a couple of zipcodes like this

ZipCode.transaction do   ZipCode.create(:foo => 'data', :bar => 'data')   ZipCode.create(:foo => 'data', :bar => 'data') end

The logs will show you that only one transaction was issued.

This is (in rails) by far the most common strategy used for taking control of your transactions: passing a block to transaction().

To take advantage of it you're going to have to partition your input csv rows into groups of 100 - something like this:

all_your_csv_input_rows.each_slice(100) do |batch|   ZipCode.transaction do     batch.each do |row|       ZipCode.create(:foo => row[1], :bar => row[2])     end   end end

HTH, Trevor