Commit every Nth INSERT in migration

Ruby. I extracted the import of a single table into a script and trimmed
the CSV down to some 5000 rows ... it still takes some 157s. While a raw
DBI+DBD::msyql import in Perl takes some 72s with autocommit and 1-2s
while commiting only once every 100 rows.

You may find that (Perl|Python|Lua|...) are quicker than ruby for some tasks.
If performance is your biggest constraint then it may be worth choosing
the fastest tool for each bit, till the others catch up... But, is there
anything else to be done?
        [...]

I tried to do the commit trick, but seems Ruby doesn't like the least
bit more than I like Ruby and it doesn't work. What am I doing wrong?

require 'rubygems'
require 'active_record'
require 'yaml'
require 'logger'
require 'fastercsv'

dbconfig = YAML::load(File.open('M:\ecom\config\database.yml'))

[OT, but may be worth having a bit more error checking in there]

ActiveRecord::Base.establish_connection(dbconfig['development'])
#ActiveRecord::Base.logger = Logger.new('zkSqlBatch2.log', File::WRONLY
> File::APPEND | File::CREAT)
#ActiveRecord::Base.logger.level = Logger::DEBUG

#ActiveRecord::Base.colorize_logging = false

class ZipCode < ActiveRecord::Base
end

ZipCode.delete_all

All that looks fairly normal to me...

started = Time.new()

i = 0
ZipCode.connection.begin_db_transaction()

To squeeze this further, put the result of zipconde.connection in
a variable.

FasterCSV.foreach("M:/ecom/db/csv_lookup_data/postal_codes_some.csv",
:col_sep => "|") do |row|

FasterCSV is certainly faster than CSV, but it may be quicker to just
split lines on /\|/, if they are sufficiently simple.

M: -- speed of access to that drive isn't a problem? If you've tried
Perl etc then probably not. (Rhetorical question, BYTK)

  ZipCode.create(
    :postal_code => row[1] , :country_id => 159, :city => row[3],
:latitude => row[4],
    :longitude=> row[5],:state_id => row[6]
  )
  i+=1
  if (i > 100) then
    i = 0
    ZipCode.connection.commit_db_transaction
    ZipCode.connection.begin_db_transaction
  end
end
ZipCode.connection.commit_db_transaction

puts "Taken #{Time.new - started} s"

As this is ruby code, I suggest you run it with -rprofile to find
out which bits are actually slow. The profiler will slow things down
of course, but you'll at least see where the bottlenecks are so you
can work on those.

        Hugh

Hey,

Sorry if this has already been mentioned.

If you manually create a couple of zipcodes like this

ZipCode.transaction do
  ZipCode.create(:foo => 'data', :bar => 'data')
  ZipCode.create(:foo => 'data', :bar => 'data')
end

The logs will show you that only one transaction was issued.

This is (in rails) by far the most common strategy used for taking
control of your transactions: passing a block to transaction().

To take advantage of it you're going to have to partition your input
csv rows into groups of 100 - something like this:

all_your_csv_input_rows.each_slice(100) do |batch|
  ZipCode.transaction do
    batch.each do |row|
      ZipCode.create(:foo => row[1], :bar => row[2])
    end
  end
end

HTH,
Trevor