Ruby. I extracted the import of a single table into a script and trimmed the CSV down to some 5000 rows ... it still takes some 157s. While a raw DBI+DBD::msyql import in Perl takes some 72s with autocommit and 1-2s while commiting only once every 100 rows.
You may find that (Perl|Python|Lua|...) are quicker than ruby for some tasks. If performance is your biggest constraint then it may be worth choosing the fastest tool for each bit, till the others catch up... But, is there anything else to be done? [...]
I tried to do the commit trick, but seems Ruby doesn't like the least bit more than I like Ruby and it doesn't work. What am I doing wrong?
require 'rubygems' require 'active_record' require 'yaml' require 'logger' require 'fastercsv'
dbconfig = YAML::load(File.open('M:\ecom\config\database.yml'))
[OT, but may be worth having a bit more error checking in there]
ActiveRecord::Base.establish_connection(dbconfig['development']) #ActiveRecord::Base.logger = Logger.new('zkSqlBatch2.log', File::WRONLY > File::APPEND | File::CREAT) #ActiveRecord::Base.logger.level = Logger::DEBUG
#ActiveRecord::Base.colorize_logging = false
class ZipCode < ActiveRecord::Base end
ZipCode.delete_all
All that looks fairly normal to me...
started = Time.new()
i = 0 ZipCode.connection.begin_db_transaction()
To squeeze this further, put the result of zipconde.connection in a variable.
FasterCSV.foreach("M:/ecom/db/csv_lookup_data/postal_codes_some.csv", :col_sep => "|") do |row|
FasterCSV is certainly faster than CSV, but it may be quicker to just split lines on /\|/, if they are sufficiently simple.
M: -- speed of access to that drive isn't a problem? If you've tried Perl etc then probably not. (Rhetorical question, BYTK)
ZipCode.create( :postal_code => row[1] , :country_id => 159, :city => row[3], :latitude => row[4], :longitude=> row[5],:state_id => row[6] ) i+=1 if (i > 100) then i = 0 ZipCode.connection.commit_db_transaction ZipCode.connection.begin_db_transaction end end ZipCode.connection.commit_db_transaction
puts "Taken #{Time.new - started} s"
As this is ruby code, I suggest you run it with -rprofile to find out which bits are actually slow. The profiler will slow things down of course, but you'll at least see where the bottlenecks are so you can work on those.
Hugh