Generate 50,000+ line CSV

Hi,

My best effort so far is to use Model.connection.select_all supplying the appropriate sql to get the data into a hash and then iterate through the hash building a csv string by appending each hash item onto a result string with << This is still taking over 15 seconds.   

Well.. you can use the old approach of relying on your db when you need to deal with large amounts of data, since dbms are pretty good for that.

You could do like this

ActiveRecord::Base.connection.select_all("select concat(first_field , ',' , second_field,',',you_get_the_idea) as csv_row from your_table where you_tell_me")

So, that way you will get directly an array with a csv string on each row. You just need to put it to your file and that's it.

I don't know how fast it will be, but for sure better than before.

Regards,

javier ramirez

javier ramirez wrote:

I don't know how fast it will be, but for sure better than before.   

just out of curiosity i just tried to make 50000 inserts in my db (well, they were actually 70K) then selecting them and making a .join("\n") to get the string you'd need to put in your file and it takes a bit over 2 seconds (and we are talking windows here), so it should be below the 2 seconds you wanted in a linux or in a faster server (i'm running the mysql on the same box as everything else so performance could definitely be better)

If you want to go even faster, you can directly dump from mysql to file with "SELECT INTO OUTFILE" but that would limit you to the server where the db runs.

regards,

javier ramirez

Don't forget CSV escaping of "

Hi,

There is some mention in the MySql docs that using the command mysql -e SELECT... > file_name would allow you to generate the file on the client machine but I couldn't seem to get that to work from Rails.   

yeah, i didn't tell you about the -e option because it's a bit harder to implement and maintain.

-e just tells mysql to execute the code you want, so you could do a system call from ruby to path/mysql -e "whatever". Problem here is that you were outputting into a file, so with that syntax you will still write on the server.

You could try and do something like mysql -u user -pPassword -e "select concat(a,',',b) from table" db_name --skip-column-names -s > output_file

which tells mysql to connect to the db with the user and password and execute the select and output no column names and in silent mode (meaning no characters to format the output as an ascii table), and then you just redirect that output to a file.

That leaves you with a bunch of things to pass as parameters if you want to do it right (path to mysql, path to the file, db_name, user, password, which can be taken from the config loaded from database.yml) and also is sensitive to permissions on the server for the output.

If writing the file directly on the server is something you prefer not to do (you could always make a NFS entry so you can link to your machine from your db server) and if you really need to optimize so you cannot go with the select and concat and write, I'd say it's the option to go with. But if you really don't need it, I'd say it's the more complex.

regards,

javier ramírez