ActiveRecord, PostgreSQL question

I'm using ActiveRecord's connection to execute a custom query (which runs fine in psql) like this:

result = ActiveRecord::Base.connection.execute("COPY (select * from users) TO STDOUT WITH CSV;")

  SQL (0.8ms) COPY (select * from users) TO STDOUT WITH CSV; => #<PGresult:0x2589ad8>

result.nfields

=> 39

result.ntuples

=> 0

An instance of PGresult is returned. How can I get data out of it?

Thanks!

I'm using ActiveRecord's connection to execute a custom query (which runs fine in psql) like this:

result = ActiveRecord::Base.connection.execute("COPY (select * from users) TO STDOUT WITH CSV;")

SQL (0.8ms) COPY (select * from users) TO STDOUT WITH CSV; => #<PGresult:0x2589ad8>

result.nfields

=> 39

result.ntuples

=> 0

An instance of PGresult is returned. How can I get data out of it?

Load up the rdocs for the postgres gem... it will tell you.... but why do it this way? Why not use AR to get your records and fastercsv to convert it to CSV? Much more portable...

Thanks Philip,

The Postgres gem rdoc hasn't been much help. All of the PGresult instance methods that retrieve values require a tuple number. My PGresult has a number of fields, but not tuples (rows).

This query (when run in psql) returns a large block of text. For example: email,fname,lname,created_at foo@example.com,Foo,Fooster,2009-07-07 17:00:41.929865 bar@example.com,Bar,Barbie,2009-07-01 20:31:08.659965 user@example.com,User,User,2009-07-07 20:33:53.293606 admin@example.com,Admin,Admin,2009-07-07 20:33:53.760538

I agree 100% that using FasterCSV and ActiveRecord to pull the data is much more portable (and elegant). In fact, that's how I'm doing it now. However, this is a huge dataset that is causing server timeouts and hogs memory. I'm investigating csv generation in Postgres as it takes a fraction of the time and resources because each object isn't getting instantiated.

Any thoughts?

I found a temporary work-around where I ask PostgreSQL to save the output in a file instead of stdout...

sql = "COPY (select * from users) TO '/tmp/file.csv' WITH CSV HEADER;"

=> "COPY (select * from users) TO '/tmp/file.csv' WITH CSV HEADER;"

ActiveRecord::Base.connection.execute(sql)

  SQL (0.2ms) SET client_min_messages TO 'panic'   SQL (0.1ms) SET client_min_messages TO 'notice'   SQL (2.6ms) COPY (select * from users) TO '/tmp/file.csv' WITH CSV HEADER; => #<PGresult:0x2650188>

csv_string = File.read '/tmp/file.csv'

=> "email,fname,lname,created_at \nfoo@example.com,Foo,Fooster, 2009-07-07 17:00:41.929865\nbar@example.com,Bar,Barbie,2009-07-01 20:31:08.659965\nuser@example.com,User,User,2009-07-07 20:33:53.293606\nadmin@example.com,Admin,Admin,2009-07-07 20:33:53.760538\n"

This is not ideal, but it's a start.

Any other ideas?

Thanks Philip,

The Postgres gem rdoc hasn't been much help. All of the PGresult instance methods that retrieve values require a tuple number. My PGresult has a number of fields, but not tuples (rows).

This query (when run in psql) returns a large block of text. For example: email,fname,lname,created_at foo@example.com,Foo,Fooster,2009-07-07 17:00:41.929865 bar@example.com,Bar,Barbie,2009-07-01 20:31:08.659965 user@example.com,User,User,2009-07-07 20:33:53.293606 admin@example.com,Admin,Admin,2009-07-07 20:33:53.760538

I agree 100% that using FasterCSV and ActiveRecord to pull the data is much more portable (and elegant). In fact, that's how I'm doing it now. However, this is a huge dataset that is causing server timeouts and hogs memory. I'm investigating csv generation in Postgres as it takes a fraction of the time and resources because each object isn't getting instantiated.

Any thoughts?

Skip AR and use the select_values() call perhaps. That will skip any
object instantiation at least.

If your dataset is truly is huge an approach that relies on first extracting the entire dataset and then serving it is not going to work reliably. If it doesn't fail outright due to a timeout, it is still going to strain your memory considerably.

As far as I can see, there are two ways around this.

If the export doesn't have to be current to this very instance, set up a background job that exports it regularly to a file and leave the serving to the web server.

If that doesn't meet your needs, consider streaming the data. Have a look at the send_data (controller) method. It may even be sensible to handle this in Metal, i.e. a Rack module. I haven't tried any of this myself, the following link might be helpful

http://amberbit.com/blog/2009/04/15/ruby-flv-pseudostreaming- implemented-using-sinatra-and-rack-evil-useful-for-rails-too/

Michael

Thanks Sijo,

I ended up using something very similar:

pg_conn = self.connection.instance_variable_get(:@connection) pg_conn.exec("COPY (#{sql}) TO STDOUT WITH CSV HEADER") csv = while (line = pg_conn.getline) != "\\."   csv << line end pg_conn.endcopy csv.join("\n")

It's a hack but it works for now.