I'm using ActiveRecord's connection to execute a custom query (which
runs fine in psql) like this:
result = ActiveRecord::Base.connection.execute("COPY (select * from users) TO STDOUT WITH CSV;")
SQL (0.8ms) COPY (select * from users) TO STDOUT WITH CSV;
=> #<PGresult:0x2589ad8>
result.nfields
=> 39
result.ntuples
=> 0
An instance of PGresult is returned. How can I get data out of it?
Thanks!
I'm using ActiveRecord's connection to execute a custom query (which
runs fine in psql) like this:
result = ActiveRecord::Base.connection.execute("COPY (select * from users) TO STDOUT WITH CSV;")
SQL (0.8ms) COPY (select * from users) TO STDOUT WITH CSV;
=> #<PGresult:0x2589ad8>
result.nfields
=> 39
result.ntuples
=> 0
An instance of PGresult is returned. How can I get data out of it?
Load up the rdocs for the postgres gem... it will tell you.... but why do it this way? Why not use AR to get your records and fastercsv to convert it to CSV? Much more portable...
Thanks Philip,
The Postgres gem rdoc hasn't been much help. All of the PGresult
instance methods that retrieve values require a tuple number. My
PGresult has a number of fields, but not tuples (rows).
This query (when run in psql) returns a large block of text. For
example:
email,fname,lname,created_at
foo@example.com,Foo,Fooster,2009-07-07 17:00:41.929865
bar@example.com,Bar,Barbie,2009-07-01 20:31:08.659965
user@example.com,User,User,2009-07-07 20:33:53.293606
admin@example.com,Admin,Admin,2009-07-07 20:33:53.760538
I agree 100% that using FasterCSV and ActiveRecord to pull the data is
much more portable (and elegant). In fact, that's how I'm doing it
now. However, this is a huge dataset that is causing server timeouts
and hogs memory. I'm investigating csv generation in Postgres as it
takes a fraction of the time and resources because each object isn't
getting instantiated.
Any thoughts?
I found a temporary work-around where I ask PostgreSQL to save the
output in a file instead of stdout...
sql = "COPY (select * from users) TO '/tmp/file.csv' WITH CSV HEADER;"
=> "COPY (select * from users) TO '/tmp/file.csv' WITH CSV HEADER;"
ActiveRecord::Base.connection.execute(sql)
SQL (0.2ms) SET client_min_messages TO 'panic'
SQL (0.1ms) SET client_min_messages TO 'notice'
SQL (2.6ms) COPY (select * from users) TO '/tmp/file.csv' WITH CSV
HEADER;
=> #<PGresult:0x2650188>
csv_string = File.read '/tmp/file.csv'
=> "email,fname,lname,created_at \nfoo@example.com,Foo,Fooster,
2009-07-07 17:00:41.929865\nbar@example.com,Bar,Barbie,2009-07-01
20:31:08.659965\nuser@example.com,User,User,2009-07-07
20:33:53.293606\nadmin@example.com,Admin,Admin,2009-07-07
20:33:53.760538\n"
This is not ideal, but it's a start.
Any other ideas?
Thanks Philip,
The Postgres gem rdoc hasn't been much help. All of the PGresult
instance methods that retrieve values require a tuple number. My
PGresult has a number of fields, but not tuples (rows).
This query (when run in psql) returns a large block of text. For
example:
email,fname,lname,created_at
foo@example.com,Foo,Fooster,2009-07-07 17:00:41.929865
bar@example.com,Bar,Barbie,2009-07-01 20:31:08.659965
user@example.com,User,User,2009-07-07 20:33:53.293606
admin@example.com,Admin,Admin,2009-07-07 20:33:53.760538
I agree 100% that using FasterCSV and ActiveRecord to pull the data is
much more portable (and elegant). In fact, that's how I'm doing it
now. However, this is a huge dataset that is causing server timeouts
and hogs memory. I'm investigating csv generation in Postgres as it
takes a fraction of the time and resources because each object isn't
getting instantiated.
Any thoughts?
Skip AR and use the select_values() call perhaps. That will skip any
object instantiation at least.
If your dataset is truly is huge an approach that relies on first
extracting the entire dataset and then serving it is not going to work
reliably. If it doesn't fail outright due to a timeout, it is still
going to strain your memory considerably.
As far as I can see, there are two ways around this.
If the export doesn't have to be current to this very instance, set up a
background job that exports it regularly to a file and leave the serving
to the web server.
If that doesn't meet your needs, consider streaming the data. Have a
look at the send_data (controller) method. It may even be sensible to
handle this in Metal, i.e. a Rack module. I haven't tried any of this
myself, the following link might be helpful
http://amberbit.com/blog/2009/04/15/ruby-flv-pseudostreaming-
implemented-using-sinatra-and-rack-evil-useful-for-rails-too/
Michael
Thanks Sijo,
I ended up using something very similar:
pg_conn = self.connection.instance_variable_get(:@connection)
pg_conn.exec("COPY (#{sql}) TO STDOUT WITH CSV HEADER")
csv =
while (line = pg_conn.getline) != "\\."
csv << line
end
pg_conn.endcopy
csv.join("\n")
It's a hack but it works for now.