PostgreSQL aggregate function inconsistent (returns strings)

I wrote this question on RoRTalk back in August but haven't heard back
yet:
http://tinyurl.com/4ohxdnf. So I think I must've been unclear.

Assume you have a Sale model with just a product Id and a quantity
sold. You want to see a total number of sales for each product.

Product.group(:product_id).select("product_id, sum(quantity) as
total_quantity")

Let's collect just the totals to see what they look like in irb:
Product.group(:product_id).select("product_id, sum(quantity) as
total_quantity").map(&:total_quantity)

In SQLite (and MySQL I think) I get the following:
=> [293.00, 4.00, 76.00, 9.00, 370.25, 71.00]

BUT! PostgreSQL returns this:
=> ["293.00", "4.00", "76.00", "9.00", "370.25", "71.00"]

Strings! Why strings!? Am I doing something wrong? Why is this
happening, how do I fix it, and why doesn't ActiveRecord protect poor
little me from the mean world of db inconsistencies? :wink:

Thank in advance.
PS Quantity is a decimal in the schema.

I wrote this question on RoRTalk back in August but haven't heard back
yet:http://tinyurl.com/4ohxdnf. So I think I must've been unclear.

Assume you have a Sale model with just a product Id and a quantity
sold. You want to see a total number of sales for each product.

Product.group(:product_id).select("product_id, sum(quantity) as
total_quantity")

Let's collect just the totals to see what they look like in irb:
Product.group(:product_id).select("product_id, sum(quantity) as
total_quantity").map(&:total_quantity)

In SQLite (and MySQL I think) I get the following:
=> [293.00, 4.00, 76.00, 9.00, 370.25, 71.00]

BUT! PostgreSQL returns this:
=> ["293.00", "4.00", "76.00", "9.00", "370.25", "71.00"]

Strings! Why strings!? Am I doing something wrong? Why is this
happening, how do I fix it, and why doesn't ActiveRecord protect poor
little me from the mean world of db inconsistencies? :wink:

In general AR doesn't know the type of non column expressions.
If you did something like Product..group(:product_id).sum(:quantity)
then AR knows you're doing a sum, and it knows that the sum of
decimals should be decimals so it would cast what it got back from the
db to the appropriate type

Fred

As always, Fred, thanks for your reply.

The example you give works until you include a join, then you get
strings again.

Order.joins(:sales).group(:product_id).sum(:quantity)

That returns strings again. I don't think I did anything AR shouldn't
be aware of. BTW, the product_id is returned as a string too. I've
verified that SQLite3 returns numbers for both. This really seems
broken to me.

Order has_many :sales
Sale belongs_to :order
Order has a ordered_at datetime and the seller_id, Sale has the
product_id and quantity. This is why I need the join. (Oh, and Sale is
actually LineItem/line_item, although I doubt that makes a
difference.)

d.

As always, Fred, thanks for your reply.

The example you give works until you include a join, then you get
strings again.

Order.joins(:sales).group(:product_id).sum(:quantity)

Quantity isn't on the model actually being queried so this doesn't
surprise me. It does suck though. It looks like the sqlite3 driver is
just smarter about asking the db for the types of the columns (I think
that with sqlite3 you sort of don't have a choice the way the api is
written, whereas with mysql you get all the columns as strings "for
free". I could be wrong though. I don't know what the postgres api is
like at all).

Fred

Okay, I understand what you are saying about :quantity not being on
the Order table. (Interesting, though... I just tried replacing "sum"
with "count" and guess what: numeric values come back.)

So two possible solutions: use ruby (in the model) to "fix" the hash
after the query, or use hardcoded SQL that explicitly declares the
return type instead of letting AR construct the SQL. Sounds about
right?

Okay, I understand what you are saying about :quantity not being on
the Order table. (Interesting, though... I just tried replacing "sum"
with "count" and guess what: numeric values come back.)

So two possible solutions: use ruby (in the model) to "fix" the hash
after the query, or use hardcoded SQL that explicitly declares the
return type instead of letting AR construct the SQL. Sounds about
right?

I'm coming into this way late, but what happens if you...

....sum('sales.quantity')

Would that give AR enough of a hint to figure out what table/type to cast it to?

Can you post the actual SQL being generated? I didn't see it in the archives...

You're not too late and I appreciate your feedback.

But it gives the same stringified results. So, I don't think AR is
tries to interpret the types at all. It always thought it did in order
to provide a consistent interface. The fact that count works as
expected confuses me though. I'd think count and sum would behave
identically (except for the result, obviously).

One other thing... the SQL generated by AR returns the correct column
types if pasted in the pgsql shell or in pgAdmin.

You're not too late and I appreciate your feedback.

But it gives the same stringified results. So, I don't think AR is
tries to interpret the types at all. It always thought it did in order
to provide a consistent interface. The fact that count works as
expected confuses me though. I'd think count and sum would behave
identically (except for the result, obviously).

Well, count() is just counting rows... it has to be an integer. SUM() might be counting other things (decimals say) so maybe AR can't decide.

Still, it should be able to do it..

What's the SQL being generated?

Any difference if instead of joins() you includes() ?

Right about count() being an integer and sum() needing to be smarter.
That makes sense.

The answers to your questions are, AR generated this SQL:
SELECT SUM(quantity) AS sum_quantity, product_id AS product_id FROM
"organizations" INNER JOIN "orders" ON "orders"."organization_id" =
"organizations"."id" INNER JOIN "line_items" ON
"line_items"."order_id" = "orders"."id" GROUP BY product_id;

(It's also curious that product_id comes back as a string too.)

Second, includes() returns strings (for both id and sum) as well. The
sql for that is, predictably:
SELECT SUM(quantity) AS sum_quantity, product_id AS product_id FROM
"organizations" LEFT OUTER JOIN "orders" ON "orders"."organization_id"
= "organizations"."id" LEFT OUTER JOIN "line_items" ON
"line_items"."order_id" = "orders"."id" GROUP BY product_id;

I dug into the source code and think I have a workaround.

First, there is a more succinct syntax for aggregate calculations, in
this case:
Sale.sum(:quantity, :joins => :order, :group => :product_id)

This returns the decimals like I want, instead of the strings I don't.

Second, you'll notice this starts on the Sale model, not the Order
model. I think that's where AR is getting confused: Sale is query
table's primary model and the model with the column to be summed. If
you reverse it into:
Order.sum(:quantity, :joins => :sales, group => :product_id)

You'll get strings again. Same results, except stringified.

That's satisfies my dilema. I hope it can help someone else. And
thanks Frederick and Philip for your help (and Ryan Bates for the
railscast 239 on reading rails source that just happened to use
ActiveRecord as the example -- he's always doing that!).

Interesting. I don't think AR will ever get the right data types since it has no idea what "sum_quantity" should translate to. In Rails 3, Postgresql 9.x I have User and BlogPost. Users have many blog posts. Use also has an 'old_accounts_id' which is an integer and we'll use for summing... I think this matches your setup...

ruby-1.8.7-p330 :023 > BlogPost.sum(:old_accounts_id, :joins => :user, :group => 'blog_posts.id').first

  SQL (3.8ms) SELECT SUM(old_accounts_id) AS sum_old_accounts_id, blog_posts.id AS blog_posts_id FROM "blog_posts" INNER JOIN "users" ON "users"."id" = "blog_posts"."user_id" GROUP BY blog_posts.id

=> [5603, "0"]

ruby-1.8.7-p330 :024 > User.sum(:old_accounts_id, :joins => :blog_posts, :group => 'blog_posts.id').first

  SQL (5.0ms) SELECT SUM("users"."old_accounts_id") AS sum_old_accounts_id, blog_posts.id AS blog_posts_id FROM "users" INNER JOIN "blog_posts" ON "blog_posts"."user_id" = "users"."id" GROUP BY blog_posts.id

=> [5603, 0]

Note that by switching the query around so I run the query on the model I'm summing and join the other one I get back integers.

The only difference that I can think of is that in the former AR knows about 'BlogPost' and 'sum_old_accounts_id' and even with it's magic can't relate the two. In the latter (i'm guessing, haven't checked the code) that it knows 'User' and 'sum_old_accounts_id'. It knows that it called 'sum' to turns it into 'old_accounts_id' which it knows is a field of User so type casts it.

Maybe. Would be interesting to dig into the source and see....

But perhaps you can flip your query around the other way and get it to work....

-philip

You've pretty much hit it on the head - digging into the 2.3.10 source
(what I've got handy) shows the code calling column_for on the given
name, which will *only* return a column object if it's a field on the
model you called .sum on. That column is then used to cast the result
to the correct type. [lines 241-312 in calculations.rb, 2.3.10
version]

The SQlite3 issue is a red herring - there's a whole different typing
mechanism in play there:
http://www.sqlite.org/datatype3.html

It might be a worthwhile patch to try looking up the column in the
*right* table, but that's only going to be obvious when the name is
qualified (:'users.old_account_id') instead of just :old_account_id
(allowed by SQL, if it's not ambiguous).

--Matt Jones

It turns out Philip and I were drafting our posts at the same time,
which is why they might sound redundant.

Matt, I'm using Rail 3.0.3 but was looking in the same file as you.
Perhaps the correct return type could be inferred if the aliased
column name was constructed to include the type, in addition to the
column and function names. In Philip's example it would change from:
  sum_old_accounts_id
...to...
  sum_old_accounts_id_integer

That seems pretty deterministic, yes?

Wait, that can't really happen where I thought it could.

It seems like sum etc. should optionally take a hash for the first
parameter, as in table => column:
BlogPost.sum(:user => :old_accounts_id, :group =>
'blog_posts.id').first

The resultant SQL would start out with:
SELECT SUM(users.old_accounts_id) AS sum_old_accounts_id...

This syntax would allow us to assume :joins => :user, unless :includes
=> :user was added.

The table and column would be available to get the correct type.

There is the possibility of identically named columns in two or more
tables, right? So, without something like this, AR is just hoping the
database will figure out which column to sum anyway.

Wait, that can't really happen where I thought it could.

It seems like sum etc. should optionally take a hash for the first
parameter, as in table => column:
BlogPost.sum(:user => :old_accounts_id, :group =>
'blog_posts.id').first

The resultant SQL would start out with:
SELECT SUM(users.old_accounts_id) AS sum_old_accounts_id...

This syntax would allow us to assume :joins => :user, unless :includes
=> :user was added.

The table and column would be available to get the correct type.

There is the possibility of identically named columns in two or more
tables, right? So, without something like this, AR is just hoping the
database will figure out which column to sum anyway.

That's my take yeah. After the call Rails could look at all the tables involved and if it finds one and only one match, cast it. If it finds multiple, don't touch it, but I doubt that will ever happen as the database should have already freaked out about that case...