Pagination. Comes up all the time for us - we have a query which paginates through table A, and we want to also grab the total of table B records owned by each A, so we have a joins().group().select(), maybe even a having(). Then count() comes along and runs our party by returning a hash instead of using a subquery to count the number of results that would returned by the original scope.
Two separate things have been conflated for historical reasons - counting results which may use grouping themselves, and getting grouped counts.
So I don’t think we should fix #size, I think we should fix #count. IMHO there should be a #grouped_count or something to get the current behavior, and #count should always find the count of things in the relation.
I’m against to count and size works equally since I would want to know how many grouped records do I have (i.e.: how many roles do I have on determining relation).
I think that count and size is working as both should.
Although, if you want to count all records of a grouped relation, you have to remove the group or just sum the counts:
User.count
or:
User.group(:role).count.values.sum
Not elegant but if you can’t remove the group for that count, you have to do that.
One possible option would rails have something like regroup(false) as reorder(false), but again I think that just adding the values is more easier for everyone
That’s why I said that In this specific case it was like an “alias”, not exactly an alias. Count will always query the DB anyway, but let’s hear more thoughts on this one.
I’m not sure if you get what .values.sum would do:
c = User.group(:role).count
=> {:normal=>3, :admin=>4}
c.values.sum
=> 7
Even a hash with thousands of keys/values would sum its values very fast.
And I don’t get what you mean with “is actually changing what would be counted”.
Considering c = User.group(:role).count, c.values.sum would return the exactly number of records that would be returned and c.keys.sum would return exactly number of group param that would be returned, no?
Yes, but it will have to load all the rows from the database into ruby. If we have a table with a few million of rows, which is pretty small for a real app, and we’re doing a join and a group to get the count of some associated record, then calling .values.sum would load all those millions of rows, and then add up the values.
Returning the millions of rows (even if it was just the IDs and counts) is many orders of magnitude slower than getting the database to tell us how many rows would be in the rowset, and it’s going to use a lot of memory too.
Yes, if your app only has thousands of rows, then it won’t matter. But this is not a good argument for a database library to be inefficient.
The point is that count should always return a count of the results in a relation, irrespective of what is involved in the relation. That’s the point of relational algebra; you can combine two operations and you get the logical composite.
Returning the count of each grouping is a logically separate thing to do, and it should have a separate API call.
I realise that that is the main use case for grouped count, but I’m not sure that you read through the scenario I described. If we are using joins + group to add data from another table to every row in the primary table, and not grouping any further (ie. group by primary_table.id) then the number of rows in the resultset is still the number of rows in the primary table.
This is not what you want to do when you’re just trying to find out how many rows will be in the table join so you can paginate over them (for example).
Again, both use cases for grouping are valid, but they are completely separate, and #count should not do both, as that means you can’t assume that you get a useful result out of calling count on an arbitrary relation.