Feature idea: ActiveRecord::Relation.size always returns an integer

ActiveRecord::Relation.size returns a Hash when I use a group method.

I think that a size method is expected to return integer typically.

I propose this implementation:

def size
  if loaded?
    @records.length
  else
    count_value = count(:all)
    count_value.is_a?(Hash) ? count_value.length : count_value
  end
end

``

This is a current implementation:

https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation.rb

# Returns size of the records.
def size
  loaded? ? @records.length : count(:all)
end

``

A count method is implemented on calculations.rb.

https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation/calculations.rb

What do you think?

Thank you

size in this case is an pretty much an “alias” for count, thus it’s gonna work the same way as count when grouping.

Is there any particular reason to know how many records returned if you are grouping?

Pagination. Comes up all the time for us - we have a query which paginates through table A, and we want to also grab the total of table B records owned by each A, so we have a joins().group().select(), maybe even a having(). Then count() comes along and runs our party by returning a hash instead of using a subquery to count the number of results that would returned by the original scope.

Two separate things have been conflated for historical reasons - counting results which may use grouping themselves, and getting grouped counts.

So I don’t think we should fix #size, I think we should fix #count. IMHO there should be a #grouped_count or something to get the current behavior, and #count should always find the count of things in the relation.

size in this case is an pretty much an “alias” for count, thus it’s gonna work the same way as count when grouping.

I think that current #size don’t behave like an alias for #count.

g = User.group(:role); g.load; g.count
=> {:normal=>3, :admin=>4}

g = User.group(:role); g.load; g.size
=> 2

``

When I call #load before #count or #size, a return value of #count don’t equals to a return value of #size.

Is there any particular reason to know how many records returned if you are grouping?

I expect #count and #size to return same value.

It is not the case that I want to know how many records the grouping query returns.

My ideal implementation:

g = User.group(:role); g.load; g.count
=> {:normal=>3, :admin=>4}

g = User.group(:role); g.load; g.size
=>{:normal=>3, :admin=>4}

``

or

g = User.group(:role); g.load; g.count
=> 2

g = User.group(:role); g.load; g.size
=>2

``

What do you think?

Thank you

2014年9月12日金曜日 8時21分39秒 UTC+9 Carlos Antonio da Silva:

I’m against to count and size works equally since I would want to know how many grouped records do I have (i.e.: how many roles do I have on determining relation).

I think that count and size is working as both should.

Although, if you want to count all records of a grouped relation, you have to remove the group or just sum the counts:

User.count

or:

User.group(:role).count.values.sum

Not elegant but if you can’t remove the group for that count, you have to do that.

One possible option would rails have something like regroup(false) as reorder(false), but again I think that just adding the values is more easier for everyone :slight_smile:

That’s why I said that In this specific case it was like an “alias”, not exactly an alias. Count will always query the DB anyway, but let’s hear more thoughts on this one.

Carlos Antonio da Silva - via celular

Calling .values.sum might have to load a LOT of (id, count) rows so it isn’t a very scalable solution.

Calling regroup or something is actually changing what would be counted - the group can affect the count, especially if you use HAVING clauses.

There’s nothing wrong with having the grouped count functionality, but it’s messing up the #count API IMHO.

I’m not sure if you get what .values.sum would do:

c = User.group(:role).count
=> {:normal=>3, :admin=>4}

c.values.sum

=> 7

Even a hash with thousands of keys/values would sum its values very fast.

And I don’t get what you mean with “is actually changing what would be counted”.

Considering c = User.group(:role).count, c.values.sum would return the exactly number of records that would be returned and c.keys.sum would return exactly number of group param that would be returned, no?

Yes, but it will have to load all the rows from the database into ruby. If we have a table with a few million of rows, which is pretty small for a real app, and we’re doing a join and a group to get the count of some associated record, then calling .values.sum would load all those millions of rows, and then add up the values.

Returning the millions of rows (even if it was just the IDs and counts) is many orders of magnitude slower than getting the database to tell us how many rows would be in the rowset, and it’s going to use a lot of memory too.

Yes, if your app only has thousands of rows, then it won’t matter. But this is not a good argument for a database library to be inefficient.

The point is that count should always return a count of the results in a relation, irrespective of what is involved in the relation. That’s the point of relational algebra; you can combine two operations and you get the logical composite.

Returning the count of each grouping is a logically separate thing to do, and it should have a separate API call.

That’s the point, relation.count will always perform a COUNT on database.

I think there is a misunderstanding going on here.

relation.count will always perform a count on database, it won’t load every row in our ruby process:

(main) normal> u = User.group(:profile_id); u.count

(0.4ms) SELECT COUNT(*) AS count_all, profile_id AS profile_id FROM “users” GROUP BY profile_id

=> {1=>4, 2=>1}

(main) normal> u.count

(0.4ms) SELECT COUNT(*) AS count_all, profile_id AS profile_id FROM “users” GROUP BY profile_id

=> {1=>4, 2=>1}

(main) normal> u.count.values.sum

(0.3ms) SELECT COUNT(*) AS count_all, profile_id AS profile_id FROM “users” GROUP BY profile_id

=> 5

Performing u.count.values.sum will also perform a grouped count and just sum the quantity of rows for every key.

I realise that that is the main use case for grouped count, but I’m not sure that you read through the scenario I described. If we are using joins + group to add data from another table to every row in the primary table, and not grouping any further (ie. group by primary_table.id) then the number of rows in the resultset is still the number of rows in the primary table.

This is not what you want to do when you’re just trying to find out how many rows will be in the table join so you can paginate over them (for example).

Again, both use cases for grouping are valid, but they are completely separate, and #count should not do both, as that means you can’t assume that you get a useful result out of calling count on an arbitrary relation.