Implement #sample on ActiveRecord::Relation to get a random record via database, not ruby

Hi, fellows!

Sometimes we need to get a random record from some table. The common way to get it is to use Array#sample method on a relation. Or, if your table is quiet large, you can use one of the ways your DB provides to do such thing. It could be MyModel.order(‘RAND()’).limit(1)

``

if you use MySQL, or

MyModel.order(‘RANDOM()’).limit(1)

``

if you use Postgresql.

This way is much more fast than using #sample method, even if your table is less then 1000 rows. But it has two downsides.

First, this code is now database-dependent. So if your code will some day migrate on an another DB, you will have to change it.

Second, it is not obvious for some rails novices, that doing MyModel.all.sample you actually use Array#sample method, that loads all relation collection in memory. If rails tend to be easy-to-start and clear for developer, it should not act this way.

So, my suggestion is to implement #sample method somewhere in ActiveRecord so that it will generate database query to get random row. There are functions to do it in all common DB distributions (MySQL, Postgresql, MS SQL, IBM DB2, Oracle). The usage could look like this:

MyModel.sample # returns one random record MyModel.sample(10) # returns 10 random records MyModel.all.sample # works identical to the first two examples; and this is not the Array#sample method. MyModel.some_scope.sample # returns one random record from a scope

``

and so on.

I think the performance benefit from such enhancement is quite obvious, but anyway, here is a benchmark for a Postgres table with 180 000 rows. 2.1.5 (main):0 > Benchmark.bm do |x|

2.1.5 (main):0 * x.report(‘sample’) { LessonVersion.all.sample }

2.1.5 (main):0 * x.report(‘RANDOM()’) { LessonVersion.order(‘RANDOM()’).limit(1) }

2.1.5 (main):0 * end

user system total real

sample 9.810000 8.690000 18.500000 ( 29.282437)

RANDOM() 0.000000 0.000000 0.000000 (0.000187)

=> [

[0] #<Benchmark::Tms:0x007fe7c699c568 @label=“sample”, @real=29.282437, @cstime=0.0, @cutime=0.0, @stime=8.69, @utime=9.81, @total=18.5>,

[1] #<Benchmark::Tms:0x007fe7c69a4420 @label=“RANDOM()”, @real=0.000187, @cstime=0.0, @cutime=0.0, @stime=0.0, @utime=0.0, @total=0.0>

]

``

I’ve already set up rails-dev-box on my laptop and cloned rails repository, but there is a CONTRIBUTING.md file, that recommends to get positive feedback before writing code. So, here is my proposal. I think I can write it myself, but ofcourse implementation suggestions are very welcome. Especially since I’m not very comfortible with ActiveRecord sources yet.

ORDER BY random() is actually not very performant at the DB level unless you have a small number of rows. This is because the database (at the very least Postgres) must implement this with a full table scan, since there is no other way to determine what all physical rows on disk are visible to the query.

This is such a big problem that the SQL standard itself provides a special way (rather than ordering by random) to sample tables. Postgres added support for some of these features to 9.5. See http://blog.2ndquadrant.com/tablesample-in-postgresql-9-5-2/

I’m not personally a fan of trying to abstract every database feature into AR. I don’t think it’s at all plausible that any decent-sized application that’s doing anything interesting in the database will really be able to use a drop-in-replacement database kind of mentality.

ORDER BY random() is actually not very performant at the DB level

It is, but it is definitely much more performant than load a whole query result into a ruby array.

Dear sirs contributors,

Was the answer “no, you shouldn’t write this code”, or I should wait a little for somebody with more than 8 commits to answer my proposal?